6,148 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The data in Figure 1 is not novel, similar data has been reported elsewhere.

      We are grateful for the critical evaluation of our finding. Although there have been a few researches indicating the prevalence of FGFR2-amplified GC patients, our research provided a novel dataset of 161 GC patients using next-generation sequencing (NGS) in China, further emphasizing the high frequency of FGFR2 amplification in gastric cancer patients. Moreover, the proportion of FGFR2-amplified GC patients in our center (6.2%) is relatively higher than that of TCGA cohort (5%).

      We have transferred the original Figure 1C and 1D to the supplementary figures, and constructed a novel pie chart for Nanjing Drum Tower Hospital cohort to compare with the TCGA cohort.

      It is unclear why the two panels in Fig 2a and 2b can not be integrated into one panel, which will make it easier to compare the activities.

      Thanks for pointing this out. In the first figure of Figure 2a and 2b, we performed gradient concentration CCK8 detection on the cytotoxicity of SHP099 against tumor cells. In the second figure, we selected 10 μm (IC50) as the fixed concentration of SHP099 for combined efficacy testing with gradient concentration of AZD4547. Moreover, the units of the horizontal axis in both figure 2a and 2b cannot be unified. Therefore, we believe that the two figures in figures 2a and 2b are not suitable for merging into one figure.

      For the convenience of observation, we integrated the first panel of figure 2a and 2b into one panel, and integrated the second panel in the same way.

      The synergetic effects of azd4547 and shp099 are not significant in Fig 2e and 2f, as well as in Fig. 3g and fig. 4f

      In Fig 2e and 2f, we not only analyzed the synergetic effects of 3 nM (a relatively lower dose) AZD4547 and 10 μm SHP099, but also 10 nM (a relatively higher dose) AZD4547 and 10 μm SHP099. The synergetic effects of different dosage combinations should be compared correctly. From our perspective, the combination treatment led to a stronger inhibition of phospho-FGFR, phospho-SHP2 and FGFR2-initiated downstream signaling molecules, especially in KATOIII.

      For ease of comparison, we circled 10 μm SHP099, 10nM AZD4547 and 10nM AZD4547+10 μm SHP099 in red.

      Author response image 1.

      Author response image 2.

      We also circled 10μM SHP099, 3nM AZD4547 and 3nM AZD4547+10 μm SHP099 in blue.

      Author response image 3.

      Author response image 4.

      For ease of comparison, we also conducted grayscale value analysis and normalization using image J.

      Author response image 5.

      Author response image 6.

      Author response image 7.

      Author response image 8.

      In Fig. 3g, the combination therapy exhibited relatively stronger inhibitory effects on phospho-ERK, phospho-AKT and phospho-mTOR.

      For ease of comparison, we conducted grayscale value analysis and normalization using image J.

      The unclear effect of combination therapy may be due to the presence of impurities other than tumor cells in patient’s ascites.

      Author response image 9.

      In Fig. 4f, it was obvious that phospho-AKT and phospho-mTOR were further suppressed in combination group.

      For ease of comparison, we conducted grayscale value analysis and normalization using image J.

      Author response image 10.

      Therefore, in our opinions, our data could relatively sufficiently confirm the synergetic effects of AZD4547 and SHP099.

      Data in Fig. 5 is weak and can be removed. It is unclear why FGFR inhibitor has some activities toward t cells since t cells do not express FGFR.

      The activation effect of SHP099 on T cells has been validated in many articles. In a previous study published in Cancer Immunology Research, it was pointed out that the combination of FGFR2 inhibitor erdafitinib and PD-1 antibody can activate T cells and downregulate T cell surface exhaustion related factors (including PD-1) in vivo Therefore, the anti-tumor immune effect of FGFR2 inhibitor cannot be ignored. Although T cells do not express FGFR, FGFR2 inhibitors may still affect PD-1 expression on the surface of T cells in some other ways, which requires further research. We have deleted fig.5D in our article. We believe that the combination of FGFR2 inhibitor and SHP2 inhibitor not only has a direct killing effect on tumor cells, but also promotes anti-tumor immunity by activating T cells. Therefore, we believe that the in vitro data in Figure 5 is also meaningful.

      Reviewer #2 (Public review):

      Strengths:

      The data is generally well presented and the study invokes a novel patient data set which could have wider value. The study provides additional evidence to support the combined therapeutic approach of RTK and phosphatase inhibition.

      We sincerely thank the reviewer for the critical evaluation and appreciation of our findings.

      Weaknesses:

      Combined therapy approaches targeting RTKs and SHP2 have been widely reported. Indeed, SHP099 in combination with FGFR inhibitors has been shown to overcome adaptive resistance in FGFR-driven cancers. Furthermore, the inhibition of SHP2 has been documented to have important implications in both targeting proliferative signalling as well as immune response. Thus, it is difficult to see novelty or a significant scientific advance in this manuscript. Although the data is generally well presented, there is inconsistency in the interpretation of the experimental outcomes from ex vivo, patient and mouse systems investigated. In addition, the study provides only minor or circumstantial understanding of the dual mechanism.

      We acknowledge that our research on the mechanism of dual inhibition is not deep enough. There remain more in-depth mechanisms of the combination of SHP2 inhibitor and RTK inhibitors needed to be explored, and it would be the main direction of our future study.

      Using data from a 161 patient cohort FGFR2 was identified as displaying amplification of FGFR2 in ~6% with concomitant elevation of mRNA of patients which correlated with PTPN11 (SHP2) mRNA expression. The broader context of this data is of value and could add a different patient demographic to other data on gastric cancer. However, there is no detail on patient stratification or prior therapeutic intervention.

      Thanks for pointing this out and we have added a table on patients’ stratification such as age, gender and so on. Unfortunately, data on patients’ prior therapeutic intervention weren’t collected.

      In SNU16 and KATOIII cells the combined therapy is shown to be effective and appears to be correlated with increased apoptotic effects (i.e. not immune response).

      Fig 2E suggests that the combined therapy in SNU16 cells is a little better than FGFR2-directed AZD457 inhibitor alone, particularly at the higher dose.

      The individual patient case study described via Fig 3 suggests efficacy of the combined therapy (at very high dosage), however, the cell biopsies only show reduced phosphorylation of ERK, but not AKT. This is at odds with the ex vivo cell-based assays. Thus, it is not clear how relevant this study is.

      The mouse xenograft study shows a convincing reduction in tumor mass/volume and clear reduction in pAKT, whilst pERK remains largely unaffected by the combined therapeutic approach. This is in conflict with the previous data which seems to show the opposite effect. In all, the impact of the dual therapy is unclear with respect to the two pathways mediated by ERK and AKT.

      Thank you for the comment. Previous researches have confirmed that both RAS/ERK and PI3K/AKT pathways are two important downstream signaling of FGFR2. In Fig 2E and F, we observed that in FGFR2-amplified cell lines dual blockade had significant inhibitory effects both on p-ERK and p-AKT, and the inhibitory effect on p-ERK is greater than that on p-AKT. Similarly, in Fig 3G, dual blockade mainly suppressed p-ERK, and slightly inhibited p-AKT and p-mTOR in cancer cells derived from the individual patient. Thus, in the two types in-vitro models, dual inhibition simultaneously inhibited both RAS/ERK and PI3K/AKT pathways, and primarily inhibited RAS/ERK pathway, which is not contradictory.

      Author response image 11.

      Author response image 12.

      Author response image 13.

      For the in-vivo animal model. Although dual inhibition had inhibitory effects on both pathways, it mainly suppressed p-AKT.

      In both in vivo and in vitro models, combination therapy has a certain inhibitory effect on the RAS/ERK and PI3K/AKT pathways, but the emphasis on the two is not the same in vivo and in vitro. Considering the significant differences between in vivo and in vitro models, we believe that this difference in emphasis is understandable.

      Author response image 14.

      Finally, the authors demonstrate the impact of SHP2 on PD-1 expression and propose that the SHP099/AZD4547 combination therapy significantly induces the production of IFN-γ in CD8+ T cells. This part of the study is unconvincing and would benefit from the investigation of the tumor micro-environment to assess T cell infiltration.

      To investigate the tumor micro-environment to assess T cell infiltration, we have to establish our research model in immunocompetent mice. However, there is currently only one type of gastric cancer cell line derived from mice, MFC, which is not a cell line with FGFR2 amplification. We attempted to transfect FGFR2 amplification plasmids into MFC, but the transfection effect was poor, making it difficult to conduct in vivo animal experiments.

      Reviewer #3 (Public review):

      Strengths:

      The authors demonstrate that FGFR2 amplification positively correlates with PTPN11 in human gastric cancer samples, providing rationale for combination therapies. Furthermore, convincing data are provided demonstrating that targeting both FGFR and SHP2 is more effective than targeting either pathway alone using in vitro and in vivo models. The use of cells derived from a gastric cancer patient that progressed following treatment with an FGFR inhibitor is also a strength. The findings from this study support the conclusion that SHP2 inhibitors enhance the efficacy of FGFR-targeted therapies in cancer patients. This study also suggests that targeting SHP2 may also be an effective strategy for targeting cancers that are resistant to FGFR-targeted therapies.

      Weaknesses:

      The main caveat with these studies is the lack of an immune competent model with which to test the finding that this combination therapy enhances T cell cytotoxicity in vivo. Discussing this limitation within the context of these findings and future directions for this work, particularly since the combination therapy appears to work quite well without the presence of T cells in the environment, would be beneficial.

      Thank you for the great suggestion. To investigate the tumor micro-environment to assess T cell infiltration, we have to establish our research model in immunocompetent mice. However, there is currently only one type of gastric cancer cell line derived from mice, MFC, which is not a cell line with FGFR2 amplification. We attempted to transfect FGFR2 amplification plasmids into MFC, but the transfection effect was poor, making it difficult to conduct in vivo animal experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points. The manuscript is poorly written and loaded with language errors.

      We sincerely thank you for your constructive suggestion and we are sorry for the mistake. We have polished the article and corrected these language errors.

      Reviewer #2 (Recommendations for the authors):

      In addition to the comments made in the Public Review the manuscript lacks detail on statistical analysis of experimental results.

      Thank you for your advice. In response to the feedback, we have supplemented detail on statistical analysis of experimental results in the “Methods” part.

      Reviewer #3 (Recommendations for the authors):

      There are numerous grammatical errors throughout, and incorrect wording is used in some places (such as "syngeneic mouse tumor model" rather than "xenograft tumor model", line 253). Careful proofreading and editing of this manuscript is recommended.

      Thank you for your suggestion. We have made corrections to the relevant content of the article.

      AZD4547 is an FGFR-selective inhibitor and is not specific for FGFR2 as it also targets FGFR1 and FGFR3, this should be clarified in the text.

      Thank you for rasing this point. We have clarified that AZD4547 is an FGFR-selective inhibitor targeting FGFR1-3 in the “Introduction” part.

      The specific FGFR inhibitor(s) used to treat the patient with FGFR2 amplification, are the authors able to provide this information?

      Thank you for raising this important issue. Indeed, due to the difficulty of small molecule drug development, the fastest clinical progress currently is in FGFR pan inhibitors. Recently, Relay Therapeutics has also developed a highly FGFR2-selective inhibitor, RLY-4008, in phase I/II clinical trials, but lacks preclinical research on gastric cancer.

      Figure 2F: the p38 and p-p38 bands are cut off at the bottom

      We sincerely thank you for your thoughtful feedback. we have improved our experimental methods and retested the two p38 and p-p38 in Figure 2F by western blotting.

      Author response image 15.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the thermal and mechanical unfolding pathways of the doubly knotted protein TrmD-Tm1570 using molecular simulations, optical tweezers experiments, and other methods. In particular, the detailed analysis of the four major unfolding pathways using a well-established simulation method is an interesting and valuable result.

      Strengths:

      A key finding that lends credibility to the simulation results is that the molecular simulations at least qualitatively reproduce the characteristic force-extension distance profiles obtained from optical tweezers experiments during mechanical unfolding. Furthermore, a major strength is that the authors have consistently studied the folding and unfolding processes of knotted proteins, and this paper represents a careful advancement building upon that foundation.

      We appreciate and we thank the reviewer for reading our manuscript.

      Weaknesses:

      While optical tweezers experiments offer valuable insights, the knowledge gained from them is limited, as the experiments are restricted to this single technique.

      The paper mentions that the high aggregation propensity of the TrmD-Tm1570 protein appears to hinder other types of experiments. This is likely the reason why a key aspect, such as whether a ribosome or molecular chaperones are essential for the folding of TrmD-Tm1570, has not been experimentally clarified, even though it should be possible in principle.

      We appreciate the suggestion that clarifying the requirement for molecular chaperones or the ribosome in TrmD-Tm1570 folding is crucial. We are pleased to report that the experiment investigating the role of molecular chaperones in the folding of TrmD-Tm1570 is currently under investigation in our laboratory. These results will provide the clarification on this aspect and will be incorporated into a future manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors combined coarse-grained structure-based model simulation, optical tweezer experiments, and AI-based analysis to assess the knotting behavior of the TrmD-Tm1570 protein. Interestingly, they found that while the structure-based model can fold the single knot from TrmD and Tm1570, the double-knot protein TrmD-Tm1570 cannot form a knot itself, suggesting the need for chaperone proteins to facilitate this knotting process. This study has strong potential to understand the molecular mechanism of knotted proteins, supported by much experimental and simulation evidence. However, there are a few places that appear to lack sufficient details, and more clarification in the presentation is needed.

      Strengths:

      A combination of both experimental and computational studies.

      We appreciate and we thank the reviewer for reading our manuscript.

      Weaknesses:

      There is a lack of detail to support some statements.

      (1) The use of the AI-based method, SOM, can be emphasized further, especially in its analysis of the simulated unfolding trajectories and discovery of the four unfolding/folding pathways. This will strengthen the statistical robustness of the discovery.

      We thank the reviewer for this observation. However, the AI-based method, SOM, was applied to obtain the main representative trajectories for the mechanical unfolding MD simulations. Specifically, for the TrmD, Tm1570, and fusion protein (TrmD-Tm1570) we extracted the representative conformational states by selecting the most highly populated SOM clusters shown in SI Figure 5 - figure supplement 3. Then, by identifying the cluster centroid, we selected the nearest point (simulations). These correspond to the clusters number 1 for Tm1570, number 11 for TrmD, and number 7 for TrmD-Tm1570. A sentence was added in the main manuscript to clarify how the main representative confirmation was obtained.

      On the other hand, no AI‑based methods were applied to the thermal unfolding simulations. The four thermal unfolding trajectories shown in Figure 3 were obtained as follows: (i) trajectories where TrmD unfolds first and its knot unties before Tm1570 unfolds, corresponding to pathway 1 (Figure 3A and E); (ii) trajectories where Tm1570 unfolds and unties first, followed by TrmD, corresponding to pathway 3 (Figure 3C and G); and (iii) trajectories where TrmD unfolds first, then Tm1570, after which the TrmD knot unties and finally the Tm1570 knot unties—this corresponds to pathway 2. Pathway 4 follows the same sequence but in the reverse order.

      (2) The manuscript would benefit from a clearer description of the correlation between the simulation and experimental results. The current correlation, presented in the paragraph starting from Line 250, focuses on measured distances. The authors could consider providing additional evidence on the order of events observed experimentally and computationally. More statistical analyses on the experimental curves presented in Figure 4 supplement would be helpful.

      We thank the reviewer for this suggestion. In response, we prepared additional statistical analyses in a table format reporting the average length‑change increments together with their standard deviations, and we clarified in the revised text that the ± values correspond to standard deviations. In addition, we quantified the percentage of TrmD, Tm1570, and TrmD-Tm1570 unfold completely, providing a clearer comparison of the order of events observed experimentally and computationally. These analyses have been incorporated into the revised manuscript, Tables 1 and 2.

      (3) How did the authors calibrate the timescale between simulation and experiment? Specifically, what is the value \tau used in Line 270, and how was it calculated? Relevant information would strengthen the connection between simulation and experiment.

      In our model time unit is defined by a relation , where m is the reduced mass unit, is an average average mass of an amino acid, m = 110 Da = 1.66 x 10<sup>-27</sup> kg, 𝜀 is the reduced energy unit, an average interaction energy between amino acids. We may assume that ε is around 2-3 kcal/mol = 2-3 x 6.95 x 10<sup>-21</sup> J, is a distance unit and is equal to 1 nm.

      After plugging this values into the equation defining 𝜏 , we get: 𝜏 = 3.2 ps.

      The definition of the time unit comes from the fact that this is how one can combine units of mass, distance and energy into an expression that has an unit of time.

      The pulling speeds used in the simulations (0.05–0.15 Å/) correspond to approximately 1.6 -4.7 m/s in real units. These speeds are necessarily much higher than the experimental pulling The pulling speeds used in the simulations (0.05–0.15 Å/ ) correspond to approximately 1.6 - speed (20 nm/s), which is a well‑known limitation of steered molecular dynamics. However, our coarse‑grained model is run in an implicit solvent regime and does not explicitly include hydrodynamic friction. As a consequence, the simulated dynamics do not reproduce absolute real time kinetics. Instead, the comparison between simulation and experiment is made through relative unfolding pathways, force extension behavior, and contour length changes, which remain robust across the range of simulated pulling speeds.

      Thus, 𝜏 = 3.2 ps is derived directly from the coarse‑grained model parameters rather than calibratedτ to experiment, and the connection between simulation and experiment is established through mechanistic agreement rather than matching absolute timescales.

      We have now added a clarifying sentence to the manuscript (Methods and Materials - Mechanical unfolding simulations) explaining how the timescale was defined and how the value of  was obtained.

      Reference: 

      Szymczak, P., and Marek Cieplak. "Stretching of proteins in a uniform flow." The Journal of chemical physics 125.16 (2006).

      (4) In Line 342, the authors comment that whether using native contacts or not, they cannot fold double-knotted TrmD-Tm1570. Could the authors provide more details on how non-native interactions were analyzed?

      To analyze the role of non‑native interactions, we calculated two non‑native contact maps, first using a distance cutoff criterion and second by identifying the highly frustrated contacts based on the frustration index using Frustratometer (http://frustratometer.qb.fcen.uba.ar/) - figure below. From this procedure, the non‑native interactions were incorporated in the SBM C-alpha model to potentially assist refolding or knot formation. However, in neither case we observe successful refolding or the formation of the double‑knotted native topology. These results indicate that the addition of these non‑native contacts are insufficient to drive the refolding of the TrmD–Tm1570 protein. This result may suggest that the protein needs the support of chaperones or the active role of ribosomes to tie the two knots. We have now clarified this point more explicitly in the revised manuscript .

      Author response image 1.

      Native and non‑native contact maps for TrmD–Tm1570. The upper triangle (blue dots) corresponds to the cutoff‑based contact map and shows only unique contacts not present in the native contact map. The lower triangle (red dots) represents highly frustrated contacts, again showing only unique contacts absent from the native map. Black dots indicate the native contacts derived from the structure, and the contact map was generated using the Shadow Contact Map software. The blue and orange shadows correspond to the knot position for TrmD and Tm1570 proteins, respectively. 

      (5) It appears that the manuscript lacks simulation or experimental evidence to support the statement at Line 343: While each domain can self-tie into its native knot, this process inhibits the knotting of the other domain. Specifically, more clarification on this inhibition is needed.

      Explaining this phenomenon remains challenging, and several contributing factors are likely.

      (1) The folding success rates of the individual TrmD and Tm1570 domains are low (<3%); folding of the double-knotted protein is therefore expected to be even less efficient. 

      (2) While formation of a single knot is observed when the two domains are examined, the folded domain adopts a native-like but not fully native conformation, regardless of whether it is TrmD or Tm1570. (2A) Fluctuations of the unfolded second domain may impose a destabilizing load, promoting unfolding of the folded domain. (2B) Conversely, folding of one domain restricts the conformational space available to the other. Such restriction may have either stabilizing or destabilizing effects: although reduced conformational space (crowding) is generally thought to increase the probability of knot formation in polymers, in this system the constraint is localized rather than global.

      (3) It is possible that extending the simulations to much longer timescales would allow formation of the second knot; however, within the timescales accessible here, unfolding of the first knot is observed instead.

      (4) The TrmD–Tm1570 protein forms a dimer with a well-defined interface, whereas our simulations were performed on a monomeric unit. Consequently, both domains are solvent-exposed, forming an open two-domain system with tRNA-binding elements that are not stabilized by intermolecular interactions.

      Taken together, these factors preclude a quantitative assessment of the dominant contribution. Our results suggest that efficient folding may require assistance from molecular chaperones or an active role of the ribosome in coordinating formation of the two knots.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The paper notes at the beginning of its results section that simulations aiming to fully fold the TrmD-Tm1570 protein from a denatured state were unsuccessful. While the failure to achieve complete folding is itself an instructive and important result, there is room for improvement in how it's presented. The authors provide no specific details on what actually occurred during these simulations. It is plausible that some intermediate state was reached, and one can imagine that the knotting of the C-terminal part, Tm1570, was partially completed. A more detailed description of these outcomes would have been beneficial.

      In the main manuscript (Figure 3), we reported the folding trajectories and the probability of native contact formation for the TrmD–Tm1570 protein, focusing on the four main observed unfolding pathways from our simulations. In addition to these common pathways, we also examined a small number of trajectories which one or both domains may refold. These are presented in Figure 3 - figure supplements 1 and 2, where we highlight a set of trajectories that we classify as rare events. In these rare trajectories, partial refolding and the formation of intermediate states can indeed be observed. However, as described in the main text, successful refolding of the fusion protein only occurs when the knot remains close to its native position and does not undergo large fluctuations along the chain. When the knot drifts significantly, refolding is not completed.

      Figure 3 - figure supplement 1 shows six representative examples of intermediate states sampled during these simulations. As the reviewer suggested, some intermediate conformations were reached, including partial reformation of structural elements. However, only the trajectory which maintains the knot sufficiently close to its native location is able to do substantial refolding. We have now clarified this point more explicitly in the revised manuscript to better explain why full folding was not achieved and how the knot dynamics constrain the refolding process.

      (2) Is it not possible to plot the degree of knot formation as a function of time or Q in Figure 3A-H? Doing so would make the verbally described results much clearer.

      We thank the reviewer for the suggestion. Based on your observation, we have added a new figure in the SI manuscript (Figure 3 - figure supplement 3) showing the knot translocation as a function of the frames with their respective structure representations from the transitions, from folded to unfolded state and knot untied processes.

      (3) Placement of a paragraph starting from line 250 looks odd to me. The paragraph describes simulation results of the mechanical unfolding, which is fully described in the following section. Specifically, the simulation result is discussed before describing its method/outline, which is to be avoided as far as possible.

      According to the standard journal style, the Method section is described after the Discussion section. However, in the simulation's results, a sentence addressing the methods was included to guide the reader through the text. 

      (4) This is only an optional request. It is highly desired to examine the in vitro folding of TrmD-Tm1570 with and without molecular chaperones. At least, authors can envision/discuss this direction.

      We agree that examining the in vitro folding of TrmD–Tm1570 with and without molecular chaperones would provide important mechanistic insights into the role of the fold of knotted proteins. We are planning to perform these experiments as part of our ongoing work, and in the revised manuscript we will add a discussion on this direction and its potential impact.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 6C was not referenced or discussed in the manuscript.

      We thank the reviewer for pointing this out. Figure 6C is indeed referenced and discussed in the manuscript.

      (2) Several places refer to figures in the Supporting Information, and should be updated to refer to the supplement figures associated with the main figures. 

      In the revised version we ensure that all references are updated and clearly labeled.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Since dimerization is essential for SARS-CoV-2 Mpro enzymatic activity, the authors investigated how different classes of inhibitors, including peptidomimetic inhibitors (PF-07321332, PF-00835231, GC376, boceprevir), non-peptidomimetic inhibitors (carmofur, ebselen, and its analog MR6-31-2), and allosteric inhibitors (AT7519 and pelitinib), influence the Mpro monomer-dimer equilibrium using native mass spectrometry. Further analyses with isotope labeling, HDX-MS, and MD simulations examined subunit exchange and conformational dynamics. Distinct inhibitory mechanisms were identified: peptidomimetic inhibitors stabilized dimerization and suppressed subunit exchange and structural flexibility, whereas ebselen covalently bound to a newly identified site at C300, disrupting dimerization and increasing conformational dynamics. This study provides detailed mechanistic evidence of how Mpro inhibitors modulate dimerization and structural dynamics. The newly identified covalently binding site C300 represents novelty as a druggable allosteric hotspot.

      Strengths:

      This manuscript investigates how different classes of inhibitors modulate SARS-CoV-2 main protease dimerization and structural dynamics, and identifies a newly observed covalent binding site for ebselen.

      Weaknesses:

      The major concern is the absence of mutagenesis data to support the proposed inhibitory mechanisms, particularly regarding the role of the inhibitor binding site.

      We thank the reviewer for the comments and recognition of our study. We agree that mutagenesis experiments are very helpful to validate the proposed mechanisms. We will perform site-directed mutagenesis of the key residue C300 and assess the effects of those C300 mutants on dimerization and enzymatic activity of Mpro, and integrate the results and discussion into the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This is a mechanistic study that provides new insights into the inhibition of SARS-CoV-2 Mpro.

      Strengths:

      The identification of dimer interface stabilization/destabilization as distinct inhibitory mechanisms and the discovery of C300 as a potential allosteric site for ebselen are important contributions to the field. The experimental approach is modern, multi-faceted, and generally well-executed.

      We thank the reviewer for the positive comments and recognition of our study.

      Weaknesses:

      The primary weaknesses relate to linking the biophysical observations more directly to functional enzymatic outcomes and providing more quantitative rigor in some analyses. While the study is overall strong, addressing its weaknesses and limitations would elevate the impact and translational relevance of the current manuscript.

      We thank the reviewer for the comments that are very helpful for improving the quality and impact of our manuscript.

      (1) Correlation with Functional Activity:

      The most significant gap is the lack of direct enzymatic activity assays under the exact conditions used for MS and HDX. While EC50 values are listed from literature, demonstrating how the observed dimer stabilization (by peptidomimetics) or dimer disruption (by ebselen) directly correlates with inhibition of proteolytic activity in the same experimental setup would solidify the functional relevance of the biophysical observations. For instance, does the fraction of monomer measured by native MS quantitatively predict the loss of activity? Also, the single inhibitor concentration used in each MS experiment needs to be specified in the main text and legends. A discussion on whether the inhibitor concentrations required to observe these dimerization effects (in native MS) or structural dynamics (in HDX-MS) align with EC50 values would be helpful for contextualizing the findings.

      We thank the reviewer for the points and agree that directly linking our biophysical observations to functional outcomes under identical conditions would be more meaningful. We will perform enzymatic activity assays to investigate whether the fraction of monomer measured by native MS can predict the loss of activity. The inhibitor concentrations used in each MS experiment will be explicitly stated in the main text and figure legends, and we will also discuss how these concentrations relate to the EC50/IC50 values, providing content for the biophysical observations.

      (2) For the two Cys residues found to be targeted by ebselen, what are their respective modification stoichiometry related to the ebselen concentration? Especially for the covalent binding site C300, which is proposed in this study to represent a novel allosteric inhibition mechanism of ebselen, more direct experimental evidence is needed to support this major hypothesis. Does mutation or modification of C300 affect the Mpro dimerization/monomer equilibrium and alter the enzymatic activity? If ebselen acts as a covalent inhibitor linked to multiple Cys, why is its activity only in the uM range?

      We thank the reviewer for the insightful comments. To address the stoichiometry of ebselen modification, we will further analyze the data and discuss accordingly. To display more direct evidence of C300 as a novel allosteric inhibition site of ebselen, we will perform site-directed mutagenesis and investigate whether these C300 mutants affect the Mpro dimerization and enzymatic activity. Regarding the modification of C300, several independent studies have been cited in this manuscript and showed that oxidation (by glutathione, Davis et., 2021) or chemical modification of C300 (by glutathione bismuth drugs, Tao et al., 2021, and Tixocortol, Davis et., 2024) leads to Mpro inactivation and promotes monomer formation. We will cite and further discuss these studies in the Discussion. The µM-range activity of ebselen can be explained by its multi-target covalent binding to multiple cysteines. The variable efficacy of cysteine modification may account for ebselen's moderate potency, as not all modifications equally inhibit their targets.

      (3) For the allosteric inhibitor pelitinib with low-uM activity, no significant differences in deuterium uptake of Mpro were observed. In terms of the binding affinity, what is the difference between pelitinib and ebselen? Some explanations could be provided about the different HDX-MS results between the two non-peptidomimetic inhibitors with similar activities.

      Pelitinib has non-covalent binding with Mpro, while the binding between ebselen and Mpro is covalent. We will add some explanations and discussion about their different HDX-MS results in the revised version.

      (4) Native MS Quantification:

      The analysis of monomer-dimer ratios from native MS spectra appears qualitative or semi-quantitative. A more rigorous and quantified analysis of the percentage of dimer/monomer species under each condition, with statistical replicates, would strengthen the equilibrium shift claims. For native MS analysis of each inhibitor, the representative spectrum can be shown in the main figure together with quantified dimer/monomer fractions from replicates to show significance by statistical tests.

      We thank the reviewer for the suggestion, and we will perform a more rigorous and quantitative analysis of the monomer-dimer equilibrium. For each condition (unbound Mpro and Mpro bound to each inhibitor), native MS experiments will be shown in triplicate. As suggested, we will include a representative native MS spectrum for each condition. The quantified monomer/dimer ratios from replicates will be added. The results with statistical analysis will be provided to show significance.

      (5) Changes of HDX rates in certain regions seem very subtle. For example, as it states 'residues 296-304 in the C-terminal region of M pro were more flexible upon ebselen binding (Figure 4c)', the difference is barely observable. The percentage of HDX rate changes between two conditions (with p values) can be specified in the text for each fragment discussed, and any change below 5% or 10% is negligible.

      We agree with the reviewer about the need for quantitative rigor in reporting HDX changes. We will calculate the fractional deuterium uptake difference for each peptide fragment discussed in the text between the inhibitor-bound and unbound states. These values, along with their statistical significance (p-values from a two-tailed t-test), will be provided in the revised figures.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have adequately addressed all of my concerns. I have no further questions or concerns.

      We thank the Reviewer #1. 

      Reviewer #2 (Recommendations for the authors):

      We thank the Reviewer #2 for thoughtful recommendations.

      (1) Figure 1A, 1B, 2B, 2C, etc.: The Y-axis label is confusing. I assume the intention was to make big numbers small by dividing by 1000. The comma makes the label confusing. Perhaps, make the label more "mathematical" as in "Avp density ((transcript/µm2) * 10-3)" or rearrange the math to be clearer as in "Avp density (transcript/1000 per µm2)".

      Great suggestion and done exactly as suggested in Figures 1, 2 and 4.

      (2) Figure 1B and 1C: The figure and legend do not match up. Either switch the figures or the legends. Currently, legend 1B describes image 1C.

      Agreed and done as suggested.

      (3) Figure 2A is broken up into separate pages/panels. It could be integrated better or separated to make A and B, then shift B and C to C and D.

      Great suggestion and we have done exactly as suggested.

      (4) Figure 2 legend: I recommend putting the scale bar info with (A) rather than at the end. The stars used in the figure are not explained in the legend.

      Good points. We have made all necessary changes as suggested.

      (5) Supplementary Figure 1B: The legend states that the data are the number of transcript-containing cells, but the figure states transcript number.

      We thank the Reviewer for pointing out this typo. We corrected all graph legends in the Supplementary Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The authors use a confusing timeline for their behavioral experiments, i.e., day 1 is the first day of training in the MWM, and day 6 is the probe trial, but in reality, day 6 is the first day after the last training day. So this is really day 1 post-training, and day 20 is 14 days post-training.

      We have revised the timeline accordingly. Briefly, mice were trained in the Morris water maze (MWM) with a hidden platform for five consecutive days (training days 1–5). Probe tests were then conducted on day 6 and day 20, which correspond to post-training day 1 and post-training day 15, respectively. We clearly stated as such in the revised manuscript (see results, line 108 – 113) and figure S1 (see figure legend, line 1747 – 1749).

      (2) The authors inaccurately use memory as a term. During the training period in the MWM, the animals are learning, while memory is only probed on day 6 (after learning). Thus, day 6 reflects memory consolidation processes after learning has taken place.

      We have revised the manuscript to distinguish between "learning" and "memory". We refer to the performance during the 5-day training period as "spatial learning" and restrict the term "memory" to the probe tests on day 6, which reflect memory consolidation after learning has taken place.

      (3) The NAT10 cKO mice are useful... but all the experiments used AAV-CRE injections in the dorsal hippocampus that showed somewhat modest decreases... For these experiments, it would be better to cross the NAT10 floxed animals to CRE lines where a better knockdown of NAT10 can be achieved, with less variability.

      We want to clarify the reason for using AAV-Cre injection rather than Cre lines. Indeed, we attempted to generate Nat10 conditional knockouts by crossing Nat10<sup>flox/flox</sup> mice with several CNS-specific Cre lines. Crossing with Nestin-Cre and Emx1-Cre resulted in embryonic and premature lethality, respectively, consistent with the essential housekeeping function of NAT10 during neurodevelopment. We will use the Camk2α-Cre line which starts to express Cre after postnatal 3 weeks specifically in hippocampal pyramidal neurons (Tsien et al., 1996).

      (4) Because knockdown is only modest (~50%), it is not clear if the remaining ac4c on mRNAs is due to remaining NAT10 protein or due to an alternative writer (as the authors pose).

      Our results suggest the existence of alternative writers. As shown in Figure 6D, we identified a population of "NAT10-independent" MISA mRNAs (present in MISA but not downregulated in NASA). Remarkably, these mRNAs possess a consensus motif (RGGGCACTAACY) that is fundamentally different from the canonical NAT10 motif (AGCAGCTG). This distinct motif usage suggests that the residual ac4C signals are not merely due to incomplete knockdown of NAT10, but reflect the activity of other, as-yet-unidentified ac4C writers. We will perform ac4C immunostaining in Nat10-reporter mice which express red fluorescent proteins in Nat10-positive cells. The results that ac4C is expressed in both Nat10-positive and negative cells will support the presence of as-yet-unidentified ac4C writers.

      Reviewer #2 (Public review):

      (1) It is known that synaptosomes are contaminated with glial tissue... So the candidate mRNAs identified by acRIP-seq might also be mixed with glial mRNAs. Are the GO BP terms shown in Figure 3A specifically chosen, or unbiasedly listed for all top ones?

      This reviewer is correct that some ac4C-mRNAs identified by acRIP-seq from the synaptosomes are highly expressed in astrocytes, such as Aldh1l1, ApoE, Sox9 and Aqp4 (see list of ac4C-mRNAs in the synaptosomes, Table S3). In agreement, we found that NAT10 was also expressed in astrocyte in addition to neurons. We have provided a representative image showing NAT10-Cre expression in astrocytes in the revised manuscript (Figure 4F and H). In the figure 3A of original submission, we showed 10 out of 16 top BP items for MISA mRNAs. In the figure 3A of revised manuscript, we showed all the top 16 BP items for MISA mRNAs, which are unbiasedly chosen (also see Table S4).

      (2) Where does NAT10-mediated mRNA acetylation take place within cells generally? Is there evidence that NAT10 can catalyze mRNA acetylation in the cytoplasm?

      The previous studies from non-neuronal cells showed that NAT10 can catalyze mRNA acetylation in the cytoplasm and enhance translational efficiency (Arango et al., 2018; Arango et al., 2022). In this study, we showed that mRNA acetylation occurred both in the homogenates and synapses (see ac4C-mRNA lists in Table S2 and S3). However, spatial memory upregulated mRNA acetylation mainly in the synapses rather than in the homogenates (Fig. 2 and Fig. S2).

      (3) "The NAT10 proteins were significantly reduced in the cytoplasm (S2 fraction) but increased in the PSD fraction..." The small increase in synaptic NAT10 might not be enough to cause a decrease in soma NAT10 protein level.

      We showed that the NAT10 protein levels were increased by one-fold in the PSD fraction, but were reduced by about 50% in the cytoplasm after memory formation (Fig. 5J and K). The protein levels of NAT10 in the homogenates and nucleus were not altered after memory formation (Fig. 5F and I). Due to these facts, we hypothesized that NAT10 proteins may have a relocation from cytoplasm to synapses after memory formation, which was also supported by the immunofluorescent results from cultured neurons (Fig. S4). However, we agree with this reviewer that drawing such a conclusion may require the time-lapse imaging of NAT10 protein trafficking in living animals, which is technically challenging at this moment.

      (4) It is difficult to separate the effect on mRNA acetylation and protein mRNA acetylation when doing the loss of function of NAT10.

      This is a good point. We agree with this reviewer that NAT10 may acetylate both mRNA and proteins. We examined the acetylation levels of a-tubulin and histone H3, two substrate proteins of NAT10 in the hippocampus of Nat10 cKO mice. As shown in Fig S5C, E, and F, the acetylation levels of a-tubulin and histone H3 remained unchanged in the Nat10 cKO mice, likely due to the compensation by other protein acetyltransferases. In contrast, mRNA ac4C levels were significantly decreased in the Nat10 cKO mice (Figure S5G–H). These results suggest that the memory deficits seen in Nat10 cKO mice may be largely due to the impaired mRNA acetylation. Nonetheless, we believe that developing a new technology which enables selective erasure of mRNA acetylation would be helpful to address the function of mRNA acetylation. We discussed these points in the MS (see discussion, line 582-589).

      Reference

      Arango, D., Sturgill, D., Alhusaini, N., Dillman, A. A., Sweet, T. J., Hanson, G., Hosogane, M., Sinclair, W. R., Nanan, K. K., & Mandler, M. D. (2018). Acetylation of cytidine in mRNA promotes translation efficiency. Cell, 175(7), 1872-1886. e1824.

      Arango, D., Sturgill, D., Yang, R., Kanai, T., Bauer, P., Roy, J., Wang, Z., Hosogane, M., Schiffers, S., & Oberdoerffer, S. (2022). Direct epitranscriptomic regulation of mammalian translation initiation through N4-acetylcytidine. Molecular cell, 82(15), 2797-2814. e2711.

      Tsien, J. Z., Chen, D. F., Gerber, D., Tom, C., Mercer, E. H., Anderson, D. J., Mayford, M., Kandel, E. R., & Tonegawa, S. (1996). Subregion-and cell type–restricted gene knockout in mouse brain. Cell, 87(7), 1317-1326.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Review of the manuscript titled " Mycobacterial Metallophosphatase MmpE acts as a nucleomodulin to regulate host gene expression and promotes intracellular survival".

      The study provides an insightful characterization of the mycobacterial secreted effector protein MmpE, which translocates to the host nucleus and exhibits phosphatase activity. The study characterizes the nuclear localization signal sequences and residues critical for the phosphatase activity, both of which are required for intracellular survival.

      Strengths:

      (1) The study addresses the role of nucleomodulins, an understudied aspect in mycobacterial infections.

      (2) The authors employ a combination of biochemical and computational analyses along with in vitro and in vivo validations to characterize the role of MmpE.

      Weaknesses:

      (1) While the study establishes that the phosphatase activity of MmpE operates independently of its NLS, there is a clear gap in understanding how this phosphatase activity supports mycobacterial infection. The investigation lacks experimental data on specific substrates of MmpE or pathways influenced by this virulence factor.

      We thank the reviewer for this insightful comment and agree that identification of the substrates of MmpE is important to fully understand its role in mycobacterial infection. MmpE is a putative purple acid phosphatase (PAP) and a member of the metallophosphoesterase (MPE) superfamily. Enzymes in this family are known for their catalytic promiscuity and broad substrate specificity, acting on phosphomonoesters, phosphodiesters, and phosphotriesters (Matange et al., Biochem J, 2015). In bacteria, several characterized MPEs have been shown to hydrolyze substrates such as cyclic nucleotides (e.g., cAMP) (Keppetipola et al., J Biol Chem, 2008; Shenoy et al., J Mol Biol, 2007), nucleotide derivatives (e.g., AMP, UDP-glucose) (Innokentev et al., mBio, 2025), and pyrophosphate-containing compounds (e.g., Ap4A, UDP-DAGn) (Matange et al., Biochem J., 2015). Although the binding motif of MmpE has been identified, determining its physiological substrates remains challenging due to the low abundance and instability of potential metabolites, as well as the limited sensitivity and coverage of current metabolomic technologies in mycobacteria.

      (2) The study does not explore whether the phosphatase activity of MmpE is dependent on the NLS within macrophages, which would provide critical insights into its biological relevance in host cells. Conducting experiments with double knockout/mutant strains and comparing their intracellular survival with single mutants could elucidate these dependencies and further validate the significance of MmpE's dual functions.

      We thank the reviewer for the comment. Deletion of the NLS motifs did not impair MmpE’s phosphatase activity in vitro (Figure 2F), indicating that MmpE's enzymatic function operates independently of its nuclear localization. Indeed, we confirmed that Fe<sup>3+</sup>-binding ability via the residues H348 and N359 is required for enzymatic activity of MmpE. We have expanded on this point in the Discussion section “MmpE is a bifunctional virulence factor in Mtb”.

      (3) The study does not provide direct experimental validation of the MmpE deletion on lysosomal trafficking of the bacteria.

      We thank the reviewer for the comment. To validate the role of MmpE in lysosome maturation during infection, we conducted fluorescence colocalization assays in THP-1 macrophages infected with BCG strains, including WT, ∆MmpE, Comp-MmpE, Comp-MmpE<sup>ΔNLS1</sup>, Comp-MmpE<sup>ΔNLS2</sup>, Comp-MmpE<sup>ΔNLS1-2</sup>. These strains were stained with the lipophilic membrane dye DiD, while macrophages were treated with the acidotropic probe LysoTracker<sup>TM</sup> Green (Martins et al., Autophagy, 2019). The result indicated that ΔMmpE and MmpE<sup>NLS1-2</sup> mutants exhibited significantly higher co-localization with LysoTracker compared to WT and Comp-MmpE strains (New Figure 5G), suggesting that MmpE deletion leads to enhanced lysosomal maturation during infection.

      (4) The role of MmpE as a mycobacterial effector would be more relevant using virulent mycobacterial strains such as H37Rv.

      We thank the reviewer for the comment. Previously, the role of Rv2577/MmpE as a virulence factor has been demonstrated in M. tuberculosis CDC 1551, where its deletion significantly reduced bacterial replication in mouse lungs at 30 days post-infection (Forrellad et al., Front Microbiol, 2020). However, that study did not explore the underlying mechanism of MmpE function. In our study, we found that MmpE enhances M. bovis BCG survival in macrophages (THP-1 and RAW264.7 both) and in mice (Figure 3, Figure 7A), consistent with its proposed role in virulence. To investigate the molecular mechanism by which MmpE promotes intracellular survival, we used M. bovis BCG as a biosafe surrogate and this model is widely accepted for studying mycobacterial pathogenesis (Wang et al., Nat Immunol, 2015; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017).

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors have characterized Rv2577 as a Fe3+/Zn2+ -dependent metallophosphatase and a nucleomodulin protein. The authors have also identified His348 and Asn359 as critical residues for Fe3+ coordination. The authors show that the proteins encode for two nuclease localization signals. Using C-terminal Flag expression constructs, the authors have shown that the MmpE protein is secretory. The authors have prepared genetic deletion strains and show that MmpE is essential for intracellular survival of M. bovis BCG in THP-1 macrophages, RAW264.7 macrophages, and a mouse model of infection. The authors have also performed RNA-seq analysis to compare the transcriptional profiles of macrophages infected with wild-type and MmpE mutant strains. The relative levels of ~ 175 transcripts were altered in MmpE mutant-infected macrophages and the majority of these were associated with various immune and inflammatory signalling pathways. Using these deletion strains, the authors proposed that MmpE inhibits inflammatory gene expression by binding to the promoter region of a vitamin D receptor. The authors also showed that MmpE arrests phagosome maturation by regulating the expression of several lysosome-associated genes such as TFEB, LAMP1, LAMP2, etc. These findings reveal a sophisticated mechanism by which a bacterial effector protein manipulates gene transcription and promotes intracellular survival.

      Strength:

      The authors have used a combination of cell biology, microbiology, and transcriptomics to elucidate the mechanisms by which Rv2577 contributes to intracellular survival.

      Weakness:

      The authors should thoroughly check the mice data and show individual replicate values in bar graphs.

      We kindly appreciate the reviewer for the advice. We have now updated the relevant mice data in the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "Mycobacterial Metallophosphatase MmpE Acts as a Nucleomodulin to Regulate Host Gene Expression and Promote Intracellular Survival", Chen et al describe biochemical characterisation, localisation and potential functions of the gene using a genetic approach in M. bovis BCG and perform macrophage and mice infections to understand the roles of this potentially secreted protein in the host cell nucleus. The findings demonstrate the role of a secreted phosphatase of M. bovis BCG in shaping the transcriptional profile of infected macrophages, potentially through nuclear localisation and direct binding to transcriptional start sites, thereby regulating the inflammatory response to infection.

      Strengths:

      The authors demonstrate using a transient transfection method that MmpE when expressed as a GFP-tagged protein in HEK293T cells, exhibits nuclear localisation. The authors identify two NLS motifs that together are required for nuclear localisation of the protein. A deletion of the gene in M. bovis BCG results in poorer survival compared to the wild-type parent strain, which is also killed by macrophages. Relative to the WT strain-infected macrophages, macrophages infected with the ∆mmpE strain exhibited differential gene expression. Overexpression of the gene in HEK293T led to occupancy of the transcription start site of several genes, including the Vitamin D Receptor. Expression of VDR in THP1 macrophages was lower in the case of ∆mmpE infection compared to WT infection. This data supports the utility of the overexpression system in identifying potential target loci of MmpE using the HEK293T transfection model. The authors also demonstrate that the protein is a phosphatase, and the phosphatase activity of the protein is partially required for bacterial survival but not for the regulation of the VDR gene expression.

      Weaknesses:

      (1) While the motifs can most certainly behave as NLSs, the overexpression of a mycobacterial protein in HEK293T cells can also result in artefacts of nuclear localisation. This is not unprecedented. Therefore, to prove that the protein is indeed secreted from BCG, and is able to elicit transcriptional changes during infection, I recommend that the authors (i) establish that the protein is indeed secreted into the host cell nucleus, and (ii) the NLS mutation prevents its localisation to the nucleus without disrupting its secretion.

      We kindly appreciate the reviewer for this insightful comment. To confirm the translocation of MmpE into the host nucleus during BCG infection, we first detected the secretion of MmpE by M. bovis BCG, using Ag85B as a positive control and GlpX as a negative control (Zhang et al., Nat commun, 2022). Our results showed that MmpE- Flag was present in the culture supernatant, indicating that MmpE is secreted by BCG indeed (new Figure S1C).

      Next, we performed immunoblot analysis of the nuclear fractions from infected THP-1 macrophages expressing FLAG-tagged wild-type MmpE and NLS mutants. The results revealed that only wild-type MmpE was detected in the nucleus, while MmpE<sup>ΔNLS1</sup>, MmpE<sup>ΔNLS2</sup> and MmpE<sup>ΔNLS1-2</sup> were not detectable in the nucleus (New Figure S1D). Taken together, these findings demonstrated that MmpE is a secreted protein and that its nuclear translocation during infection requires both NLS motifs.

      Demonstration that the protein is secreted: Supplementary Figure 3 - Immunoblotting should be performed for a cytosolic protein, also to rule out detection of proteins from lysis of dead cells. Also, for detecting proteins in the secreted fraction, it would be better to use Sauton's media without detergent, and grow the cultures without agitation or with gentle agitation. The method used by the authors is not a recommended protocol for obtaining the secreted fraction of mycobacteria.

      We kindly appreciate the reviewer for the advice. To avoid the effects of bacterial lysis, we cultured the BCG strains expressing MmpE-Flag in Middlebrook 7H9 broth with 0.5% glycerol, 0.02% Tyloxapol, and 50 µg/mL kanamycin at 37 °C with gentle agitation (80 rpm) until an OD<sub>600</sub> of approximately 0.6 (Zhang et al., Nat Commun, 2022). Subsequently, we assessed the secretion of MmpE-Flag in the culture supernatant, using Ag85B as a positive control and GlpX as a negative control (New Figure S1C). The results showed that GlpX was not detected in the supernatant, while MmpE and Ag85B were detected, indicating that MmpE is indeed a secreted protein in BCG.

      Demonstration that the protein localises to the host cell nucleus upon infection: Perform an infection followed by immunofluorescence to demonstrate that the endogenous protein of BCG can translocate to the host cell nucleus. This should be done for an NLS1-2 mutant expressing cell also.

      We thank the reviewer for the suggestion. We agree that this experiment would be helpful to further verify the ability of MmpE for nuclear import. However, MmpE specific antibody is not available for us for immunofluorescence experiment. Alternatively, we performed nuclear-cytoplasmic fractionation for the THP-1 cells infected with the M. bovis BCG strains expressing FLAG-tagged wild-type MmpE, as well as NLS deletion mutants (MmpE<sup>ΔNLS1</sup>, MmpE<sup>ΔNLS2</sup>, and MmpE<sup>ΔNLS1-2</sup>). The WT MmpE is detectable in both cytoplasmic and nuclear compartments, while MmpE<sup>ΔNLS1</sup>, MmpE<sup>ΔNLS2</sup> or MmpE<sup>ΔNLS1-2</sup> were almost undetectable in nuclear fractions (New Figure S1D), suggesting that both NLS motifs are necessary for nuclear import.

      (2) In the RNA-seq analysis, the directionality of change of each of the reported pathways is not apparent in the way the data have been presented. For example, are genes in the cytokine-cytokine receptor interaction or TNF signalling pathway expressed more, or less in the ∆mmpE strain?

      We thank the reviewer for the comment. The KEGG pathway enrichment diagrams in our RNA-seq analysis primarily reflect the statistical significance of pathway enrichment based on differentially expressed genes, but do not indicate the directionality of genes expression changes. To address this concern, we conducted qRT-PCR on genes associated with the cytokine-cytokine receptor interaction pathway, specifically IL23A, CSF2, and IL12B. The results showed that, compared to the WT strain, infection with the ΔMmpE strain resulted in significantly increased expression levels of these genes in THP-1 cells (Figure 4F, Figure S4B), consistent with the RNA-seq data. Furthermore, we have submitted the complete RNA-seq dataset to the NCBI GEO repository [GSE312039], which includes normalized expression values and differential expression results for all detected genes.

      (3) Several of these pathways are affected as a result of infection, while others are not induced by BCG infection. For example, BCG infection does not, on its own, produce changes in IL1β levels. As the author s did not compare the uninfected macrophages as a control, it is difficult to interpret whether ∆mmpE induced higher expression than the WT strain, or simply did not induce a gene while the WT strain suppressed expression of a gene. This is particularly important because the strain is attenuated. Does the attenuation have anything to do with the ability of the protein to induce lysosomal pathway genes? Does induction of this pathway lead to attenuation of the strain? Similarly, for pathways that seem to be downregulated in the ∆mmpE strain compared to the WT strain, these might have been induced upon infection with the WT strain but not sufficiently by the ∆mmpE strain due to its attenuation/ lower bacterial burden.

      We thank the reviewer for the comment. Previous studies have shown that wild-type BCG induces relatively low levels of IL-1β, while retaining partial capacity to activate the inflammasome (Qu et al., Sci Adv, 2020). Our data (Figures 3G) show that infection with the ΔMmpE strain results in enhanced IL-1β expression, consistent with findings by Master et al. (Cell Host Microbe, 2008), in which deletion of zmp1 in BCG or M. tuberculosis led to increased IL-1β levels due to reduced inhibition of inflammasome activation.

      In the revised manuscript, we have provided additional qRT-PCR data using uninfected macrophages as a baseline control. These results demonstrate that the WT strain suppresses lysosome-associated gene expression, whereas the ΔMmpE strain upregulates these genes, indicating that MmpE inhibits lysosome-related genes expression (Figure 4G). Furthermore, bacterial burden analysis revealed that ∆mmpE exhibited ~3-fold lower intracellular survival than the WT strain in THP-1 cells. However, when lysosomal maturation was inhibited, the difference in bacterial load between the two strains was reduced to ~1-fold (New Figures S6B and C). These findings indicate that MmpE promotes intracellular survival primarily by inhibiting lysosomal maturation, which is consistent with a previous study (Chandra et al., Sci Rep, 2015).

      (4) CHIP-seq should be performed in THP1 macrophages, and not in HEK293T. Overexpression of a nuclear-localised protein in a non-relevant line is likely to lead to several transcriptional changes that do not inform us of the role of the gene as a transcriptional regulator during infection.

      We thank the reviewer for the comment. We performed ChIP-seq in HEK293T cells based on their high transfection efficiency, robust nuclear protein expression, and well-annotated genome (Lampe et al., Nat Biotechnol, 2024; Marasco et al., Cell, 2022). These characteristics make HEK293T an ideal system for the initial identification of genome-wide chromatin binding profiles by MmpE.

      Further, we performed comprehensive validation of the ChIP-seq findings in THP-1 macrophages. First, CUT&Tag and RNA-seq analyses in THP-1 cells revealed that MmpE modulates genes involved in the PI3K–AKT signaling and lysosomal maturation pathways (Figure 4C; Figure S5A-B). Correspondingly, we found that infection with the ΔMmpE strain led to reduced phosphorylation of AKT (S473), mTOR (S2448), and p70S6K (T389) (New Figure 5E-F), and upregulation of lysosomal genes such as TFEB, LAMP1, and LAMP2 (Figure 4G), compared to infection with the WT strain, and lysosomal maturation in cells infected with the ΔMmpE strain more obviously (New Figure 5G). Additionally, CUT&Tag profiling identified MmpE binding at the promoter region of the VDR gene, which was further validated by EMSA and ChIP-qPCR. Also, qRT-PCR demonstrated that MmpE suppresses VDR transcription, supporting its role as a transcriptional regulator (Figure 6). Collectively, these data confirm the biological relevance and functional significance of the ChIP-seq findings obtained in HEK293T cells.

      (5) I would not expect to see such large inflammatory reactions persisting 56 days post-infection with M. bovis BCG. Is this something peculiar for an intratracheal infection with 1x107 bacilli? For images of animal tissue, the authors should provide images of the entire lung lobe with the zoomed-in image indicated as an inset.

      We thank the reviewer for the comment. The lung inflammation peaked at days 21–28 and had clearly subsided by day 56 across all groups (New Figure 7B), consistent with the expected resolution of immune responses to an attenuated strain like M. bovis BCG. This temporal pattern is in line with previous studies using intravenous or intratracheal BCG vaccination in mice and macaques, which also demonstrated robust early immune activation followed by resolution over time (Smith et al., Nat Microbiol, 2025; Darrah et al., Nature, 2020).

      In this study, the infectious dose (1×10<sup>7</sup> CFU intratracheal) was selected based on previous studies in which intratracheal delivery of 1×10<sup>7</sup> CFU produced consistent and measurable lung immune responses and pathology without causing overt illness or mortality (Xu et al., Sci Rep, 2017; Niroula et al., Sci Rep, 2025). We have provided whole-lung lobe images with zoomed-in insets in the source dataset.

      (6) For the qRT-PCR based validation, infections should be performed with the MmpE-complemented strain in the same experiments as those for the WT and ∆mmpE strain so that they can be on the same graph, in the main manuscript file. Supplementary Figure 4 has three complementary strains. Again, the absence of the uninfected, WT, and ∆mmpE infected condition makes interpretation of these data very difficult.

      We thank the reviewer for the comment. As suggested, we have conducted the qRT-PCR experiment including the uninfected, WT, ∆mmpE, Comp-MmpE, and the three complementary strains infecting THP-1 cells (Figure 4F and G; New Figure S4B–D).

      (7) The abstract mentions that MmpE represses the PI3K-Akt-mTOR pathway, which arrests phagosome maturation. There is not enough data in this manuscript in support of this claim. Supplementary Figure 5 does provide qRT-PCR validation of genes of this pathway, but the data do not indicate that higher expression of these pathways, whether by VDR repression or otherwise, is driving the growth restriction of the ∆mmpE strain.

      We thank the reviewer for the comment. In the updated manuscript, we have provided more evidence. First, the RNA-seq analysis indicated that MmpE affects the PI3K-AKT signaling pathway (Figure 4C). Second, CUT&Tag analysis suggested that MmpE binds to the promoter regions of key pathway components, including PRKCBPLCG2, and PIK3CB (Figure S5A). Third, confocal microscopy showed that ΔMmpE strain promotes significantly increased lysosomal maturation compared to the WT, a process downstream of the PI3K-AKT-mTOR axis (New Figure 5G).

      Further, we measured protein phosphorylation for validating activation of the pathway (Zhang et al., Stem Cell Reports, 2017). Our results showed that cells infected with WT strains exhibited significantly higher phosphorylation of Akt, mTOR, and p70S6K compared to those infected with ΔMmpE strains (New Figures 5E and F). Moreover, the dual PI3K/mTOR inhibitor BEZ235 abolished the survival advantage of WT strains over ΔMmpE mutants in THP-1 macrophages (New Figure S6B and C). Collectively, these results support that MmpE activates the PI3K–Akt–mTOR signaling pathway to enhance bacterial survival within the host.

      (8) The relevance of the NLS and the phosphatase activity is not completely clear in the CFU assays and in the gene expression data. Firstly, there needs to be immunoblot data provided for the expression and secretion of the NLS-deficient and phosphatase mutants. Secondly, CFU data in Figure 3A, C, and E must consistently include both the WT and ∆mmpE strain.

      We thank the reviewer for the comment. We have now added immunoblot analysis for expression and secretion of MmpE mutants. The result show that NLS-deficient and phosphatase mutants can detected in supernatant (New Figure S1C). Additionally, we have revised Figures 3A, 3C, and 3E to consistently include both the WT and ΔMmpE strains in the CFU assays (Figures 3A, 3C, and 3E).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors should attempt to address the following comments:

      (1) Please perform densitometric analysis for the western blot shown in Figure 1E.

      We sincerely thank the reviewer for the suggestion. In the updated manuscript, we have performed densitometric analysis of the western blot shown in New Figure 1F and G.

      (2) Is it possible to measure the protein levels for MmpE in lysates prepared from infected macrophages.

      We thank the reviewer for the comment. In the revised manuscript, we performed immunoblot analysis to measure MmpE levels in lysates from infected macrophages. The results demonstrated that wild-type MmpE was present in both the cytoplasmic and nuclear fractions during infection in THP-1 cells (New Figure S1D).

      (3) The authors should perform circular dichroism studies to compare the secondary structure of wild type and mutant proteins (in particular MmpEHis348 and MmpEAsn359.

      We thank the reviewer for this valuable suggestion. We agree that circular dichroism spectroscopy could provide useful information in comparison of the differences on the secondary structures. However, due to the technical limitations, we instead compared the structures of wild-type MmpE and the His348 and Asn359 mutant proteins predicted by AlphaFold. These structural models showed almost no differences in secondary structures between the wild-type and mutants (Figure S1B).

      (4) The authors should perform more experiments to determine the binding motif for MmpE in the promoter region of VDR.

      We thank the reviewer for this suggestion. In the current study, we have identified the MmpE-binding motif within the promoter region of VDR using CUT&Tag sequencing. This prediction was further validated by ChIP-qPCR and EMSA (Figure 6). These complementary approaches collectively support the identification of a specific MmpE-binding motif and demonstrate its functional relevance. Such approach was acceptable in many publications (Wen et al., Commun Biol, 2020; Li et al., Nat Commun, 2022).

      (5) Were the transcript levels of VDR also measured in the lung tissues of infected animals?

      We thank the reviewer for this suggestion. In the revised manuscript, we have performed qRT-PCR to assess VDR transcript levels in the lung tissues of infected mice (New Figure S8B).

      (6) How does MmpE regulate the expression of lysosome-associated genes?

      We thank the reviewer for this question. Our experiments suggested that MmpE suppresses lysosomal maturation probably by activating the host PI3K–AKT–mTOR signaling pathway (New Figure 5E–I). This pathway is well established as a negative regulator of lysosome biogenesis and function (Yang et al., Signal Transduct Target Ther, 2020; Cui et al., Nature, 2023; Cui et al., Nature, 2025). During infection, THP-1 cells infected with the WT showed increased phosphorylation of Akt, mTOR, and p70S6K compared to those infected with ΔMmpE (New Figure S5C, New Figure 5E and F), and concurrently downregulated key lysosomal maturation markers, including TFEB, LAMP1, LAMP2, and multiple V-ATPase subunits (Figure 4G). Given that PI3K–AKT–mTOR signaling suppresses TFEB activity and lysosomal gene transcription (Palmieri et al., Nat Commun, 2017), we propose that MmpE modulates lysosome-associated gene expression and lysosomal function probably by PI3K–AKT–mTOR signaling pathway.

      (7) Mice experiment:

      (a) The methods section states that mice were infected intranasally, but the legend for Figure 6 states intratracheally. Kindly check?

      (b) Supplementary Figure 7 - this is not clear. The legend says bacterial loads in spleens (CFU/g) instead of DNA expression, as shown in the figure.

      (c) The data in Figure 6 and Figure S7 seem to be derived from the same experiment, but the number of animals is different. In Figure 6, it is n = 6, and in Figure S7, it is n=3.

      We thank the reviewer for the comments.

      (a) The infection was performed intranasally, and the figure legend for New Figure 7 has now been corrected.

      (b) We adopted quantitative PCR method to measure bacterial DNA levels in the spleens of infected mice. We have now revised the legend.

      (c) We have conducted new experiments where each experiment now includes six mice. The results are showed in Figure 7B and C, as well as in the new Figure S8.

      (8) The authors should show individual values for various replicates in bar graphs (for all figures).

      We thank the reviewer for this helpful suggestion. We have now updated all relevant bar graphs to include individual data points for each biological replicate.

      (9) The authors should validate the relative levels of a few DEGs shown in Figure 3F, Figure 3G, and Figure S4C, in the lung tissues of mice infected with wild-type, mutant, and complemented strains.

      We thank the reviewer for this suggestion. In the revised manuscript, we have performed qRT-PCR to validate the expression levels of selected DEGs, including inflammation-related and lysosome-associated genes, in lung tissues from mice infected with wild-type, mutant, and complemented strains (New Figure S8C-H).

      (10) Did the authors perform an animal experiment using a mutant strain complemented with the phosphatase-deficient MmpE (Comp-MmpE-H348AN359H)?

      We appreciate the reviewer's comment. We agree that an additional animal experiment would be useful to assess the effects of the phosphatase. However, our study mainly focused on interpreting the function of the nuclear localization of MmpE during BCG infection. Additionally, we have assessed the role of the phosphatase of MmpE during infection with cell model (Figure 3E).

      Minor comment:

      The mutant strain should be verified by either Southern blot or whole genome sequencing.

      We thank the reviewer for this comment. We verified deletion of mmpE gene by PCR method (Figure S3A-D) which was acceptable in many publications (Zhang et al., PLoS Pathog, 2020; Zhang et al., Nat Commun, 2022).

      Reviewer #3 (Recommendations for the authors):

      (1) Line 195: cytokine.

      We thank the reviewer for the comments. We have now corrected it.

      (2) Line 225: rewording required.

      Corrected.

      (3) Figure 4A. "No difference" instead of "No different".

      Corrected.

      (4) "KommpE" should be replaced with "∆mmpE strain" (∆=delta symbol).

      Corrected.

      (5) Supplementary Figure 7. The figure legend states CFU assays, but the y-axis and the graph seem to depict IS1081 quantification.

      We thank the reviewer for the comment. The figure is based on IS1081 quantification using qRT-PCR, not CFU assays. We have now revised the legend for New Figure S8A.

      References

      Chandra P, Ghanwat S, Matta SK, Yadav SS, Mehta M, Siddiqui Z, Singh A, Kumar D (2015) Mycobacterium tuberculosis Inhibits RAB7 Recruitment to Selectively Modulate Autophagy Flux in Macrophages Sci Rep 5:16320.

      Darrah PA, Zeppa JJ, Maiello P, Hackney JA, Wadsworth MH 2nd, Hughes TK, Pokkali S, Swanson PA 2nd, Grant NL, Rodgers MA, Kamath M, Causgrove CM, Laddy DJ, Bonavia A, Casimiro D, Lin PL, Klein E, White AG, Scanga CA, Shalek AK, Roederer M, Flynn JL, Seder RA (2020) Prevention of tuberculosis in macaques after intravenous BCG immunization Nature 577:95-102. 

      Forrellad MA, Blanco FC, Marrero Diaz de Villegas R, Vázquez CL, Yaneff A, García EA, Gutierrez MG, Durán R, Villarino A, Bigi F (2020) Rv2577 of Mycobacterium tuberculosis Is a virulence factor with dual phosphatase and phosphodiesterase functions Front Microbiol 11:570794.

      Innokentev A, Sanchez AM, Monetti M, Schwer B, Shuman S (2025) Efn1 and Efn2 are extracellular 5'-nucleotidases induced during the fission yeast response to phosphate starvation mBio 16: e0299224.

      Keppetipola N, Shuman S (2008) A phosphate-binding histidine of binuclear metallophosphodiesterase enzymes is a determinant of 2',3'-cyclic nucleotide phosphodiesterase activity J Biol Chem 283:30942-9.

      Lampe GD, King RT, Halpin-Healy TS, Klompe SE, Hogan MI, Vo PLH, Tang S, Chavez A, Sternberg SH (2024) Targeted DNA integration in human cells without double-strand breaks using CRISPR-associated transposases Nat Biotechnol 42:87-98.

      Li Z, Sheerin DJ, von Roepenack-Lahaye E, Stahl M, Hiltbrunner A (2022) The phytochrome interacting proteins ERF55 and ERF58 repress light-induced seed germination in Arabidopsis thaliana Nat Commun 13:1656.

      Marasco LE, Dujardin G, Sousa-Luís R, Liu YH, Stigliano JN, Nomakuchi T, Proudfoot NJ, Krainer AR, Kornblihtt AR (2022) Counteracting chromatin effects of a splicing-correcting antisense oligonucleotide improves its therapeutic efficacy in spinal muscular atrophy Cell 185:2057-2070.e15.

      Martins WK, Santos NF, Rocha CS, Bacellar IOL, Tsubone TM, Viotto AC, Matsukuma AY, Abrantes ABP, Siani P, Dias LG, Baptista MS (2019) Parallel damage in mitochondria and lysosomes is an efficient way to photoinduce cell death Autophagy 15:259-279.

      Master SS, Rampini SK, Davis AS, Keller C, Ehlers S, Springer B, Timmins GS, Sander P, Deretic V (2008) Mycobacterium tuberculosis prevents inflammasome activation Cell Host Microbe 3:224-32.

      Matange N, Podobnik M, Visweswariah SS (2015) Metallophosphoesterases: structural fidelity with functional promiscuity Biochem J 467:201-16.

      Niroula N, Ghodasara P, Marreros N, Fuller B, Sanderson H, Zriba S, Walker S, Shury TK, Chen JM (2025) Orally administered live BCG and heat-inactivated Mycobacterium bovis protect bison against experimental bovine tuberculosis Sci Rep 15:3764.

      Palmieri M, Pal R, Nelvagal HR, Lotfi P, Stinnett GR, Seymour ML, Chaudhury A, Bajaj L, Bondar VV, Bremner L, Saleem U, Tse DY, Sanagasetti D, Wu SM, Neilson JR, Pereira FA, Pautler RG, Rodney GG, Cooper JD, Sardiello M (2017) mTORC1-independent TFEB activation via Akt inhibition promotes cellular clearance in neurodegenerative storage diseases Nat Commun 8:14338.

      Péan CB, Schiebler M, Tan SW, Sharrock JA, Kierdorf K, Brown KP, Maserumule MC, Menezes S, Pilátová M, Bronda K, Guermonprez P, Stramer BM, Andres Floto R, Dionne MS (2017) Regulation of phagocyte triglyceride by a STAT-ATG2 pathway controls mycobacterial infection Nat Commun 8:14642.

      Qu Z, Zhou J, Zhou Y, Xie Y, Jiang Y, Wu J, Luo Z, Liu G, Yin L, Zhang XL (2020) Mycobacterial EST12 activates a RACK1-NLRP3-gasdermin D pyroptosis-IL-1β immune pathway Sci Adv 6: eaba4733.

      Shenoy AR, Capuder M, Draskovic P, Lamba D, Visweswariah SS, Podobnik M (2007) Structural and biochemical analysis of the Rv0805 cyclic nucleotide phosphodiesterase from Mycobacterium tuberculosis J Mol Biol 365:211-25.

      Smith AA, Su H, Wallach J, Liu Y, Maiello P, Borish HJ, Winchell C, Simonson AW, Lin PL, Rodgers M, Fillmore D, Sakal J, Lin K, Vinette V, Schnappinger D, Ehrt S, Flynn JL (2025) A BCG kill switch strain protects against Mycobacterium tuberculosis in mice and non-human primates with improved safety and immunogenicity Nat Microbiol 10:468-481.

      Wang J, Ge P, Qiang L, Tian F, Zhao D, Chai Q, Zhu M, Zhou R, Meng G, Iwakura Y, Gao GF, Liu CH (2017) The mycobacterial phosphatase PtpA regulates the expression of host genes and promotes cell proliferation Nat Commun 8:244.

      Wang J, Li BX, Ge PP, Li J, Wang Q, Gao GF, Qiu XB, Liu CH (2015) Mycobacterium tuberculosis suppresses innate immunity by coopting the host ubiquitin system Nat Immunol 16:237–245

      Wen X, Wang J, Zhang D, Ding Y, Ji X, Tan Z, Wang Y (2020) Reverse Chromatin Immunoprecipitation (R-ChIP) enables investigation of the upstream regulators of plant genes Commun Biol 3:770.

      Xu X, Lu X, Dong X, Luo Y, Wang Q, Liu X, Fu J, Zhang Y, Zhu B, Ma X (2017) Effects of hMASP-2 on the formation of BCG infection-induced granuloma in the lungs of BALB/c mice Sci Rep 7:2300.

      Zhang L, Hendrickson RC, Meikle V, Lefkowitz EJ, Ioerger TR, Niederweis M. (2020) Comprehensive analysis of iron utilization by Mycobacterium tuberculosis PLoS Pathog 16: e1008337.

      Zhang L, Kent JE, Whitaker M, Young DC, Herrmann D, Aleshin AE, Ko YH, Cingolani G, Saad JS, Moody DB, Marassi FM, Ehrt S, Niederweis M (2022) A periplasmic cinched protein is required for siderophore secretion and virulence of Mycobacterium tuberculosis Nat Commun 13:2255.

      Zhang X, He X, Li Q, Kong X, Ou Z, Zhang L, Gong Z, Long D, Li J, Zhang M, Ji W, Zhang W, Xu L, Xuan A (2017) PI3K/AKT/mTOR Signaling Mediates Valproic Acid-Induced Neuronal Differentiation of Neural Stem Cells through Epigenetic Modifications Stem Cell Reports 8:1256-1269.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the role of E2 ubiquitin enzyme, Uev1a in tissue resistance to oncogenic RasV12 in Drosophila melanogaster polyploid germline cells and human cancer cell lines. The incomplete evidence suggests that Uev1a works with the E3 ligase APC/C to degrade Cyclin A, and the strength of evidence could be increased by addressing the expression of CycA in the ovaries and the uev1a loss of function in human cancer cells. This work would be of interest to researchers in germline biology and cancer.

      Thank you for your valuable assessment. The requested data on CycA expression (Figure 4E-G) and uev1a loss-of-function in human cancer cells (Figure 8 and Figure 8-figure supplement 2) have been added to the revised manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study uncovers a protective role of the ubiquitin-conjugating enzyme variant Uev1A in mitigating cell death caused by over-expressed oncogenic Ras in polyploid Drosophila nurse cells and by RasK12 in diploid human tumor cell lines. The authors previously showed that overexpression of oncogenic Ras induces death in nurse cells, and now they perform a deficiency screen for modifiers. They identified Uev1A as a suppressor of this Ras-induced cell death. Using genetics and biochemistry, the authors found that Uev1A collaborates with the APC/C E3 ubiquitin ligase complex to promote proteasomal degradation of Cyclin A. This function of Uev1A appears to extend to diploid cells, where its human homologs UBE2V1 and UBE2V2 suppress oncogenic Ras-dependent phenotypes in human colorectal cancer cells in vitro and in xenografts in mice.

      Strengths:

      (1) Most of the data is supported by a sufficient sample size and appropriate statistics.

      (2) Good mix of genetics and biochemistry.

      (3) Generation of new transgenes and Drosophila alleles that will be beneficial for the community.

      We greatly appreciate your comments.

      Weaknesses:

      (1) Phenotypes are based on artificial overexpression. It is not clear whether these results are relevant to normal physiology.

      Downregulation of Uev1A, Ben, and Cdc27 together significantly increased the incidence of dying nurse cells in normal ovaries (Figure 5-figure supplement 2), indicating that the mechanism we uncovered also protects nurse cells from death during normal oogenesis.

      (2) The phenotype of "degenerating ovaries" is very broad, and the study is not focused on phenotypes at the cellular level. Furthermore, no information is provided in the Materials and Methods on how degenerating ovaries are scored, despite this being the most important assay in the study.

      Thank you for pointing out this issue. We quantified the phenotype of nurse cell death using “degrading/total egg chambers per ovary”, not “degenerating ovaries”. Normal nurse cell nuclei exhibit a large, round morphology in DAPI staining (see the first panel in Figure 1D). During early death, they become disorganized and begin to condense and fragment (see the second panel in Figure 1D). In late-stage death, they are completely fragmented into small, spherical structures (see the third panel in Figure 1D), making cellular-level phenotypic quantification impossible. Since all nurse cells within the same egg chamber are interconnected, their death process is synchronous. Thus, quantifying the phenotype at the egg-chamber level is more practical than at the cellular level. We have added the description of this death phenotype and its quantification to the main text (Lines 104-108).

      (3) In Figure 5, the authors want to conclude that uev1a is a tumor-suppressor, and so they over-express ubev1/2 in human cancer cell lines that have RasK12 and find reduced proliferation, colony formation, and xenograft size. However, genes that act as tumor suppressors have loss-of-function phenotypes that allow for increased cell division. The Drosophila uev1a mutant is viable and fertile, suggesting that it is not a tumor suppressor in flies. Additionally, they do not deplete human ubev1/2 from human cancer cell lines and assess whether this increases cell division, colony formation, and xenograph growth.

      We apologize for any misleading description. We aimed to demonstrate that UBE2V1/2, like Uev1A in Drosophilanos>Ras<sup>G12V</sup>+bam-RNAi” germline tumors, suppress oncogenic KRAS-driven overgrowth in diploid human cancer cells. Importantly, this function of Uev1A and UBE2V1/2 is dependent on Ras-driven tumors; there is no evidence that they act as broad tumor suppressors in the absence of oncogenic Ras. Drosophila uev1a mutants were lethal, not viable (see Lines 135-137), and germline-specific knockdown of uev1a (nos>uev1a-RNAi) caused female sterility without inducing tumors. These findings suggest that Uev1A lacks tumor-suppressive activity in the Drosophila female germline in the absence of Ras-driven tumors. We have revised the manuscript to prevent misinterpretation. Furthermore, we have added data demonstrating that the combined knockdown of UBE2V1 and UBE2V2 significantly promotes the growth of KRAS-mutant human cancer cells, as suggested (Figure 8 and Figure 8-figure supplement 2).

      (4) A critical part of the model does not make sense. CycA is a key part of their model, but they do not show CycA protein expression in WT egg chambers or in their over-expression models (nos.RasV12 or bam>RasV12). Based on Lilly and Spradling 1996, Cyclin A is not expressed in germ cells in region 2-3 of the germarium; whether CycA is expressed in nurse cells in later egg chambers is not shown but is critical to document comprehensively.

      We appreciate your critical comment. CycA is a key cyclin that partners with Cdk1 to promote cell division (Edgar and Lehner, 1996). Notably, nurse cells are post-mitotic endocycling cells (Hammond and Laird, 1985) and typically do not express CycA (Lilly and Spradling, 1996) (see the last sentence, page 2518, paragraph 3 in this 1996 paper). However, their death induced by oncogenic Ras<sup>G12V</sup> is significantly suppressed by monoallelic deletion of either cycA or cdk1 (Zhang et al., 2024). Conversely, ectopic CycA expression in nurse cells triggers their death (Figure 4C, D). These findings suggest that polyploid nurse cells exhibit high sensitivity to aberrant division-promoting stress, which may represent a distinct form of cellular stress unique to polyploid cells. In the revised manuscript, we have provided the CycA-staining data, comparing its expression in normal nurse cells versus cells undergoing oncogenic Ras<sup>G12V</sup>-induced death (Figure 4E-G).

      (5) The authors should provide more information about the knowledge base of uev1a and its homologs in the introduction.

      Thank you for your suggestion. In the revised introduction, we have provided a more detailed description of Uev1A (Lines 72-79). Additionally, we have introduced its human homologs, UBE2V1 and UBE2V2, in the main text (Lines 143-145).

      Reviewer #2 (Public review):

      Summary:

      The authors performed a genetic screen using deficiency lines and identified Uev1a as a factor that protects nurse cells from RasG12V-induced cell death. According to a previous study from the same lab, this cell death is caused by aberrant mitotic stress due to CycA upregulation (Zhang et al.). This paper further reveals that Uev1a forms a complex with APC/C to promote proteasome-mediated degradation of CycA.

      In addition to polyploid nurse cells, the authors also examined the effect of RasG12V-overexpression in diploid germline cells, where RasG12V-overexpression triggers active proliferation, not cell death. Uev1a was found to suppress its overgrowth as well.

      Finally, the authors show that the overexpression of the human homologs, UBE2V1 and UBE2V2, suppresses tumor growth in human colorectal cancer xenografts and cell lines. Notably, the expression of these genes correlates with the survival of colorectal cancer patients carrying the Ras mutation.

      Strength:

      This paper presents a significant finding that UBE2V1/2 may serve as a potential therapy for cancers harboring Ras mutations. The authors propose a fascinating mechanism in which Uev1a forms a complex with APC/C to inhibit aberrant cell cycle progression.

      We greatly appreciate your comments.

      Weakness:

      The quantification of some crucial experiments lacks sufficient clarity.

      Thank you for highlighting this issue. We have provided more details regarding the quantification data in the revised manuscript.

      References

      Edgar, B.A., and Lehner, C.F. (1996). Developmental control of cell cycle regulators: a fly's perspective. Science 274, 1646-1652.

      Hammond, M.P., and Laird, C.D. (1985). Chromosome structure and DNA replication in nurse and follicle cells of Drosophila melanogaster. Chromosoma 91, 267-278.

      Lilly, M.A., and Spradling, A.C. (1996). The Drosophila endocycle is controlled by Cyclin E and lacks a checkpoint ensuring S-phase completion. Genes Dev 10, 2514-2526.

      Zhang, Q., Wang, Y., Bu, Z., Zhang, Y., Zhang, Q., Li, L., Yan, L., Wang, Y., and Zhao, S. (2024). Ras promotes germline stem cell division in Drosophila ovaries. Stem Cell Reports 19, 1205-1216.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The figure legends insufficiently describe the figures. One example is Figure 3, where there are no details in the figure legend about what conditions apply to each panel and each lane of the gels.

      For clarity and brevity, detailed experimental conditions are described in the Materials and Methods section. Figure legends therefore focus on summarizing the key findings. Thank you for your understanding!

      (2) The font size on the figure is too small.

      Thank you for your constructive suggestion. In response, we have enlarged all font sizes to improve readability.

      (3) There are places where the authors overstate their results, and there are issues with the clarity of the text:

      (3a) Lines 170: "excessive" is not appropriate. Their prior study showed a mild increase in proliferation.

      “Excessive” has been removed in the revised manuscript (Lines 215-216).

      (3b) Line 187-8: The authors should restate this sentence. Here's a possibility. Over-expression of Uev1a suppressed the phenotypes caused by CycA over-expression.

      This sentence has been restated as “Notably, this cell death was suppressed by co-overexpression of CycA and Uev1A, indicating a genetic interaction between them”. (Lines 229-231).

      (3c) Lines 266-7: The properties of Uev1a (ie, lacking a conserved Cys) should be in the introduction.

      This information has been added to the revised introduction (Lines 74-76).

      (3d) Line 318: "markedly" is an overstatement of the prior results.

      Our quantification data revealed that “nos>Ras<sup>G12V</sup>; bam<sup>-/-</sup>” ovaries are three times larger than “nos>GFP; bam<sup>-/-</sup>” control ovaries (see Figure 4A-C in Zhang et al., Stem Cell Reports 19, 1205-1216). Given this substantial difference, we think that using "markedly" is not an overstatement.

      (4) Data not shown occurs in a few places in the text. Given the ability to supply supplemental information in eLife preprints, these data should be shown.

      Thanks for your suggestion. All “not shown” data have been added to the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      Major Comments

      (1) Cyclin A (CycA) is a key player in this study, but the authors do not provide evidence showing the upregulation of CycA following Ras overexpression in either polyploid or diploid cells. Data on CycA expression should be included.

      Thank you for your constructive suggestion. These data have been added to the revised manuscript (Figure 4E-G).

      (2) DNA replication stress, cellular senescence, and cell death should be assessed under Ras overexpression (RasOE) and RasOE + Uev1A RNAi conditions to support the model proposed in Figure 4F.

      We apologize for any confusion caused by our initial model. We do not have evidence that DNA replication stress and cellular senescence occur under these conditions. Cell death can be readily detected through the presence of fragmented nuclei and condensed DNA (see Figure 1D). The model has been updated accordingly (Figure 9E).

      (3) Appropriate controls should be performed alongside the experimental sets. The same nos>Ras+GFPi data set was repeatedly used in Figures 1I, 2B, 2H, and Figures 2, S2B, which is not ideal.

      All these experiments were performed under identical conditions. Therefore, we deem it appropriate to use the same control data across these analyses.

      (4) Overall, the microscopic images are too small and hard to see.

      Thank you for raising this important point. In the revised manuscript, all images and the font size on figures have been enlarged for improved clarity.

      (5) Figure 1H

      Why is the frequency of egg chamber degradation quite less in nos>RasG12V+GFP-RNAi (about 40%) than nos > RasG12V (about 80%)? And the authors do not show that there is a significant difference between those two conditions, although it should be there. We will need the explanation from the authors on why there is a difference here.

      These overexpression experiments were conducted using the GAL4/UAS system. While both “nos>Ras<sup>G12V</sup>+GFP-RNAi” and “nos>Ras<sup>G12V</sup>” contain a single nos-GAL4 driver, they differ in UAS copy number: the former incorporates two UAS elements compared to only one in the latter (see the detailed genotypes in Source data 2). These results demonstrate that UAS copy number impacts experimental outcomes in our system.

      In the previous paper (Zhang et al. (2024), Figure 7H shows that the frequency of egg chambers in nos>RasG12V is 33%, although this paper shows it as about 80%. There seems to be a difference in flies' age (previous paper: 7d, this paper: 3d), but this data raises the question of why nos>RasG12V shows more egg chamber degradation this time.

      We greatly appreciate your careful observation. The nurse-cell-death phenotype exhibits a spectrum from mild to severe manifestations [see Figure 1D and our response to weekness (2) in Reviewer #1’s public reviews]. While our 2024 paper exclusively quantified egg chambers with severe phenotypes as degrading, the current study included both mild and severe cases in this classification. We do not think fly age could account for this substantial phenotypic difference. A detailed description of the nurse-cell-death phenotype and its quantification have been added to the revised manuscript (Lines 104-108).

      In the following experiments, only nos>RasG12V+GFP-RNAi is used as a control (Figures 2B, H, S2B). I wonder if these results would give us a different conclusion if nos>RasG12V were used as a control.

      As explained above, the UAS copy number does matter in our analyses, so it is important to keep them identical for comparison.

      (6) In the abstract, the authors mention that uev1a is an intrinsic factor to protect cells from RasG12V-induced cell death. RasG12V does not induce much cell death of cystocytes with bam-gal4, whereas it induces a lot of nurse cells' death. Does it mean the intrinsic expression level of uev1a is low in nurse cells (or polyploid cells) compared to cystocytes (or diploid cells)?

      Overexpression of Ras<sup>G12V</sup> driven by bam-GAL4 exhibited only minimal nurse cell death (Figure 1D, E). Additionally, Uev1A exhibited low intrinsic expression levels in both cystocytes and nurse cells (Figure 3E and Figure 5-figure supplement 1).

      (7) Is uev1a-RNAi alone sufficient to induce egg chamber degradation? Or does it have any effect on ovarian development? (Related to question #1 in minor comments)

      While nos>uev1a-RNAi resulted in female sterility, it alone was insufficient to induce egg chamber degradation. However, simultaneous downregulation of Uev1A, Ben, and Cdc27 triggered significant egg chamber degradation (Figure 5-figure supplement 2).

      (8) Which stages of egg chambers get degraded with RasG12V induction?

      This is a good question. In our analyses, we noted that degrading egg chambers exhibited considerable size variability (Figure 1D). Because degradation disrupts normal morphological cues, precise staging of these egg chambers is nearly impossible.

      (9) I suggest testing the cellular senescence marker as well if the authors mention that CycA-degradation by Uev1a-APC/C complex prevents cellular senescence induced by RasG12V in a schematic image of Figure 4 (e.g., Dap/p21, SA-β-gal).

      As addressed in our response to your Major Comment (2), we lacked experimental evidence to support cellular senescence in this context. We have therefore revised the model accordingly (Figure 9E). While this study focuses specifically on cell death, investigating potential roles of cellular senescence remains an important direction for future research. Thank you for your suggestion!

      Minor Comments

      (1) Figure 1D: Df#7584

      It seems that the late-stage egg chamber is missing in this condition. Why does this occur without egg chamber degradation? Is there a possibility that we do not see egg chamber degradation because this deficiency line does not have a properly developed egg chamber that can have a degradation?

      While this image represents only a single sample, we have confirmed the presence of late-stage egg chambers in other samples. If “Df#7584/+” females were unable to support late-stage egg chamber development, complete sterility would be expected due to the lack of mature eggs. However, as shown in this image (Figure 1D), the ovary contains mature eggs, and the “Df#7584/+” fly strain remains fertile.

      (2) Based on the results that DDR signaling functions as keeping egg chambers from degradation, the authors may be better to check the DNA-damage markers in nos>RasG12V, nos>RasG12V +uev1a. (e.g. γ-H2AX)

      Thank you for your constructive recommendation. These data have been added to the revised manuscript (Figure 3C).

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #2 (Public review):

      Points to be addressed:

      (1) As a statistical test, the authors report having used unpaired t-tests; however, often three groups are compared for which t-tests are inadequate. This is faulty as, amongst other things, it does not take multiple comparison testing into account.

      We have adopted the reviewers' suggestions and conducted a variance analysis (ANOVA) to reanalyze the experimental results with three or more different condition groups. At the same time, we have retained the t-test results for experiments with only two condition groups.

      (2) Both B-Actin and GAPDH seem to have been used for protein-level normalization. Why? The Figure 2HL first panel reports B-actin, whereas the other three report GAPDH. The same applies to Figures 3E-F, where both are shown, and it is not mentioned which of the two has been used. Moreso, uncropped blots seem to be unavailable as supplementary data for proper review. These should be provided as supplementary data.

      In Figures 2G and 3E-F, β-actin and GAPDH both have been used for protein level normalization. The main issue is the mixed use of these two housekeeping proteins, without taking consistency into account in advance. In addition, the expression levels of these two proteins show no significant differences in response to different fluid shear stresses. The uncropped blot images have been organized and provided in the supplementary data.

      (3) LSS and MSS were compared based on transcriptomic analysis. Conversely, RNA sequencing was not reported for the HSS. Why is this data missing? It would be valuable to assess transcriptomics following HSS, and also to allow transcriptomic comparison of LSS and HSS.

      In the current study, we have only conducted the transcriptomic comparative analysis between LSS and MSS conditions, mainly considering that most of current researches focuses on the endothelial dysfunction and atherosclerosis under LSS. Since our HSS condition is overall about 24 dyn/cm<sup>2</sup>, which is also recognized within the normal physiological range in some reports. Moreover, the transcriptomic data are primarily used to identify the targets in our study. Interestingly, for these selected genes, they share the same trend involved in endothelial cell ferroptosis induced by LSS and HSS. At the same time, we strongly agree with the reviewer’s claim that the RNA sequencing results under HSS are also valuable. Therefore, in the future, we are planning to perform the transcriptomic sequencing analysis under the HSS or higher level of shear stress, aiming to discover new insights.

      (4) Actual sample sizes should be reported rather than "three or more". Moreso, it would be beneficial to show individual datapoints in bar graphs rather than only mean with SD if sample sizes are below 10 (e.g., Figures 1B-H, Figure 2G, etc.).

      After rechecking our original data, All analyzed results were from three biological replicates, so they are uniformly marked as 'n=3' in the article. According to the reviewer's suggestion, the position of each data point has been added in the chart of the statistical results along with the standard deviation bars.

      (5) The authors claim that by modifying the thickness of the middle layer, shear stress could be modified, whilst claiming to keep on-site pressure within physiological ranges (approx. 70 mmHg) as a hallmark of their microfluidic devices. Has it been experimentally verified that pressures indeed remain around 70 mmHg.

      It is a very interesting question. In this article, the cross-sectional areas of different tunnel-like channel is related to the thickness of the middle layer, resulting in different level of shear stress. Since all flow rates under three conditions keep same at 1.6 ml/min, the average pressure is calculated to be around 70 mmHg based on our previously reported formula (PMID: 37662690). To address the reviewer's question about the actual pressure values, we used a water-filled tube connected to a chip and measured the height of the water surface in the elevated end relative to the chip position, as shown in the Author response image 1. As expected, when the height of the middle layer bulging to the same value (0.7 mm) as under the LSS condition, the water level reaches to 900 mm, which is corresponding to about 70 mmHg.

      Author response image 1.

      Schematic diagram of on-chip pressure detection

      (6) A coculture model (VSMC, EC, monocytes) is mentioned in the last part of the results section without any further information. Information on this model should be provided in the methods section (seeding, cell numbers, etc.). Moreover, comparison of LSS vs LSS+KLF6 OE and HSS vs HSS+KLF6 OE is shown. It would benefit the interpretation of the outcomes if MSS were also shown. It would also be beneficial to demonstrate differences between LSS, MSS, and HSS in this coculture model (without KLF6 OE).

      The specific methods for constructing the co-culture models (vascular smooth muscle cells, endothelial cells, monocytes) mentioned in the results section have been introduced in our previous paper. For the convenience for reading this article, we have added a brief description in the section of “Methods and materials” in this paper, including cell seeding and numbers. In this study, the results of LSS vs LSS+KLF6 OE and HSS vs HSS+KLF6 OE are presented to verify the role of KLF6 in LSS- or HSS-induced promotion of early atherosclerotic events. In our previously published paper (PMID: 37662690), we have showed the effects of three different shear stresses on the atherosclerotic events (shown in Fig. 4 in that paper). Those results have demonstrated that both LSS and HSS significantly promote early atherosclerotic events compared with the MSS.

      (7) The experiments were solely performed with a venous endothelial cell line (HUVECs). Was the use of an arterial endothelial cell line considered? It may translate better towards atherosclerosis, which occurs within arteries. HUVECs are not accustomed to the claimed near-physiological pressures.

      The human umbilical vein endothelial cell (HUVEC) is a commonly used cell line for many in vitro studies of vascular endothelium under fluid shear stress conditions. Although human arterial endothelial cells (HAECs) may be more suitable than HUVECs for responding to physiologically relevant pressure, HUVECs are more easy to obtain and maintain. However, we are going to order HAECs and will use them to validate the conclusion for the potential translatability.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Information on seeding of the microfluidic device is absent in the methods section (i.e., seeding, cell density, passage number, confluence, etc.). Moreso, treatment with Fer-1 is not reported in the methods section.

      We have described the cell seeding information in‘Preparation of cell culture in the microfluidic chip’ and the Fer-1 treatment in ‘Cell death assay’ in the Method section.

      (2) Figure 3F has "MSS", "HSS", and "LSS+KLF6" as groups on the x-axis; the latter should probably be "HSS+KLF6".

      Thank you for pointing out this error in Figure 3F. We have made the correction.

      (3) Data should be made available in online repositories rather than "making it available upon reasonable request". As it was not provided, the sequencing data could not be reviewed. In addition, it was stated that a preprint was available on BioRxiv, but I could not find it.

      Thank you for the suggestion. We have uploaded the RNA-seq data to the NCBI GEO database, which was publicly available on December 9, 2025.

    1. Author response:

      eLife Assessment

      Using genome databases, the authors performed solid bioinformatic analyses to trace the genomic history of the clinically relevant Staphylococcus aureus tetracycline resistance plasmid pT181 over the last seven decades. They discovered that this element has transitioned from a multicopy plasmid to a chromosomally integrated element, and the work represents a valuable demonstration of the use of publicly available data to investigate plasmid biology and inform clinical epidemiology. This work will appeal to researchers interested in staphylococcal evolution and plasmid biology.

      Thank you, we agree with this overview. We also think this work is interesting to people interested in antimicrobial resistance and bacterial genome structure.

      Public Reviews:

      Reviewer #1 (Public review):

      The study provides a robust bioinformatic characterization of the evolution of pT181. My main criticism of the work is the lack of experimental validation for the hypotheses proposed by the authors.

      Comments on the study:

      (1) One potential reason for the decline in pT181 copy number over time may be a high cost associated with the multicopy state. In this sense, it would be interesting if the authors could use (or construct) isogenic strains differing only in the state of the plasmid (multicopy/integrated). With this system, the authors could measure the fitness of the strains in the presence and absence of tetracycline, and they could be able to understand the benefit associated with the plasmid transition. The authors discuss these ideas, but it would be nice to test them.

      We agree that the relative fitness of integrated versus multicopy plasmids is interesting and a costly multicopy state could explain the transition of independent pT181 replicons to chromosomal integration. This is a project we are exploring for a future study. However, we think that this additional experimental work goes beyond the scope of the paper.

      (2) It would be interesting to know the transfer frequencies of the multicopy mobilizable pT181 plasmid, compared to the transfer frequency of the plasmid integrated into the SSCmec element (which can be co-transferred, integrated in conjugative plasmids, or by transduction).

      We agree with the reviewer that this is an interesting question. However, we think inferring these rates from natural sequence data is not feasible in this case given the low heterogeneity of the plasmid sequence. A laboratory-based experimental study could not address the real transfers we observe over the course of decades, as in vitro S. aureus transfer rates are often not good proxies for in vivo (McCarthy et al., 2014). In addition, we do not know what is moving the integrated plasmid. pT181 could be moved by a phage or plasmid, so we are uncertain what the correct experiment would be to explore this.

      (3) One important limitation of the study that should be mentioned is that inferring pT181 PCN from whole genome data can be problematic. For example, some DNA extraction methods may underestimate the copy number of small plasmids because the small, circular plasmids are preferentially depleted during the process (see, for example, https://www.nature.com/articles/srep28063).

      We will investigate this issue further in the revisions. The kits used to extract DNA for the earlier-collected samples may possibly yield more plasmid DNA relative to the chromosome compared to newer ones on average; however, we think this is not driving the decline that we observe in multicopy pT181 copy number. Multiple BioProjects find the same result, where earlier samples have higher copy number compared to later samples. We expect extraction methods to be consistent within a BioProject, suggesting that this decline is genuine and not technical. In revisions, we intend to evaluate the effect of date of sequencing and additional metadata on copy number.

      Reviewer #2 (Public review):

      Summary:

      The authors performed bioinformatic analyses to trace the genomic history of the clinically relevant pT181 plasmid. Specifically, they:

      (1) Tracked the presence of pT181 across different S. aureus strain backgrounds through time. It was first found in one, later multiple strains, though this may reflect changes in sampling over time.

      (2) Estimated the mutation rate of the chromosome and plasmid.

      (3) Estimated the plasmid copy number of pT181, and found that it decreased over time. The latter was supported by two sets of statistical analyses, first showing that the number of single-copy isolates increased over time, and second, that the multicopy isolates demonstrated a lower PCN over time.

      (4) Reported the different integration sites at which pT181 integrated into the genome.

      As a caveat, they mentioned that identical plasmid sequences have variable plasmid copy numbers across different genomes in their dataset.

      Strengths:

      This is a very solid, well-considered bioinformatic study on publicly available data. I greatly appreciate the thoughtful approach the authors have taken to their subject matter, neither over- nor underselling their results. It is a strength that the authors focused on a single plasmid in a single bacterial species, as it allowed them to take into account unique knowledge about the biology of this system and really dive deep into the evolution of this specific plasmid. It makes for a compelling case study. At the same time, I think the introduction and discussion can be strengthened to demonstrate what lessons might be drawn from this case study for other plasmids.

      Weaknesses:

      The finding that the pT181 copy number declined over time is the most interesting claim of the paper to me, and not something that I have seen done before. While the authors have looked at some confounders in this analysis, I think this could be strengthened further in a revision.

      In the revisions, we will further explore the impact that technical variation could have in contributing to copy number variation and update our claims for the decline in copy number of the independent replicon over time and variation for the same plasmid sequence accordingly. Multiple BioProjects show earlier samples have higher copy number compared to later samples; we expect extraction methods to be consistent within a BioProject, supporting our initial findings that this decline over time is not due to technical variation.

      For the flow of the storyline, I also think the estimation of mutation rates (starting L181) and integration into the chromosome (starting L255) could be moved to the supplement or a later position in the main text.

      We will revisit the text organization for flow and clarity of storyline.

      Clearly, the use of publicly available data prevents the authors from controlling the growth and sequencing conditions of the isolates. It is striking that they observe a clear signal in spite of this, but I would have loved to see more discussion of the metadata that came with the publicly available sequences and even more use of that metadata to control for confounding.

      In revisions, we will further investigate possible contributors to the observed decline in copy number of multicopy pT181 over time. We have incorporated the date of sample collection and BioProject in our analysis, but not the date of sequencing or extraction technique.

      References

      McCarthy, A. J., Loeffler, A., Witney, A. A., Gould, K. A., Lloyd, D. H., & Lindsay, J. A. (2014). Extensive horizontal gene transfer during Staphylococcus aureus co-colonization in vivo. Genome Biology and Evolution, 6(10), 2697–2708. https://doi.org/10.1093/gbe/evu214

    1. Author response:

      We thank the reviewers for their thorough and constructive evaluation of our manuscript titled “PSD-95 drives binocular vision maturation critical for predation”. The reviewers raised several important conceptual and technical points. Here, we address and provide additional context on the major themes and outline our planned revisions.

      We acknowledge that the current prey capture task cannot directly adjudicate between PSD-95 binocular vision impairments or sensorimotor processing deficits. However, we did not observe any major impairment supporting a sensorimotor processing deficit, in contrast to a major impairment in line with binocular vision impairment. Evidence from Huang et al. (2015) [1], Favaro et al. (2018) [2] and our data with the visual water task (VWT) — thus requiring identical sensorimotor but differential visual processing—clearly demonstrated intact visual acuity but impaired orientation discrimination in PSD-95 KO mice. Therefore, we believe that a binocular integration deficit is the most likely explanation of PSD-95 KO binocular impairments. In line with this, it is unlikely that aberrations in binocular eye movements account for the observations. We appreciate that alternative explanations remain possible and merit explicit discussion. Accordingly, we intend to expand the discussion of these alternatives.

      Importantly, we will provide additional experimental data demonstrating that knock-down of PSD-95 in V1 but not in superior colliculus, significantly decreases orientation discrimination analyzed with the VWT, as we had shown for PSD-95 KO mice (while control knock-down does not have this effect). We believe that this new evidence better delineates the potential neuroanatomical locus of the PSD-95-associated deficits.

      Furthermore, we will provide additional head movement analyses, as suggested by Reviewer 1. Specifically, we will investigate the head angle in relation to the cricket (azimuth) in time (±1 second) around prey contact under light and dark conditions.

      We will also address the potential impact of PSD-95 KO learning deficits. We agree that there are more impairments in the PSD-95 KO brain, as has been published previously. But strikingly, the binocular impairment was dominating the sensory processing. This cannot be convincingly explained by learning deficits. In fact, we have observed improved learning of PSD-95 KO mice with some tasks (e.g. cocaine conditioned place preference) [3], but no significant differences in the VWT [1,2]. Learning differences were described for another PSD-95 mouse line, expressing the N-terminus with two PDZ domains [4]. To avoid potential learning dependent confounds, we have chosen salient stimuli, like water aversion, and prey capture to reduce impacts of potential learning defects.

      We agree on the strength of the random dot stereograms to isolate stereoscopic computations. However, it requires special filters in front of either eye, which renders it unsuitable for the VWT. The lengthy training with less silent stimuli of water reward, could potentially add additional confounds of PSD-95 KO deficits. Thus, we think that this would be something for future experiments to allow for integration of different visual inputs. However, the combined improved performance of WT mice with binocular vision for prey capture (depth percept) and orientation discrimination (summation) is already supporting the importance of binocular vision in mice and the dominant defect in PSD-95 KO mice.

      Finally, we will address the other points raised by the reviewers through clearer exposition and reorganization of the manuscript.

      Once again, we would like to thank the reviewers for their thoughtful and constructive feedback, which we believe will substantially strengthen the manuscript.

      (1) Huang, X., Stodieck, S. K., Goetze, B., Cui, L., Wong, M. H., Wenzel, C., Hosang, L., Dong, Y., Löwel, S., and Schlüter, O. M. (2015). Progressive maturation of silent synapses governs the duration of a critical period. Proc. Natl. Acad. Sci. 112, E3131–E3140. https://doi.org/10.1073/pnas.1506488112.

      (2) Favaro, P.D., Huang, X., Hosang, L., Stodieck, S., Cui, L., Liu, Y., Engelhardt, K.-A., Schmitz, F., Dong, Y., Löwel, S., et al. (2018). An opposing function of paralogs in balancing developmental synapse maturation. PLOS Biol. 16, e2006838. https://doi.org/10.1371/journal.pbio.2006838.

      (3) Shukla, A., Beroun, A., Panopoulou, M., Neumann, P.A., Grant, S.G., Olive, M.F., Dong, Y., and Schlüter, O.M. (2017). Calcium‐permeable AMPA receptors and silent synapses in cocaine‐conditioned place preference. EMBO J. 36, 458–474. https://doi.org/10.15252/embj.201695465.

      (4) Migaud, M., Charlesworth, P., Dempster, M., Webster, L.C., Watabe, A.M., Makhinson, M., He, Y., Ramsay, M.F., Morris, R.G.M., Morrison, J.H., et al. (1998). Enhanced long-term potentiation and impaired learning in mice with mutant postsynaptic density-95 protein. Nature 396, 433–439. https://doi.org/10.1038/24790.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      I am happy with the revisions the authors made, and believe that the manuscript is now stronger, representing an important contribution.

      We are truly thankful to this reviewer for the very constructive comments

      Reviewer #2 (Public review):

      In their response, the authors state that they do not treat the EAK evidence as decisive, yet the manuscript repeatedly characterizes the assemblage in very definitive terms. For example, EAK is described as "the oldest unambiguous proboscidean butchery site at Olduvai" and as "the oldest secure proboscidean butchery evidence." These phrases communicate a high level of confidence that does not align with the more qualified position articulated in the rebuttal and extends beyond what the documented evidence securely supports.

      We decided to sound less dogmatic and remove the emphasis by adding “potentially” the oldest…. We emphasize that even if we had documented cut marks, we would be on the same epistemological ground, since there is no 100% certainty that the marks identified as cut marks could be cut marks.

      I appreciate the authors' clarification regarding the fracture features, and I agree that these are well-established outcomes of dynamic hammerstone percussion. At the same time, several of these traits have been documented in non-anthropogenic contexts, including helicoidal spiral fractures resulting from trampling and carnivore activity (Haynes 1983), adjacent or flake-like scars created by carnivore gnawing (Villa and Bartram 1996), hackled break surfaces produced by heavy passive breakage such as trampling or sediment pressure (Haynes 1983), and impact-related bone flakes observed in carnivore-modified assemblages (Coil et al. 2020).

      We added this explanation to the final version of the article:

      “This interpretation is epistemologically problematic because it does not satisfy the fundamental criteria for valid analogy as outlined by Bunge (1981), namely substantial, structural, and environmental affinity. Specifically, the cited examples involve agents, materials, and contexts that differ markedly in composition, mechanical properties, and loading regimes from those considered here. Experimental and actualistic studies demonstrate that carnivores—rather than trampling—are also capable of producing spiral fractures and overlapping bone scarring, but these observations are restricted to faunal remains of substantially smaller body size than elephants, which they can gnaw (Haynes 1983; see also Figures S30–S36). To date, no carnivore has been documented as producing comparable fracture morphologies or surface damage on elephant bones. Consequently, the proposed analogy is not supported. Moreover, Haynes (1983) provides no empirical evidence that sediment pressure or trampling can generate hackled fracture surfaces. Such features are instead associated with dynamic loading conditions, whereas passive breakage processes have not been shown to produce these types of modifications. This reasoning also applies to impact flakes on elephant bones, which can only be produced by the sole modern agent documented to dynamically fracture green proboscidean long bones: humans.”

      One of the biggest issues is that there is no quantitative data or images of the bone fracture features that the authors refer to as the main diagnostic criteria at EAK. The only figures that show EAK specimens (S21, S22, S23) illustrate general green-bone fracture morphology but none of the specific traits listed in the text. In contrast, clear examples of similar features come from other Olduvai assemblages, which may be misleading to readers if they mistakenly interpret those as images from EAK. The manuscript also states that these traits "co-occur," but it is not defined whether this refers to multiple features on the same fragment or within the broader assemblage. Without images or counts that document these traits on EAK fossils, readers cannot evaluate the strength of the interpretation. Including that information would substantially strengthen the manuscript.

      The arguments were addressed in the general criteria criticized by the reviewer in his/her previous review encompassing all green broken elephant bones documented. If we restrict the arguments now to EAK, then suffice to rescue the arguments from the previous reply. Images (Figs S21-23) show the EAK broken specimens and clearly indicate their human agency by two factors: a) at least one of them is a long bone flake with overlapping scars (FS 23 is showing its medullary side), and b) elephant bones impacted by carnivores (namely, hyenas) have always shown intensive gnawing and tooth-marking; lack thereof in both EAK specimens refutes a non-human carnivore agency. The former argument is interpreted as human agency because carnivores have not documented to produce such features on elephant bones.

      Regarding the statement that "natural elephant long limb breaks have been documented only in pre or peri-mortem stages when an elephant breaks a leg, and only in femora (Haynes et al., 2021)," it is not entirely clear what this example is intended to illustrate in relation to the EAK assemblage. My understanding is that the authors are suggesting that naturally produced green bone fractures in elephants are very limited, perhaps occurring only in pre or peri-mortem broken leg cases, and that fractures on other elements should therefore be attributed to hominin activity. If that is not the intended argument, I would encourage clarifying this point. This appears to conflate pre-mortem injury with the broader issue of equifinality. My original comment was not referring to pre-mortem breaks but to the range of natural (i.e., non-hominin) and post-mortem processes that can generate spiral or green bone fractures similar to those described by the authors.

      We elaborated such argument addressing exclusively the reviewer´s previous argument that natural limb breaking produced spiral breaks on elephant long bones, which is correctly, as Haynes describes it, the only way not involving human agency that can generate a helicoidal spiral fracture on an elephant long bone. Non-human post-mortem processes on fresh bone do not generate these features. Neither have extant carnivores documented to produce these features on elephant bones.

      Finally, in considering the authors' response on the Nyayanga material, I still find the basis for their dismissal of that evidence difficult to follow and the contrasting treatment of the Nyayanga and EAK evidence raises concerns about interpretive consistency. Plummer et al. (2023) specify that bone surface modifications were examined using low-power magnification (10×-40×) and strong light sources to identify modifications and that they attributed agency (e.g., hominin, carnivore) to modifications only after excluding possible alternatives. The rebuttal does not engage with the procedures reported. The existence of newer analytical techniques does not diminish the validity of long-standing methods that have been applied across many studies. It is also unclear why abrasion is presented as a more likely explanation than stone tool cutmarks. The authors dismiss the Nyayanga images as "blurry," but this is irrelevant to the interpretation, since the analysis was based on the fossils, not the photographs. The Nyayanga dataset is dismissed without a thorough engagement, while the EAK material, despite similar uncertainties and potential for alternative explanations, is treated as definitive.

      We believe the rebuttal engages with these arguments. The protocol “bone surface modifications were examined using low-power magnification (10×-40×) and strong light sources to identify modifications and that they attributed agency (e.g., hominin, carnivore) to modifications only after excluding possible alternatives” does not guarantee that any derived interpretation is correct. These methods have consistently been used for decades now in contexts in which different researchers draw different conclusions on the same marks. The underlying variables used are subjectively interpreted and tallied, and equifinal when not considering overlapping factors, such as sediment abrasion and trampling. As an example, the same marks on the Nyayanga hippo bones interpreted by the original authors as cut marks, we see them undifferentiable from trampling marks from the image evidence published.

      It is clear in the final version of our article that the EAK evidence is not treated as definitive, since that would be dogmatic, and thus, non-scientific. We thank this reviewer for having given us the chance to reconsider our original phrasing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary:

      This study investigates the molecular mechanism by which warm temperature induces female-to-male sex reversal in the ricefield eel (Monopterus albus), a protogynous hermaphroditic fish of significant aquacultural value in China. The study identifies Trpv4 - a temperature-sensitive Ca<sup>2+</sup> channel - as a putative thermosensor linking environmental temperature to sex determination. The authors propose that Trpv4 causes Ca<sup>2+</sup> influx, leading to activation of Stat3 (pStat3).pStat3 then transcriptionally upregulates the histone demethylase Kdm6b (aka Jmjd3), leading to increased dmrt1 gene expression and ovo-testes development. This work aims to bridge ecological cues with molecular and epigenetic regulators of sex change and has potential implications for sex control in aquaculture.

      Strengths:

      (1) This study proposes the first mechanistic pathway linking thermal cues to natural sex reversal in adult ricefield eel, extending the temperature-dependent sex determination paradigm beyond embryonic reptiles and saltwater fish.

      (2) The findings could have applications for aquaculture, where skewed sex ratios apparently limit breeding efficiency.

      We thank you for the encouraging comments of our work, and answering your questions has greatly improved the quality of the manuscript.

      Weaknesses:

      (A) Scientific Concerns:

      (1) There is insufficient replication and data transparency. First, the qPCR data are presented as bar graphs without individual data points, making it impossible to assess variability or replication. Please show all individual data points and clarify n (sample size) per group. Second, the Western blotting is only shown as single replicates. If repeated 2-3 times as stated, quantification and normalization (e.g., pStat3/Stat3, GAPDH loading control) are essential. The full, uncropped blots should be included in the supplementary data.

      We thank you for the critical comments. Now we have remade the bar graphs with individual data points, and added the sample size per group if possible. Quantification and/or normalization of the WB data based on at least two replicates were included. The representative uncropped blots have also been loaded as the supplementary data.

      (2) The biological significance of the results is not clear. Many reported fold changes (e.g., kdm6b modulation by Stat3 inhibition, sox9a in S3A) are modest (<2-fold), raising concerns about biological relevance. Can the authors define thresholds of functional relevance or confirm phenotypic outcomes in these animals?

      We thank you for the inspiring comments. Most of the experiments were transient in nature, for instance, warm temperature treatment of fish for 3-4 days, the fold change of gene expression were modest.

      We admit that there are some shortcomings in this work. The major one is lacking of data showing that Trpv4 inhibition/activation,or pStat3 inhibition/activation can cause a gonadal phenotype change, for instance, from ovary to ovotestis or causing females to intersex fish. We only showed that pharmacological or RNAi can lead to change in sex-biased gene expression or affect temperature-induced gene expression, but not gonadal transformation.

      In natural population, the sex change of ricefield eel may take several months to one year or even longer. We propose that the magnitude and duration of temperature exposure promote sex change of ricefield eel by driving the accumulation of testicular differentiation genes in sufficient quantities. In experimental condition, to realize the gonadal phenotype change, animals may need to be under repeated pharmaceutical treatment (3 day interval treatment) for longer time to reach a threshold. However, long term treatment significantly increases the death rate of the animals, caused by stress or frequent manipulation.

      Inspired by your comment, we are optimizing the experimental conditions in order to cause some phenotypic outcomes, thanks.

      (3) The specificity of key antibodies is not validated. Key antibodies (Stat3, pStat3, Foxl2, Amh) were raised against mammalian proteins. Their specificity for ricefield eel proteins is unverified. Validation should include siRNA-mediated knockdown with immunoblot quantification with 3 replicates. Homemade antibodies (Sox9a, Dmrt1) also require rigorous validation.

      We thank you for the comments about the specificity of the antibodies. First,when choosing the commercial antibodies, we have compared the immunogen of the animal with the corresponding amino acids of ricefield eel, making sure that it was conserved to some extent (at least> 85% similarity). Second, we have referred the published work, where the antibodies have been proven to work in zebrafish, frogs, and turtles et al. This was true for pStat3 and Stat3 antibodies (Weber et al. 2020; Ge et al., 2024). Third, the specificity for each antibody was assessed using WB, based on the predicted size of the protein and the correct control setting.

      For instance, we are very confident for the specificity for Dmrt1 antibody. First, Dmrt1 protein was readily detected in testes of males but barely detected in ovaries of females (Author response image 1). Second, Dmrt1 protein was not detected in ovary of fish at cool temperature, but clearly detected in nuclei of follicles in warm temperature-treated fish (Figure 3C, 4B), in line with our qPCR results. Third, by performing IF, Dmrt1 was not detected in females reared at lower temperature. By contrast, after warm temperature treatment or Trpv4 activation, it was detected in the nuclei in specific cell types but not everywhere (Figure 3E, 6C).

      Author response image 1.

      Although we have carefully evaluated the antibodies before experiments as described above, in response to your concerns, we went on to validate Amh, Dmrt1, Sox9a, and Stat3 antibodies using the corresponding siRNAs (Author response image 2). The results indicated that the antibodies, although not perfect, can be used in this work, as the expected band was gone or reduced in intensity. The experiments were repeated two times, and shown were representative.

      Author response image 2.

      (4) Most of the imaging data (immunofluorescence) is inconclusive. Immunofluorescence panels are small and lack monochrome channels, which severely limits interpretability. Larger, better-contrasted images (showing the merge and the monochrome of important channels) and quantification would enhance the clarity of these findings.

      We apologize for the poor quality of the IF images. At your suggestion, we have repeated the majority of the IF experiments, and imaging data with better quality were presented in the revised manuscript. Quantification of WB and IF was also included to enhance the clarity. Please see the revised manuscript, Thanks.

      (B) Other comments about the science: 

      (1) In S3A, sox9a expression is not dose-responsive to Trpv4 modulation, weakening the causal inference.

      We have repeated the experiments, and new data was included for the replacement of the old one in the revised manuscript.

      (2) An antibody against Kdm6b (if available) should be used to confirm protein-level changes.

      We thank you for the nice suggestion. Unfortunately, current commercial antibody for Kdm6b is for mammals, which was not working in ricefield eel. At your suggestion, we are going to make one in future.

      In sum, the interpretations are limited by the above concerns regarding data presentation and reagent specificity.

      Reviewer #2 (Public review):

      Summary:

      This study presents valuable findings on the molecular mechanisms driving the female-to-male transformation in the ricefield eel (Monopterus albus) during aging. The authors explore the role of temperature-activated TRPV4 signaling in promoting testicular differentiation, proposing a TRPV4-Ca<sup>2+</sup>-pSTAT3-Kdm6b axis that facilitates this gonadal shift.

      We thank you for the encouraging comments. Answering your questions has greatly improved our understanding of Trpv4 function in ricefield eel, and the quality of the manuscript.

      Strengths:

      The manuscript describes an interesting mechanism potentially underlying sex differentiation in M. albus.

      Weaknesses:

      The current data are insufficient to fully support the central claims, and the study would benefit from more rigorous experimental approaches.

      (1) Overstated Title and Claims:

      The title "TRPV4 mediates temperature-induced sex change" overstates the evidence. No histological confirmation of gonadal transformation (e.g., formation of testicular structures) is presented. Conclusions are based solely on molecular markers such as dmrt1 and sox9a, which, although suggestive, are not definitive indicators of functional sex reversal.

      We thank you for pointing out this. The title has been changed to “Trpv4 links environmental temperature to testicular differentiation in hermaphroditic ricefield eel.”

      (2) Temperature vs Growth Rate Confounding (Figure 1E):<br /> The conclusion that warm temperature directly induces gonadal transformation is confounded by potential growth rate effects. The authors state that body size was "comparable" between 25C and 33C groups, but fail to provide supporting data. In ectotherms, growth is intrinsically temperature-dependent. Given the known correlation between size and sex change in M. albus, growth rate-rather than temperature per se-may underlie the observed sex ratio shifts. Controlled growth-matched comparisons or inclusion of growth rate metrics are needed.

      We thank you for the critical comments. We have repeated the experiments, and have carefully compared the body length and weight, and results showed that there is no big difference between 25 and 33 degree groups. Please see Figure S1D-E, and the text in the last paragraph of “Warm temperature promotes gonadal transformation” section in the Results part.

      (3) TRPV4 as a Thermosensor-Insufficient Evidence:<br /> The characterisation of TRPV4 as a direct thermosensor lacks biophysical validation. The observed transcriptional upregulation of Trpv4 under heat (Figure 2) reflects downstream responses rather than primary sensor function. Functional thermosensors, including TRPV4, respond to heat via immediate ion channel activity-typically measurable within seconds-not mRNA expression over hours. No patch-clamp or electrophysiological data are provided to confirm TRPV4 activation thresholds in eel gonadal cells.

      We thank you for the critical comments. The patch-clamp or electrophysiological experiments require special equipment and well-trained expert, unfortunately, our lab members and nearby collaborators have no experience in performing the kind of experiments. The Trpv4 is a well-known cation channel protein that is activated by moderate heat (> 27 degree). And a body of published work has demonstrated its role in the regulation of Ca<sup>2+</sup> signals via change its configuration in response to temperature (J Physiol. 2017 Oct 25;595(22):6869–6885. doi: 10.1113/JP275052; Cell Death Dis 11, 1009 (2020). https://doi.org/10.1038/s41419-020-03181-7; Cell Death Dis 10, 497 (2019). https://doi.org/10.1038/s41419-019-1708-9; Cell calcium, https://doi.org/10.1016/j.ceca.2026.103108).

      Consistently, warm temperature increased calcium influx within an hour, similar to the Trpv4 agonist treatment (Figure 2E, 5D), and addition of ion channel Trpv4 inhibitor prevents the calcium signals by war temperature treatment. Moreover, calcium signaling activity is closely linked with pStat3 activity and expression of sex-biased genes (Figures 5G, 6F). Although we did not show biophysical data, these results implied that Trpv4 is a thermosensor, and regulate the downstream pathway via the regulation of calcium signals, in line with it functions as an ion channel.

      Additionally, the Ca<sup>2+</sup> imaging assay (Figure 2F) lacks essential details: the timing of GSK1016790A/RN1734 administration relative to imaging is unclear, making it difficult to distinguish direct channel activity from indirect transcriptional effects.

      We have added more information for Ca<sup>2+</sup> imaging assay (now Figure 2E and the corresponding text in Figure 2 legend, in the revised manuscript). In particular, we added the timing of treatment to better show that it was a direct effect.

      (4) Cellular Context of TRPV4 Activity Is Unclear:<br /> In situ hybridisation suggests TRPV4 expression shifts from interstitial to somatic domains under heat (Figures. 2H, S2C), implying potential cell-type-specific roles. However, the study does not clarify: (i) whether TRPV4 plays the same role across these cell types, (ii) why somatic cells show stronger signal amplification, or (iii) the cellular composition of explants used in in vitro assays. Without this resolution, conclusions from pharmacological manipulation (e.g., GSK1016790A effects) cannot be definitively linked to specific cell populations.

      We thank you for the inspiring comments. We have performed IF experiments using Trpv4 specific antibodies (antibody specificity was confirmed). It was clearly shown that Trpv4 was expressed in a portion of follicle cells. To explore the identity of Trpv4-expressing somatic cells, we have performed double IF experiments using Trpv4 and Foxl2, a granulosa cell marker. The results (Figure 2H) clearly showed that Trpv4-expressing cells are a portion of Foxl2-positive granulosa cells. We propose that Trpv4-expressing granulosa cells may play an important role in sensing the temperature, and that Trpv4-expressing granulosa cells transdifferentiate into Sertoli cells by warm temperature exposure, because Dmrt1, a Sertoli cell marker, started within follicles in a typical granulosa cell location. Unfortunately, current Dmrt1/Trpv4 antibodies are both produced from rabbit. To overcome this, we are ordering mouse Dmrt1 antibodies, and in future we will perform Trpv4/Dmrt1 double IF to show if Dmrt1 positive cells co-localize with Trpv4 expressing cells. We would like to update the results to you once the antibody was available.

      As our animal experiments (Figure 2H) have clearly shown the identify of Trpv4 expressing somatic cells, we did not repeat the experiments using explants, to explore the cellular composition of explants used in in vitro assays.

      (5) Rapid Trpv4 mRNA Elevation and Channel Function:<br /> The authors report a dramatic increase in Trpv4 mRNA within one day of heat exposure (Figures 4D, S2B). Given that TRPV4 is a membrane channel, not a transcription factor, its rapid transcriptional sensitivity to temperature raises mechanistic questions. This finding, while intriguing, seems more correlational than functional. A clearer explanation of how TRPV4 senses temperature at the molecular level is needed.

      We appreciate you for your inspiring comments. Actually, we are also wondering about how trpv4 mRNA was regulated by warm temperature. First of all, the up-regulation of trpv4 mRNA is true, as evidenced by multiple pieces of data using qPCR and ISH experiments. It appears that ovarian cells respond to the temperature changes by increasing calcium influx via Trpv4 ion channel,as well as by increasing trpv4 mRNA expression levels.

      Then, how trpv4 mRNA is regulated by heat? It is well-known that gene expression can be regulated by subtle temperature change via some direct temperature sensing genes (Haltenhof et al., 2020). We hypothesized that trpv4 is a downstream target of these thermosensors, displaying a mechanism similar to mammals. Actually, we have performed some experiments, and the preliminary data were obtained, which support our hypothesis.

      Because the mechanistic explanation study is undergoing and not published, we chose not to discuss it in detail in the revised manuscript. We wish to report it by the end of this year, and by then are pleased to update you with the progress.

      (6) Inconclusive Evidence for the Ca<sup>2+</sup>-pSTAT3-Kdm6b Axis: Although the authors propose a TRPV4-Ca<sup>2+</sup>-pSTAT3-Kdm6b-dmrt1 pathway, intermediate steps remain poorly supported. For example, western blot data (Figures 3C, 4B) do not convincingly demonstrate significant pSTAT3 elevation at 34C. Higher-resolution and properly quantified blots are essential. The inferred signalling cascade is based largely on temporal correlation and pharmacological inhibition, which are insufficient to establish direct regulatory relationships.

      We thank you for the critical comments. In response to your concerns, we have repeated experiments, and better resolution WB data with proper quantification were included in the revised manuscript. In particular, we convincingly demonstrate that 34 degree caused significant pStat3 elevation.

      To directly establish regulatory relationship of the members, at your suggestion, we provided some genetic and molecular biology data to support our conclusion in the revised manuscript. For instance, we have knockdown the stat3 gene by using siRNAs, and as shown in Figure 6F, we further showed that pStat3 is functionally downstream of Trpv4. Moreover, ChIP and luciferase assays were performed to show that pStat3 directly binds and activate kdm6b (Figure 7B-C). We have also performed various pharmacological inhibition to further strength our conclusion (Figures 6B-E).

      (7) Species-Specific STAT3-Kdm6b Regulation Is Unresolved:<br /> The proposed activation of Kdm6b by pSTAT3 contrasts with findings in the red-eared slider turtle (Trachemys scripta), where pSTAT3 represses Kdm6b. This divergence in regulatory direction between the two TSD species is surprising and demands further justification. Cross-species differences in binding motifs or epigenetic context should be explored. Additional evidence, such as luciferase reporter assays (using wild-type and mutant pSTAT3 binding motifs in the Kdm6b promoter) is needed to confirm direct activation.

      We thank you for the inspiring comments. At your suggestion, we have performed luciferase assay using kdm6b promotor that is intact or mutated. The results were in favor of our statement. Please see Figure 7C and the related text.

      A rescue experiment-testing whether Kdm6b overexpression can compensate for pSTAT3 inhibition-would also greatly strengthen the model.

      We thank you for the nice suggestion. It is technically challenging to perform kdm6b overexpression or any Kdm6b gain of function experiments (we have tried to make lentivirus, however, it was not working). There is no Kdm6b-specific agonists.

      Inspired by you, we are establishing constitutive kdm6b transgenic ricefield eel. Although it require at least a year to allow the fish to grow up for functional experiments, once it was established, we can directly answer some important questions.

      (8) Immunofluorescence-Lack of Structural Markers: <br /> All immunofluorescence images should include structural markers to delineate gonadal boundaries. Furthermore, image descriptions in the figure legends and main text lack detail and should be significantly expanded for clarity.

      We thank you for the critical comments. At your comments, we have first performed IF using beta-catenin as structural marker. However, the results were not good for some unknown reasons. Then, we used Vimentin as a structural maker, as it can label all the cells in gonads. Foxl2 was used as granulosa cell marker. Dmrt1 was used as Sertoli cell marker.

      Some essential description was added in the figure legend as requested. Please see detail in the revised manuscript.

      (9) Pharmacological Reagents-Mechanisms and References: <br /> The manuscript lacks proper references and mechanistic descriptions for the pharmacological agents used (e.g., GSK1016790A, RN1734, Stattic). Established literature on their specificity and usage context should be cited to support their application and interpretation in this study.

      These pharmacological agents have been used by others (Ge et al., 2017; Liu et al., 2021; Weber et al., 2020; Wu et al.,2024), and they are properly cited in the manuscript.

      (10) Efficiency of Experimental Interventions: <br /> The percentage of gonads exhibiting sex reversal following pharmacological or RNAi treatments should be reported in the Results. This is critical for evaluating the strength and reproducibility of the interventions.

      We thank you for the critical and important comments. Actually another reviewer has asked the same question. We admit that this was the big shortcoming of the work, as we did not provide data demonstrating that Trpv4 inhibition/activation, or pStat3 inhibition/activation can cause a gonadal phenotype change, for instance, from ovary to ovotestis or causing sex reversal of fish. We only showed that pharmacological or RNAi can lead to alteration of sex-biased gene expression or affect temperature induced gene expression.

      In wild population, the entire sex change of ricefield eel may take months to one year or even longer. We propose that the magnitude and duration of temperature exposure promote sex change of ricefield eel by driving the accumulation of testicular differentiation genes in sufficient quantities. In experimental condition, to realize the gonadal phenotype change, animals may need to be under repeated pharmaceutical treatment (3 day interval treatment) for longer time to reach a threshold, however, long term treatment significantly increases the death rate of the animals, caused by stress or frequent manipulation. Actually, my students have tried the experiments, unfortunately, either the number of sex-versing animals were small or the experiments lacked of repeat. So no percentage of gonadal transformation after treatment can be provided at this time, but we have indicated the number of samples when performing molecular experiments (showing expression of sex-biased genes).

      Inspired by your important comment, we are optimizing the experimental conditions in order to cause some phenotypic outcomes. By then, the percentage of gonads exhibiting sex reversal following pharmacological or RNAi treatments can be calculated, showing the biological significance.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Editorial Concerns: 

      (1) The term "sex reversal" should be clearly defined upfront as female-to-male, and the developmental consequences (e.g., increase in body size post-transition) should be explicitly stated early in the introduction.

      We thank our editorial for pointing out this. We have added those in the introduction Part. It reads “The species begins life as a female and then develops into a male through an intersex stage, thus displaying a female-to-male sex reversal during aging. Females are small in size (< 25 cm), and during and after sex change, there is a gradual increase in body size (> 55 cm for the majority of males).”

      Additional information was shown in the first and second paragraph in the results Part.

      (2) The manuscript references skewed sex ratios in cultured ricefield eel but fails to specify the direction (e.g., too many males or females). This should be clarified to contextualize the biological and commercial problem. 

      According to your suggestion, we now added additional information, and it reads “The reproductive mode of ricefield eel, which leads to much more females than males in spawning season, severely affects the sex ratio, and decreases the productivity of broodstock. Moreover, adult females lay limited eggs (~200) due to its small size.”

      (3) Define TSD (temperature-dependent sex determination) upon first use, not at the second mention.

      We have checked this, and make sure it was properly done.

      (4) The phrase "quality fries for aquaculture" should be reworded or defined; it is unclear to non-specialists.

      We thank you for pointing out this. Now it reads “adult females lay limited eggs (~200) due to its small size, which is a limiting factor for massive production of seedling for aquaculture industry”.

      (5) Several in-text citations (e.g., Weber 2020, Wu 2024) are absent from the bibliography. ]

      We have double checked the reference, thanks.

      (6) The inclusion of page and line numbers would facilitate peer review.

      We have now shown the page and line.

      (7) The discussion is written vaguely. Clarify species names when discussing comparative biology and consider breaking down complex sentences to aid comprehension for a broad audience, such as that of eLife. 

      We have added the species name, and try our best to use concise expression. Thanks.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Abdelmageed et al. investigate age-related changes in the subcellular localization of DNA polymerase kappa (POLK) in the brains of mice. POLK has been actively investigated for its role in translesion DNA synthesis and involvement in other DNA repair pathways in proliferating cells, very little is known about POLK in a tissue-specific context, let alone in post-mitotic cells. The authors investigated POLK subcellular distribution in the brains of young, middle-aged, and old mice via immunoblotting of fractioned tissue extracts and immunofluorescence (IF). Immunoblotting revealed a progressive decrease in the abundance of nuclear POLK, while cytoplasmic POLK levels concomitantly increased. Similar findings were present when IF was performed on brain sections. Further, IF studies of the cingulate cortex (Cg1), the motor cortex (M1, M2), and the somatosensory (S1) cortical regions all showed an age-related decline in nuclear POLK. Nuclear speckles of POLK decrease in each region, meanwhile, the number of cytoplasmic POLK granules decreases in all four regions, but granule size is increasing. The authors report similar findings for REV1, another Y-family DNA polymerase.

      The authors then investigate the colocalization of POLK with other DNA damage response (DDR) proteins in either pyramidal neurons or inhibitory interneurons. At 18 months of age, DNA damage marker gH2AX demonstrated colocalization with nuclear POLK, while strong colocalization of POLK and 8-oxo-dG was present in geriatric mice. The authors find that cytoplasmic POLK granules colocalize with stress granule marker G3BP1, suggesting that the accumulated POLK ends up in the lysosome.

      Brain regions were further stained to identify POLK patterns in NeuN+ neurons, GABAergic neurons, and other non-neuronal cell types present in the cortex. Microglia associated with pyramidal neurons or inhibitory interneurons were found to have a higher abundance of cytoplasmic POLK. The authors also report that POLK localization can be regulated by neuronal activity induced by Kainic acid treatment. Lastly, the authors suggest that POLK could serve as an aging clock for brain tissue, but POLK deserves further characterization and correlation to functional changes before being considered as a biomarker.

      Strengths:

      Investigation of TLS polymerases in specific tissues and in post-mitotic cells is largely understudied. The potential changes in sub-cellular localization of POLK and potentially other TLS polymerases open up many questions about DNA repair and damage tolerance in the brain and how it can change with age.

      Weaknesses:

      The work is quite novel and interesting, and the authors do suggest some potentially interesting roles for POLK in the brain, but these are in and of themselves a bit speculative. The majority of the findings of this paper draw upon findings from POLK antibody and its presumed specificity for POLK. However, this antibody has not been fully validated and needs further work. Further validation experiments using Polk-deficient or knocked-down cells to investigate antibody specificity for both immunoblotting and immunofluorescence should be performed. More mechanistic investigation is needed before POLK could be considered as a brain aging clock.

      We are thankful for the overall enthusiasm and positive comments.

      (a) Concern over POLK antibody characterization in mouse:

      We performed siRNA and shRNA knock downs in mouse primary cortical neurons as well as efficiently transfectable murine lines like 4T1 and Neuro-2A showing knock down of 99kDa and 120kDa bands recognized by sc-166667 anti-POLK antibody (exact figure number Figure 1 and S1). We show that in IF sc-166667 and A12052 (Figure S1G) shows similar immunostaining patterns and we used sc-166667 in all reported figures and western blots.

      (b) More mechanistic investigation is needed before POLK could be considered as a brain aging clock:

      We sincerely appreciate the valuable suggestion. We agree as a terminal assay POLK nucleo-cytoplasmic status is not practical for longitudinal studies. However, we believe it may serve an investigative/correlative endogenous signal for determining tissue age, that may be useful to "date" brain sections, since not many such cell biological markers exist. We have added clarification texts to address this.

      Reviewer #2 (Public review):

      Summary:

      Abdelmageed et al., demonstrate POLK expression in nervous tissue and focus mainly on neurons. Here they describe an exciting age-dependent change in POLK subcellular localization, from the nucleus in young tissue to the cytoplasm in old tissue. They argue that the cytosolic POLK is associated with stress granules. They also investigate the cell-type specific expression of POLK, and quantitate expression changes induced by cell-autonomous (activity) and cell nonautonomous (microglia) factors.

      I think it is an interesting report but requires a few more experiments to support their findings in the latter half of the paper. Additionally, a more mechanistic understanding of the pathways regulating POLK dynamics between the nucleus and cytosol, what is POLK doing in the cytosol, and what is it interacting with; would greatly increase the impact of this report. However, additional mechanistic experiments are mostly not needed to support much of the currently presented results, again, it would simply increase the impact.

      (a) Concern on more mechanistic understanding of the pathways regulating POLK dynamics between the nucleus and cytosol:

      We sincerely appreciate the reviewer’s enthusiasm and valuable guidance in helping us better understand the mechanism of nuclear-cytoplasmic POLK dynamics. Previously, we developed a modified aniPOND (accelerated native isolation of proteins on nascent DNA) protocol, which we termed iPoKD-MS (isolation of proteins on Pol kappa synthesized DNA followed by mass spectrometry), to capture proteins bound to nascent DNA synthesized by POLK in human cell lines (bioRxiv https://www.biorxiv.org/content/10.1101/2022.10.27.513845v3). In this dataset, we identified potential candidates that may regulate nuclear/cytoplasmic POLK dynamics. These candidates are currently undergoing validation in human cell lines, and we are preparing a manuscript on these findings. Among these, some candidates, including previously identified proteins such as exportin and importin (Temprine et al., 2020, PMID: 32345725), are being explored further as potential POLK nuclear/cytoplasmic shuttles. We are also conducting tests on these candidates in mouse cortical primary neurons to assess their role in POLK dynamics. In the revised version of the manuscript, we have included a discussion of our current understanding.

      (b) Question on “… what is POLK doing in the cytosol, and what is it interacting with …”: Our data so far indicate that POLK accumulates in stress granules and lysosomes. We are very grateful for the reviewer’s insightful suggestions and will make every effort to incorporate them in the revised manuscript. We characterized POLK accumulation in the cytoplasm using six additional endo-lysosomal markers, as recommended by the reviewer. This data is now part of entirely new Figure 3.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors show that DNA polymerase kappa POLK relocalizes in the cytoplasm as granules with age in mice. The reduction of nuclear POLK in old brains is congruent with an increase in DNA damage markers. The cytoplasmic granules colocalize with stress granules and endo-lysosome. The study proposes that protein localization of POLK could be used to determine the biological age of brain tissue sections.

      Strengths:

      Very few studies focus on the POLK protein in the peripheral nervous system (PNS). The microscopy approach used here is also very relevant: it allows the authors to highlight a radical change in POLK localization (nuclear versus cytoplasmic) depending on the age of the neurons. 

      The conclusions of the study are strong. Several types of neurons are compared, the colocalization with several proteins from the NHEJ and BER repair pathways is tested, and microscopy images are systematically quantified.

      Weaknesses:

      The authors do not discuss the physical nature of POLK granules. There is a large field of research dedicated to the nature and function of condensates: in particular numerous studies have shown that some condensates but not all exhibit liquid-like properties (https://www.nature.com/articles/nrm.2017.7, https://pubmed.ncbi.nlm.nih.gov/33510441/ https://www.mdpi.com/2073-4425/13/10/1846). The change of physical properties of condensates is particularly important in cells undergoing stress and during aging. The authors should discuss this literature.

      We highly appreciate the reviewer bringing up the context of biomolecular condensates. Our iPoKD-MS data referenced above suggests candidates from various biomolecular condensates that we are currently investigating. We appreciate the reviewer providing important literature cited these articles in text and potential biomolecular condensates are discussed in the revised version. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The work is quite novel and interesting, and the authors do suggest some potentially interesting roles for POLK in the brain, but these are in of themselves a bit speculative. The majority of the findings of this paper rely upon the POLK antibody and its specificity for POLK, which is not fully characterized and needs further work (validation of antibodies using immunoblots of Polk KO cells or siRNA KD of POLK in murine cells) to provide confidence in the authors' findings. 

      Points

      siRNA knockdown of Polk in primary neurons showed a dramatic reduction in signal by IF even though qPCR analysis showed a reduction of only ~35% at the transcript level. Typically many DNA repair genes need to be knocked down by 80% or more to see discernable differences at the protein level. siRNA knockdown in a murine cell line (MEFs, neurons, or some other easily transfectable cell type) needs to be performed with immunoblotting with whole cell and fractionated (nuclear/cytoplasmic) lysates in order to better validate the anti-POLK antibodies and which bands that are visualized during immunoblotting are specific to POLK.

      We performed siRNA and shRNA knock downs in mouse primary cortical neurons as well as efficiently transfectable murine lines like 4T1 and Neuro-2A showing knock down of 99kDa and 120kDa bands recognized by sc-166667 anti-POLK antibody (exact figure number Figure 1 and S1). We show that in IF sc-166667 and A12052 (Figure S1G) shows similar immunostaining patterns and we used sc-166667 in all reported figures and western blots.

      Figure 1B and C, it is not clear which antibody(ies) are used for the immunoblotting of nuclear and cytoplasmic fractions and for a blot with whole tissue lysates. Please place the antibody vendor or clone next to the corresponding blot or describe it in the figure legend. Bands of varying sizes are present in 1B (and Figure S1) but only a band at 99 kDa was shown in 1C. Because there are no bands of equivalent size present in the nuclear and cytoplasmic fractions in Figure 1B, please describe or denote which bands were used for quantification purposes for nuclear and cytoplasmic POLK.

      This has been clarified by using only one antibody throughout the manuscript sc-166667. We observed in whole cell lysate an intense ~99kDa and a faint ~120kDa band, which gets intense in nuclear fraction and is absent in cytoplasmic fraction. We have noted this in multiple human cell lines and hiPSC-derived neurons, which is our ongoing work. We do not know yet if the ~120kDa is a modification or isoform of POLK. We have hints from our proteomics data that it may be SUMOylated or ubiquitinylated or other post translational modifications. We added this in the discussion section.

      Figure 1I, is there a quantification beyond just the representative image? There is no green staining pattern outside the cytoplasm in the 1-month-old M1 images that is present in all the other images in the panel.

      Fig 1I is now Fig S1G in the revised manuscript. Since REV1 and POLH were not central to the study that focused on POLK, they were meant to be exploratory data panels and as such we did not quantify beyond the qualitative evaluation, which broadly resembled POLK’s disposition with age. We have noted there are some sample to sample variability in the background signal. In general, outside the cytoplasm as subcellularly segmented by fluorescent nissl expression, tends to be variable by brain areas but also higher in older brains

      "Association with PRKDC further suggests POLK's role in the "gap-filling" step in the NHEJ repair pathway in neurons." There is no strong evidence in the literature for mammalian POLK playing a role in NHEJ. Some description of a role in HR has been described, however. The reference regarding the iPoKD-MS data set that provides evidence of POLK associating with BER and NHEJ factors is listed as Paul, 2022 but is in the reference list as Shilpi Paul 2022.

      We removed this speculative statement and citation fixed.

      Figure 4A, what is the age of the mouse for the representative images?

      19 months and now mentioned in the figure legend

      Figure 4C, Could the data from the different ages be plotted side by side to better evaluate the differences for each cell type/region?

      Data is plotted side by side

      Why was the one-month time point chosen as this could still represent the developing and not mature murine brain? 

      Reviewer correctly noted that a 1 month brain is still developing, but mostly from the behavioral and circuit maturation standpoint. However, from cell division and neurogenesis perspective, that is considered to be complete by first postnatal month, with neuron production thereafter largely restricted to specialized adult niches in the dentate gyrus and subventricular zone–olfactory bulb pathway; these adult neurogenic stem cells are embryonically derived and are regulated in ways that are distinct from the early, expansionary developmental waves of neurogenesis. In our study we performed our measurements in the cortical areas only. (Caviness et al., 1995, PMID: 7482802; Ansorg et al., 2012, PMID: 22564330; Ming & Song, 2011, PMID: 21609825; Bond et al., 2015, PMID: 26431181; Bond et al., 2021, PMID: 33706926; Bartkowska et al., 2022, PMID: 36078144). Also, in Figure 6A it was incorrectly mentioned to be just 1month, we rechecked our metadata and noted that young brains were comprised of 1 and 2 month old brains and now it has been corrected.

      Furthermore, can the authors describe which sex of mice was used in these experiments and the justification if a single sex was used? If both sexes were used, were there any dimorphic differences in POLK localization patterns?

      This is an important aspect, but in the beginning to keep mice numbers within manageable limits, we were focusing more on the age component. While both males and female brains were assayed but due to uneven sample distribution between sexes, we could not estimate if there were any statistically significant sexual dimorphic differences in IN, PN and NNs. Future studies will investigate the sex component as a function of age.

      The suggestion of POLK as a brain aging clock may be a bit premature as the functional and behavioral consequences of cytoplasmic POLK sequestration are not fully known. Furthermore, investigation of POLK levels in other genetic models of neurodegeneration or with gerotherapeutics would be needed to establish if the POLK brain clock is responsive to changes that shift brain aging. Lastly, this clock may be impractical and not useful for longitudinal studies due to the terminal nature of assessing POLK levels.

      We agree as a terminal assay POLK nucleo-cytoplasmic status is not practical for longitudinal studies. However, we believe it may serve an investigative/correlative endogenous signal for determining tissue age, that may be useful to "date" brain sections, since not many such cell biological markers exist. We have added clarification text.

      Some discussion of the Polk-null mice is warranted, as they only have a slightly shortened lifespan, and any disease phenotypes were not reported. This stands in contrast to other DNA repair-deficient mice that mimic premature aging and show behavioral and motor deficits. This calls into question the role of POLK in brain aging.

      Discussion statements on Polk-null mice has been added.

      Please correct the catalog number for the SCBT anti-POLK antibody to sc-166667

      Typographical error has been corrected

      Reviewer #2 (Recommendations for the authors):

      Results:

      Figure by figure 

      (1) A progressive age-associated shift in subcellular localization of POLK The authors state that POLK has not been studied in nervous tissue before and they want to see if it is expressed, and if it changes subcellular location as a function of age. The authors argue age = stress like that seen in previous models using genotoxic agents and cancer cells. Indeed, POLK seems to convincingly change subcellular location from the nucleus to larger cytosolic puncta. 

      (2) Nuclear POLK co-localizes with DNA damage response and repair proteins This was a difficult dataset for me to decipher. To me, it appears as though POLK colocalizes with these examined proteins in the CYTOSOL, not the nucleus. Especially, in the oldest mice.

      We added in the discussion that DNA repair proteins were observed to be present in the cytoplasm and biomolecular condensates citing relevant reviews and primary references.

      (3) POLK in the cytoplasm is associated with stress granules and lysosomes in old brains LAMP1 has some issues as a lysosome marker. The authors even state it can be on endosomes. It would be nice to use a marker for mature lysosomes, some fluorescent reporter that is activated only by lysosomal proteases or pH. It is also of interest if POLK is localized to the membrane or the inside of these structures. The authors have access to an airyscan which is sufficient to examine luminal vs membrane localization on larger organelles like lysosomes.

      We thank the reviewer for pushing us to investigate the nature of cytoplasmic POLK in endo-lysosomal compartments. We have now added a full-page figure on the cell biological results from six different markers, subset (Cathepsin B and D) are known to present in the lumens of endo-lysosomes, in Figure 3. Further high-resolution membrane vs lumen was not pursued, which is perhaps better suited in cultured neurons rather than thick fixed tissues.

      (4) Differentially altered POLK subcellular expression amongst excitatory, inhibitory, and nonneuronal cells in the cortex.

      This seems fine. I don't see anything wrong with the author's statement that there is more POLK in neurons vs non-neuronal cells. 

      (5) Microglia associated with IN and PN have significantly higher levels of cytoplasmic POLK I don't see really any convincing evidence of the author's claim here. They find a difference at early-old age, but not at old-old, or other ages. This is explained by "However, this effect is lost in late-old age (Figure 5D), likely due to the MG-mediated removal of the INs.". But no trend being observed, no experiment to show sufficiency, and no experiment to uncover a directional relationship; this is a tough claim to stand by.

      Changes made in text to reflect speculative nature of this observation

      (6) Subcellular localization of POLK is regulated by neuronal activity

      Interesting and fairly difficult experiment. Can the authors talk more about what these values mean? I am confused as to why there is a decline in nuclear puncta at 80 min. Also, why are POLK counts in 6c similar at baseline between young and early-old? In Figures 5 and 6 I also worry about statistical analysis. Are all assumptions checked to use t-tests? Why not always use a test that has fewer assumptions?

      We have explained in the text the artificial nature of few hour long acute slice preparations is very different and inherently a stressful environment, especially for the old brains, compared to the vascular perfused PFA fixed brain tissues tested between young and old ages.

      We don’t have a proper explanation for the initial dip in nuclear puncta in both young and old brains at 80min of very similar magnitude. It could be a separate biological phenomenon that occurs at much shorter time scales that would not otherwise be captured in a fixed tissue assay and needs careful investigation using live tissue fluorescence imaging that is beyond the scope of this manuscript.

      We apologize for the typographical error in the figure legend. We rechecked our R code and the tests were all Wilcoxon rank-sum (Mann–Whitney U) two-sided nonparametric.

      Figure 6B & E had absurdly small p values due to large sample numbers. So, we implemented random sampling of 100 cells repeating for 200 times and presented the distribution of p values and Cohen’s d in the supplement and reported the median p value and Cohen’s in the main plot.

      (7) POLK as an endogenous "aging clock" for brain tissue

      Trainable model. What are the criteria for the model, and how does it work? The cutoffs it uses to classify each age group might be interesting in that the model may have identified a trait the researchers were unaware of. Otherwise, it is not especially useful. Maybe as an independent 'blind' analysis of the data?

      We have added a better description of the models, assumptions and how two different unsupervised approaches converge on the same set of features with high AUROCs.

      Minor questions:

      The cartoons (1a, 2a-b, 5a, 6a) help a lot. However, I still had to work a bit to understand some of the graphs (e.g., 5d, 6b-e, fig 7). Is there a simpler way to present them? Maybe simply additional labelling? I'm not sure.

      A more thorough discussion of statistical tests is warranted I think. I am not very clear why some were chosen (t-test vs nonparametric with fewer assumptions). Infinitesimally small p values also make me think maybe incorrect tests were done or no power analysis was performed beforehand. A fix for this is just discussing what went into the testing methods and why they were chosen.

      Statistical analysis for Fig2 (using Generalized Estimating Equations), and Fig6 (with random repeated subsampling; method explained in text, figure legend updated and supplementary data on the distribution of p values and cohen’s d are added) to address the very small p values. Descriptions rewritten in relevant text.

      In the absence of further mechanistic experiments, it would still be interesting to hear what the authors think is going on and what the significance of this altered subcellular location means. How do the authors think this is occurring? I think they are arguing that cytosolic localization of POLK is 100% detrimental to the neuron. ("The reduction of nuclear POLK in old brains is congruent with an increase in DNA damage markers") Do they have any idea what the 'bug' is in the POLK system then?

      Statements in the discussion has been added.

      Reviewer #3 (Recommendations for the authors):

      POLK is detected as small " as small "speckles" inside the nucleus at a young age (1-2 months) and larger "granules" can be seen in the cytoplasm at progressively older time points (>9 months). In the nucleus, is POLK bound to DNA? In the cytoplasm, how are the POLK molecules organized: are they bound to a substrate or are they just organized as a proteins condensate without DNA?

      In human U2OS cell line Dnase1 treatment leads to loss of POLK from the nucleus as well as its activity as reported in Fig5 of Paul, S. et. al. 2023 bioRxiv. While we haven’t reproduced these results in mouse primary neurons, we anticipate a similar situation which will be tested in the future. We have addressed limited aspects of the POLK in the cytoplasm in all new Fig3 with six endo-lysosomal markers, and added text.

      When POLK proteins accumulate in the cytoplasm in aging cells, do they also repair condensates in the cytoplasm? What is the function of cytoplasmic POLK granules? More generally, is it known if other granules or foci, such as repair foci are found in the cytoplasms in aging cells, or in cells under stress?

      Six markers for endo-lysosomes were tested to characterize the cytoplasmic granules now shown in Fig3.

      While the authors quantify the number and sizes of the POLK signal, they don't discuss their physical nature. Some membrane-less condensates exhibit liquid-like properties, such as stress granules, P-bodies, or in the nucleus some repair condensates. In some diseased tissues, some condensates lose their liquid properties and become solid-like. Is it known if POLK condensates behave like liquid condensates or they are simply formed by bound molecules on DNA? Since they are larger and fewer in the cytoplasm, is it because several small puncta fused together to form a larger one? It would be worthwhile to discuss these points.

      Discussion statements on the nature of condensates in context of the POLK cytoplasmic signal has been added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for great suggestions.

      (1) The X-axis labels in some panels in Figure 2C and Supplementary Figure 2B overlap, making them difficult to read. Adjusting the label spacing or formatting would improve clarity.

      We thank the reviewer for the comment. All panels including Figure 2C and Supplementary Figure 2B, have now been organized the way in which X-axis labels are easily read.

      (2) In the scatter dot plot bar diagrams, it appears that n=3 for most of the data. Does this represent the number of mice used or the number of tissue sections per sample? This should be clarified in the figure legends for better transparency. 

      Great suggestion. In Results (page 7, lines 135-136), we now clarified that quantification was performed on every tenth section of the brain from 3 female and 3 male mice. Additionally, in the legends for scatter dot plot bar diagrams we also mentioned that n=3 represents the number of mice used.

      (3) In Supplemental Figure 2B, the positive signals are not clearly visible. Providing higher-magnification images is recommended.

      Great suggestion. The revised Supplemental Figure 2B, but also Figure 2A, now provide higher magnification inset images with distinctive positive signals.

      Reviewer #2:

      We thank the reviewer for great and critical suggestions.

      (1) Introduction:

      Line 58: References should be provided for this statement as it is based on a robust field of research, not on a new concept.

      We thank the reviewer for the comment. We have now included relevant references as suggested (page 4, line 58).

      (2) Line 100-102: This sentence seems to make new, an idea that has been well-documented since the late 1970s. Posterior pituitary hormones oxytocin and vasopressin have long been known to have multiple peripheral targets, and at least a subset of vasopressin and oxytocin neurons have robust central projections. The central targets have been the focus of study for numerous labs. Reference 34 does not relate to posterior pituitary hormones and seems mis-cited.

      We have changed this sentence, excluded the reference that does not relate to posterior pituitary hormones and added 4 further references reporting other non-traditional roles of vasopressin and oxytocin (page 6, lines 100-102).

      (3) Lines 102-108: While the regulation of bone is an interesting example of an under-appreciated impact of vasopressin, the example does not build to the rationale for examining central Avp and Avpr1a expression.

      We mean no disrespect here, but we have recently reported neural brain-bone connections using the SNS-specific PRV152 virus (Ryu et al., 2024; PMID: 38963696) and submitted Single Transcript Level Atlas of Oxytocin and the Oxytocin Receptor in the Mouse Brain (doi: https://doi.org/10.1101/2024.02.15.580498). Surprisingly, we detected Avpr1a and Oxtr expression in certain brain areas (for example, PVH and MPOM) that connect to both bone and adipose tissue through the SNS—raising an important question regarding a central role of Avpr1a and Oxtr in bodily mass and fat regulation. 

      (4) Line 111: Avp expression and Avpr1a expression have both been studied using in situ hybridization. Thus, the overall concept is less novel than hinted at in the text. Avp expression has been studied quite extensively. Avpr1a expression has not been studied in an exhaustive fashion. 

      We thank the reviewer for this comment and absolutely agree that brain AVP expression has been studied extensively. As with the Avpr, we believe that RNAscope probe design and signal amplification system employed in our study allow for more specific and sensitive detection of individual RNA targets at the single transcript level with much cleaner background noise comparing to in situ hybridization method. 

      (5) Results:

      Line 143: RNAscope is indeed a powerful method of detecting mRNA at the single transcript level. However, using that single transcript resolution only to provide transcript per brain region analysis, losing all of the nuance of the individual transcript expression, seems like a poor use of the method potential.

      This is a good point and we did notice that Avpr1a transcript expression in several brain nuclei displayed individual pattern of expression versus more ubiquitous expression in most of the other brain areas. We noted this finding in Results (page 9, lines 164-168); however, because of the word limits in Discussion, we are not sure what would be dropped to make more room and whether this is truly necessary.

      (6 &7) Line 135: Sections were coded from 3 males and 3 females. I would argue that there is not enough statistical power to make inferences regarding sex differences or regional differences. In fact, the authors did not provide any statistical analysis in the manuscript at all, even though they stated they had completed statistical tests on the methods.

      150-157: All statements regarding sex differences in expression are made without statistical analyses, which, if conducted, would be underpowered. Given the limitations of performing and analyzing RNAscope data en masse a low n is understandable, but it requires a much more precise description of the data and a more careful look at how the results can be interpreted.

      We thank the reviewer for these comments. We mean no disrespect here, but while statistical analysis for main brain regions is relevant, it is not meaningful as far as nuclei, sub-nuclei and regions are concerned. It is noteworthy to mention that RNAscope data analysis in the whole mouse brain is an extremely drawn-out process requiring almost 2 months to conduct exhaustive manual counting of single Avpr1a transcripts in a single mouse brain—authors analyzed 6 brains. That said, statistical tests have been performed and exact P values are now shown in graphs.

      (8) Line 146: I am flagging this instance, but it should be corrected everywhere it occurs. Since we cannot know the gender of a given mouse, I would recommend referring to the mouse's "sex" rather than its "gender."

      Good suggestion. We made adequate changes throughout the manuscript.

      (9) Line 153: The authors switch to discussing cell numbers. Why is this data relegated to the supplemental material?

      Main figures in the manuscript report Avp and Avpr1a transcript density which has more important biological significance in terms of signal efficiency and cellular response dynamics. Due to the graph abundancy in the main text, we included all graphs with Avp and Avpr1a transcript counts in the supplemental material.

      (10) Methods:

      Line 369: "For simplicity and clarity, exact test results and exact P values are not presented." Simplicity or clarity is not a scientific rationale not to provide accurate statistics.

      We now provide exact P values in the graphs and the sentence in line 369 has been corrected accordingly (page 18, lines 379-380).

      (11) Line 362: The description of how data were analyzed is inadequate. More detail is needed.

      Agreed. We now included a detailed description on how data was analyzed (page 18, lines 365-374).

      (12) Discussion:

      Line 321: "This contrasts the rudimentary attribution of a single function per brain area." While brain function is often taught in such rudimentary terms to make the information palatable to students, I do not think the scientific literature on vasopressin function published over the past 50 years would suggest that we are so naïve in interpreting the functional role of vasopressin in the brain. Clearly, vasopressin is involved in numerous brain functions that likely cross behavioral modalities.

      Agreed and we removed this sentence.

      (13) Line 322: "The approach of direct mapping of receptor expression in the brain and periphery provides the groundwork." On its face, this statement is true, but the present data build on the groundwork laid by others (multiple papers from Ostrowski et al. in the early 1990s).

      Agreed.

      (14) Figures:

      Figure 1: 1B, I do not know the purpose of creating graphs with single bars (3V, ic, pir-male, and pir-female); there are no comparisons made in the graph. In the graphs with many brain regions, very little data can be effectively represented with the scale as it is. I recommend using tables to provide the count/density data and making graphs of only the most robust areas. In addition, although there is no statistical comparison, combining males and females in the same graphs might be beneficial to make a visual comparison easier. Why were cell counts only included in the supplemental material? Is that data not relevant?

      We thank the reviewer for this comment. Now all figures are presented in a more effective and aesthetically pleasing way.

      (15) There is a real missed opportunity to highlight some of the findings. For example, cell counts and density measures are provided for regions in the hippocampus, thalamus, and cortex that are not typically reported to contain vasopressin-expressing cells. Photomicrographs of these locations showing the RNAscope staining would be far more impactful in reporting these data. Are there cells expressing Avp, or is the Avp mRNA in these areas contained in fibers projecting to these areas from hypothalamic and forebrain sources?

      Great suggestion. We now see in Figure 1D showing novel Avp transcript expression in the hippocampus, thalamus and cortex. Based on counterstained hematoxylin staining, Avp mRNA transcripts were found in somata.

      (16) Figure 1C legend suggests images of the hippocampus and cortex, but all images are from the hypothalamus. Abbreviations are not defined.

      Thank you for the comment. We corrected Figure 1C legend and separately included Figure 1D showing novel Avp mRNA expression in the hippocampus and cortex.

      (17) Figure 2: The analysis of Avpr1a suffers from some of the same issues as the Avp analysis. In Figure 2A, the photomicrographs do not do a very good job of illustrating representative staining. The central canal image does not appear to have any obvious puncta, but the density of Avpr1a puncta suggests something different. The sex difference in the arcuate is also not clearly apparent in representative images. There is minimal visualization of the data for a project that depends so heavily on the appearance of puncta in tissue, coupled with the lack of clarity in the images provided, greatly diminished the overall enthusiasm for the data presentation. The figures in 2C would be more useful as tables with graphs used to highlight specific regions; as is, most of the data points are lost against the graph axis. Photomicrographs would also provide a better understanding of the data than graphs.

      Great suggestion. The revised Figure 2A but also Supplemental Figure 2B now provide higher magnification inset images with distinctive positive signals. As with Figures 2C, we arranged all graphs in a more effective and aesthetically pleasing manner.

      (18) Figure 3: Given the low number of animals and, therefore, low statistical power, I do not think that illustrating the ratios of male to female is a statistically valid comparison.

      Please see response to Point 6 & Point 7.

      (19) Figure 4: Pituitary is an interesting choice to analyze. However, why was only the posterior pituitary analyzed? Were Avp transcripts contained in terminals of vasopressin neuron axons or other cells? Was Avpr1a transcript present in blood vessel cells where Avp is released? A different cell type? Why not examine the anterior pituitary, which also expresses Avp receptors (although the literature suggests largely Avpr1b)?

      Thank you for the great comment. We included only posterior pituitary because there were no positive Avp/Avpr1a transcripts found in the anterior pituitary. Unfortunately, we have not performed cell type-specific staining, which would have enabled greater variation in AVP and its receptor expression across various cell types.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript titled, "Sleep-Wake Transitions Are Impaired in the AppNL-G-F Mouse Model of Early Onset Alzheimer's Disease", is about a study of sleep/wake phenomena in a knockin mouse strain carrying "three mutations in the human App gene associated with elevated risk for early onset AD". Traditional, in-depth characterization of sleep/wake states, EEG parameters, and response to sleep loss are employed to provide evidence, "supporting the use of this strain as a model to investigate interventions that mitigate AD burden during early disease stages". The sleep/wake findings of earlier studies (especially Maezono et al., 2020, as noted by the authors) were extended by several important, genotype-related observations, including age-related hyperactivity onset that is typically associated with increased arousal, a normal response to loss of sleep and to multiple sleep latency testing, and a stronger AD-like phenotype in females. The authors conclude that the AppNL-G-F mice demonstrate many of the human AD prodromal symptoms and suggest that this strain may serve as a model for prodromal AD in humans, confirming the earlier results and conclusions of Maezono et al. Finally, based on state bout frequency and duration analyses, it is suggested that the AppNL-G-F mice may develop disruptions in mechanism(s) involved in state transition.

      Strengths:

      The study appears to have been, technically, rigorously conducted with high quality, in-depth traditional assessment of both state and EEG characteristics, with the concordant addition of activity and temperature. The major strengths of this study derive from observations that the AppNL-G-F mice: (1) are more hyperactive in association with decreased transitions between states; (2) maintain a normal response to sleep deprivation and have normal MSLT results; and (3) display a sex specific, "stronger" insomnia-like effect of the knockin in females.

      Weaknesses:

      The weaknesses stem from the study's impact being limited due to its being largely confirmatory of the Maezono et al. study, with advances of importance to a potentially more focused field. Further, the authors conclude that AppNL-G-F mice have disrupted mechanism(s) responsible for state transition; however, these were not directly examined. The rationale for this conclusion is stated by the authors as based on the observations that bouts of both W and NREM tend to be longer in duration and decreased in frequency in AppNL-G-F mice. Although altered mechanism(s) of state transition (it is not clear what mechanisms are referenced here) cannot be ruled out, other explanations might be considered. For example, increased arousal in association with hyperactivity would be expected to result in increased duration of W bouts during the active phase. This would also predictably result in greater sleep pressure that is typically associated with more consolidated NREM bouts, consistent with the observations of bout duration and frequency.

      Reviewer 1 succinctly summarizes the advances of this study beyond the ground-breaking Maezono et al (2020) study of this “humanized” mouse model exhibiting amyloid deposition. Whereas Maezono et al. conducted sleep/wake studies on male App<sup>NL-G-F</sup> mice at 6 and 12 months of age, we had the unusual opportunity to study both sexes of homozygous App<sup>NL-G-F</sup> mice and WT littermates at 14-18 months of age and to conduct a longitudinal assessment of many of the same individuals at 18-22 months. In addition to baseline sleep/wake and EEG spectral analyses, we (1) measured subcutaneous body temperature and activity to obtain a broader picture of the physiology and behavior of this strain at advanced ages; (2) assessed baseline sleepiness in this strain using the murine version of the clinically-relevant Multiple Sleep Latency Test (MSLT); (3) evaluated the response of App<sup>NL-G-F</sup> mice and WT littermates to a perturbation of the sleep homeostat; (4) compared the sleep/wake characteristics of male vs. female App<sup>NL-G-F</sup> mice at 18-22 months and, (5) to assess the stability of the phenotypes, analyzed these data over a continuous 14-d recording rather than the conventional 24h recordings typical of most sleep/wake studies including Maezono et al. We found that a long wake/short sleep phenotype was characteristic of homozygous App<sup>NL-G-F</sup> mice at these advanced ages which is also evident in the Maezono et al. (2020) study at 12 months of age (but not at 6 months), although the authors do not comment on this phenotype and instead focus on the reduced REM sleep which is particularly evident in female App<sup>NL-G-F</sup> mice in our study. Remarkably, despite being awake ~20% longer per day, we find that App<sup>NL-G-F</sup> mice are no sleepier than WT mice as determined by the MSLT and that their sleep homeostat is intact when challenged by 6-h sleep deprivation. At both advanced ages, the long wake/short sleep phenotype is due primarily to longer Wake bouts and shorter bouts of both NREM and REM sleep during the dark phase. Moreover, hyperactivity develops in older in App<sup>NL-G-F</sup> mice, particularly females, which contributes to this phenotype. We agree with Reviewer 1 that “hyperactivity would be expected to result in increased duration of W bouts during the active phase” and that this could result in more consolidated NREM bouts and we will modify the manuscript to discuss this alternative. However, the suggestion of greater sleep pressure is not borne out by the MSLT studies as we did not observe the shorter sleep latencies and increased sleep during the nap opportunities on the MSLT that we have observed in other mouse strains. Moreover, due to their short sleep phenotype, App<sup>NL-G-F</sup> mice would be entering the sleep deprivation study with a greater sleep debt than WT mice, yet we did not observe greater EEG Slow Wave Activity in this strain during recovery from sleep deprivation. Thus, we have suggested that App<sup>NL-G-F</sup> mice are unable to transition from Wake to sleep as readily as their WT littermates. Our observations summarized above set the stage for subsequent mechanistic studies in aged App<sup>NL-G-F</sup> mice, although realistically, mice of this age and genotype are a rare commodity.

      Reviewer #2 (Public review):

      Summary:

      The authors have used a knock-in mouse model to explore late-in-life amyloid effects on sleep. This is an excellent model as the mutated genes are regulated by the endogenous promoter system. The sleep study techniques and statistical analyses are also first-rate.

      The group finds an age-dependent increase in motor activity in advanced age in the NLGF homozygous knock-in mice (NLGF), with a parallel age-dependent increase in body temperature, both effects predominate in the dark period. Interestingly, the sleep patterns do not quite follow the sleep changes. Wake time is increased in NLGF mice, and there is no progression in increased wake over time. NREMS and REM sleep are both reduced, and there is no progression. Sleep-wake effects, however, show a robust light:dark effect with larger effects in the dark period. These findings support distinct effects of this mutation on activity and temperature and on sleep. This is the first description of the temporal pattern of these effects. NLGF mice show wake stability (longer bout durations in the dark period (their active period) and fewer brief arousals from sleep. Sleep homeostasis across the lights-on period is normal. Wake power spectral density is unaffected in NLGF mice at either age. Only REM power spectra are affected, with NLGF mice showing less theta and more delta. There are interesting sex differences, with females showing no gene difference in wake bout number, while males show a gene effect. Similarly, gene effects on NREM bout number seem larger in males than in females. Although there was no difference in homeostatic response, there was normalization of sleep-wake activity after sleep deprivation.

      Strengths:

      Approach (model extent of sleep phenotyping), analysis.

      Weaknesses:

      The weaknesses are summarized below and are viewed as "addressable".

      (1) The term insomnia. Insomnia is defined as a subjective dissatisfaction with sleep, which cannot be ascertained in a mouse model. The findings across baseline sleep in NLGF mice support increased wake consolidation in the active period. The predominant sleep period (lights on) is largely unaffected, and the active period (lights off) shows increased activity and increased wake with longer bouts. There is a fantastic clue where NLGF effects are consistent with increased hypocretinergic (orexinergic) neuron activity in the dark period, and/or increased drive to hypocretin neurons from PVH.

      (2) Sleep-wake transitions are impaired: This should not be termed an impairment. It could actually be beneficial to have greater state stability, especially wake stability in the dark or active period. There is reduced sleep in the model that can be normalized by short-term sleep loss. It is fascinating that recovery sleep normalized sleep in the NLGF in the immediate lights-on and light-off period. This is a key finding.

      Reviewer 2 suggests a provocative hypothesis to test. Curiously, although a recent Science paper suggests that hyperexcitable hypocretin/orexin neurons in aging mice results in greater sleep/wake fragmentation, hyperexcitability of this system could result in hyperactivity and longer wake bouts in aged App<sup>NL-G-F</sup> mice.

      Reviewer #3 (Public review):

      Summary:

      In this study, Tisdale et al. studied the sleep/wake patterns in the biological mouse model of Alzheimer's disease. The results in this study, together with the established literature on the relationship of sleep and Alzheimer's disease progression, guided the authors to propose this mouse model for the mechanistic understanding of sleep states that translates to Alzheimer's disease patients. However, the manuscript currently suffers from a disconnect between the physiological data and the mechanistic interpretations. Specifically, the claim of "impaired transitions" is logically at odds with the observed increase in wake-state stability or possible hyperactivity. Additionally, the description of the methods, the quantification, and the figure presentation could be substantially improved. I detail some of my concerns below.

      Strengths:

      The selection of the knock-in model is a notable strength as it avoids the artifacts associated with APP overexpression and more closely mimics human pathology. The study utilizes continuous 14-day EEG recordings, providing a unique dataset for assessing chronic changes in arousal states. The assessment of sex as a biological variable identifies a more severe "insomniac-like" phenotype in females, which aligns with the higher prevalence and severity of Alzheimer's disease in women.

      Weaknesses:

      The study seems to lack a clear hypothesis-driven approach and relies mostly on explorative investigations. Moreover, lack of quantitative analytical methods as well as shaky logical conclusions, possibly not supported by data in its current form, leaves room for major improvement.

      Since this paper studied sleep states, the "Methods" section is quite unclear on what specific criteria were used to classify sleep states. There is no quantitative description of classifying sleep based on clear, reproducible procedures. There are many reasonably well-characterized sleep scoring systems used in rat electrophysiological literature, which could be useful here. The authors are generally expected to describe movement speed and/or EMG and/or EEG (theta/delta/gamma) criteria used to classify these epochs. The subjective (manual) nature of this procedure provides no verifiable validation of the accuracy and interpretability of the results.

      One of the bigger claims is that "state transition mechanism(s)" are impaired. However, Figure 7 shows that model mice exhibit significantly more long wake bouts (>260s) and fewer short wake bouts (<60s). Logically, an "impaired switch" (the flip-flop model, Saper et al., 2010) results in state fragmentation. The data here show the opposite: the wake state has become too stable. This suggests the primary defect is not in the transition mechanism itself, but possibly in a pathological increase in arousal drive (hyper-arousal), likely linked to the dark-phase hyperactivity shown in Figures 4 and 5. Also, a point to note is that this finding is not new.

      Figure 3 heatmaps lack color bars and units. Spectral power must be quantitatively defined and methods well-explained in the Methods section. Without these, the reader cannot discern if the "reduced power" in females is a global suppression of signal or a frequency-specific shift. Additionally, the representative example used to claim shorter sleep bouts lacks the statistical weight required for a major physiological conclusion. How does a cooler color (not clear what range and what the interpretation is) mean shorter sleep bout in female mice? The authors should clearly mark the frequency ranges that support their claims. In this figure, there is a question mark following the theta/delta range. The authors should avoid speculation and state their claims based on facts. They should also add the theta and delta ranges in the plot, such that readers can draw their own conclusions.

      Figure 8 and the MSLT results show that model mice are "no sleepier than WT mice" and have a functional homeostatic rebound. This presents a logical flaw in the "insomnia" narrative. True insomnia in AD patients typically involves a failure of the homeostatic process or a debilitating accumulation of sleep debt. If these mice do not show increased sleepiness (shorter latency) despite ~19% less sleep, the authors might be describing a "reduced need" for sleep or a "hyper-aroused" state, possibly not a clinical insomnia phenotype.

      In Figure 9, LFP power shown and compared in percentages is problematic, as LFP power distribution is known to be skewed (follows power law). This is particularly problematic here because all the frequencies above ~20 Hz seem to be totally flattened or nonexistent, which makes this comparison of power severely limited and biased towards the relative frequency in the highly skewed portion of the LFP power spectrum, i.e., very low frequency ranges like delta, theta, and possibly beta. This ignores low, mid, and high gamma as well as ripple band frequencies. NREM sleep is known to have relatively greater ripple band (100-250 Hz) power bursts in hippocampal regions, and REM sleep is known to have synchronous theta-gamma relationships.

      We agree with the reviewer that the “Classification of arousal states” section was missing the key description of how we scored the recordings into arousal states based on EEG, EMG and locomotor activity; this was an oversight as the corresponding text exists in all our previous sleep/wake studies published over several decades. Reviewer 1 also points out the alternative interpretation that “the wake state has become too stable.” However, I think we are using different words to say the same thing: that the transition from wake to sleep is impaired whether it is due to hyperarousal or to a defect in the flip/flop switch that results in greater Wake stability. We will revise Fig 3 (Reviewer 2 suggests combining with Fig 14) but note that the X-axis is labelled 0-25 Hz and that this figure was intended to be descriptive -- illustrating how unusual the female App<sup>NL-G-F</sup> mice are relative to WT -- rather than a quantitative analysis of spectral power as in Fig. 14. Both Reviewer 2 and 3 suggest that we are using “insomnia” incorrectly, which we have simply used to describe less sleep per 24h period. Reviewer 2 states that “Insomnia is defined as a subjective dissatisfaction with sleep” and Reviewer 3 suggests a narrow definition of insomnia as due only to “a failure of the homeostatic process or a debilitating accumulation of sleep debt.” In a revised manuscript, we will define “insomnia” as an operational term to succinctly mean “less sleep”. Regarding the problem of presenting spectral power in percentages, we completely agree with the reviewer. However, we intentionally presented spectral power density, a measure of relative power, as in Figure 3A and 3B of Maezono et al. (2020). At the risk of making Fig. 9 even more busy, we will revise Fig. 9 to add labels for all Y-axes.

      In addition to a revised Fig. 9, in the revised manuscript, we will reformat Tables 1-3, Figs. S1 and S2 for legibility and correct an error in Fig. 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses an important clinical challenge by proposing muscle network analysis as a tool to evaluate rehabilitation outcomes. The research direction is relevant, and the findings suggest further research. The strength of evidence supporting the claims is, however, limited: the improvements in function are not directly demonstrated, the robustness of the method is not benchmarked against already published approaches, and key terminology is not clearly defined, which reduces the clarity and impact of the work.

      Comments:

      There are several aspects of the current work that require clarification and improvement, both from a methodological and a conceptual standpoint.

      First, the actual improvements associated with the rehabilitation protocol remain unclear. While the authors report certain quantitative metrics, the study lacks more direct evidence of functional gains. Typically, rehabilitation interventions are strengthened by complementary material (e.g., videos or case examples) that clearly demonstrate improvements in activities of daily living. Including such evidence would make the findings more compelling.

      We thank the reviewer for their careful consideration of our work. We agree that direct evidence for the functional gains achieved by patients is important for establishing the efficacy of a clinical intervention and that this evidence should provide comprehensive insights for clinicians, from videos to case examples as suggested. Our aim here was apply a novel computational framework to a cohort of patients undergoing rehabilitation, and in doing so, provide empirical support for its utility in standardised motor assessments. We have shown that our novel approach can identify distinct physiological responses to VR vs PT conditions across the post-stroke cohort (see Fig.2B and associated text). Hence, although the data contains virtual reality vs. conventional physical therapy experimental conditions which likely holds important insights into the clinical use case of virtual reality interventions, we did not focus on such complementary evidence in this study. In future work, research groups (including our own) investigating the important question of clinical intervention efficacy will likely gain unique and useful mechanistic insights using our approach.

      Moreover, a threshold of 5 points at the FMA-UE was considered as MCID, to distinguish between responder and non-responder patients, which represents an acknowledged and applicable measure in the clinical field. The use of single cases represents low evidence of change from the perspective of expert clinicians, raising concerns on the clinical meaningful of reported results. All this given, we chose to provide stronger evidence of clinical effect (i.e. comparison between responders and non-responders) interpreted from the perspective of muscle synergies, than to support our results in single selected cases, representing a bias in terms of translation to population of people survived to a stroke.

      Second, the claim that the proposed muscle network analysis is robust is not sufficiently substantiated. The method is introduced without adequate reference to, or comparison with, the extensive literature that has proposed alternative metrics. It is also not evident whether a simpler analysis (e.g., EMG amplitude) might produce similar results. To highlight the added value of the proposed method, it would be important to benchmark it against established approaches. This would help clarify its specific advantages and potential applications. Moreover, several studies have shown very good outcomes when using AI and latent manifold analyses in patients with neural lesions. Interpreting the latent space appears even easier than interpreting muscle networks, as the manifolds provide a simple encoding-decoding representation of what the patient can still perform and what they can no longer do.

      To address the reviewers concerns regarding adequate evidence for the claims made about the presented framework, we have now included an application of the conventional muscle synergy analysis approach based on non-negative matrix factorisation to the post-stroke cohort (see Supplementary materials Fig.5 and associated text). We made efforts to make this comparison as fair as possible by applying the conventional approach at the population level also and clustering the activation coefficients using a similar yet more conventional approach, agglomerative clustering. Accompanying the output of this application, we have included several points of where our framework improves significantly upon conventional muscle synergy analysis:

      “Comparison with conventional approaches

      To more directly illustrate the advantages of the proposed framework, we carried out a standardised pre-processing of the EMG data in line with conventional muscle synergy analysis. This included rectification, low-pass filtration (cut-off: 20Hz) and smooth resampling of EMG waveforms to 50 timepoints. All data for each participant at each session was separately normalised by channel-wise variance, concatenated together and input into non-negative matrix factorisation (NMF) ('nnmf' Matlab function, 10 replications) to extract 11 muscle synergies (W1-11 of Supplementary Materials Fig.5(Left)) and their time-varying activations. The number of components to extract was determined in a conventional way as the number of components required to explain >75% of the data variance. The extracted muscle synergies included distinct shoulder- (e.g. W2), elbow (e.g. W8) and forearm-level (e.g. W1) muscle covariation patterns along with more isolated muscle contributions (e.g. UT in W3, TL in W10).

      Regarding the clustering results of our framework and how they compare to conventional approaches, to facilitate this comparison we applied agglomerative clustering to the time-varying activation coefficients of all participants, trials, tasks separately for pre- and post-sessions and employed the 'evalclusters' Matlab function (Ward linkage clustering, Calinski Harabasz criterion, Klist search = 2:21) for each session. We identified two clusters both at pre-session (Criterion = 1.69) and post-session (Criterion = 1.81) as optimal fits to the population data (see Supplementary Materials Fig.5(Right)). We found no associations between pre- or post-session cluster partitions and participants FMA-UE scores. Nevertheless, we did identify significant associations between the pre-session clustering’s and S_Pre (X<sup>2</sup> = 7.08, p = 0.008) and between post-session clustering’s and conventionally-defined treatment responders (X<sup>2</sup> = 4.2, p = 0.04). These findings, along with the similar two-way clustering structure found using the NIF, highlights important commonalities between these approaches.

      To summarise the main advantages of our framework over this conventional approach:

      - Lower dimensionality and enhanced interpretability of extracted components.

      Our framework yields a lower number of population-level components that correspond more consistently to meaningful biomechanical and physiological functions.

      - Integration of pairwise muscle relationships.

      By incorporating muscle-pair level analysis, our framework captures coordinated interactions between primary and stabilising muscles—relationships that conventional NMF approaches overlook.

      - Separation of task-relevant and task-irrelevant activity.

      The NIF isolates task-relevant coordination patterns, distinguishing them from task-irrelevant interactions driven by biomechanical or task constraints. On the other hand, task-relevant and -irrelevant muscle contributions are intermixed in conventional muscle synergy analysis.

      - Ability to identify complementary functional roles.

      The NIF characterises whether muscle pairs act in similar or complementary ways, providing richer insight into motor control strategies.

      - Reduced dependence on variance-based optimisation.

      Unlike conventional methods that rely on maximising variance explained, our framework allows detection of subtle but functionally significant interactions that contribute less to total variance.

      - Improved detection of clinically relevant population structure.

      The clustering component of our framework revealed distinct post-stroke subgroups with important clinical relevance, distinguishing moderately and severely impaired cohorts and treatment responders and non-responders from pre-treatment data.”

      This supplementary analysis is referred to in the Methods section of the main text with reference to previous similar comparisons between our framework and conventional approaches:

      “Towards finding an effective approach to clustering participants in this data based on differences in impairment severity and therapeutic (non-)responsiveness, we found that conventional clustering algorithms (e.g. agglomerative, k-means etc.) could not provide substantive outputs (see Supplementary Materials Fig.5 and associated text for a direct comparison with conventional approaches), perhaps resulting from the complex interdependencies between the modular activations.”

      “To facilitate comparisons with existing approaches, we performed a conventional muscle synergy analysis on the post-stroke cohort (see Supplementary Materials Fig.5 and associated text). Further comparisons with conventional approaches can be found in our previous work (O’Reilly & Delis, 2022).”

      Further, we have also referred to a previous analysis of this post-stroke dataset using the conventional approach in the discussion section, where we point out how our approach can identify salient features of post-stroke physiological responses that conventional approaches cannot:

      “Further, the NIF demonstrated here an enhanced capability over traditional approaches to identify these crucial patterns, as earlier work on related versions of this dataset could not identify any differentiable fractionation events across the cohort (Pregnolato et al., 2025).”

      Overall, the utility of conventional muscle synergy analysis is well recognised across the field (Hong et al 2021). Our proposed approach builds on this conventional method by addressing key limitations to further enhance this clinical utility. We also agree that manifold learning approaches are an exciting area of research that we aim to incorporate into our framework in future research. Specifically, manifold learning methods like Laplacian eigenmaps can readily be applied to the co-membership matrix produced by our clustering algorithm, exploiting the geometry of this matrix to provide a continuous rather than discrete representation of population structure. We have highlighted this possibility in the discussion section:

      “Indeed, in future work, we aim to apply manifold learning approaches to the co-membership matrix derived from this clustering algorithm, providing a continuous representation of the population structure.”

      Third, the terminology used throughout the manuscript is sometimes ambiguous. A key example is the distinction made between "functional" and "redundant" synergies. The abstract states: "Notably, we identified a shift from redundancy to synergy in muscle coordination as a hallmark of effective rehabilitation-a transformation supported by a more precise quantification of treatment outcomes."

      However, in motor control research, redundancy is not typically seen as maladaptive. Rather, it is a fundamental property of the CNS, allowing the same motor task to be achieved through different patterns of muscle activity (e.g., alternative motor unit recruitment strategies). This redundancy provides flexibility and robustness, particularly under fatiguing conditions, where new synergies often emerge. Several studies have emphasized this adaptive role of redundancy. Thus, if the authors intend to use "redundancy" differently, it is essential to define the term explicitly and justify its use to avoid misinterpretation.

      We appreciate the reviewers concerns regarding the terminology employed in this study. Indeed, we agree that redundancy is seen in the motor control literature as a positive feature of biological systems, appearing to contradict the interpretations of the redundancy-to-synergy information conversion result we have presented. We also wish to highlight that across the motor control literature and beyond, the idea of redundancy is often conflated with the related but distinct notion of degeneracy. Traditional motor control research has also recognised this difference, for example, Latash has outlined this difference in the seminal work on motor abundance (https://doi.org/10.1007/s00221-012-3000-4). A key reference discussing this conflation and these two concepts in an information-theoretic way is found here: https://doi.org/10.1093/cercor/bhaa148. To summarise what their arguments mean for our work:

      - System degeneracy relates to the ability of different system components to contribute towards the same task in a context-specific way.

      - System redundancy corresponds to the degree of functional overlap among system components.

      Hence, conceptually speaking, informational redundancy as employed in our study (i.e. functionally-similar muscle interactions) links with system redundancy in that it quantifies the functional overlap of system components. This definition of system redundancy implies that it is an unavoidable by-product of degenerate systems (inefficient use of degrees of freedom) which should be minimised where possible. As a result of stroke, in our study and related previous work patients displayed increased informational redundancy, linking with the abnormal co-activations they typically experience for example and with previous results from traditional muscle synergy analysis showing fewer components extracted as a function of motor impairment post-stroke (i.e. higher informational redundancy) (Clark et al. 2010). Our novel contribution here is to convey how effective rehabilitation is underpinned by a redundancy-to-synergy information conversion across the muscle networks, relating in a loose sense conceptually to a reduction in system redundancy and enhancement of system degeneracy (i.e. functionally differentiated system components contributing towards task performance).

      Together, and alongside the mathematical descriptions of redundant (functionally-similar) and synergistic (functionally-complementary) information in what types of functional relationships they capture, we believe the intuition behind this finding has clear links with previous research showing a) the merging of muscle synergies in response to post-stroke impairment (i.e. functional de-differentiation), b) reduction in abnormal couplings with effective rehabilitation (i.e. functional re-differentiation). To communicate this more clearly to readers, we have included the following in the corresponding discussion section:

      “Previous research has shown that functional redundancy increases post-stroke (Cheung et al., 2012; Clark et al., 2010), reflecting the characteristic loss of functional specificity (i.e. functional de-differentiation) of muscle interactions post-stroke. Enhanced synergy with treatment here thus reflects the functional re-differentiation of predominantly flexor-driven muscle networks towards different, complementary task-objectives across the seven upper-limb motor tasks performed (Kim et al., 2024b), leading to improved motor function among responders.”

      Finally, we have screened the updated manuscript for consistent use of terminology including functional/redundant/synergistic.

      References

      Clark DJ, Ting LH, Zajac FE, Neptune RR, Kautz SA. Merging of healthy motor modules predicts reduced locomotor performance and muscle coordination complexity post-stroke. Journal of neurophysiology. 2010 Feb;103(2):844-57.

      Hong YN, Ballekere AN, Fregly BJ, Roh J. Are muscle synergies useful for stroke rehabilitation?. Current Opinion in Biomedical Engineering. 2021 Sep 1;19:100315.

      Latash ML. The bliss (not the problem) of motor abundance (not redundancy). Experimental brain research. 2012 Mar;217(1):1-5.

      O'Reilly D, Delis I. Dissecting muscle synergies in the task space. Elife. 2024 Feb 26;12:RP87651.

      Sajid N, Parr T, Hope TM, Price CJ, Friston KJ. Degeneracy and redundancy in active inference. Cerebral Cortex. 2020 Nov;30(11):5750-66.

      Reviewer #2 (Public review):

      Summary:

      This study analyzes muscle interactions in post-stroke patients undergoing rehabilitation, using information-theoretic and network analysis tools applied to sEMG signals with task performance measurements. The authors identified patterns of muscle interaction that correlate well with therapeutic measures and could potentially be used to stratify patients and better evaluate the effectiveness of rehabilitation.

      However, I found that the Methods and Materials section, as it stands, lacks sufficient detail and clarity for me to fully understand and evaluate the quality of the method. Below, I outline my main points of concern, which I hope the authors will address in a revision to improve the quality of the Methods section. I would also like to note that the methods appear to be largely based on a previous paper by the authors (O'Reilly & Delis, 2024), but I was unable to resolve my questions after consulting that work.

      I understand the general procedure of the method to be: (1) defining a connectivity matrix, (2) refining that matrix using network analysis methods, and (3) applying a lower-dimensional decomposition to the refined matrix, which defines the sub-component of muscle interaction. However, there are a few steps not fully explained in the text.

      (1) The muscle network is defined as the connectivity matrix A. Is each entry in A defined by the co-information? Is this quantity estimated for each time point of the sEMG signal and task variable? Given that there are only 10 repetitions of the measurement for each task, I do not fully understand how this is sufficient for estimating a quantity involving mutual information.

      We acknowledge the confusion caused here in how many datapoints were incorporated into the estimation of II. The number of datapoints included in each variable involved was in fact no. of timepoints x 10 repetitions. Hence for the EMGs employed in this analysis with a sampling rate of 2000Hz, the length of variables involved in this analysis could easily extend beyond 20,000 datapoints each. We have clarified this more specifically in the corresponding section of the methods:

      “We carried out this application in the spatial domain (i.e. interactions between muscles across time (Ó’Reilly & Delis, 2022)) by concatenating the 10 repetitions of each task executed on a particular side (i.e. variables of length no. of timepoints x 10 trials) and quantifying II with respect to this discrete task parameter codified to describe the motor task performed at each timepoint for each trial included.”

      In the previous paper (O'Reilly & Delis, 2024), the authors initially defined the co-information (Equation 1.3) but then referred to mutual information (MI) in the subsequent text, which I found confusing. In addition, while the matrix A is symmetrical, it should not be orthogonal (the authors wrote A<sup>T</sup>A = I) unless some additional constraint was imposed?

      We thank the reviewer for spotting this typo in the previous paper describing a symmetric matrix as A<sup>T</sup>A = I which is in fact related to orthogonality instead. To clarify this error, in the current study we have correctly described the symmetric matrix as A = A<sup>T</sup> here:

      “We carried out this application in the spatial domain (i.e. interactions between muscles across time (Ó’Reilly & Delis, 2022)) by concatenating the 10 repetitions of each task executed on a particular side (i.e. variables of length no. of timepoints x 10 trials) and quantifying II with respect to this discrete task parameter codified to describe the motor task performed at each timepoint for each trial included. This computation was performed on all unique m<sub>x</sub> and m<sub>y</sub> pairings, generating symmetric matrices (A) (i.e. A = A<sup>T</sup>) composed separately of non-negative redundant and synergistic values (Fig.5).”

      Regarding the reviewers point about the reference to MI after equation 1.3 of the previous paper where co-Information is defined, we were referring both to the task-relevant and task-irrelevant estimates analysed there collectively in a general sense as ‘MI estimates’ as they both are derived from mutual information, task-irrelevant being the MI between two muscles conditioned on a task variable (conditional mutual information) and task-relevant being the difference between two MI values (co-I is a higher-order MI estimate). This removed the need to continuously refer to each separately throughout the paper which may in its own way cause some confusion. For clarity, in the results of that paper we also provided context for each MI estimate on how they were estimated (see beginning of “Task-irrelevant muscle couplings” and “Task-redundant muscle couplings” and “Task-synergistic muscle couplings” results sections), referring throughout the Venn diagrams depicting them (see Fig.1 of previous paper). In the present study however, for brevity and focus we did not perform an analysis on task-irrelevant muscle interactions and so decided to focus our terminology on co-I (II), a higher-order MI estimate. We acknowledge that this may have caused some confusion but highlight the efforts made to communicate each measure throughout the previous and present study. We have explicitly pointed out this specific focus on task-dependent muscle couplings in this paper at the end of the introduction of the updated manuscript:

      “To do so, here we focussed our analysis on quantifying task-dependent muscle couplings (collectively referred to as II), extracting functionally-similar (i.e. redundant) and -complementary (i.e. synergistic) modules…”

      (2) The authors should clarify what the following statement means: "Where a muscle interaction was determined to be net redundant/synergistic, their corresponding network edge in the other muscle network was set to zero."

      We acknowledge this sentence was unclear/misleading and have now clarified this statement in the following way:

      “This computation was performed on all unique m<sub>x</sub> and m<sub>y</sub> pairings, generating sparse symmetric matrices (A) (i.e. A = A<sup>T</sup>) composed separately of non-negative redundant and synergistic values (Fig.5).” Additionally, we have now included an additional figure (fig.5) describing this text graphically.

      (3) It should be clarified what the 'm' values are in Equation 1.1. Are these the co-information values after the sparsification and applying the Louvain algorithm to the matrix 'A'? Furthermore, since each task will yield a different co-information value, how is the information from different tasks (r) being combined here?

      We thank the reviewer for their attention to detail. For clarity, at the related section of Equation 1.1, we have clarified that the input matrix is composed of co-I estimates:

      “The input matrix for PNMF consisted of the sparsified A on both affected and unaffected sides from all participants at both pre- and post-sessions concatenated in their vectorised forms. More specifically, the input matrix composed of redundant or synergistic values was configured such that the set of unique muscle pairings (1 … K) on affected and unaffected sides (m<sub>aff</sub> and m<sub>unaff</sub> respectively)…”.

      The co-I estimates in this input matrix are indeed those that survived sparsification in previous steps, however, for determining the number of modules to extract using the Louvain algorithm, this step has no direct impact or transformation on the co-I estimates and is simply employed to derive an empirical input parameter for dimensionality reduction. We refer the reviewer to the following part of this paragraph where this is described:

      “The number of muscle network modules identified in this final consensus partition was used as the input parameter for dimensionality reduction, namely projective non-negative matrix factorisation (PNMF) (Fig.1(D)) (Yang & Oja, 2010). The input matrix for PNMF consisted of the sparsified A on both affected and unaffected sides from all participants at both pre- and post-sessions concatenated together in their vectorised form.”

      Finally, as the reviewer has mentioned, the co-I estimates from the same muscles pairings but for different tasks, experimental sessions and participants are indeed different, reflecting their task-specific tuning, changes with rehabilitation and individual differences. To combine these representations into low-dimensional components, we employed projective non-negative matrix factorisation (PNMF). As outlined in the previous paper and earlier work on this framework (O’ Reilly & Delis, 2022), application of dimensionality reduction here can generate highly generalisable motor components, highlighting their ability to effectively represent large populations of participants, tasks and sessions, while allowing interesting individual differences mentioned by the reviewer to be buffered into the corresponding activation coefficients. These activation coefficients are for this reason the focus of the cluster analyses in the present study to characterise the post-stroke cohort. We have explicitly provided this reason in the methods section of the updated manuscript:

      “We focussed on $a$ here as the extraction of population-level functional modules enabled the buffering of individual differences into the space of modular activations, making them an ideal target for identifying population structure.”

      (4) In general, I recommend improving the clarity of the Methods section, particularly by being more precise in defining the quantities that are being calculated. For example, the adjacency matrix should be defined clearly using co-information at the beginning, and explain how it is changed/used throughout the rest of the section.

      We thank the reviewer for their constructive advice and have gone to lengths to improve the clarity of the methods section. Firstly, we have addressed all the reviewers comments on various specific sections of the methods, including more clearly the ‘why’ and ‘how’ of what was performed. Secondly, we have now included an additional figure illustrating how co-information was quantified at the network level and separated into redundant and synergistic values (see Fig.5 of updated manuscript). Finally, we have re-structured several paragraphs of the methods section to enhance flow with additional subheadings for clarity.

      (5) In the previous paper (O'Reilly & Delis, 2024), the authors applied a tensor decomposition to the interaction matrix and extracted both the spatial and temporal factors. In the current work, the authors simply concatenated the temporal signals and only chose to extract the spatial mode instead. The authors should clarify this choice.

      The reviewer is correct in that a different dimensionality reduction approach was employed in the previous paper. In the present study, we instead chose to employ projective non-negative matrix factorisation, as was employed in a preliminary paper on this framework (O’Reilly & Delis, 2022). This decision was made simply based on aiming to maintain brevity and simplicity in the analysis and presentation of results as we introduce other tools to the framework (i.e. the clustering algorithm). Indeed, we could have just as easily employed the tensor decomposition to extract both spatial and temporal components, however we believed the main take away points for this paper could be more easily communicated using spatial networks only. To clarify this difference for readers we have included the following in the methods section:

      “The choice of PNMF here, in contrast to the space-time tensor decomposition employed in the parent study (O’Reilly & Delis, 2024), was chosen simply to maintain brevity by focussing subsequent analyses on the spatial domain.”

      References

      Ó’Reilly D, Delis I. A network information theoretic framework to characterise muscle synergies in space and time. Journal of Neural Engineering. 2022 Feb 18;19(1):016031.

      O'Reilly D, Delis I. Dissecting muscle synergies in the task space. Elife. 2024 Feb 26;12:RP87651.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Both reviewers are concerned with the manuscript in its current form. They questioned the relevance of the current approach in providing functional or mechanistic explanations about the rehabilitation process of post-stroke patients. Our eLife Assessment would change if you include comparisons between your current method and classical ones, in addition to improving the description of your method to strengthen the evidence of its robustness.

      Reviewer #1 (Recommendations for the authors):

      There is a minor typographical error in Figure 2 ("compononents" should be corrected).

      This error has been rectified.

      Reviewer #2 (Recommendations for the authors):

      The authors should be able to address most of my concerns by providing a substantially improved version of the Methods section.

      See above responses to the reviewers comments regarding the methods section.

      However, I would like the authors to explain in full detail (potentially including a simulation or power analysis) the procedure for estimating the co-information quantity, and to clarify whether it is robust given the sample size used in this paper.

      We refer the reviewer to our previous responses outlining with greater clarity the number of samples included in the estimation of co-I. We would also like to mention here that our framework does not make inferences on the statistical significance of individual muscle couplings (i.e. co-I estimates). Instead, these estimates are employed collectively for the sole purpose of pattern recognition. Nevertheless, to generate reliable estimates of the muscle couplings, we have employed a substantial number of samples for each co-I estimate (>20k samples in each variable) addressing the reviewers main concern her.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Wu et al. uses endogenous bruchpilot expression in a cell-type-specific manner to assess synaptic heterogeneity in adult Drosophila melanogaster mushroom body output neurons. The authors performed genomic on locus tagging of the presynaptic scaffold protein bruchpilot (BRP) with one part of splitGFP (GFP11) using the CRISPR/Cas9 methodology and co-expressed the other part of splitGFP (GFP1-10) using the GAL4/UAS system. Upon expression of both parts of splitGFP, fluorescent GFP is assembled at the N-terminus of BRP, exactly where BRP is endogenously expressed in active zones. For manageable analysis, a high-throughput pipeline was developed. This analysis evaluated parameters like location of BRP clusters, volume of clusters, and cluster intensity as a direct measure of the relative amount of BRP expression levels on site, using publicly available 3D analysis tools that are integrated in Fiji. Analysis was conducted for different mushroom body cell types in different mushroom body lobes using various specific GAL4 drivers. To test this new method of synapse assessment, Wu et al. performed an associative learning experiment in which an odor was paired with an aversive stimulus and found that, in a specific time frame after conditioning, the new analysis solidly revealed changes in BRP levels at specific synapses that are associated with aversive learning.

      Strengths:

      Expression of splitGFP bound to BRP enables intensity analysis of BRP expression levels as exactly one GFP molecule is expressed per BRP. This is a great tool for synapse assessment. This tool can be widely used for any synapse as long as driver lines are available to co-express the other part of splitGFP in a cell-type-specific manner. As neuropils and thus the BRP label can be extremely dense, the analysis pipeline developed here is very useful and important. The authors have chosen an exceptionally dense neuropil - the mushroom bodies - for their analysis and convincingly show that BRP assessment can be achieved with such densely packed active zones. The result that BRP levels change upon associative learning in an experiment with odor presentation paired with punishment is likewise convincing, and strongly suggests that the tool and pipeline developed here can be used in an in vivo context.

      Weaknesses:

      Although BRP is an important scaffold protein and its expression levels were associated with function and plasticity, I am still somewhat reluctant to accept that synapse structure profiling can be inferred from only assessing BRP expression levels and BRP cluster volume. Also, is it guaranteed that synaptic plasticity is not impaired by the large GFP fluorophore? Could the GFP10 construct that is tagged to BRP in all BRP-expressing cells, independent of GAL4, possibly hamper neuronal function? Is it certain that only active zones are labeled? I do see that plastic changes are made visible in this study after an associative learning experiment with BRP intensity and cluster volume as read-out, but I would be reassured by direct measurement of synaptic plasticity with splitGFP directly connected to BRP, maybe at a different synapse that is more accessible.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that Brp is an important, but not the only player in the active zone. We have included new data to demonstrate that split-GFP tagging does not severely affect the localization and plasticity of Brp and the function of synapses by showing: (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs; (3) activity-dependent Brp remodeling in R8 photoreceptors; (4) no defect in memory performance when labeling Brp::rGFP in KCs; These four lines of additional evidence further corroborate our approach to characterize endogenous Brp as a proxy of active zone structure.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a cell-type specific fluorescence-tagging approach using a CRISPR/Cas9 induced spilt-GFP reconstitution system to visualize endogenous Bruchpilot (BRP) clusters as presynaptic active zones (AZ) in specific cell types of the mushroom body (MB) in the adult Drosophila brain. This AZ profiling approach was implemented in a high-throughput quantification process, allowing for the comparison of synapse profiles within single cells, cell types, MB compartments, and between different individuals. The aim is to analyse in more detail neuronal connectivity and circuits in this centre of associative learning. These are notoriously difficult to investigate due to the density of cells and structures within a cell. The authors detect and characterize cell-type-specific differences in BRP-dependent profiling of presynapses in different compartments of the MB, while intracellular AZ distribution was found to be stereotyped. Next to the descriptive part characterizing various AZ profiles in the MB, the authors apply an associative learning assay and detect consequent AZ re-organisation.

      Strengths:

      The strength of this study lies in the outstanding resolution of synapse profiling in the extremely dense compartments of the MB. This detailed analysis will be the entry point for many future analyses of synapse diversity in connection with functional specificity to uncover the molecular mechanisms underlying learning and memory formation and neuronal network logics. Therefore, this approach is of high importance for the scientific community and a valuable tool to investigate and correlate AZ architecture and synapse function in the CNS.

      Weaknesses:

      The results and conclusions presented in this study are, in many aspects, well-supported by the data presented. To further support the key findings of the manuscript, additional controls, comments, and possibly broader functional analysis would be helpful. In particular:

      (1) All experiments in the study are based on spilt-GFP lines (BRP:GFP11 and UAS-GFP1-10).The Materials and Methods section does not contain any cloning strategy (gRNA, primer, PCR/sequencing validation, exact position of tag insertion, etc.) and only refers to a bioRxiv publication. It might be helpful to add a Materials and Methods section (at least for the BRP:GFP11 line). Additionally, as this is an on locus insertion the in BRP-ORF, it needs a general validation of this line, including controls (Western Blot and correlative antibody staining against BRP) showing that overall BRP expression is not compromised due to the GFP insertion and localizes as BRP in wild type flies, that flies are viable, have no defects in locomotion and learning and memory formation and MB morphology is not affected compared to wild type animals.

      We thank the reviewer for suggesting these important validations. We included details of the design of the construct and insertion site to the Methods section, performed several new experiments to validate the split-GFP tagging of Brp, and present the data in the revision.

      First, to examine whether the transcription of the brp gene is unaffected by the insertion of GFP<sub>11</sub>, we conducted qRT-PCR to compare the brp mRNA levels between brp::GFP<sub>11</sub>, UAS-GFP1-10 and UAS-GFP1-10 and found no difference (Figure 1 - figure supplement 1A).

      To further verify the effect of GFP<sub>11</sub> tagging at the protein level, we performed anti-Brp (nc82) immunohistochemistry of brains where GFP is reconstituted pan-neuronally. We found unaltered neuropile localization of nc82 signals (Figure 1 - figure supplement 1C). In presynaptic terminals of the mushroom body calyx, we found integration of Brp::rGFP to nc82 accumulation (Figure 1D). We performed super-resolution microscopy to verify the configuration of Brp::rGFP and confirmed the donut-shape arrangement of Brp::rGFP in the terminals of motor neurons (see Wu, Eno et al., 2025 PLOS Biology), corroborating the nanoscopic assembly of Brp::rGFP at active zones (Kittel et al., 2006 Science).

      Furthermore, co-expression of RFP-tagged voltage-gated calcium channel alpha subunit Cacophony (Cac) and Brp::rGFP in PAM-γ5 dopaminergic neurons revealed strong presynaptic colocalization of their punctate clusters (Figure 1E), suggesting that rGFP tagging of Brp did not damage key protein assembly at active zones (Kawasaki et al., 2004 J Neuroscience; Kittel et al., Science).

      These lines of evidence suggest that the localization of endogenous Brp is barely affected by the C-terminal GFP<sub>11</sub> insertion or GFP reconstitution therewith. This is in line with a large body of studies confirming that the N-terminal region and coiled-coil domains, but not the C-terminal, region of Brp are necessary and sufficient for active zone localization (Fouquet et al., 2009 J Cell Biol; Oswald et al., 2010 J Cell Biol; Mosca and Luo, 2014 eLife; Kiragasi et al., 2017 Cell Rep; Akbergenova et al., 2018 eLife; Nieratschker et al., 2009 PLoS Genet; Johnson et al., 2009 PLoS Biol; Hallermann et al., 2010 J Neurosci). We nevertheless report homozygous lethality and found the decreased immunoreactive signals in flies carrying the GFP<sub>11</sub> insertion (Figure 1 - figure supplement 1B).

      For these reasons, we always use heterozygotes for all the experiments therefore there is no conspicuous defect in locomotion as reported in the original study (Wagh et al., 2005 Neuron). To functionally validate the heterozygotes, we measured the aversive olfactory memory performance of flies where GFP reconstitution was induced in Kenyon cells using R13F02-GAL4. We found that all these transgenes did not alter mushroom body morphology (Figure 7 - figure supplement 1) or memory performance as compared to wild-type flies (Figure 7 - figure supplement 2), suggesting the synapse function required for short-term memory formation is not affected by split-GFP tagging of Brp.

      (2) Several aspects of image acquisition and high-throughput quantification data analysis would benefit from a more detailed clarification.

      (a) For BRP cluster segmentation it is stated in the Materials and Methods state, that intensity threshold and noise tolerance were "set" - this setting has a large effect on the quantification, and it should be specified and setting criteria named and justified (if set manually (how and why) or automatically (to what)). Additionally, if Pyhton was used for "Nearest Neigbor" analysis, the code should be made available within this manuscript; otherwise, it is difficult to judge the quality of this quantification step.

      (b) To better evaluate the quality of both the imaging analysis and image presentation, it would be important to state, if presented and analysed images are deconvolved and if so, at least one proof of principle example of a comparison of original and deconvoluted file should be shown and quantified to show the impact of deconvolution on the output quality as this is central to this study.

      We thank the reviewer for suggesting these clarifications. We have included more description to the revised manuscript to clarify the setting of segmentation, which was manually adjusted to optimize the F-score (previous Figure 1D, now moved to Figure 1 -figure supplement 5). We have included the code used for analyzing nearest neighbor distance, AZ density and local Brp density in the revised manuscript (Supplementary file 1), together with a pre-processed sample data sheet (Supplementary file 2).

      Regarding image deconvolution, we have clarified the differential use of deconvolved and not-deconvolved images in the revised manuscript. We have also included a quantitative evaluation of Richardson-Lucy iterative deconvolution (Figure 1 - figure supplement 4). We used 20 iterations due to only marginal FWHM improvement beyond this point (Figure 1 - figure supplement 4).

      (3) The major part of this study focuses on the description and comparison of the divergent synapse parameters across cell-types in MB compartments, which is highly relevant and interesting. Yet it would be very interesting to connect this new method with functional aspects of the heterogeneous synapses. This is done in Figure 7 with an associative learning approach, which is, in part, not trivial to follow for the reader and would profit from a more comprehensive analysis.

      (a) It would be important for the understanding and validation of the learning induced changes, if not (only) a ratio (of AZ density/local intensity) would be presented, but both values on their own, especially to allow a comparison to the quoted, previous AZ remodelling analysis quantifying BRP intensities (ref. 17, 18). It should be elucidated in more detail why only the ratio was presented here.

      We thank the reviewer for the suggestion on the presentation of learning-induced Brp remodeling. The reported values in Figure 7C are the correlation coefficient of AZ density and local intensity in each compartment, but not the ratio. These results suggest that subcompartment-sized clusters of AZs with high Brp accumulation (Figure 6) undergo local structural remodeling upon associative learning (Figure 7). For clarity, we have included a schematic of this correlation and an example scatter plot to Figure 6. Unlike the previous studies (refs 17 and 18), we did not observe robust learning-dependent changes in the Brp intensity, possibly due to some confounding factors such as overall expression levels and conditioning protocols as described in the previous and following points, respectively.

      (b) The reason why a single instead of a dual odour conditioning was performed could be clarified and discussed (would that have the same effects?).

      (c) Additionally, "controls" for the unpaired values - that is, in flies receiving neither shock nor odour - it would help to evaluate the unpaired control values in the different MB compartments.

      We use single odor conditioning because it is the simplest way to examine the effect of odor-shock association by comparing the paired and unpaired group. Standard differential conditioning with two odors contains unpaired odor presentation (CS-) even in the ‘paired’ group. We now show that single-odor conditioning induces memory that lasts one day as in differential conditioning (Figure 7B; Tully and Quinn, J Comp Phys A 1985).

      (d) The temporal resolution of the effect is very interesting (Figure 7D), and at more time points, especially between 90 and 270 min, this might raise interesting results.

      The sampling time points after training was chosen based on approximately logarithmic intervals, as the memory decay is roughly exponential (Figure 7B). This transient remodeling is consistent with the previous studies reporting that the Brp plasticity was short-lived (Zhang et al., 2018 Neuron; Turrel et al., 2022 Current Biol).

      (e) Additionally, it would be very interesting and rewarding to have at least one additional assay, relating structure and function, e.g. on a molecular level by a correlative analysis of BRP and synaptic vesicles (by staining or co-expression of SV-protein markers) or calcium activity imaging or on a functional level by additional learning assays.

      We thank the reviewer for raising this important point. We have performed calcium imaging of KC presynaptic terminals to correlate the structure and function in another study (see Figure 2 in Wu, Eno et al., 2025 PLOS Biology for more detail). The basal presynaptic calcium pattern along the γ compartments is strikingly similar to the compartmental heterogeneity of Brp accumulation (see also Figure 2 in this study). Considering colocalization of other active-zone components, such as Cac (Figure 1E), we propose that the learning-induced remodeling of local Brp clusters should transiently modulate synaptic properties.

      As a response to other reviewers’ interest, we used Brp::rGFP to measure different forms of Brp-based structural plasticity upon constant light exposure in the photoreceptors and upon silencing rab3 in KCs. Since these experiments nicely reproduced the results of previous studies (Sugie et al., Neuron 2013; Graf et al., Neuron 2009), we believe the learning-induced plasticity of Brp clustering in KCs has a transient nature.

      Reviewer #3 (Public review):

      Summary:

      The authors develop a tool for marking presynaptic active zones in Drosophila brains, dependent on the GAL4 construct used to express a fragment of GFP, which will incorporate with a genome-engineered partial GFP attached to the active zone protein bruchpilot - signal will be specific to the GAL4-expressing neuronal compartment. They then use various GAL4s to examine innervation onto the mushroom bodies to dissect compartment-specific differences in the size and intensity of active zones. After a description of these differences, they induce learning in flies with classic odour/electric shock pairing and observe changes after conditioning that are specific to the paired conditioning/learning paradigm.

      Strengths:

      The imaging and analysis appear strong. The tool is novel and exciting.

      Weaknesses:

      I feel that the tool could do with a little more characterisation. It is assumed that the puncta observed are AZs with no further definition or characterisation.

      We performed additional validation on the tool, including (1) nanoscopic localization of Brp::rGFP using STED imaging; (2) colocalization between Brp::rGFP and anti-Brp signals/VGCCs (Figure 1D-E); 3) activity-dependent active zone remodeling in R8 photoreceptors (Figure 1F). These will be detailed in our point-by-point response below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors keep stating, they profile or assess synaptic structure by analyzing BRP localization, cluster volume, and intensity. However, I do not think that BRP cluster volume and intensity warrant an educated statement about presynaptic structure as a whole. I do not challenge the usefulness of BRP cluster analysis for synapse evaluation, but as there are so many more players involved in synaptic function, BRP analysis certainly cannot explain it all. This should at least be discussed.

      It is correct that Brp is not the only player in the active zone. We have included more discussion on the specific role of Brp (line 84 to 89) and other synaptic markers (line 250) and edited potentially misunderstanding text.

      (2) I do see that changes in BRP expression were observed following associative learning, but is it certain, that synaptic plasticity is generally unaffected by the large GFP fluorophore? BRP is grabbing onto other proteins, both with its C- and N-termini. As the GFP is right before the stop codon, it should be at the N-terminus. How far could BRP function be hampered by this? Is there still enough space for other proteins to interact?

      We thank the reviewer for sharing the concerns. We here provided three lines of evidence to demonstrate that the Brp assembly at active zones required for synaptic plasticity is unaffected by split-GFP tagging.

      First, we assessed olfactory memory of flies that have Brp::rGFP labeled in Kenyon cells and found the performance comparable to wild-type (Figure 7 - figure supplement 2), suggesting the Brp function required for olfactory memory (Knapek et al., J Neurosci 2011) is unaffected by split-GFP tagging.

      Second, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors. We have included this result into the revised manuscript (Figure 1F).

      To further validate protein interaction of Brp::rGFP, we focused on Rab3, as it was previously shown to control Brp allocation at active zones (Graf et al., 2009 Neuron). To this end, we silenced rab3 expression in Kenyon cells using RNAi and measured the intensity of Brp::rGFP clusters in γ Kenyon cells. As previously reported in the neuromuscular junction, we found that rab3 knock-down increased Brp::rGFP accumulation to the active zones, suggesting that Brp::rGFP represents the interaction with Rab3. We have included all the new data to the revised manuscript (Figure 1 - figure supplement 3).

      (3) It may well be that not only active-zone-associated BRP is labeled but possibly also BRP molecules elsewhere in the neuron. I would like to see more validation, e.g., the percentage of tagged endogenous BRP associated with other presynaptic proteins.

      To answer to what extent Brp::rGFP clusters represent active zones, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% of Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (4) Z-size is ~200 nm, while x/y pixel size is ~75 nm during acquisition. How far down does the resolution go after deconvolution?

      The Z-step was 370 nm and XY pixel size was 79 nm for image acquisition. We performed 20 iterations of Richarson-Lucy deconvolution using an empirical point spread function (PSF). We found that the effect of deconvolution on the full-width at half maximum (FWHM) of Brp::rGFP clusters improves only marginally beyond 20 iterations, when the XY FWHM is around 200 nm and the XZ FWHM is around 450 nm (Figure 1 - figure supplement 4).

      (5) Figure Legend 7: What is a "cytoplasm membrane marker"? Does this mean membrane-bound tdTom is sticking into the cytoplasm?

      We apologize for the typo and have corrected it to “plasma membrane marker”.

      (6) At the end of the introduction: "characterizing multiple structural parameters..." - which were these parameters? I was under the assumption that BRP localization, cluster volume, and intensity were assessed. I do not see how these are structural parameters. Please define what exactly is meant by "structural parameters".

      We apologize for the confusion. By "structural parameters”, we indeed referred to the volume, intensity and molecular density of Brp::rGFP clusters. We have revised the sentence to “Characterizing the distinct parameters and localization of Brp::rGFP cluster.”

      (7) Next to last sentence of the introduction: "Characterizing multiple structural parameters revealed a significant synaptic heterogeneity within single neurons and AZ distribution stereotypy across individuals." What do the authors mean by "significant synaptic heterogeneity"?

      By “synaptic heterogeneity”, we refer to the intracellular variability of active zone cytomatrices reported by Brp clusters. For instance, the intensities of Brp::rGFP clusters within Kenyon cell subtypes were variable among compartments (Figure 2). Intracellular variability of the Brp concentration of individual active zones was higher in DPM and APL neurons than Kenyon cells (Figure 3). These variabilities demonstrate intracellular synaptic heterogeneity. We have revised the sentence to be more specific to the different characters of Brp clusters.

      (8) I do not understand the last sentence of the introduction. "These cell-type-specific synapse profiles suggest that AZs are organized at multiple scales, ranging from neighboring synapses to across individuals." What do the authors mean by "ranging from neighboring synapses to across individuals"? Does this mean that even neighboring synapses in the same cell can be different?

      We have revised the sentence to “These cell-type-specific synapse profiles suggest that AZs are spatially organized at multiple scales, ranging from interindividual stereotypy to neighboring synapses in the same cells.”

      By “neighboring synapses", we refer to the nearest neighbor similarity in Brp levels in some cell-types (Figure 6A-C), and also the sub-compartmental dense AZ clusters with high Brp level in Kenyon cells (Figure 6D-H). By “across individuals”, we refer to the individually conserved active zone distribution patterns in some neurons (Figure 5).

      (9) The title talks about cell-type-specific spatial configurations. I do not understand what is meant by "spatial configurations"? Do you mean BRP cluster volume? I think the title is a little misleading.

      By “spatial configuration”, we refer to the arrangement of Brp clusters within individual mushroom body neurons. This statement is based on our findings on the intracellular synaptic heterogeneity (see also response to comment #7). We have streamlined the text description in the revised manuscript for clarity.

      Reviewer #2 (Recommendations for the authors):

      (1) For Figure 3A: exemplary two AZs are compared here, a histogram comparing more AZs would aid in making the point that in general, AZ of similar size have different BRP level (intensities) and how much variation exists.

      We have included histograms for Brp::rGFP intensity and cluster volumes to Figure 3 in the revised manuscript.

      (2) Line 52: "endogenous synapses" is a confusing term; it's probably meant that the protein levels within the synapse are endogenous and not overexpressed. 

      We apologize for the confusion and have revised the term to “endogenous synaptic proteins.”

      (3) It is not clear from the Materials and Methods section, whether and where deconvolved or not-deconvolved images were used for the quantification pipeline. Please comment on this. 

      We have now revised the Method section to clarify how deconvolved or not-deconvolved images were differently used in the pipeline.

      (4) Line 664 (C) not bold.

      We have corrected the error.

      (5) 725 "Files" should be Flies.

      We have corrected the error.

      (6) 727 two times "first".

      We have corrected the error.

      (7) Figure 7. All (A) etc., not bold - there should be consistent annotation. 

      We want to thank the reviewer for the detailed proof and have corrected all the errors spotted.

      Reviewer #3 (Recommendations for the authors):

      (1) Has there been an expression of the construct in a non-neuronal cell? Astrocyte-like cell? Any glia? As some sort of control for background and activity?

      As the reviewer suggested, we verified the neuronal expression specificity of Brp::rGFP. Using R86E01-GAL4 and Amon-GAL4, we compared Brp::rGFP in astrocyte-like glia and neuropeptide-releasing neurons. We found no Brp::rGFP puncta in the neuropils in astrocyte-like glia compared to neurons, suggesting Brp::rGFP is specific to neurons. We have included this new dataset to the revised manuscript (Figure 1 - figure supplement 2).

      (2) Similarly, expression of the construct co-expressed with a channelrhodopsin, and induction of a 'learning'-like regime of activity, similarly in a control type of experiment, expression of an inwardly rectifying channel (e.g. Kir2.1) to show that increases in size of the BRP puncta are truly activity dependent? The NMJ may be an optimal neuron to use to see the 'donut' structures of the AZs and their increase with activity. Also, are these truly AZs we are seeing here? Perhaps try co-expressing cacophony-dsRed? If the GFP Puncta are active zones, then they should be surrounded by cacophony.

      We would like to clarify that we did not find Brp::rGFP size increase upon learning. Instead, we demonstrated that associative training transiently remodelled sub-compartment-sized AZ “hot spots” in Kenyon cells, indicated by the correlation of local intensity and AZ density (Figure 6-7).

      To demonstrate split-GFP tagging does not affect activity-dependent plasticity associated with Brp, we measured Brp remodeling in photoreceptors induced by constant light exposure (LL; Sugie et al., 2015 Neuron). Consistent with the previous study, we found that LL decreased the numbers of Brp::rGFP clusters in R8 terminals in the medulla, as compared to constant dark condition (DD). This result validates the synaptic plasticity involving dynamic Brp rearrangement in the photoreceptors (Figure 1F).

      As the reviewer suggested, we performed the STED microscopy for the larval motor neuron and confirmed the donut-shape arrangement of Brp::rGFP (Wu, Eno et al., PLOS Biol 2025).

      Also following the reviewer’s suggestion, we double-labelled Brp::rGFP and Cac::tdTomato (Cacophony, the alpha subunit of the voltage-gated calcium channels). We found that 97% Brp::rGFP clusters showed co-localization with Cac::tdTomato in PAM-γ5 dopamine neurons terminals (Figure 1E), suggesting most Brp::rGFP clusters represent functional AZs.

      (3) In the introduction: Intro, a sentence about BRP - central organiser of the active zone, so a key regulator of activity.

      We have included a few more sentences about the role Brp in the active zones to the revised manuscript.

      (4) Figure 1 E, line 650 'cite the resource here'. 

      We thank the reviewer for pointing out the error and we have corrected it.

      (5) Many readers may not be MB aficionados, and to make the data more accessible, perhaps use a cartoon of an MB with the cell bodies of the neurons around the MB expressing the constructs highlighted so that the reader can have a wider idea of the anatomy in relation to the MB.

      We appreciate these comments and have appended cartoons of the MB to figures to help readers understand the anatomy.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study uses creative scalp EEG decoding methods to attempt to demonstrate that two forms of learned associations in a Stroop task are dissociable, despite sharing similar temporal dynamics. However, the evidence supporting the conclusions is incomplete due to concerns with the experimental design and methodology. This paper would be of interest to researchers studying cognitive control and adaptive behavior, if the concerns raised in the reviews can be addressed satisfactorily.

      We thank the editors and the reviewers for their positive assessment of our work and for providing us with an opportunity to strengthen this manuscript. Please see below our responses to each comment raised in the reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. In particular, two types of learned associations are characterized. One being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify SC and SR correlates and to determine whether they have similar topographies and dynamics.

      The results suggest SC and SR associations are simultaneously coactivated and have shared topographies, with the inference being that these associations may share a common generator.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations. Nice idea to orthogonalize the ISPC condition (MC/MI) from stimulus features.

      Thank you for acknowledging the strength in EEG decoding and design. We have addressed all your concerns raised below point by point.

      Weaknesses:

      (1a) I'm relatively concerned that these results may be spurious. I hope to be proven wrong, but I would suggest taking another look at a few things.

      While a nice idea in principle, the ISPC manipulation seems to be quite confounded with the trial number. E.g., color-red is MI only during phase 2, and is MC primarily only during Phase 3 (since phase 1 is so sparsely represented). In my experience, EEG noise is highly structured across a session and easily exploited by decoders. Plus, behavior seems quite different between Phase 2 and Phase 3. So, it seems likely that the classes you are asking the decoder to separate are highly confounded with temporally structured noise.

      I suggest thinking of how to handle this concern in a rigorous way. A compelling way to address this would be to perform "cross-phase" decoding, however I am not sure if that is possible given the design.

      Thank you for raising this important issue. To test whether decoding might be confounded by temporally structured noise, we performed a control decoding analysis. As the reviewer correctly pointed out, cross-phase decoding is not possible due to the experimental design. Alternatively, to maximize temporal separation between the training and test data, we divided the EEG data in phase 2 and phase 1&3 into the first and second half chronologically. Phase 1 and 3 were combined because they share the same MC and MI assignments. We then trained the decoders on one half and tested them on the other half. Finally, we averaged the decoding results across all possible assignments of training and test data. The similar patterns (Supplementary Fig.1) observed confirmed that the decoding results are unlikely to be driven by temporally structured noise in the EEG data. The clarification has been added to page 13 of the revised manuscript.

      (1b) The time courses also seem concerning. What are we to make of the SR and SC timecourses, which have aggregate decoding dynamics that look to be <1Hz?

      As detailed in the response to your next comment, some new results using data without baseline correction show a narrower time window of above-chance decoding. We speculate that the remaining results of long-lasting above-chance decoding could be attributed to trials with slow responses (some responses were made near the response deadline of 1500 ms). Additionally, as shown in Figure 6a, the long-lasting above-chance decoding seems to be driven by color and congruency representations. Thus, it is also possible that the binding of color and congruency contributes to decoding. This interpretation has been added to page 17 of the revised manuscript.

      (1c) Some sanity checks would be one place to start. Time courses were baselined, but this is often not necessary with decoding; it can cause bias (10.1016/j.jneumeth.2021.109080), and can mask deeper issues. What do things look like when not baselined? Can variables be decoded when they should not be decoded? What does cross-temporal decoding look like - everything stable across all times, etc.?

      As the reviewer mentioned, baseline-corrected data may introduce bias to the decoding results. Thus, we cited the van Driel et al (2021) paper in the revised manuscript to justify the use of EEG data without baseline-correction in decoding analysis (Page 27 of the revised manuscript), and re-ran all decoding analysis accordingly. The new results revealed largely similar results (Fig. 2, 4, 6 and 8 in the revised manuscript) with the following exceptions: narrower time window for separatable SC subspace and SR subspace (Fig. 4b), narrower time window for concurrent representations of SC and SR (Fig. 6a-b), and wider time window for the correlations of SC/SR representations with RTs (Fig. 8).

      (2) The nature of the shared features between SR and SC subspaces is unclear.

      The simulation is framed in terms of the amount of overlap, revealing the number of shared dimensions between subspaces. In reality, it seems like it's closer to 'proportion of volume shared', i.e., a small number of dominant dimensions could drive a large degree of alignment between subspaces.

      What features drive the similarity? What features drive the distinctions between SR and SC? Aside from the temporal confounds I mentioned above, is it possible that some low-dimensional feature, like EEG congruency effect (e.g., low-D ERPs associated with conflict), or RT dynamics, drives discriminability among these classes? It seems plausible to me - all one would need is non-homogeneity in the size of the congruency effect across different items (subject-level idiosyncracies could contribute: 10.1016/j.neuroimage.2013.03.039).

      Thank you for this question. To test what dimensions are shared between SC and SR subspaces, we first identify which factors can be shared across SC and SR subspaces. For SC, the eight conditions are the four colors × ISPC. Thus, the possible shared dimensions are color and ISPC. Additionally, because the four colors and words are divided into two groups (e.g., red-blue and green-yellow, counterbalanced across subjects, see Methods), the group is a third potential shared dimension. Similarly, for SR decoders, potential shared dimensions are word, ISPC and group. Note that each class in SC and SR decoders has both congruent and incongruent trials. Thus, congruency is not decodable from SC/SR decoders and hence unlikely to be a shared dimension in our analysis. To test the effect of sharing for each of the potential dimensions, we performed RSA on decoding results of the SC decoder trained on SR subspace (SR | SC) (Supplementary Fig. 4a) and the SR decoder trained on SC subspace (SC | SR) (Supplementary Fig. 4b), where the decoders indicated the decoding accuracy of shared SC and SR representations. In the SC classes of SR | SC, word red and blue were mixed within the same class, same were word yellow and green. The similarity matrix for “Group” of SR | SC (Supplementary Fig. 4a) shows the comparison between two word groups (red & blue vs. yellow & green). The similarity matrix for “Group” of SC | SR (Supplementary Fig. 4b) shows the comparison between two color groups (red & blue vs. yellow & green).

      The RSA results revealed that the contributions of group to the SC decoder (Supplementary Fig. 5a) and the SR decoder (Supplementary Fig. 5b) were significant. Meanwhile, a wider time window showed significant effect of color on the SC decoder (approximately 100 - 1100 ms post-stimulus onset, Supplementary Fig. 5a) and a narrower time window showed significant effect of word on SR decoder (approximately 100 - 500 ms post-stimulus onset, Supplementary Fig. 5b). However, we found no significant effect of ISPC on either SC or SR decoders. We also performed the same analyses on response-locked data from the time window -800 to 200 ms. The results showed shared representation of color in the SC decoder (Supplementary Fig. 5c) and group in both decoders (Supplementary Fig. 5c-d). Overall, the above results demonstrated that color, word and group information are shared between SC and SR subspaces.

      Lastly, we would like to stress that our main hypothesis for the cross-subspace decoding analysis is that SR and SC subspaces are not identical. This hypothesis was supported by lower decoding accuracy for cross-subspace than within-subspace decoders and enables following analyses that treated SC and SR as separate representations.

      We have added the interpretation to page 13-14 of the revised manuscript.

      (3) The time-resolved within-trial correlation of RSA betas is a cool idea, but I am concerned it is biased. Estimating correlations among different coefficients from the same GLM design matrix is, in general, biased, i.e., when the regressors are non-orthogonal. This bias comes from the expected covariance of the betas and is discussed in detail here (10.1371/journal.pcbi.1006299). In short, correlations could be inflated due to a combination of the design matrix and the structure of the noise. The most established solution, to cross-validate across different GLM estimations, is unfortunately not available here. I would suggest that the authors think of ways to handle this issue.

      Thank you for raising this important issue. Because the bias comes from the covariance between the regressors and the same GLM was applied to all time points in our analysis, we assume that the inflation would be similar at different time points. Therefore, we calculated the correlation of SC and SR betas ranging from -200 to 0 ms relative to stimulus onset as a baseline (i.e., no SC or SR representation is expected before the stimulus onset) and compared the post-stimulus onset correlation coefficients against this baseline. We hypothesized that if the positively within-trial correlation of SC and SR betas resulted from the simultaneous representation instead of inflation, we should observe significantly higher correlation when compared with the baseline. To examine this hypothesis, we first performed the linear discriminant analysis (Supplementary Fig. 7a) and RSA regression (Supplementary Fig. 7b) on the -200 - 0 ms window relative to stimulus onset. We then calculated the average r<sub>baseline</sub> of SC and SR betas on that time window for each participant (group results at each time point are shown in Supplementary Fig. 7c) and computed the relative correlation at each post-stimulus onset time point using (fisher-z (r) - fisher-z (r<sub>baseline</sub>)). Finally, we performed a simple t test at the group level on baseline-corrected correlation coefficients with Bonferroni correction. The results (Fig. 6c) showed significantly more positive correlation from 100 - 500 ms post-stimulus onset compared with baseline, supporting our hypothesis that the positive within-trial correlation of SC and SR betas arise from simultaneous representation rather than inflation. The related interpretation was added to page 17 of the revised manuscript.

      (4) Are results robust to running response-locked analyses? Especially the EEG-behavior correlation. Could this be driven by different RTs across trials & trial-types? I.e., at 400 ms poststim onset, some trials would be near or at RT/action execution, while others may not be nearly as close, and so EEG features would differ & "predict" RT.

      Thanks for this question. We now pair each of the stimulus-locked EEG analysis in the manuscript with response-locked analysis. To control for RT variations among trial types, when using the linear mixed model (LMM) to predict RTs from trial-wise RSA results, we included a separate intercept for each of the eight trial types in SC or SR. Furthermore, at each time point, we only included trials that have not generated a response (for stimulus-locked analysis) or already started (for response-locked analysis). All the results (Fig. 3, 5, 7, 9 in the revised manuscript) are in support of our hypothesis. We added these detailed to page 31 of the revised manuscript.

      (5) I suggest providing more explanation about the logic of the subspace decoding method - what trialtypes exactly constitute the different classes, why we would expect this method to capture something useful regarding ISPC, & what this something might be. I felt that the first paragraph of the results breezes by a lot of important logic.

      In general, this paper does not seem to be written for readers who are unfamiliar with this particular topic area. If authors think this is undesirable, I would suggest altering the text.

      To improve clarity, we revised the first paragraph of the SC and SR association subspace analysis to list the conditions for each of the SC and SR decoders and explain more about how the concept of being separatable can be tested by cross-decoding between SC and SR subspaces. The revised paragraph now reads:

      “Prior to testing whether controlled and non-controlled associations were represented simultaneously, we first tested whether the two representations were separable in the EEG data.

      In other words, we reorganized the 16 experimental conditions into 8 conditions for SC (4 colors × MC/MI, while collapsing across SR levels) and SR (4 words × 2 possible responses per word, while collapsing across SC levels) associations separately. If SC and SR associations are not separable, it follows that they encode the same information, such that both SC and SR associations can be represented in the same subspace (i.e., by the same information encoded in both associations). For example, because (1) the word can be determined by the color and congruency and (2) the most-likely response can be determined by color and ISPC, the SR association (i.e., association between word and most-likely response) can in theory be represented using the same information as the SC association. On the other hand, if SC and SR associations are separable, they are expected to be represented in different subspaces (i.e., the information used to encode the two associations is different). Notably, if some, but not all, information is shared between SC and SR associations, they are still separable by the unique information encoded. In this case, the SC and SR subspaces will partially overlap but still differ in some dimensions. To summarize, whether SC and SR associations are separable is operationalized as whether the associations are represented in the same subspace of EEG data. To test this, we leveraged the subspace created by the LDA (see Methods). Briefly, to capture the subspace that best distinguishes our experimental conditions, we trained SC and SR decoders using their respective aforementioned 8 experimental conditions. We then projected the EEG data onto the decoding weights of the LDA for each of the SC and SR decoders to obtain its respective subspace. We hypothesized that if SC and SR subspaces are identical (i.e., not separable), SC/SR decoding accuracy should not differ by which subspace (SC or SR) the decoder is trained on. For example, SC decoders trained in SC subspace should show similar decoding performance as SC decoders trained in SR subspace. On the other hand, if SC and SR association representations are in different subspaces, the SC/SR subspace will not encode all information for SR/SC associations. As a result, decoding accuracy should be higher using its own subspace (e.g., decoding SC using the SC subspace) than using the other subspace (e.g., decoding SC using the SR subspace). We used cross-validation to avoid artificially higher decoding accuracy for decoders using their own subspace (see Methods).” (Page 11-12).

      We also explicitly tested what information is shared between SC and SR representations (see response to comment #2). Lastly, to help the readers navigate the EEG results, we added a section “Overview of EEG analysis” to summarize the EEG analysis and their relations in the following manner:

      “EEG analysis overview. We started by validating that the 16 experimental conditions (8 unique stimuli × MC/MI) were represented in the EEG data. Evidence of representation was provided by above-chance decoding of the experimental conditions (Fig. 2-3). We then examined whether the SC and SR associations were separable (i.e., whether SC and SR associations were different representations of equivalent information). As our results supported separable representations of SC and SR association (Fig. 4-5), we further estimated the temporal dynamics of each representation within a trial using RSA. This analysis revealed that the temporal dynamics of SC and SR association representations overlapped (Fig. 6a-b, Fig. 7a-b). To explore the potential reason behind the temporal overlap of the two representations, we investigated whether SC and SR associations were represented simultaneously as part of the task representation, independently from each other, or competitively/exclusively (e.g., on some trials only SC association was represented, while on other trials only SR association was represented). This was done by assessing the correlation between the strength of SC and SR representations across trials (Fig. 6c, Fig. 7c). Lastly, we tested how SC and SR representations facilitated performance (Fig.8-9).” (Page 8-9).

      Minor suggestions:

      (6) I'd suggest using single-trial RSA beta coefficients, not t-values, as they can be more stable (it's a t-value based on 16 observations against 9 or so regressors.... the SE can be tiny).

      Thank you for your suggestion. To choose between using betas and t-values, we calculate the proportion of outliers (defined as values beyond mean ± 5 SD) for each predictor of the design matrix and each subject. We found that outliers were less frequent for t-values than for beta coefficients (t-values: mean = 0.07%, SD = 0.009%; beta-values: mean = 0.19%, SD = 0.033%). Thus, we decided to stay with t-values.

      (7) Instead of prewhitening the RTs before the HLM with drift terms, try putting those in the HLM itself, to avoid two-stage regression bias.

      Thank you for your suggestion. Because our current LMM included each of the eight trial types in SC or SR as separate predictors with their own intercepts (as mentioned above), adding regressors of trial number and mini blocks (1-100 blocks) introduced collinearity (as ISPC flipped during the experiment). We therefore excluded these regressors from the current LMM (Page 31).

      (8) The text says classical MDS was performed on decoding *accuracy* - is this accurate?

      We now clarify in the manuscript that it is the decoders’ probabilistic classification results (Page 28).

      (9) At a few points, it was claimed that a negative correlation between SC and SR would be expected within single trials, if the two were temporally dissociable. Wouldn't it also be possible that they are not correlated/orthogonal?

      We agree with the reviewer and revised the null hypothesis in the cross-trial correlation analysis to include no correlation as SC and SR association representations may be independent from each other (Page 17, 22).

      Reviewer #2 (Public review):

      Summary:

      In this EEG study, Huang et al. investigated the relative contribution of two accounts to the process of conflict control, namely the stimulus-control association (SC), which refers to the phenomenon that the ratio of congruent vs. incongruent trials affects the overall control demands, and the stimulus-response association (SR), stating that the frequency of stimulusresponse pairings can also impact the level of control. The authors extended the Stroop task with novel manipulation of item congruencies across blocks in order to test whether both types of information are encoded and related to behaviour. Using decoding and RSA, they showed that the SC and SR representations were concurrently present in voltage signals, and they also positively co-varied. In addition, the variability in both of their strengths was predictive of reaction time. In general, the experiment has a solid design, but there are some confounding factors in the analyses that should be addressed to provide strong support for the conclusions.

      Strengths:

      (1) The authors used an interesting task design that extended the classic Stroop paradigm and is potentially effective in teasing apart the relative contribution of the two different accounts regarding item-specific proportion congruency effect, provided that some confounds are addressed.

      (2) Linking the strength of RSA scores with behavioural measures is critical to demonstrating the functional significance of the task representations in question.

      Thank you for your positive feedback. We hope our responses below address your concerns.

      Weakness:

      (1) While the use of RSA to model the decoding strength vector is a fitting choice, looking at the RDMs in Figure 7, it seems that SC, SR, ISPC, and Identity matrices are all somewhat correlated. I wouldn't be surprised if some correlations would be quite high if they were reported. Total orthogonality is, of course, impossible depending on the hypothesis, but from experience, having highly covaried predictors in a regression can lead to unexpected results, such as artificially boosting the significance of one predictor in one direction, and the other one to the opposite direction. Perhaps some efforts to address how stable the timed-resolved RSA correlations for SC and SR are with and without the other highly correlated predictors will be valuable to raising confidence in the findings.

      Thank you for this important point. The results of proportion of variability explained shown in the Author response table 1 below, indicated relatively higher correlation of SC/SR with Color and Identity. We agree that it is impossible to fully orthogonalize them. To address the issue of collinearity, we performed a control RSA by removing predictors highly correlated with others. Specifically, we calculated the variance inflation factor (VIF) for each predictor. The Identity predictor had a high VIF of 5 and was removed from the RSA. All other predictors had VIFs < 4 and were kept in the RSA. The results (Supplementary Fig. 6) showed patterns similar to the results with the Identity predictor, suggesting that the findings are not significantly influenced by collinearity. We have added the interpretation to page 17 of the revised manuscript.

      Author response table 1.

      Proportion of variability explained (r<sup>2</sup>) of RSA predictors.

      (2) In "task overview", SR is defined as the word-response pair; however, in the Methods, lines 495-496, the definition changed to "the pairing between word and ISPC" which is in accordance with the values in the RDMs (e.g., mccbb and mcirb have similarity of 1, but they are linked to different responses, so should they not be considered different in terms of SR?). This needs clarification as they have very different implications for the task design and interpretation of results, e.g., how correlated the SC and SR manipulations were.

      Thank you for pointing out this important issue with how our operationalization captures the concept in questions. In the revised manuscript, we clarified the stimulus-response (SR) association is the link between the word and the most-likely response (i.e., not necessarily the actual response on the current trial). This association is likely to be encoded based on statistical learning over several trials. On each trial, the association is updated based on the stimulus and the actual response. Over multiple trials, the accumulated association will be driven towards the most-common (i.e., most-likely) response. In our ISPC manipulation, a color is presented in mostly congruent/incongruent (MC/MI) trials, which will also pair a word with a most-likely response. For example, if the color blue is MC, the color blue, which leads to the response blue, will co-occur with the word blue with high frequency. In other words, the SR association here is between the word blue and the response blue. As the actual response is not part of the SR association, in the RDM two trial types with different responses may share the same SR association, as long as they share the same word and the same ISPC manipulation, which, by the logic above, will produce the same most-likely response. These clarifications have been added to page 4 and 29 of the revised manuscript.

      In the revised manuscript (Page 17), we addressed how much the correlated SC and SR predictors in the RDM could affect the correlation analysis between SC and SR association representation strength. Specifically, we conducted the RSA using the same GLM on EEG data prior to stimulus onset (Supplementary Fig. 7a-b). As no SC and SR associations are expected to be present before stimulus onset, the correlation between SC and SR representation would serve as a baseline of inflation due to correlated predictors in the GLM (Supplementary Fig. 7c, also see comment #3 of R1). The SC-SR correlation coefficients following stimulus onset was then compared to the baseline to control for potential inflation (Fig. 6c). Significantly above-baseline correlation was still observed between ~100-500 ms post-stimulus onset, providing support for the hypothesis that SC and SR are encoded in the same task representation.

      Minor suggestions:

      (3) Overall, I find that calling SC-controlled and SR-uncontrolled representations unwarranted. How is the level controlledness defined? Both are essentially types of statistical expectation that provide contextual information for the block of tasks. Is one really more automatic and requires less conscious processing than the other? More background/justification could be provided if the authors would like to use these terms.

      Following your advice, we have added more discussion on how controlledness is conceptualized in this work and in the literature, which reads:

      “We consider SC and SR as controlled and uncontrolled respectively based on the literature investigating the mechanism of ISPC effect. The SC account posits that the ISPC effect results from conflict and involves conflict adaptation, which requires the regulation of attention or control (Bugg & Hutchison, 2013; Bugg et al., 2011; Schmidt, 2018; Schmidt & Besner, 2008). On the other hand, the SR account argues that ISPC effect does not require conflict adaptation but instead reflects contingency leaning. That is, the response can be directly retrieved from the association between the stimulus and the most-likely response without top-down regulation of attention or control. As more empirical evidence emerged, researchers advocating control view began to acknowledge the role of associative learning in cognitive control regarding the ISPC effect (Abrahamse et al., 2016). SC association has been thought to include both automatic that is fast and resource saving and controlled processes that is flexible and generalizable (Chiu, 2019). Overall, we do not intend to claim that SC is entirely controlled or SR is completely automatic. We use SC-controlled and SR-uncontrolled representations to align with the original theoretical motivation and to highlight the conceptual difference between SC and SR associations.” (Page 24-25)

      (4) Figures 3c and d: the figures could benefit from more explanation of what they try to show to the readers. Also for 3d, the dimensions were aligned with color sets and congruencies, but word identities were not linearly separable, at least for the first 3 axes. Shouldn't one expect that words can be decoded in the SR subspace if word-response pairs were decodable (e.g., Figure 3b)?

      Thank you for the insightful observation. We now clarified that Fig. 3c and d in the original manuscript (Fig. 4c and d in the current manuscript) aim to show how each of the 8 trial types in the SC and SR subspaces are represented. The MDS approach we used for visualization tries to preserve dissimilarity between trial types when projecting from data from a high dimensional to a low dimensional space. However, such projection may also make patterns linearly separatable in high dimensional space not linearly separatable in low dimensional space. For example, if the word blue has two points (-1, -1) and (1, 1) and the word red has two points (-1, 1) and (1, -1), they are not linearly separatable in the 2D space. Yet, if they are projected from a 3D space with coordinates of (-1, -1, -0.1), (1, 1, -0.1), (-1, 1, 0.1) and (1, -1, 0.1), the two words can be linearly separatable using the 3<sup>rd</sup> dimension. Thus, a better way to test whether word can be linearly separated in SR subspace is to perform RSA on the original high dimensional space. We performed the RSA with word (Supplementary Fig. 2) on the SR decoder trained on the SR subspace. Note that in Fig. 3c and d of the original script (Fig. 4c and d in the current manuscript) there are two pairs of words that are not linearly separable: red-blue and yellow-green. Thus, we specifically tested the separability within the two pairs using the one predictor for each pair, as shown in Supplementary Fig. 2. The results showed that within both word pairs individual words were presented above chance level (Supplementary Fig. 3). Considering that the decoders are linear, this finding indicates linear separability of the word pairs in the original SR subspace. The clarification has been added to page 13 (the end of the second paragraph) of the revised manuscript.

      References

      Abrahamse, E., Braem, S., Notebaert, W., & Verguts, T. (2016). Grounding cognitive control in associative learning. Psychological Bulletin, 142(7), 693-728.doi:10.1037/bul0000047.

      Bugg, J. M., & Hutchison, K. A. (2013). Converging evidence for control of color-word Stroop interference at the item level. Journal of Experimental Psychology:Human Perception and Performance, 39(2), 433-449. doi:10.1037/a0029145.

      Bugg, J. M., Jacoby, L. L., & Chanani, S. (2011). Why it is too early to lose control in accounts of item-specific proportion congruency effects. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 844-859. doi:10.1037/a0019957.

      Chiu, Y.-C. (2019). Automating adaptive control with item-specific learning. In Psychology of Learning and Motivation (Vol. 71, pp. 1-37).

      Schmidt, J. R. (2018). Evidence against conflict monitoring and adaptation: An updated review. Psychonomic Bulletin & Review, 26(3), 753-771. doi:10.3758/s13423018-1520-z.

      Schmidt, J. R., & Besner, D. (2008). The Stroop effect: Why proportion congruent has nothing to do with congruency and everything to do with contingency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 34(3), 514-523. doi:10.1037/0278-7393.34.3.514.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      In this article, Kawanabe-Kobayashi et al., aim to examine the mechanisms by which stress can modulate pain in mice. They focus on the contribution of noradrenergic neurons (NA) of the locus coeruleus (LC). The authors use acute restraint stress as a stress paradigm and found that following one hour of restraint stress mice display mechanical hypersensitivity. They show that restraint stress causes the activation of LC NA neurons and the release of NA in the spinal cord dorsal horn (SDH). They then examine the spinal mechanisms by which LC→SDH NA produces mechanical hypersensitivity. The authors provide evidence that NA can act on alphaA1Rs expressed by a class of astrocytes defined by the expression of Hes (Hes+). Furthermore, they found that NA, presumably through astrocytic release of ATP following NA action on alphaA1Rs Hes+ astrocytes, can cause an adenosine-mediated inhibition of SDH inhibitory interneurons. They propose that this disinhibition mechanism could explain how restraint stress can cause the mechanical hypersensitivity they measured in their behavioral experiments.

      Strengths:

      (1) Significance. Stress profoundly influences pain perception; resolving the mechanisms by which stress alters nociception in rodents may explain the well-known phenomenon of stress-induced analgesia and/or facilitate the development of therapies to mitigate the negative consequences of chronic stress on chronic pain.

      (2) Novelty. The authors' findings reveal a crucial contribution of Hes+ spinal astrocytes in the modulation of pain thresholds during stress.

      (3) Techniques. This study combines multiple approaches to dissect circuit, cellular, and molecular mechanisms including optical recordings of neural and astrocytic Ca2+ activity in behaving mice, intersectional genetic strategies, cell ablation, optogenetics, chemogenetics, CRISPR-based gene knockdown, slice electrophysiology, and behavior.

      Weaknesses:

      (1) Mouse model of stress. Although chronic stress can increase sensitivity to somatosensory stimuli and contribute to hyperalgesia and anhedonia, particularly in the context of chronic pain states, acute stress is well known to produce analgesia in humans and rodents. The experimental design used by the authors consists of a single one-hour session of restraint stress followed by 30 min to one hour of habituation and measurement of cutaneous mechanical sensitivity with von Frey filaments. This acute stress behavioral paradigm corresponds to the conditions in which the clinical phenomenon of stress-induced analgesia is observed in humans, as well as in animal models. Surprisingly, however, the authors measured that this acute stressor produced hypersensitivity rather than antinociception. This discrepancy is significant and requires further investigation.

      We thank the reviewer for evaluating our work and for highlighting both its strengths and weaknesses. As stated by the reviewer, numerous studies have reported acute stress-induced antinociception. However, as shown in a new additional table (Table S1) in which we have summarized previously published data using the acute restraint stress model employed in our present study, most studies reporting antinociceptive effects of acute restraint stress assessed behavioral responses to heat stimuli or formalin. This observation is consistent with the findings from our previous study (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215)). The present study also confirms that acute restraint stress reduces behavioral responses to noxious heat (see also our response to Comment #2 below). In contrast to the robust and consistent antinociceptive effects observed with thermal stimuli, some studies evaluating behavioral responses to mechanical stimuli have reported stress-induced hypersensitivity (see Table S1), which aligns with our current findings. Taken together, these data support our original notion that the effects of acute stress on pain-related behaviors depend on several factors, including the nature, duration, and intensity of the stressor, as well as the sensory modality assessed in behavioral tests. We have incorporated this discussion and Table S1 into the revised manuscript (lines 344-353). Furthermore, we have slightly modified the text including the title, replacing "pain facilitation" with "mechanical pain hypersensitivity" to more accurately reflect our research focus and the conclusion of this study that LC<sup>→SDH</sup> NAergic signaling to spinal astrocytes is required for stress-induced mechanical pain hypersensitivity. Finally, while mouse models of stress could provide valuable insights, the clinical relevance of stress-induced mechanical pain hypersensitivity remains to be elucidated and requires further investigation. We hope these clarifications address your concerns.

      (2) Specifically, is the hypersensitivity to mechanical stimulation also observed in response to heat or cold on a hotplate or coldplate?

      Thank you for your important comment. We have now conducted additional behavioral experiments to assess responses to heat using the hot-plate test. We found that mice subjected to restraint stress did not exhibit behavioral hypersensitivity to heat stimuli; instead, they displayed antinociceptive responses (Figure S2; lines 95-98). These results are consistent with our previous findings (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215)) as well as numerous other reports (Table S1).

      (3) Using other stress models, such as a forced swim, do the authors also observe acute stress-induced hypersensitivity instead of stress-induced antinociception?

      As suggested by the reviewer, we conducted a forced swim test. We found that mice subjected to forced swimming, which has been reported to produce analgesic effects on thermal stimuli (Contet et al., Neuropsychopharmacology, 2006 (PMID: 16237385)), did not exhibit any changes in mechanical pain hypersensitivity (Figure S2; lines 98-99). Furthermore, a previous study demonstrated that mechanical pain sensitivity is enhanced by other stress models, such as exposure to an elevated open platform for 30 min (Kawabata et al., Neuroscience, 2023 (PMID: 37211084)). However, considering our data showing that changes in mechanosensory behavior induced by restraint stress depend on the duration of exposure (Figure S1), and that restraint stress also produced an antinociceptive effect on heat stimuli (Figure S2), stress-induced modulation of pain is a complex phenomenon influenced by multiple factors, including the stress model, intensity, and duration, as well as the sensory modality used for behavioral testing (lines 100-103).

      (4) Measurement of stress hormones in blood would provide an objective measure of the stress of the animals.

      A previous study has demonstrated that plasma corticosterone levels—a stress hormone—are elevated following a 1-hour exposure to restraint stress in mice (Kim et al., Sci Rep, 2018 (PMID: 30104581)), using a stress protocol similar to that employed in our current study. We have included this information with citing this paper (lines 104-105).

      (5) Results:

      (a) Optical recordings of Ca2+ activity in behaving rodents are particularly useful to investigate the relationship between Ca2+ dynamics and the behaviors displayed by rodents.

      In the optical recordings of Ca<sup>2+</sup> activity in LC neurons, we monitored mouse behavior during stress exposure. We have now included a video of this in the revised manuscript (video; lines 111-114).

      (b) The authors report an increase in Ca2+ events in LC NA neurons during restraint stress: Did mice display specific behaviors at the time these Ca2+ events were observed such as movements to escape or orofacial behaviors including head movements or whisking?

      By reanalyzing the temporal relationship between Ca<sup>2+</sup> events and mouse behavior during stress exposure, we found that the Ca<sup>2+</sup> transients and escape behaviors (struggling) occurred almost simultaneously (video). A similar temporal correlation is also observed in Ca<sup>2+</sup> responses in the bed nucleus of the stria terminalis (Luchsinger et al., Nat Commun, 2021 (PMID: 34117229)). The video file has been included in the revised manuscript (video; lines 111-113, 552-553, 573-575).

      Additionally, as described in the Methods section and shown in Figure S2 of the initial version (now Figure S3), non-specific signals or artifacts—such as those caused by head movements—were corrected (although such responses were minimal in our recordings).

      (c) Additionally, are similar increases in Ca2+ events in LC NA neurons observed during other stressful behavioral paradigms versus non-stressful paradigms?

      We appreciate the reviewer's valuable suggestion. Since the present, initial version of our manuscript focused on acute restraint stress, we did not measure Ca<sup>2+</sup> events in LC-NA neurons in other stress models, but a recent study has shown an increase in Ca<sup>2+</sup> responses in LC-NA neurons by social defeat stress (Seiriki et al., BioRxiv, https://www.biorxiv.org/content/10.1101/2025.03.07.641347v1).

      (d) Neuronal ablation to reveal the function of a cell population.

      This method has been widely used in numerous previous studies as an effective experimental approach to investigate the role of specific neuronal populations—including SDH-projecting LC-NA neurons (Ma et al., Brain Res, 2022 (PMID: 34929182); Kawanabe et al., Mol Brain, 2021 (PMID: 33971918))—in CNS function.

      (e) The proportion of LC NA neurons and LC→SDH NA neurons expressing DTR-GFP and ablated should be quantified (Figures 1G and J) to validate the methods and permit interpretation of the behavioral data (Figures 1H and K). Importantly, the nocifensive responses and behavior of these mice in other pain assays in the absence of stress (e.g., hotplate) and a few standard assays (open field, rotarod, elevated plus maze) would help determine the consequences of cell ablation on processing of nociceptive information and general behavior.

      As suggested, we conducted additional experiments to quantitatively analyze the number of LC<sup>→SDH</sup>-NA neurons. We used WT mice injected with AAVretro-Cre into the SDH (L4 segment) and AAV-FLEx[DTR-EGFP] into the LC. In these mice, 4.4% of total LC-NA neurons [positive for tyrosine hydroxylase (TH)] expressed DTR-GFP, representing the LC<sup>→SDH</sup>-NA neuronal population (Figure S4; lines 126-127). Furthermore, treatment with DTX successfully ablated the DTR-expressing LC<sup>→SDH</sup>-NA neurons. Importantly, the neurons quantified in this analysis were specifically those projecting to the L4 segment of the SDH; therefore, the total number of SDH-projecting LC-NA neurons across all spinal segments is expected to be much higher.

      We also performed the rotarod and paw-flick tests to assess motor function and thermal sensitivity following ablation of LC<sup>→SDH</sup>-NA neurons. No significant differences were observed between the ablated and control groups (Figure S5; lines 131-134), indicating that ablation of these neurons does not produce non-specific behavioral deficits in motor function or other sensory modalities.

      (f) Confirmation of LC NA neuron function with other methods that alter neuronal excitability or neurotransmission instead of destroying the circuit investigated, such as chemogenetics or chemogenetics, would greatly strengthen the findings. Optogenetics is used in Figure 1M, N but excitation of LCLC<sup>→SDH</sup> NA neuron terminals is tested instead of inhibition (to mimic ablation), and in naïve mice instead of stressed mice.

      We appreciate the reviewer’s comment. The optogenetic approach is useful for manipulating neuronal excitability; however, prolonged light illumination (> tens of seconds) can lead to undesirable tissue heating, ionic imbalance, and rebound spikes (Wiegert et al., Neuron, 2017 (PMID: 28772120)), making it difficult to apply in our experiments, in which mice are exposed to stress for 60 min. For this reason, we decided to employ the cell-ablation approach in stress experiments, as it is more suitable than optogenetic inhibition. In addition, as described in our response to weakness (1)-a) by Reviewer 3 (Public review), we have now demonstrated the specific expression of DTRs in NA neurons in the LC, but not in A5 or A7 (Figure S4; lines 127-128), confirming the specificity of LCLC<sup>→SDH</sup>-NAergic pathway targeting in our study. Chemogenetics represent another promising approach to further strengthen our findings on the role of LCLC<sup>→SDH</sup>-NA neurons, but this will be an important subject for future studies, as it will require extensive experiments to assess, for example, the effectiveness of chemogenetic inhibition of these neurons during 60 min of restraint stress, as well as optimization of key parameters (e.g., systemic DCZ doses).

      (g) Alpha1Ars. The authors noted that "Adra1a mRNA is also expressed in INs in the SDH".

      The expression of α<sub>1A</sub>Rs in inhibitory interneurons in the SDH is consistent with our previous findings (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215)) as well as with scRNA-seq data (http://linnarssonlab.org/dorsalhorn/, Häring et al., Nat Neurosci, 2018 (PMID: 29686262)).

      (h) The authors should comprehensively indicate what other cell types present in the spinal cord and neurons projecting to the spinal cord express alpha1Ars and what is the relative expression level of alpha1Ars in these different cell types.

      According to the scRNA-seq data (https://seqseek.ninds.nih.gov/genes, Russ et al., Nat Commun, 2021 (PMID: 34588430); http://linnarssonlab.org/dorsalhorn/, Häring et al., Nat Neurosci, 2018 (PMID: 29686262)), we confirmed that α<sub>1A</sub>Rs are predominantly expressed in astrocytes and inhibitory interneurons in the spinal cord. Also, an α<sub>1A</sub>R-expressing excitatory neuron population (Glut14) expresses Tacr1, GPR83, and Tac1 mRNAs, markers that are known to be enriched in projection neurons of the SDH. This raises the possibility that α<sub>1A</sub> Rs may also be expressed in a subset of projection neurons, although further experiments are required to confirm this. In DRG neurons, α<sub>1A</sub>R expression was detected to some extent, but its level seems to be much lower than in the spinal cord (http://linnarssonlab.org/drg/ Usoskin et al., Nat Neurosci, 2015 (PMID: 25420068)). Consistent with this, primary afferent glutamatergic synaptic transmission has been shown to be unaffected by α<sub>1A</sub>R agonists (Kawasaki et al., Anesthesiology, 2003 (PMID: 12606912); Li and Eisenach, JPET, 2001 (PMID: 11714880)). This information has been incorporated into the Discussion section (lines 317-319).

      (i) The conditional KO of alpha1Ars specifically in Hes5+ astrocytes and not in other cell types expressing alpha1Ars should be quantified and validated (Figure 2H).

      We have previously shown a selective KO of α<sub>1A</sub>R in Hes5<sup>+</sup> astrocytes in the same mouse line (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). This information has been included in the revised text (line 166-167).

      (j) Depolarization of SDH inhibitory interneurons by NA (Figure 3). The authors' bath applied NA, which presumably activates all NA receptors present in the preparation.

      We believe that the reviewer’s concern may pertain to the possibility that NA acts on non-Vgat<sup>+</sup> neurons, thereby indirectly causing depolarization of Vgat<sup>+</sup> neurons. As described in the Method section of the initial version, in our electrophysiological experiments, we added four antagonists for excitatory and inhibitory neurotransmitter receptors—CNQX (AMPA receptor), MK-801 (NMDA receptor), bicuculline (GABA<sub>A</sub> receptor), and strychnine (glycine receptor)—to the artificial cerebrospinal fluid to block synaptic inputs from other neurons to the recorded Vgat<sup>+</sup> neurons. Since this method is widely used for this purpose in many previous studies (Wu et al., J Neurosci, 2004 (PMID: 15140934); Liu et al., Nat Neurosci, 2010 (PMID: 20835251)), it is reasonable to conclude that NA directly acts on the recorded SDH Vgat<sup>+</sup> interneurons to produce excitation (lines 193-196).

      (k) The authors' model (Figure 4H) implies that NA released by LC→SDH NA neurons leads to the inhibition of SDH inhibitory interneurons by NA. In other experiments (Figure 1L, Figure 2A), the authors used optogenetics to promote the release of endogenous NA in SDH by LC→SDH NA neurons. This approach would investigate the function of NA endogenously released by LC NA neurons at presynaptic terminals in the SDH and at physiological concentrations and would test the model more convincingly compared to the bath application of NA.

      We appreciate the reviewer’s valuable comment. As noted, optogenetic stimulation of LC<sup>→SDH</sup>-NA neurons would indeed be useful to test this model. However, in our case, it is technically difficult to investigate the responses of Vgat<sup>+</sup> inhibitory neurons and Hes5<sup>+</sup> astrocytes to NA endogenously released from LC<sup>→SDH</sup>-NA neurons. This would require the use of Vgat-Cre or Hes5-CreERT2 mice, but employing these lines precludes the use of NET-Cre mice, which are necessary for specific and efficient expression of ChrimsonR in LC<sup>→SDH</sup>-NA neurons. Nevertheless, all of our experimental data consistently support the proposed model, and we believe that the reviewer will agree with this, without additional experiments that is difficult to conduct because of technical limitations (lines 382-388).

      (l) As for other experiments, the proportion of Hes+ astrocytes that express hM3Dq, and the absence of expression in other cells, should be quantified and validated to interpret behavioral data.

      We thank the reviewer for raising this point. In our experiments, we used an HA-tag (fused with hM3Dq) to confirm hM3Dq expression. However, it is difficult to precisely analyze individual astrocytes because, as shown in Figure 3J, the boundaries of many HA-tag<sup>+</sup> astrocytes are indistinguishable. This seems to be due to the membrane localization of HA-tag, the complex morphology of astrocytes, and their tile-like distribution pattern (Baldwin et al., Trends Cell Biol, 2024 (PMID: 38180380)). Nevertheless, our previous study demonstrated that ~90% of astrocytes in the superficial laminae are Hes5<sup>+</sup> (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), and intra-SDH injection of AAV-hM3Dq labeled the majority of superficial astrocytes (Figure 3J). Thus, AAV-FLEx[hM3Dq] injection into Hes5-CreERT2 mice allows efficient expression of hM3Dq in Hes5<sup>+</sup> astrocytes in the SDH. Importantly, our previous studies using Hes5-CreERT2 mice have confirmed that hM3Dq is not expressed in other cell types (neurons, oligodendrocytes, or microglia) (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); Kagiyama et al., Mol Brain, 2025 (PMID: 40289116)). This information regarding the cell-type specificity has now been briefly described in the revised version (lines 218-219).

      (m) Showing that the effect of CNO is dose-dependent would strengthen the authors' findings.

      Thank you for your comment. We have now demonstrated a dose-dependent effect of CNO on Ca<sup>2+</sup> responses in SDH astrocytes (please see our response to Major Point (4) from Reviewer #2 (Recommendations for the Authors) (Figure S7; lines 225-228). In addition, we also confirmed that the effect of CNO is not nonspecific, as CNO application did not alter sIPSCs in spinal cord slices prepared from mice lacking hM3Dq expression in astrocytes (Figure S7; lines 225-228).

      (n) The proportion of SG neurons for which CNO bath application resulted in a reduction in recorded sIPSCs is not clear.

      We have included individual data points in each bar graph to more clearly illustrate the effect of CNO on each neuron (Figure 3L, N).

      (o) A1Rs. The specific expression of Cas9 and guide RNAs, and the specific KD of A1Rs, in inhibitory interneurons but not in other cell types expressing A1Rs should be quantified and validated.

      In addition to the data demonstrating the specific expression of SaCas9 and sgAdora1 in Vgat<sup>+</sup> inhibitory neurons shown in Figure 3G of the initial version, we have now conducted the same experiments with a different sample and confirmed this specificity: SaCas9 (detected via HA-tag) and sgAdora1 (detected via mCherry) were expressed in PAX2<sup>+</sup> inhibitory neurons (Author response image 1). Furthermore, as shown in Figure 3H and I in the initial version, the functional reduction of A<sub>1</sub>Rs in inhibitory neurons was validated by electrophysiological recordings. Together, these results support the successful deletion of A<sub>1</sub>Rs in inhibitory neurons.

      Author response image 1.

      Expression of HA-tag and mCherry in inhibitory neurons (a different sample from Figure 3G) SaCas9 (yellow, detected by HA-tag) and mCherry (magenta) expression in the PAX2<sup>+</sup> inhibitory neurons (cyan) at 3 weeks after intra-SDH injection of AAV-FLEx[SaCas9-HA] and AAV-FLEx[mCherry]-U6-sgAdora1 in Vgat-Cre mice. Arrowheads indicate genome-editing Vgat<sup>+</sup> cells. Scale bar, 25 µm.

      (6) Methods:

      It is unclear how fiber photometry is performed using "optic cannula" during restraint stress while mice are in a 50ml falcon tube (as shown in Figure 1A).

      We apologize for the omission of this detail in the Methods section. To monitor Ca<sup>2+</sup> events in LC-NA neurons during restraint stress, we created a narrow slit on the top of the conical tube, allowing mice to undergo restraint stress while connected to the optic fiber (see video). This information has now been added to the Methods section (lines 552-553).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Scientific rigor:

      It is unclear if the normal distribution of the data was determined before selecting statistical tests.

      We apologize for omitting this description. For all statistical analyses in this study, we first assessed the normality of the data and then selected appropriate statistical tests accordingly. We have added this information to the revised manuscript (lines 711-712).

      (2) Nomenclature:

      (a) Mouse Genome Informatics (MGI) nomenclature should be used to describe mouse genotypes (i.e., gene name in italic, only first letter is capitalized, alleles in superscript).

      (b) FLEx should be used instead of flex.

      Thank you for the suggestion. We have corrected these terms (including FLEx) according to MGI nomenclature.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the role of spinal astrocytes in mediating stress-induced pain hypersensitivity, focusing on the LC (locus coeruleus)-to-SDH (spinal dorsal horn) circuit and its mechanisms. The authors aimed to delineate how LC activity contributes to spinal astrocytic activation under stress conditions, explore the role of noradrenaline (NA) signaling in this process, and identify the downstream astrocytic mechanisms that influence pain hypersensitivity.

      The authors provide strong evidence that 1-hour restraint stress-induced pain hypersensitivity involves the LC-to-SDH circuit, where NA triggers astrocytic calcium activity via alpha1a adrenoceptors (alpha1aRs). Blockade of alpha1aRs on astrocytes - but not on Vgat-positive SDH neurons - reduced stress-induced pain hypersensitivity. These findings are rigorously supported by well-established behavioral models and advanced genetic techniques, uncovering the critical role of spinal astrocytes in modulating stress-induced pain.

      However, the study's third aim - to establish a pathway from astrocyte alpha1aRs to adenosine-mediated inhibition of SDH-Vgat neurons - is less compelling. While pharmacological and behavioral evidence is intriguing, the ex vivo findings are indirect and lack a clear connection to the stress-induced pain model. Despite these limitations, the study advances our understanding of astrocyte-neuron interactions in stress-pain contexts and provides a strong foundation for future research into glial mechanisms in pain hypersensitivity.

      Strengths:

      The study is built on a robust experimental design using a validated 1-hour restraint stress model, providing a reliable framework to investigate stress-induced pain hypersensitivity. The authors utilized advanced genetic tools, including retrograde AAVs, optogenetics, chemogenetics, and subpopulation-specific knockouts, allowing precise manipulation and interrogation of the LC-SDH circuit and astrocytic roles in pain modulation. Clear evidence demonstrates that NA triggers astrocytic calcium activity via alpha1aRs, and blocking these receptors effectively reduces stress-induced pain hypersensitivity.

      Weaknesses:

      Despite its strengths, the study presents indirect evidence for the proposed NA-to-astrocyte(alpha1aRs)-to-adenosine-to-SDH-Vgat neurons pathway, as the link between astrocytic adenosine release and stress-induced pain remains unclear. The ex vivo experiments, including NA-induced depolarization of Vgat neurons and chemogenetic stimulation of astrocytes, are challenging to interpret in the stress context, with the high CNO concentration raising concerns about specificity. Additionally, the role of astrocyte-derived D-serine is tangential and lacks clarity regarding its effects on SDH Vgat neurons. The astrocyte calcium signal "dip" after LC optostimulation-induced elevation are presented without any interpretation.

      We appreciate the reviewer's careful reading of our paper. According to the reviewer's comments, we have performed new additional experiments and added some discussion in the revised manuscript (please see the point-by-point responses below).

      Reviewer #2 (Recommendations for the authors):

      The astrocyte-mediated pathway of NA-to-astrocyte (alpha1aRs)-to-adenosine-to-SDH Vgat neurons (A1R) in the context of stress-induced pain hypersensitivity requires more direct evidence. While the data showing that the A1R agonist CPT inhibits stress-induced hypersensitivity and that stress combined with Aβ fiber stimulation increases pERK in the SDH are intriguing, these findings primarily support the involvement of A1R on Vgat neurons and are only behaviorally consistent with SDH-Vgat neuronal A1R knockdown. The role of astrocytes in this pathway in vivo remains indirect. The ex vivo chemogenetic Gq-DREADD stimulation of SDH astrocytes, which reduced sIPSCs in Vgat neurons in a CPT-dependent manner, needs revision with non-DREADD+CNO controls to validate specificity. Furthermore, the ex vivo bath application of NA causing depolarization in Vgat neurons, blocked by CPT, adds complexity to the data leaving me wondering how astrocytes are involved in such processes, and it does not directly connect to stress-induced pain hypersensitivity. These findings are potentially useful but require additional refinement to establish their relevance to the stress model.

      We thank the reviewer for the insightful feedback. First, regarding the role of astrocytes in this pathway in vivo, we showed in the initial version that mechanical pain hypersensitivities induced by intrathecal NA injection and by acute restraint stress were attenuated by both pharmacological blockade and Vgat<sup>+</sup> neuron-specific knockdown of A<sub>1</sub>Rs (Figure 4A, B). Given that NA- and stress-induced pain hypersensitivity is mediated by α<sub>1A</sub>R-dependent signaling in Hes5<sup>+</sup> astrocytes (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); this study), these findings provide in vivo evidence supporting the involvement of the NA → Hes5<sup>+</sup> astrocyte (via α<sub>1A</sub>Rs) → adenosine → Vgat<sup>+</sup> neuron (via A<sub>1</sub>Rs) pathway. As noted in the reviewer’s major comment (2), in vivo monitoring of adenosine dynamics in the SDH during stress exposure would further substantiate the astrocyte-to-neuron signaling pathway. However, we did not detect clear signals, potentially due to several technical limitations (see our response below). Acknowledging this limitation, we have now added a new paragraph in the end of Discussion section to address this issue. Second, the specificity of the effect of CNO has now been validated by additional experiments (see our response to major point (4)). Third, the reviewer’s concern regarding the action of NA on Vgat<sup>+</sup> neurons has also been addressed (see our response to major point (3) below).

      Major points:

      (1) The in vivo pharmacology using DCK to antagonize D-serine signaling from alpha1a-activated astrocytes is tangential, as there is limited evidence on how Vgat neurons (among many others) respond to D-serine. This aspect requires more focused exploration to substantiate its relevance.

      We propose that the site of action of D-serine in our neural circuit model is the NMDA receptors (NMDARs) on excitatory neurons, a notion supported by our previous findings (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); Kagiyama et al., Mol Brain, 2025 (PMID: 40289116)). However, we cannot exclude the possibility that D-serine also acts on NMDARs expressed by Vgat<sup>+</sup> inhibitory neurons. Nevertheless, given that intrathecal injection of D-serine in naïve mice induces mechanical pain hypersensitivity (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), it appears that the pronociceptive effect of D-serine in the SDH is primarily associated with enhanced pain processing and transmission, presumably via NMDARs on excitatory neurons. We have added this point to the Discussion section in the revised manuscript (lines 325-330).

      (2) Additionally, employing GRAB-Ado sensors to monitor adenosine dynamics in SDH astrocytes during NA signaling would significantly strengthen conclusions about astrocyte-derived adenosine's role in the stress model.

      We agree with the reviewer’s comment. Following this suggestion, we attempted to visualize NA-induced adenosine (and ATP) dynamics using GRAB-ATP and GRAB-Ado sensors (Wu et al., Neuron, 2022 (PMID: 34942116); Peng et al., Science, 2020 (PMID: 32883833)) in acutely isolated spinal cord slices from mice after intra-SDH injection of AAV-hSyn-GRABATP<sub>1.0</sub> and -GRABAdo<sub>1.0</sub>. We confirmed expression of these sensors in the SDH (Author response image 2a) and observed increased signals after bath application of ATP (0.1 or 1 µM) or adenosine (1 µM) (Author response image 2b, c). However, we were unable to detect clear signals following NA stimulation (Author response image 2b, c). The reason for this lack of detectable changes remains unclear. If the release of adenosine from astrocytes is a highly localized phenomenon, it may be measurable using high-resolution microscopy capable of detecting adenosine levels at the synaptic level and more sensitive sensors. Further investigation will therefore be required (lines 340-341).

      Author response image 2.

      Ex vivo imaging of GRAB-ATP and GRAB-Ado sensors.(a) Representative images of GRAB<sub>ATP1.0</sub> (left, green) or GRAB<sub>Ado1.0</sub> (right, green) expression in the SDH at 3 weeks after SDH injection of AAV-hSyn-GRAB<sub>Ado1.0</sub> or AAV-hSyn-GRAB<sub>Ado1.0</sub> in Hes5-CreERT2 mice. Scale bar, 200 µm. (b) Left: Representative fluorescence images showing GRAB<sub>ATP1.0</sub> responses before and after perfusion with NA or ATP. Right: Representative traces showing responses to ATP (0.1 and 1 µM) or NA (10 µM). (c) Left: Representative fluorescence images showing GRABAdo1.0 responses before and after perfusion with NA or adenosine (Ado). Right: Representative traces showing responses to Ado (0.01, 0.1, and 1 µM), NA (10 µM), or no application (negative control).

      (3) The interpretation of Figure 3D is challenging. The manuscript implies that 20 μM NA acts on Adra1a receptors on Vgat neurons to depolarize them, but this concentration should also activate Adra1a on astrocytes, leading to adenosine release and potential inhibition of depolarization. The observation of depolarization despite these opposing mechanisms requires explanation, as does the inhibition of depolarization by bath-applied A1R agonist. Of note, 20 μM NA is a high concentration for Adra1a activation, typically responsive at nanomolar levels. The discussion should reconcile this with prior studies indicating dose-dependent effects of NA on pain sensitivity (e.g., Reference 22).

      Like the reviewer, we also considered that bath-applied NA could activate α<sub>1A</sub>Rs expressed on Hes5<sup>+</sup> astrocytes. To clarify this point, we have performed additional patch-clamp recordings and found that knockdown of A<sub>1</sub>Rs in Vgat<sup>+</sup> neurons tended to increase the proportion of Vgat<sup>+</sup> neurons with NA-induced depolarizing responses (Figure S8). Therefore, it is conceivable that NA-induced excitation of Vgat<sup>+</sup> neurons may involve both a direct effect of NA activating α<sub>1A</sub>Rs in Vgat<sup>+</sup> neurons and an indirect inhibitory signaling from NA-stimulated Hes5<sup>+</sup> astrocytes via adenosine (lines 298-300).

      The concentration of NA used in our ex vivo experiments is higher than that typically used in vitro with αR-<sub>1A</sub>expressing cell lines or primary culture cells, but is comparable to concentrations used in other studies employing spinal cord slices (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652); Baba et al., Anesthesiology, 2000 (PMID: 10691236); Lefton et al., Science, 2025 (PMID: 40373122)). In slice experiments, drugs must diffuse through the tissue to reach target cells, resulting in a concentration gradient. Therefore, higher drug concentrations are generally necessary in slice experiments, in contrast to cultured cell experiments, where drugs are directly applied to target cells. Importantly, we have previously shown that the pharmacological effects of 20 μM NA on Vgat<sup>+</sup> neurons and Hes5<sup>+</sup> astrocytes are abolished by loss of α<sub>1A</sub>Rs in these cells (Uchiyama et al., Mol Brain, 2022 (PMID: 34980215); Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), confirming the specificity of these NA actions.

      Regarding the dose-dependent effect of NA on pain sensitivity, NA-induced pain hypersensitivity is abolished in Hes5<sup>+</sup> astrocyte-specific α<sub>1A</sub>R-KO mice (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), indicating that this behavior is mediated by α<sub>1A</sub>Rs expressed on Hes5<sup>+</sup> astrocytes. In contrast, the suppression of pain sensitivity by high doses of NA was unaffected in the KO mice (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), suggesting that other adrenergic receptors may contribute to this phenomenon. Clarifying the responsible receptors will require future investigation.

      (4) In Figure 3K-M, the CNO concentration used (100 μM) is unusually high compared to standard doses (1 to a few μM), raising concerns about potential off-target effects. Including non-hM3Dq controls and using lower CNO concentrations are essential to validate the specificity of the observed effects. Similarly, the study should clarify whether astrocyte hM3Dq stimulation alone (without NA) would induce hyperpolarization in Vgat neurons and how this interacts with NA-induced depolarization.

      We acknowledge that the concentration of CNO used in our experiments is relatively high compared to that used in other reports. However, in our experiments, application of CNO at 1, 10, and 100 μM induced Ca<sup>2+</sup> increases in GCaMP6-expressing astrocytes in spinal cord slices in a concentration-dependent manner (Figure S7). Among these, 100 μM CNO most effectively replicated the NA-induced Ca<sup>2+</sup> signals in astrocytes. Based on these findings, we selected this concentration for use in both the current and previous studies (Kohro et al., Nat Neurosci., 2020 (PMID: 33020652)). Importantly, to rule out non-specific effects, we conducted control experiments using spinal cord slices from mice that did not express hM3Dq in astrocytes and confirmed that CNO had no effect on Ca<sup>2+</sup> responses in astrocytes and sIPSCs in substantial gelatinosa (SG) neurons (Figure S7; lines 223-228). Thus, although the CNO concentration used is relatively high, the observed effects of CNO are not non-specific but result from the chemogenetic activation of hM3Dq-expressing astrocytes.

      In this study, we used Hes5-CreERT2 and Vgat-Cre mice to manipulate gene expression in Hes5<sup>+</sup> astrocytes and Vgat<sup>+</sup> neurons, respectively. In order to fully address the reviewer’s comment, the use of both Cre lines is necessary. However, simultaneous and independent genetic manipulation in each cell type using Cre activity alone is not feasible with the current genetic tools. We have mentioned this as a technical limitation in the Discussion section (lines 382-388).

      (5) The role of D-serine released by hM3Dq-stimulated astrocytes in (separately) modulating sub-types of neurons including excitatory neurons and Vgat positives needs more detailed discussion. If no effect of D-serine on Vgat neurons is observed, this should be explicitly stated, and the discussion should address why this might be the case.

      As mentioned in our response to Major Point (1) above, we have added a discussion of this point in the revised manuscript (lines 325-330).

      (6) Finally, the observed "dip" in astrocyte calcium signals below baseline following the large peaks with LC optostimulation should be discussed further, as understanding this phenomenon could provide valuable insights into astrocytic signaling dynamics in the context of single acute or repetitive chronic stress.

      Thank you for your comment. We found that this phenomenon was not affected by pretreatment with the α<sub>1A</sub>R-specific antagonist silodosin (Author response image 3), which effectively suppressed Ca<sup>2+</sup> elevations evoked by stimulation of LC-NA neurons (Figure 2F). This implies that the phenomenon is independent of α<sub>1A</sub>R signaling. Elucidating the detailed underlying mechanism remains an important direction for future investigation.

      Author response image 3.

      The observed "dip" in astrocyte Ca<sup>2+</sup> signals was not affected by pretreatment with the α<sub>1A</sub>R-specific antagonist silodosin. Representative traces of astrocytic GCaMP6m signals in response to optogenetic stimulation of LC-NAe<sup>→SDH</sup>rgic axons/terminals in a spinal cord slice. Each trace shows the GCaMP6m signal before and after optogenetic stimulation (625 nm, 1 mW, 10 Hz, 5 ms pulse duration, 10 s). Slices were pretreated with silodosin (40 nM) for 5 min prior to stimulation.

      Reviewer #3 (Public review):

      Summary:

      This is an exciting and timely study addressing the role of descending noradrenergic systems in nocifensive responses. While it is well-established that spinally released noradrenaline (aka norepinephrine) generally acts as an inhibitory factor in spinal sensory processing, this system is highly complex. Descending projections from the A6 (locus coeruleus, LC) and the A5 regions typically modulate spinal sensory processing and reduce pain behaviours, but certain subpopulations of LC neurons have been shown to mediate pronociceptive effects, such as those projecting to the prefrontal cortex (Hirshberg et al., PMID: 29027903).

      The study proposes that descending cerulean noradrenergic neurons potentiate touch sensation via alpha-1 adrenoceptors on Hes5+ spinal astrocytes, contributing to mechanical hyperalgesia. This finding is consistent with prior work from the same group (dd et al., PMID:). However, caution is needed when generalising about LC projections, as the locus coeruleus is functionally diverse, with differences in targets, neurotransmitter co-release, and behavioural effects. Specifying the subpopulations of LC neurons involved would significantly enhance the impact and interpretability of the findings.

      Strengths:

      The study employs state-of-the-art molecular, genetic, and neurophysiological methods, including precise CRISPR and optogenetic targeting, to investigate the role of Hes5+ astrocytes. This approach is elegant and highlights the often-overlooked contribution of astrocytes in spinal sensory gating. The data convincingly support the role of Hes5+ astrocytes as regulators of touch sensation, coordinated by brain-derived noradrenaline in the spinal dorsal horn, opening new avenues for research into pain and touch modulation.

      Furthermore, the data support a model in which superficial dorsal horn (SDH) Hes5+ astrocytes act as non-neuronal gating cells for brain-derived noradrenergic (NA) signalling through their interaction with substantia gelatinosa inhibitory interneurons. Locally released adenosine from NA-stimulated Hes5+ astrocytes, following acute restraint stress, may suppress the function of SDH-Vgat+ inhibitory interneurons, resulting in mechanical pain hypersensitivity. However, the spatially restricted neuron-astrocyte communication underlying this mechanism requires further investigation in future studies.

      Weaknesses

      (1) Specificity of the LC Pathway targeting

      The main concern lies with how definitively the LC pathway was targeted. Were other descending noradrenergic nuclei, such as A5 or A7, also labelled in the experiments? The authors must convincingly demonstrate that the observed effects are mediated exclusively by LC noradrenergic terminals to substantiate their claims (i.e. "we identified a circuit, the descending LC→SDH-NA neurons").

      (a) For instance, the direct vector injection into the LC likely results in unspecific effects due to the extreme heterogeneity of this nucleus and retrograde labelling of the A5 and A7 nuclei from the LC (i.e., Li et al., PMID: 26903420).

      We appreciate the reviewer's valuable comments. To address this point, we performed additional experiments and demonstrated that intra-SDH injection of AAVretro-Cre followed by intra-LC injection of AAV2/9-EF1α-FLEx[DTR-EGFP] specifically results in DTR expression in NA neurons of the LC, but not of the A5 or A7 regions (Figure S4; lines 127-128). These results confirm the specificity of targeting the LC<sup>→SDH</sup>-NAergic pathway in our study.

      (b) It is difficult to believe that the intersectional approach described in the study successfully targeted LC→SDH-NA neurons using AAVrg vectors. Previous studies (e.g., PMID: 34344259 or PMID: 36625030) demonstrated that similar strategies were ineffective for spinal-LC projections. The authors should provide detailed quantification of the efficiency of retrograde labelling and specificity of transgene expression in LC neurons projecting to the SDH.

      Thank you for your comment. As we described in our response to the weakness (5)-e) of Reviewer #1 (Public review), our additional analysis showed that, under our experimental conditions, expression of genes (for example DTR) was observed in 4.4% of NA (TH<sup>+</sup>) neurons in the LC (Figure S4; lines 126-127).

      The reasons for this difference between the previous studies and our current study is unclear; however, it is likely attributed to methodological differences, including the type of viral vectors employed, species differences (mouse (PMID: 34344259, our study) vs. rat (PMID: 36625030)), the amount of AAV injected into the SDH (300 nL at three sites (PMID: 34344259), and 300 nL at a single site (our study)) and LC (500 nL at a single site (PMID: 34344259), and 300 nL at a single site (our study)), as well as the depth of AAV injection in the SDH (200–300 µm from the dorsal surface of the spinal cord (PMID: 34344259), and 120–150 µm in depth from the surface of the dorsal root entry zone (our study)).

      (c) Furthermore, it is striking that the authors observed a comparably strong phenotypical change in Figure 1K despite fewer neurons being labelled, compared to Figure 1H and 1N with substantially more neurons being targeted. Interestingly, the effect in Figure 1K appears more pronounced but shorter-lasting than in the comparable experiment shown in Figure 1H. This discrepancy requires further explanation.

      Although only a representative section of the LC was shown in the initial version, LC<sup>→SDH</sup>-NA neurons are distributed rostrocaudally throughout the LC, as previously reported (Llorca-Torralba et al., Brain, 2022 (PMID: 34373893)). Our additional experiments analyzing multiple sections of the anterior and posterior regions of the LC have now revealed that approximately sixty LC<sup>→SDH</sup>-NA neurons express DTR, and these neurons are eliminated following DTX treatment (Figure S4; lines 126-128) (it should be noted that these neurons specifically project to the L4 segment of the SDH, and the total number of LC<sup>→SDH</sup>-NA neurons is likely much higher). Considering the specificity of LC<sup>→SDH</sup>-NAergic pathway targeting demonstrated in our study (as described above), together with the fact that primary afferent sensory fibers from the plantar skin of the hindpaw predominantly project to the L4 segment of the SDH, these data suggest that the observed behavioral changes are attributable to the loss of these neurons and that ablation of even a relatively small number of NA neurons in the LC can have a significant impact on behavior. We have added this hypothesis in the Discussion section (lines 373-382).

      Regarding the data in Figures 1H and 1K, as the reviewer pointed out, a statistically significant difference was observed at 90 min in mice with ablation of LC-NA neurons, but not in those with LC<sup>→SDH</sup>-NA neuron ablation. This is likely due to a slightly higher threshold in the control group at this time point (Figure 1K), and it remains unclear whether there is a mechanistic difference between the two groups at this specific time point.

      (d) A valuable addition would be staining for noradrenergic terminals in the spinal cord for the intersectional approach (Figure 1J), as done in Figures 1F/G. LC projections terminate preferentially in the SDH, whereas A5 projections terminate in the deep dorsal horn (DDH). Staining could clarify whether circuits beyond the LC are being ablated.

      As suggested, we performed DTR immunostaining in the SDH; however, we did not detect any DTR immunofluorescence there. A similar result was also observed in the spinal terminals of DTR-expressing primary afferent fibers (our unpublished data). The reason for this is unclear, but to the best of our knowledge, no studies have clearly shown DTR expression at presynaptic terminals, which may be because the action of DTX on the neuronal cell body is necessary for cell ablation. Nevertheless, as described in our response to the weakness (5)-f) by Reviewer 1 (Public review), we have now confirmed the specific expression of DTR in the LC, but not in the A5 and A7 regions (Figure S4; lines 127-128).

      (e) Furthermore, different LC neurons often mediate opposite physiological outcomes depending on their projection targets-for example, dorsal LC neurons projecting to the prefrontal cortex PFCx are pronociceptive, while ventral LC neurons projecting to the SC are antinociceptive (PMIDs: 29027903, 34344259, 36625030). Given this functional diversity, direct injection into the LC is likely to result in nonspecific effects.

      To avoid behavioral outcomes resulting from a mixture of facilitatory and inhibitory effects caused by activating the entire population of LC-NA neurons, we employed a specific manipulation targeting LC<sup>→SDH</sup>-NA neurons using AAV vectors. The specificity of this manipulation was confirmed in our previous study (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)) and in the current study (Figure S4). Using this approach, we previously demonstrated that LC neurons can exert pronociceptive effects via astrocytes in the SDH (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). This pronociceptive role is further supported by the current study, which uses a more selective manipulation of LC<sup>→SDH</sup>-NA neurons through a NET-Cre mouse line. In addition, intrathecal administration of relatively low doses of NA in naïve mice clearly induces mechanical pain hypersensitivity. Nevertheless, we have also acknowledged that several recent studies have reported an inhibitory role of LC<sup>→SDH</sup>-NA neurons in spinal nociceptive signaling. The reason for these differing behavioral outcomes remains unclear, but several methodological differences may underlie the discrepancy. First, the degree of LC<sup>→SDH</sup>-NA neuronal activity may play a role. Although direct comparisons between studies reporting pro- and anti-nociceptive effects are difficult, our previous studies demonstrated that intrathecal administration of high doses of NA in naïve mice does not induce mechanical pain hypersensitivity (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). Second, the sensory modality used in behavioral testing may be a contributing factor as the pronociceptive effect of NA appears to be selectively observed in responses to mechanical, but not thermal, stimuli (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)). This sensory modality-selective effect is also evident in mice subjected to acute restraint stress (Table S1). Therefore, the role of LC<sup>→SDH</sup>-NA neurons in modulating nociceptive signaling in the SDH is more complex than previously appreciated, and their contribution to pain regulation should be reconsidered in light of factors such as NA levels, sensory modality, and experimental context. In revising the manuscript, we have included some points described above in the Discussion (lines 282-291).

      Conclusion on Specificity: The authors are strongly encouraged to address these limitations directly, as they significantly affect the validity of the conclusions regarding the LC pathway. Providing more robust evidence, acknowledging experimental limitations, and incorporating complementary analyses would greatly strengthen the manuscript.

      We appreciate the reviewer’s comments. We fully acknowledge the limitations raised and agree that addressing them directly is important for the rigor of our conclusions on the LC pathway. To this end, we have performed additional experiments (e.g., Figure A and S4), which are now included in the revised manuscript. Furthermore, we have also newly added a new paragraph for experimental limitations in the end of Discussion section (lines 373-408). We believe these new data substantially strengthen the validity of our findings and have clarified these points in the Discussion section.

      (2) Discrepancies in Data

      (a) Figures 1B and 1E: The behavioural effect of stress on PWT (Figure 1E) persists for 120 minutes, whereas Ca2+ imaging changes (Figure 1B) are only observed in the first 20 minutes, with signal attenuation starting at 30 minutes. This discrepancy requires clarification, as it impacts the proposed mechanism.

      Thank you for your important comment. As pointed out by the reviewer, there is a difference between the duration of behavioral responses and Ca<sup>2+</sup> events, although the exact time point at which the PWT begins to decline remains undetermined (as behavioral testing cannot be conducted during stress exposure). A similar temporal difference was also observed following intraplantar injection of capsaicin (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)); while LC<sup>→SDH</sup>-NA neuron-mediated astrocytic Ca<sup>2+</sup> responses in SDH astrocytes last for 5–10 min after injection, behavioral hypersensitivity peaks around 60 min post-injection and gradually returns to baseline over the subsequent 60–120 min. These findings raise the possibility that astrocyte-mediated pain hypersensitivity in the SDH may involve a sustained alteration in spinal neural function, such as central sensitization. We have added this hypothesis to the Discussion section of the revised manuscript (lines 399-408), as it represents an important direction for future investigation.

      (b) Figure 4E: The effect is barely visible, and the tissue resembles "Swiss cheese," suggesting poor staining quality. This is insufficient for such an important conclusion. Improved staining and/or complementary staining (e.g., cFOS) are needed. Additionally, no clear difference is observed between Stress+Ab stim. and Stress+Ab stim.+CPT, raising doubts about the robustness of the data.

      As suggested, we performed c-FOS immunostaining and obtained clearer results (Figure 4E,F; lines 243-252). We also quantitatively analyzed the number of c-FOS<sup>+</sup> cells in the superficial laminae, and the results are consistent with those obtained from the pERK experiments.

      (c) Discrepancy with Existing Evidence: The claim regarding the pronociceptive effect of LC→SDH-NAergic signalling on mechanical hypersensitivity contrasts with findings by Kucharczyk et al. (PMID: 35245374), who reported no facilitation of spinal convergent (wide-dynamic range) neuron responses to tactile mechanical stimuli, but potent inhibition to noxious mechanical von Frey stimulation. This discrepancy suggests alternative mechanisms may be at play and raises the question of why noxious stimuli were not tested.

      In our experiments, ChrimsonR expression was observed in the superficial and deeper laminae of the spinal cord (Figure S6). Due to the technical limitations of the optical fibers used for optogenetics, the light stimulation could only reach the superficial laminae; therefore, it may not have affected the activity of neurons (including WDR neurons) located in the deeper laminae. Furthermore, the study by Kucharczyk et al. (Brain, 2022 (PMID: 35245374)) employed a stimulation protocol that differed from ours, applying continuous stimulation over several minutes. Given that the levels of NA released from LC<sup>→SDH</sup>-NAergic terminals in the SDH increase with the duration of terminal stimulation (as shown in Figure 2B), longer stimulation may result in higher levels of NA in the SDH. Considering also our data indicating that the pro- and anti-nociceptive effects of NA are dose dependent (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), these differences may be related to LC<sup>→SDH</sup>-NA neuron activity, NA levels in the SDH, and the differential responses of SDH neurons in the superficial versus deeper laminae (lines 388-395).

      (3) Sole reliance on Von Frey testing

      The exclusive use of von Frey as a behavioural readout for mechanical sensitisation is a significant limitation. This assay is highly variable, and without additional supporting measures, the conclusions lack robustness. Incorporating other behavioural measures, such as the adhesive tape removal test to evaluate tactile discomfort, the needle floor walk corridor to assess sensitivity to uneven or noxious surfaces, or the kinetic weight-bearing test to measure changes in limb loading during movement, could provide complementary insights. Physiological tests, such as the Randall-Selitto test for noxious pressure thresholds or CatWalk gait analysis to evaluate changes in weight distribution and gait dynamics, would further strengthen the findings and allow for a more comprehensive assessment of mechanical sensitisation.

      Thank you for your suggestion. Based on our previous findings that Hes5<sup>+</sup> astrocytes in the SDH selectively modulate mechanosensory signaling (Kohro et al., Nat Neurosci, 2020 (PMID: 33020652)), the present study focused on behavioral responses to mechanical stimuli using von Frey filaments. As we have not previously conducted most of the behavioral tests suggested by the reviewers, and as we currently lack the necessary equipments for these tests (e.g., Randall–Selitto test, CatWalk gait analysis, and weight-bearing test), we were unable to include them in this study. However, it will be of great interest in future research to investigate whether activation of the LC<sup>→SDH</sup>-NA neuron-to-SDH Hes5<sup>+</sup> astrocyte signaling pathway similarly sensitizes behavioral responses to other types of mechanical stimuli and also to investigate the sensory modality-selective pro- and antinociceptive role of LC<sup>→SDH</sup>-NAergic signaling in the SDH (lines 396-399).

      Overall Conclusion

      This study addresses an important and complex topic with innovative methods and compelling data. However, the conclusions rely on several assumptions that require more robust evidence. Specificity of the LC pathway, experimental discrepancies, and methodological limitations (e.g., sole reliance on von Frey) must be addressed to substantiate the claims. With these issues resolved, this work could significantly advance our understanding of astrocytic and noradrenergic contributions to pain modulation.

      We have made every effort to address the reviewer’s concerns through additional experiments and analyses. Based on the new control data presented, we believe that our explanation is reasonable and acceptable. Although additional data cannot be provided on some points due to methodological constraints and limitations of the techniques currently available in our laboratory, we respectfully submit that the evidence presented sufficiently supports our conclusions.

      Reviewer #3 (Recommendations for the authors):

      A lot of beautiful and challenging-to-collect data is presented. Sincere congratulations to all the authors on this achievement!

      Notwithstanding, please carefully reconsider the conclusions regarding the LC pathway, as additional evidence is required to ensure their specificity and robustness.

      We thank the reviewer for the kind comments and for raising an important point regarding the LC pathway. The reviewer’s feedback prompted us to conduct additional investigations to further strengthen the validity of our conclusions. We have incorporated these new data and analyses into the revised manuscript, and we believe that these revisions substantially enhance the robustness and reliability of our findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this study, Lamberti et al. investigate how translation initiation and elongation are coordinated at the single-mRNA level in mammalian cells. The authors aim to uncover whether and how cells dynamically adjust initiation rates in response to elongation dynamics, with the overarching goal of understanding how translational homeostasis is maintained. To this end, the study combines single-molecule live-cell imaging using the SunTag system with a kinetic modeling framework grounded in the Totally Asymmetric Simple Exclusion Process (TASEP). By applying this approach to custom reporter constructs with different coding sequences, and under perturbations of the initiation/elongation factor eIF5A, the authors infer initiation and elongation rates from individual mRNAs and examine how these rates covary.

      The central finding is that initiation and elongation rates are strongly correlated across a range of coding sequences, resulting in consistently low ribosome density ({less than or equal to}12% of the coding sequence occupied). This coupling is preserved under partial pharmacological inhibition of eIF5A, which slows elongation but is matched by a proportional decrease in initiation, thereby maintaining ribosome density. However, a complete genetic knockout of eIF5A disrupts this coordination, leading to reduced ribosome density, potentially due to changes in ribosome stalling resolution or degradation.

      Strengths:

      A key strength of this work is its methodological innovation. The authors develop and validate a TASEP-based Hidden Markov Model (HMM) to infer translation kinetics at single-mRNA resolution. This approach provides a substantial advance over previous population-level or averaged models and enables dynamic reconstruction of ribosome behavior from experimental traces. The model is carefully benchmarked against simulated data and appropriately applied. The experimental design is also strong. The authors construct matched SunTag reporters differing only in codon composition in a defined region of the coding sequence, allowing them to isolate the effects of elongation-related features while controlling for other regulatory elements. The use of both pharmacological and genetic perturbations of eIF5A adds robustness and depth to the biological conclusions. The results are compelling: across all constructs and conditions, ribosome density remains low, and initiation and elongation appear tightly coordinated, suggesting an intrinsic feedback mechanism in translational regulation. These findings challenge the classical view of translation initiation as the sole rate-limiting step and provide new insights into how cells may dynamically maintain translation efficiency and avoid ribosome collisions.

      We thank the reviewer for their constructive assessment of our work, and for recognizing the methodological innovation and experimental rigor of our study.

      Weaknesses:

      A limitation of the study is its reliance on exogenous reporter mRNAs in HeLa cells, which may not fully capture the complexity of endogenous translation regulation. While the authors acknowledge this, it remains unclear how generalizable the observed coupling is to native mRNAs or in different cellular contexts.

      We agree that the use of exogenous reporters is a limitation inherent to the SunTag system, for which there is currently no simple alternative for single-mRNA translation imaging. However, we believe our findings are likely generalizable for several reasons.

      As discussed in our introduction and discussion, there is growing mechanistic evidence in the literature for coupling between elongation (ribosome collisions) and initiation via pathways such as the GIGYF2-4EHP axis (Amaya et al. 2018, Hickey et al. 2020, Juszkiewicz et al. 2020), which might operate on both exogenous and endogenous mRNAs.

      As already acknowledged in our limitations section, our exogenous reporters may not fully recapitulate certain aspects of endogenous translation (e.g., ER-coupled collagen processing), yet the observed initiation-elongation coupling was robust across all tested constructs and conditions.

      We have now expanded the Discussion (L393-395) to cite complementary evidence from Dufourt et al. (2021), who used a CRISPR-based approach in Drosophila embryos to measure translation of endogenous genes. We also added a reference to Choi et al. 2025, who uses a ER-specific SunTag reporter to visualize translation at the ER (L395-397).

      Additionally, the model assumes homogeneous elongation rates and does not explicitly account for ribosome pausing or collisions, which could affect inference accuracy, particularly in constructs designed to induce stalling. While the model is validated under low-density assumptions, more work may be needed to understand how deviations from these assumptions affect parameter estimates in real data.

      We agree with the reviewer that the assumption of homogeneous elongation rates is a simplification, and that our work represents a first step towards rigorous single-trace analysis of translation dynamics. We have explicitly tested the robustness of our model to violations of the low-density assumption through simulations (Figure 2 - figure supplement 2). These show that while parameter inference remains accurate at low ribosome densities, accuracy slightly deteriorates at higher densities, as expected. In fact, our experimental data do provide evidence for heterogeneous elongation: the waiting times between termination events deviate significantly from an exponential distribution (Figure 3 - figure supplement 2C), indicating the presence of ribosome stalling and/or bursting, consistent with the reviewer's concern. We acknowledge in the Limitations section (L402-406) that extending the model to explicitly capture transcript-dependent elongation rates and ribosome interactions remains challenging. The TASEP is difficult to solve analytically under these conditions, but we note that simulation-based inference approaches, such as particle filters to replace HMMs, could provide a path forward for future work to capture this complexity at the single-trace level.

      Furthermore, although the study observes translation "bursting" behavior, this is not explicitly modeled. Given the growing recognition of translational bursting as a regulatory feature, incorporating or quantifying this behavior more rigorously could strengthen the work's impact.

      While we do not explicitly model the bursting dynamics in the HMM framework, we have quantified bursting behavior directly from the data. Specifically, we measure the duration of translated (ON) and untranslated (OFF) periods across all reporters and conditions (Figure 1G for control conditions and Figure 4G-H for perturbed conditions), finding that active translation typically lasts 10-15 minutes interspersed with shorter silent periods of 5-10 minutes. This empirical characterization demonstrates that bursting is a consistent feature of translation across our experimental conditions. The average duration of silent periods is similar to what was inferred by Livingston et al. 2023 for a similar SunTag reporter; while the average duration of active periods is substantially shorter (~15 min instead of ~40 min), which is consistent with the shorter trace duration in our system compared to theirs (~15 min compared to ~80 min, on average). Incorporating an explicit two-state or multi-state bursting model into the TASEP-HMM framework would indeed be computationally intensive and represents an important direction for future work, as it would enable inference of switching rates alongside initiation and elongation parameters. We have added this point to the Discussion (L415-417).

      Assessment of Goals and Conclusions:

      The authors successfully achieve their stated aims: they quantify translation initiation and elongation at the single-mRNA level and show that these processes are dynamically coupled to maintain low ribosome density. The modeling framework is well suited to this task, and the conclusions are supported by multiple lines of evidence, including inferred kinetic parameters, independent ribosome counts, and consistent behavior under perturbation.

      Impact and Utility:

      This work makes a significant conceptual and technical contribution to the field of translation biology. The modeling framework developed here opens the door to more detailed and quantitative studies of ribosome dynamics on single mRNAs and could be adapted to other imaging systems or perturbations. The discovery of initiation-elongation coupling as a general feature of translation in mammalian cells will likely influence how researchers think about translational regulation under homeostatic and stress conditions.

      The data, models, and tools developed in this study will be of broad utility to the community, particularly for researchers studying translation dynamics, ribosome behavior, or the effects of codon usage and mRNA structure on protein synthesis.

      Context and Interpretation:

      This study contributes to a growing body of evidence that translation is not merely controlled at initiation but involves feedback between elongation and initiation. It supports the emerging view that ribosome collisions, stalling, and quality control pathways play active roles in regulating initiation rates in cis. The findings are consistent with recent studies in yeast and metazoans showing translation initiation repression following stalling events. However, the mechanistic details of this feedback remain incompletely understood and merit further investigation, particularly in physiological or stress contexts. 

      In summary, this is a thoughtfully executed and timely study that provides valuable insights into the dynamic regulation of translation and introduces a modeling framework with broad applicability. It will be of interest to a wide audience in molecular biology, systems biology, and quantitative imaging.

      We appreciate the reviewer's thorough and positive assessment of our work, and that they recognize both the technical innovation of our modeling framework and its potential broad utility to the translation biology community. We agree that further mechanistic investigation of initiation-elongation feedback under various physiological contexts represents an important direction for future research.

      Reviewer #2 (Public review):

      Summary:

      This manuscript uses single-molecule run-off experiments and TASEP/HMM models to estimate biophysical parameters, i.e., ribosomal initiation and elongation rates. Combining inferred initiation and elongation rates, the authors quantify ribosomal density. TASEP modeling was used to simulate the mechanistic dynamics of ribosomal translation, and the HMM is used to link ribosomal dynamics to microscope intensity measurements. The authors' main conclusions and findings are:

      (1) Ribosomal elongation rates and initiation rates are strongly coordinated.

      (2) Elongation rates were estimated between 1-4.5 aa/sec. Initiation rates were estimated between 0.5-2.5 events/min. These values agree with previously reported values.

      (3) Ribosomal density was determined below 12% for all constructs and conditions.

      (4) eIF5A-perturbations (KO and GC7 inhibition) resulted in non-significant changes in translational bursting and ribosome density.

      (5) eIF5A perturbations resulted in increases in elongation and decreases in initiation rates.

      Strengths:

      This manuscript presents an interesting scientific hypothesis to study ribosome initiation and elongation concurrently. This topic is highly relevant for the field. The manuscript presents a novel quantitative methodology to estimate ribosomal initiation rates from Harringtonine run-off assays. This is relevant because run-off assays have been used to estimate, exclusively, elongation rates.

      We thank the reviewer for their careful evaluation of our work and for recognizing the novelty of our quantitative methodology to extract both initiation and elongation rates from harringtonine run-off assays, extending beyond the traditional use of these experiments.

      Weaknesses:

      The conclusion of the strong coordination between initiation and elongation rates is interesting, but some results are unexpected, and further experimental validation is needed to ensure this coordination is valid. 

      We agree that some of our findings need further experimental investigation in future studies. However, we believe that the coordination between initiation and elongation is supported by multiple results in our current work: (1) the strong correlation observed across all reporters and conditions (Figure 3E), and (2) the consistent maintenance of low ribosome density despite varying elongation rates. While additional experimental validation would be valuable, we note that directly manipulating initiation or elongation independently in mammalian cells remains technically challenging. Nevertheless, our findings are consistent with emerging mechanistic understanding of collision-sensing pathways (GIGYF2-4EHP) that could mediate such coupling, as discussed in our manuscript.

      (1) eIF5a perturbations resulted in a non-significant effect on the fraction of translating mRNA, translation duration, and bursting periods. Given the central role of eIF5a, I would have expected a different outcome. I would recommend that the authors expand the discussion and review more literature to justify these findings.

      We appreciate this comment. This finding is indeed discussed in detail in our manuscript (Discussion, paragraphs 6-7). As we note there, while eIF5A plays a critical role in elongation, the maintenance of bursting dynamics and ribosome density upon perturbation can be explained by compensatory feedback mechanisms. Specifically, the coordinated decrease in initiation rates that counterbalances slower elongation to maintain homeostatic ribosome density. We also discuss several factors that complicate interpretation: (1) potential RQC-mediated degradation masking stronger effects in proline-rich constructs, (2) differences between GC7 treatment and genetic knockout suggesting altered stalling resolution kinetics, and (3) the limitations of using exogenous reporters that lack ER-coupled processing, which may be critical for eIF5A function in endogenous collagen translation (as suggested by Rossi et al., 2014; Mandal et al., 2016; Barba-Aliaga et al., 2021). The mechanistic complexity and tissue-specific nature of eIF5A function in mammals, which differs substantially from the better-characterized yeast system, likely contributes to the nuanced phenotype we observe. We believe our Discussion adequately addresses these points.

      (2) The AAG construct leading to slow elongation is very surprising. It is the opposite of the field consensus, where codon-optimized gene sequences are expected to elongate faster. More information about each construct should be provided. I would recommend more bioinformatic analysis on this, for example, calculating CAI for all constructs, or predicting the structures of the proteins.

      We agree that the slow elongation of the AAG construct is counterintuitive and indeed surprising. Following the reviewer's suggestion, we have now calculated the Codon Adaptation Index (CAI) for all constructs (Renilla 0.89, Col1a1 0.78, Col1a1 mutated 0.74). It is therefore unlikely that codon bias explains the slow translation, particularly since we designed the mutated Col1a1 construct with alanine codons selected to respect human codon usage bias, thereby minimizing changes in codon optimality. As we discuss in the manuscript, we hypothesize that the proline-to-alanine substitutions disrupted co-translational folding of the collagen-derived sequence. Prolines are critical for collagen triple-helix formation (Shoulders and Raines, 2009), and their replacement with alanines likely generates misfolded intermediates that cause ribosome stalling (Barba-Aliaga et al., 2021; Komar et al., 2024). This interpretation is supported by the high frequency (>30%) of incomplete run-off traces for AAG, suggesting persistent stalling events. Our findings thus illustrate an important potential caveat: "optimizing" a sequence based solely on codon usage can be detrimental when it disrupts functionally important structural features or co-translational folding pathways.

      This highlights that elongation rates depend not only on codon optimality but also on the interplay between nascent chain properties and ribosome progression.

      (3) The authors should consider using their methodology to study the effects of modifying the 5'UTR, resulting in changes in initiation rate and bursting, such as previously shown in reference Livingston et al., 2023. This may be outside of the scope of this project, but the authors could add this as a future direction and discuss if this may corroborate their conclusions. 

      We thank the reviewer for this excellent suggestion. We agree that applying our methodology to 5'-UTR variants would provide a complementary test of initiation-elongation coupling, and we have now added this as a future direction in the Discussion (L417-420).

      (4) The mathematical model and parameter inference routines are central to the conclusions of this manuscript. In order to support reproducibility, the computational code should be made available and well-documented, with a requirements file indicating the dependencies and their versions. 

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Reviewer #3 (Public review):

      Disclaimer:

      My expertise is in live single-molecule imaging of RNA and transcription, as well as associated data analysis and modeling. While this aligns well with the technical aspects of the manuscript, my background in translation is more limited, and I am not best positioned to assess the novelty of the biological conclusions.

      Summary:

      This study combines live-cell imaging of nascent proteins on single mRNAs with time-series analysis to investigate the kinetics of mRNA translation.

      The authors (i) used a calibration method for estimating absolute ribosome counts, and (ii) developed a new Bayesian approach to infer ribosome counts over time from run-off experiments, enabling estimation of elongation rates and ribosome density across conditions.

      They report (i) translational bursting at the single-mRNA level, (ii) low ribosome density (~10% occupancy

      {plus minus} a few percents), (iii) that ribosome density is minimally affected by perturbations of elongation (using a drug and/or different coding sequences in the reporter), suggesting a homeostatic mechanism potentially involving a feedback of elongation onto initiation, although (iv) this coupling breaks down upon knockout of elongation factor eIF5A.

      Strengths:

      (1) The manuscript is well written, and the conclusions are, in general, appropriately cautious (besides the few improvements I suggest below).

      (2) The time-series inference method is interesting and promising for broader applications. 

      (3) Simulations provide convincing support for the modeling (though some improvements are possible). 

      (4) The reported homeostatic effect on ribosome density is surprising and carefully validated with multiple perturbations.

      (5) Imaging quality and corrections (e.g., flat-fielding, laser power measurements) are robust.

      (6) Mathematical modeling is clearly described and precise; a few clarifications could improve it further.

      We thank the reviewer for recognizing the novelty of the approach and its rigour, and for providing suggestions to improve it further.

      Weaknesses:

      (1) The absolute quantification of ribosome numbers (via the measurement of $i_{MP}$ ) should be improved.This only affects the finding that ribosome density is low, not that it appears to be under homeostatic control. However, if $i_{MP}$ turns out to be substantially overestimated (hence ribosome density underestimated), then "ribosomes queuing up to the initiation site and physically blocking initiation" could become a relevant hypothesis. In my detailed recommendations to the authors, I list points that need clarification in their quantifications and suggest an independent validation experiment (measuring the intensity of an object with a known number of GFP molecules, e.g., MS2-GFP MS2-GFP-labeled RNAs, or individual GEMs).

      We agree with the reviewer that the estimation of the number of ribosomes is central to our finding that translation happens at low density on our reporters. This result derives from our measurement of the intensity of one mature protein (i<sub>MP</sub>), that we have achieved by using a SunTag reporter with a RH1 domain in the C terminus of the mature protein, allowing us to stabilise mature proteins via actin-tethering. In addition, as suggested by the reviewer, we already validated this result with an independent estimate of the mature protein intensity (Figure 5 - figure supplement 2B), which was obtained by adding the mature protein intensity directly as a free parameter of the HMM. The inferred value of mature protein intensity for each construct (10-15 a.u) was remarkably close to the experimental calibration result (14 ± 2 a.u.). Therefore, we have confidence that our absolute quantification of ribosome numbers is accurate.

      (2) The proposed initiation-elongation coupling is plausible, but alternative explanations, such as changes in abortive elongation frequency, should be considered more carefully. The authors mention this possibility, but should test or rule it out quantitatively. 

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new perturbation experiments is beyond the scope of the present work.

      (3) The observation of translational bursting is presented as novel, but similar findings were reported by Livingston et al. (2023) using a similar SunTag-MS2 system. This prior work should be acknowledged, and the added value of the current approach clarified.

      We did cite Livingston et al. (2023) in several places, but we recognized that we could add a few citations in key places, to make clear that the observation of bursting is not novel but is in agreement with previous results. We now did so in the Results and Discussion sections.

      (4) It is unclear what the single-mRNA nature of the inference method is bringing since it is only used here to report _average_ ribosome elongation rate and density (averaged across mRNAs and across time during the run-off experiments - although the method, in principle, has the power to resolve these two aspects).

      While decoding individual traces, our model infers shared (population-level) rates. Inferring transcript-specific parameters would be more informative, but it is highly challenging due to the uncertainty on the initial ribosome distribution on single transcripts. Pooling multiple transcripts together allows us to use some assumptions on the initial distribution and infer average elongation and initiation-rate parameters, while revealing substantial mRNA-to-mRNA variability in the posterior decoding (e.g. Figure 3 - figure Supplement 2C). Indeed, the inference still informs on the single-trace run-off time distribution (Figure 3 A) and the waiting time between termination events (Figure 3 - figure supplement 2C), suggesting the presence of stalling and bursting. In addition, the transcript-to-transcript heterogeneity is likely accounted for by our model better than previous methods (linear fit of the average run-off intensity), as suggested by their comparison (Figure 3 - figure supplement 2 A). In the future the model could be refined by introducing transcript-specific parameters, possibly in a hierarchical way, alongside shared parameters.

      (5) I did not find any statement about data availability. The data should be made available. Their absence limits the ability to fully assess and reproduce the findings.

      We have added the Github link in the manuscript (https://github.com/naef-lab/suntag-analysis) and have also deposited the data (.ome.tif) on Zenodo (https://zenodo.org/records/17669332).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Major Comments:

      (1) Lack of Explicit Bursting Model

      Although translation "bursts" are observed, the current framework does not explicitly model initiation as a stochastic ON/OFF process. This limits insight into regulatory mechanisms controlling burst frequency or duration. The authors should either incorporate a two-state/more-state (bursting) model of initiation or perform statistical analysis (e.g., dwell-time distributions) to quantify bursting dynamics. They should clarify how bursting influences the interpretation of initiation rate estimates.

      We agree with the reviewer that an explicit bursting model (e.g., a two-state telegraph model) would be the ideal theoretical framework. However, integrating such a model into the TASEP-HMM inference framework is computationally intensive and complex. As a robust first step, we have opted to quantify bursting empirically based on the decoded single-mRNA traces. As shown in Figure 1G (control) and Figure 4G (perturbed conditions), we explicitly measured the duration of "ON" (translated) and "OFF" (untranslated) periods. This statistical analysis provides a quantitative description of the bursting dynamics without relying on the specific assumptions of a telegraph model. We have clarified this in the text (L123-125) and, as suggested, added a discussion (L415-417) on the potential extensions of the model to include explicit switching kinetics in the Outlook section.

      (2) Assumption of Uniform Elongation Rates

      The model assumes homogeneous elongation across coding sequences, which may not hold for stalling-prone inserts (e.g., PPG). This simplification could bias inference, particularly in cases of sequence-specific pausing. Adding simulations or sensitivity analysis to assess how non-uniform elongation affects the accuracy of inferred parameters. The authors should explicitly discuss how ribosome stalling, collisions, or heterogeneity might skew model outputs (see point 4).

      A strong stalling sequence that affects all ribosomes equally should not deteriorate the inference of the initiation rate, provided that the low-density assumption holds. The scenario where stalling events lead to higher density, and thus increased ribosome-ribosome interactions, is comparable to the conditions explored in Figure 2E. In those simulations, we tested the inference on data generated with varying initiation and elongation rates, resulting in ribosome densities ranging from low to high. We demonstrated that the inference remains robust at low ribosome densities (<10%). At higher densities, the accuracy of the initiation rate estimate decreases, whereas the elongation rate estimate remains comparatively robust. Additionally, the model tends to overestimate ribosome density under high-density conditions, likely because it neglects ribosome interference at the initiation site (Figure 2 figure supplement 2C). We agree that a deeper investigation into the consequences of stochastic stalling and bursting would be beneficial, and we have explicitly acknowledged this in the Limitations section.

      (3) Interpretation of eIF5A Knockout Phenotype

      The observation that eIF5A KO reduces initiation more than elongation, leading to decreased ribosome density, is biologically intriguing. However, the explanation invoking altered RQC kinetics is speculative and not directly tested. The authors should consider validating the RQC hypothesis by monitoring reporter mRNA stability, ribosome collision markers, or translation termination intermediates.

      We thank the reviewer for the comment, but we consider that ruling out alternative explanations through new experiments is beyond the scope of the present work.

      (4) To strengthen the manuscript, the authors should incorporate insights from three studies.

      - Livingston et al. (PMC10330622) found that translation occurs in bursts, influenced by mRNA features and initiation factors, supporting the coupling of initiation and elongation.

      - Madern et al. (PMID: 39892379) demonstrated that ribosome cooperativity enhances translational efficiency, highlighting coordinated ribosome behavior.

      - Dufourt et al. (PMID: 33927056) observed that high initiation rates correlate with high elongation rates, suggesting a conserved mechanism across cell cultures and organisms.

      Integrating these studies could enrich the manuscript's interpretation and stimulate new avenues of thought.

      We thank the reviewer for the valuable comment. We added citations of Livingston et al. in the context of translational bursting. We already cited Madern et al. in multiple places and, although its observations of ribosome cooperativity are very compelling, they cannot be linked with our observations of a feedback between initiation and elongation, and it would be very challenging to see a similar effect on our reporters. This is why we did not expressly discuss cooperativity. We also integrated Dufourt et al. in the Discussion about the possibility of designing genetically-encoded reporter. We also added a sentence about the possibility of using an ER-specific SunTag reporter, as done recently in Choi et al., Nature (2025) (https://doi.org/10.1038/s41586-025-09718-0).

      Minor Comments:

      (1) Use consistent naming for SunTag reporters (e.g., "PPG" vs "proline-rich") throughout.

      Thank you for the comment. However, the term proline-rich always appears together with PPG, so we believe that the naming is clear and consistent.

      (2) Consider a schematic overview of the experimental design and modeling pipeline for accessibility.

      Thank you for the suggestion. We consider that experimental design and modeling is now sufficiently clearly described and does not justify an additional scheme. 

      (3) Clarify how incomplete run-off traces are handled in the HMM inference.

      Incomplete run-off traces are treated identically to complete traces in our HMM inference. This is possible because our model relies on the probability of transitions occurring per time step to infer rates. It does not require observing the final "empty" state to estimate the kinetic parameters ɑ and λ. The loss of signal (e.g., mRNA moving out of the focal volume or photobleaching) does not invalidate the kinetic information contained in the portion of the trace that was observed. We have clarified this in the Methods section.

      Reviewer #2 (Recommendations for the authors):

      (1) Reproducibility:

      (1.1) The authors should use a GitHub repository with a timestamp for the release version.

      The code is available on GitHub (https://github.com/naef-lab/suntag-analysis).

      (1.2) Make raw images and data available in a figure repository like Figshare.

      The raw images (.ome.tif) are now available on Zenodo (https://zenodo.org/records/17669332).

      (2) Paper reorganization and expansion of the intensity and ribosome quantification:

      (2.1) Given the relevance of the initiation and elongation rates for the conclusions of this study, and the fact that the authors inferred these rates from the spot intensities. I recommend that the authors move Figure 1 Supplement 2 to the main text and expand the description of the process to relate spot intensity and number of ribosomes. Please also expand the figure caption for this image.

      We agree with the importance of this validation. We have expanded the description of the calibration experiment in the main text and in the figure caption.

      (2.2) I suggest the authors explicitly mention the use of HMM in the abstract.

      We have now explicitly mentioned the TASEP-based HMM in the abstract.

      (2.3) In line 492, please add the frame rate used to acquire the images for the run-off assays.

      We have added the specific frame rate (one frame every 20 seconds) to the relevant section.

      (3) Figures and captions:

      (3.1) Figure 1, Supplement 2. Please add a description of the colors used in plots B, C. 

      We have expanded the caption and added the color description.

      (3.2) In the Figure 2 caption. It is not clear what the authors mean by "traceseLife". Please ensure it is not a typo.

      Thank you for spotting this. We have corrected the typo.

      (3.3) Figure 1 A, in the cartoon N(alpha)->N-1, shouldn't the transition also depend on lambda?

      The transition probability was explicitly derived in the “Bayesian modeling of run-off traces” section (Eqs. 17-18), and does not depend on λ, but only on the initiation rate under the low-density assumption.

      (3.4) Figure 3, Supplement 2. "presence of bursting and stalling.." has a typo.

      Corrected.

      (3.5) Figure 5, panel C, the y-axis label should be "run-off time (min)."

      Corrected.

      (3.6) For most figures, add significance bars.

      (3.7) In the figure captions, please add the total number of cells used for each condition.

      We have systematically indicated the number of traces (n<sub>t</sub>) and the number of independent experiments (n<sub>e</sub>) in the captions in this format (n<sub>t</sub>, n<sub>e</sub>).

      (4) Mathematical Methods:

      We greatly thank the reviewer for their detailed attention to the mathematical notation. We have addressed all points below.

      (4.1) In lines 555, Materials and Methods, subsection, Quantification of Intensity Traces, multiple equations are not numbered. For example, after Equation (4), no numbers are provided for the rest of the equations. Please keep consistency throughout the whole document.

      We have ensured that all equations are now consistently numbered throughout the document.

      (4.2) In line 588, the authors mention "$X$ is a standard normal random variable with mean $\mu$ and standard deviation $s_0$". Please ensure this is correct. A standard normal random variable has a 0 mean and std 1. 

      Thank you for the suggestion, we have corrected the text (L678).

      (4.3) Line 546, Equation 2. The authors use mu(x,y) to describe a 2d Gaussian function. But later in line 587, the authors reuse the same variable name in equation 5 to redefine the intensity as mu = b_0 + I.

      We have renamed the 2D Gaussian function to \mu_{2D}(x,y) in the spot tracking section

      (4.4) For the complete document, it could be beneficial to the reader if the authors expand the definition of the relationship between the signal "y" and the spot intensity "I". Please note how the paragraph in lines 582-587 does not properly introduce "y".

      We have added an explicit definition of y and its relationship to the underlying spot intensity I in the text to improve readability and clarity.

      (4.5) Please ensure consistency in variable names. For example, "I" is used in line 587 for the experimental spot intensity, then line 763 redefines I(t) as the total intensity obtained from the TASEP model; please use "I_sim(t)" for simulated intensities. Please note that reusing the variable "I" for different contexts makes it hard for the reader to follow the text. 

      We agree that this was confusing. We have implemented the suggestion and now distinguish simulated intensities using the notation I<sub>S</sub> .

      (4.6) Line 555 "The prior on the total intensity I is an "uninformative" prior" I ~ half_normal(1000). Please ensure it is not "I_0 ~ half_normal(1000)."? 

      We confirm that “I” is the correct variable representing the total intensity in this context; we do not use an “I<sub>0</sub>” variable here.

      (4.7) In lines 595, equation 6. Ensure that the equation is correct. Shouldn't it be: s_0^2 = ln ( 1 + (sigma_meas^2 / ⟨y⟩^2) )? Please ensure that this is correct and it is not affecting the calculated values given in lines 598.

      Thank you for catching this typo. We have corrected the equation in the manuscript. We confirm that the calculations performed in the code used the correct formula, so the reported values remain unchanged.

      (4.8) In line 597, "the mean intensity square ^2". Please ensure it is not "the square of the temporal mean intensity."

      We have corrected the text to "the square of the temporal mean intensity."

      (4.9) In lines 602-619, Bayesian modeling of run-off traces, please ensure to introduce the constant "\ell". Used to define the ribosomal footprint?

      We have added the explicit definition of 𝓁 as the ribosome footprint size (length of transcript occupied by one ribosome) in the "Bayesian modeling of run-off traces" section.

      (4.10) Line 687 has a minor typo "[...] ribosome distribution.. Then, [...]"

      We have corrected the punctuation.

      (4.11) In line 678, Equation 19 introduces the constant "L_S", Please ensure that it is defined in the text.

      We have added the explicit definition of L<sub>S</sub> (the length of the SunTag) to the text surrounding Equation 19.

      (4.12) In line 695, Equation 22, please consider using a subscript to differentiate the variance due to ribosome configuration. For example, instead of "sigma (...)^2" use something like "sigma_c ^2 (...)". Ensure that this change is correctly applied to Equation 24 and all other affected equations.

      Thank you, we have implemented the suggestions.

      (4.13) In line 696, please double-check equations 26 and 27. Specifically, the denominator ^2. Given the previous text, it is hard to follow the meaning of this variable. 

      We have revised the notation in Equations 26 and 27 to ensure the denominator is consistent with the definitions provided in the text.

      (4.14) In lines 726, the authors mention "[...], but for the purposes of this dissertation [...]", it should be "[...], but for the purposes of this study [...]"

      Thank you for spotting this. We have replaced "dissertation" with "study."

      (4.15) Equations 5, 28, 37, and the unnumbered equation between Equations 16 and 17 are similar, but in some, "y" does not explicitly depend on time. Please ensure this is correct. 

      We have verified these equations and believe they are correct.

      (4.16) Please review the complete document and ensure that variables and constants used in the equations are defined in the text. Please ensure that the same variable names are not reused for different concepts. To improve readability and flow in the text, please review the complete Materials and Methods sections and evaluate if the modeling section can be written more clearly and concisely. For example, Equation 28 is repeated in the text.

      We have performed a comprehensive review of the Materials and Methods section. To improve conciseness and flow, we have merged the subsection “Observation model and estimation of observation parameters” with the “Bayesian modeling of run-off traces” section. This allowed us to remove redundant definitions and repeated equations (such as the previous Equation 28). We have also checked that all variables and constants are defined upon first use and that variable names remain consistent throughout the manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) Data Presentation

      (1.1) In main Figures 1D and 4E, the traces appear to show frequent on-off-on transitions ("bursting"), but in supplementary figures (1-S1A and 4-S1A), this behavior is seen in only ~8 of 54 traces. Are the main figure examples truly representative?

      We acknowledge the reviewer's point. In Figure 1D, we selected some of the longest and most illustrative traces to highlight the bursting dynamics. We agree that the term "representative" might be misleading if interpreted as "average." We have updated the text to state "we show bursting traces" to more accurately reflect the selection.

      (1.2) There are 8 videos, but I could not identify which is which.

      Thank you for pointing this out. We have renamed the video files to clearly correspond to the figures and conditions they represent.

      (2) Data Availability:

      As noted above, the data should be shared. This is in accordance with eLife's policy: "Authors must make all original data used to support the claims of the paper, or that are required to reproduce them, available in the manuscript text, tables, figures or supplementary materials, or at a trusted digital repository (the latter is recommended). [...] eLife considers works to be published when they are posted as preprints, and expects preprints we review to meet the standards outlined here." Access to the time traces would have been helpful for reviewers.

      We have now added the Github link for the code (https://github.com/naef-lab/suntag-analysis) and deposited the raw data (.ome.tif files) on Zenodo (10.5281/zenodo.17669332).

      (3) Model Assumptions:

      (3.1) The broad range of run-off times (Figure 3A) suggests stalling, which may be incompatible with the 'low-density' assumption used on the TASEP model, which essentially assumes that ribosomes do not bump into each other. This could impact the validity of the assumptions that ribosomes behave independently, elongate at constant speed (necessary for the continuum-limit approximation), and that the rate-limiting step is the initiation. How robust are the inferences to this assumption?

      We agree that the deviation of waiting times from an exponential distribution (Figure 3 - figure supplement 2C) suggests the presence of stalling, which challenges the strict low-density assumption and constant elongation speed. We explicitly explored the robustness of our model to higher ribosome densities in simulations. As shown in Figure 2 - figure supplement 2, while the model accuracy for single parameters deteriorates at very high densities (overestimating density due to neglected interference), it remains robust for estimating global rates in the regime relevant to our data. We have expanded the discussion on the limitations of the low density and homogeneous elongation rate assumptions in the text (L404-408).

      (3.2) Since all constructs share the same SunTag region, elongation rates should be identical there and diverge only in the variable region. This would affect $\gamma (t)$ and hence possibly affect the results. A brief discussion would be helpful.

      This is a valid point. Currently, our model infers a single average elongation rate that effectively averages the behavior over the SunTag and the variable CDS regions. Modeling distinct rates for these regions would be a valuable extension but adds significant complexity. While our current "effective rate" approach might underestimate the magnitude of differences between reporters, it captures the global kinetic trend. We have added a brief discussion acknowledging this simplification (L408-412).

      (3.3) A similar point applies to the Gillespie simulations: modeling the SunTag region with a shared elongation rate would be more accurate.

      We agree. Simulating distinct rates for the SunTag and CDS would increase realism, though our current homogeneous simulations serve primarily to benchmark the inference framework itself. We have noted this as a potential future improvement (L413-414).

      (3.4) Equation (13) assumes that switching between bursting and non-bursting states is much slower than the elongation time. First, this should be made explicit. Second, this is not quite true (~5 min elongation time on Figure 3-s2A vs ~5-15min switching times on Figure 1). It would be useful to show the intensity distribution at t=0 and compare it to the expected mixture distribution (i.e., a Poisson distribution + some extra 'N=0' cells). 

      We thank the reviewer for this insightful comment. We have added a sentence to the text explicitly stating the assumption that switching dynamics are slower than the translation time. While the timescales are indeed closer than ideal (5 min vs. 5-15 min), this assumption allows for a tractable approximation of the initial conditions for the run-off inference. Comparing the intensity distribution at t=0 to a zero-inflated Poisson distribution is an excellent suggestion for validation, which we will consider for future iterations of the model.

      (4) Microscopy Quantifications:

      (4.1) Figure 1-S2A shows variable scFv-GFP expression across cells. Were cells selected for uniform expression in the analysis? Or is the SunTag assumed saturated? which would then need to be demonstrated. 

      All cell lines used are monoclonal, and cells were selected via FACS for consistent average cytoplasmic GFP signal. We assume the SunTag is saturated based on the established characterization of the system by Tanenbaum et al. (2014), where the high affinity of the scFv-GFP ensures saturation at expression levels similar to ours.

      (4.2) As translation proceeds, free scFv-GFP may become limiting due to the accumulation of mature SunTag-containing proteins. This would be difficult to detect (since mature proteins stay in the cytoplasm) and could affect intensity measurements (newly synthesized SunTag proteins getting dimmer over time).

      This effect can occur with very long induction times. To mitigate this, we optimized the Doxycycline (Dox) incubation time for our harringtonine experiments to prevent excessive accumulation of mature protein. We also monitor the cytoplasmic background for granularity, which would indicate aggregation or accumulation.

      (4.3) The statements "for some traces, the mRNA signal was lost before the run-off completion" (line 195) and "we observed relatively consistent fractions of translated transcripts and trace duration distributions across reporters" (line 340) should be supported by a supplementary figure.

      The first statement is supported by Figure 2 - figure supplement 1, which shows representative run-off traces for all constructs, including incomplete ones.

      The second statement regarding consistency is supported by the quantitative data in Figure 1E and G, which summarize the fraction of translated transcripts and trace durations across conditions.

      (4.4) Measurements of single mature protein intensity $i_{MP}$:

      (4.4.1) Since puromycin is used to disassemble elongating ribosomes, calibration may be biased by incomplete translation products (likely a substantial fraction, since the Dox induction is only 20min and RNAs need several minutes to be transcribed, exported, and then fully translated).

      As mentioned in the “Live-cell imaging” paragraph, the imaging takes place 40 min after the end of Dox incubation. This provides ample time for mRNA export and full translation of the synthesized proteins. Consequently, the fraction of incomplete products generated by the final puromycin addition is negligible compared to the pool of fully synthesized mature proteins accumulated during the preceding hour.

      (4.4.2) Line 519: "The intensity of each spot is averaged over the 100 frames". Do I understand correctly that you are looking at immobile proteins? What immobilizes these proteins? Are these small aggregates? It would be surprising that these aggregates have really only 1, 2, or 3 proteins, as suggested by Figure 1-S2A.

      We are visualizing mature proteins that are specifically tethered to the actin cytoskeleton. This is achieved using a reporter where the RH1 domain is fused directly to the C-terminus of the Renilla protein (SunTag-Renilla-RH1). The RH1 domain recruits the endogenous Myosin Va motor, which anchors the protein to actin filaments, rendering it immobile. Since each Myosin Va motor interacts with one RH1 domain (and thus one mature protein), the resulting spots represent individual immobilized proteins rather than aggregates. We have now revised the text and Methods section to make this calibration strategy and the construct design clearer (L130-140).

      (4.4.3) Estimating the average intensity $i_{MP}$ of single proteins all resides in the seeing discrete modes in the histogram of Figure 1-S2B, which is not very convincing. A complementary experiment, measuring *on the same microscope* the intensity of an object with a known number of GFP molecules (e.g., MS2-GFP labeled RNAs, or individual GEMs https://doi.org/10.1016/j.cell.2018.05.042 (only requiring a single transfection)) would be reassuring to convince the reader that we are not off by an order of magnitude.

      While a complementary calibration experiment would be valuable, we believe our current estimate is robust because it is independently validated by our model. When we inferred i<sub>MP</sub> as a free parameter in the HMM (Figure 5 - figure supplement 2B), the resulting value (10-15 a.u.) was remarkably consistent with our experimental calibration (14 ± 2 a.u.). We have clarified this independent validation in the text to strengthen the confidence in our quantification (L264-272).

      (4.4.4) Further on the histogram in Figure 1-S2B:

      - The gap between the first two modes is unexpectedly sharp. Can you double-check? It means that we have a completely empty bin between two of the most populated bins.

      We have double-checked the data; the plot is correct, though the sharp gap is likely due to the small sample size (n=29).

      - I am surprised not to see 3 modes or more, given that panel A shows three levels of intensity (the three colors of the arrows).

      As noted below, brighter foci exist but fall outside the displayed range of the histogram.

      - It is unclear what the statistical test is and what it is supposed to demonstrate.

      The Student's t-test compares the means of the two identified populations to confirm they are statistically distinct intensity groups.

      - I count n = 29, not 31. (The sample is small enough that the bars of the histogram show clear discrete heights, proportional to 1, 2, 3, 4, and 5 --adding up all the counts, I get 29). Is there a mistake somewhere? Or are some points falling outside of the displayed x-range?

      You are correct. Two brighter data points fell outside the displayed range. The total number of foci in the histogram is 29. We have corrected the figure caption and the text accordingly.

      (5) Miscellaneous Points: 

      (5.1) Panel B in Figure 2-s1 appears to be missing.

      The figure contains only one panel.

      (5.2) In Equation (7), $l$ is not defined (presumably ribosome footprint length?). Instead, $J$ is defined right after eq (7), as if it were used in this equation.

      Thank you for pointing this out, we have corrected it.

      (5.3) Line 703, did you mean to write something else than "Equation 26" (since equation 26 is defined after)?

      Yes, this was a typo. We have corrected the cross-reference.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Drosophila larval type II neuroblasts generate diverse types of neurons by sequentially expressing different temporal identity genes during development. Previous studies have shown that the transition from early temporal identity genes (such as Chinmo and Imp) to late temporal identity genes (such as Syp and Broad) depends on the activation of the expression of EcR by Seven-up (Svp) and progression through the G1/S transition of the cell cycle. In this study, Chaya and Syed examined whether the expression of Syp and EcR is regulated by cell cycle and cytokinesis by knocking down CDK1 or Pav, respectively, throughout development or at specific developmental stages. They find that knocking down CDK1 or Pav either in all type II neuroblasts throughout development or in single-type neuroblast clones after larval hatching consistently leads to failure to activate late temporal identity genes Syp and EcR. To determine whether the failure of the activation of Syp and EcR is due to impaired Svp expression, they also examined Svp expression using a Svp-lacZ reporter line. They find that Svp is expressed normally in CDK1 RNAi neuroblasts. Further, knocking down CDK1 or Pav after Svp activation still leads to loss of Syp and EcR expression. Finally, they also extended their analysis to type I neuroblasts. They find that knocking down CDK1 or Pav, either at 0 hours or at 42 hours after larval hatching, also results in loss of Syp and EcR expression in type I neuroblasts. Based on these findings, the authors conclude that cycle and cytokinesis are required for the transition from early to late temporal identity genes in both types of neuroblasts. These findings add mechanistic details to our understanding of the temporal patterning of Drosophila larval neuroblasts.

      Strengths:

      The data presented in the paper are solid and largely support their conclusion. Images are of high quality. The manuscript is well-written and clear.

      We appreciate the reviewer’s detailed summary and recognition of the study’s strengths.

      Weaknesses:

      The quantifications of the expression of temporal identity genes and the interpretation of some of the data could be more rigorous.

      (1) Expression of temporal identity genes may not be just positive or negative. Therefore, it would be more rigorous to quantify the expression of Imp, Syp, and EcR based on the staining intensity rather than simply counting the number of neuroblasts that are positive for these genes, which can be very subjective. Or the authors should define clearly what qualifies as "positive" (e.g., a staining intensity at least 2x background).

      We thank the reviewer for this helpful suggestion. In the new version, we have now clarified how positive expression was defined and added more details of our quantification strategy to the Methods section (page 11, lines 380-388; lines 426-434 in tracked changes file). Fluorescence intensity for each neuroblast was normalized to the mean intensity of neighboring wild-type neuroblasts imaged in the same field. A neuroblast was considered positive for a given factor when its normalized nuclear intensity was at least 2× the local background. This scoring criterion was applied uniformly across all genotypes and time points. All quantifications were performed on the raw LSM files in Fiji prior to assembling the figure panels.

      (2) The finding that inhibiting cytokinesis without affecting nuclear divisions by knocking down Pav leads to the loss of expression of Syp and EcR does not support their conclusion that nuclear division is also essential for the early-late gene expression switch in type II NSCs (at the bottom of the left column on page 5). No experiments were done to specifically block the nuclear division in this study specifically. This conclusion should be revised.

      We blocked both cell cycle progression and cytokinesis, and both these manipulations affected temporal gene transitions, suggesting that both cell cycle and cytokinesis are essential. To our knowledge, no mechanism/tool exists that selectively blocks nuclear division while leaving cell cycle progression intact. We have added more clarification on page 4, line 123 onwards (lines 126 onwards in tracked changes file).

      (3) Knocking down CDK1 in single random neuroblast clones does not make the CDK1 knockdown neuroblast develop in the same environment (except still in the same brain) as wild-type neuroblast lineages. It does not help address the concern whether "type 2 NSCS with cell cycle arrest failed to undergo normal temporal progression is indirectly due to a lack of feedback signaling from their progeny", as discussed (from the bottom of the right column on page 9 to the top of the left column on page 10). The CDK1 knockdown neuroblasts do not divide to produce progeny and thus do not receive a feedback signal from their progeny as wild-type neuroblasts do. Therefore, it cannot be ruled out that the loss of Syp and EcR expression in CDK1 knockdown neuroblasts is due to the lack of the feedback signal from their progeny. This part of the discussion needs to be clarification.

      Thanks to the reviewer for raising this critical point. We agree and have added more clarification of our interpretations and limitations to our studies in the revised text on page 8, line 278-282 (lines 296-300 in tracked changes file)

      (4) In Figure 2I, there is a clear EcR staining signal in the clone, which contradicts the quantification data in Figure 2J that EcR is absent in Pav RNAi neuroblasts. The authors should verify that the image and quantification data are consistent and correct.

      When cytokinesis is blocked using pav-RNAi, the neuroblasts become extremely large and multinucleated. In some large pav RNAi clones, we observed a weak EcR signal near the cell membrane. However, more importantly, none of the nuclear compartments showed detectable EcR staining, where EcR is typically localized. We selected a representative nuclear image for the figure panel. To clarify this observation, we have now added an explanatory note to the discussion section on page 8, lines 283-291 (lines 301-309 in tracked changes file).

      Reviewer #2 (Public review):

      Summary:

      Neural stem cells produce a wide variety of neurons during development. The regulatory mechanisms of neural diversity are based on the spatial and temporal patterning of neural stem cells. Although the molecular basis of spatial patterning is well-understood, the temporal patterning mechanism remains unclear. In this manuscript, the authors focused on the roles of cell cycle progression and cytokinesis in temporal patterning and found that both are involved in this process.

      Strengths:

      They conducted RNAi-mediated disruption on cell cycle progression and cytokinesis. As they expected, both disruptions affected temporal patterning in NSCs.

      We appreciate the reviewer’s positive assessment of our experimental results.

      Weaknesses:

      Although the authors showed clear results, they needed to provide additional data to support their conclusion sufficiently.

      For example, they need to identify type II NSCs using molecular markers (Ase/Dpn).The authors are encouraged to provide a more detailed explanation of each experiment. The current version of the manuscript is difficult for non-expert readers to understand.

      Thanks for your feedback. We have now included a detailed description of how we identify type II NSCs in both wild-type and mutant clones. We have also added a representative Asense staining to clearly distinguish type 1 (Ase<sup>+</sup>) from type 2 (Ase<sup>-</sup>) NSCs see Figure S1. We have also added a resources table explaining the genotypes associated with each figure, which was omitted due to an error in the previous version of the manuscript. 

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Chaya and Syed focuses on understanding the link between cell cycle and temporal patterning in central brain type II neural stem cells (NSCs). To investigate this, the authors perturb the progression of the cell cycle by delaying the entry into M phase and preventing cytokinesis. Their results convincingly show that temporal factor expression requires progression of the cell cycle in both Type 1 and Type 2 NSCs in the Drosophila central brain. Overall, this study establishes an important link between the two timing mechanisms of neurogenesis.

      Strengths:

      The authors provide solid experimental evidence for the coupling of cell cycle and temporal factor progression in Type 2 NSCs. The quantified phenotype shows an all-ornone effect of cell cycle block on the emergence of subsequent temporal factors in the NSCs, strongly suggesting that both nuclear division and cytokinesis are required for temporal progression. The authors also extend this phenotype to Type 1 NSCs in the central brain, providing a generalizable characterization of the relationship between cell cycle and temporal patterning.

      We thank the reviewer for recognizing the robustness of our data linking the cell cycle to temporal progression.

      Weaknesses:

      One major weakness of the study is that the authors do not explore the mechanistic relationship between the cell cycle and temporal factor expression. Although their results are quite convincing, they do not provide an explanation as to why Cdk1 depletion affects Syp and EcR expression but not the onset of svp. This result suggests that at least a part of the temporal cascade in NSCs is cell-cycle independent, which isn't addressed or sufficiently discussed.

      Thank you for bringing up this important point. We are equally interested in uncovering the mechanism by which the cell cycle regulates temporal gene transitions; however, such mechanistic exploration is beyond the scope of the present study. Interestingly, while the temporal switching factor Svp is expressed independently of the cell cycle, the subsequent temporal transitions are not. We have expanded our discussion on this intriguing finding (page 9, line 307-315; lines 345-355 in tracked changes file). Specifically, we propose that svp activation marks a cell-cycle–independent phase, whereas EcR/Syp induction likely depends on cell-cycle–coupled mechanisms, such as mitosis-dependent chromatin remodeling or daughter-cell feedback. Although further dissection of this mechanism lies beyond the current study, our findings establish a foundation for future work aimed at identifying how developmental timekeeping is molecularly coupled to cell-cycle progression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Figure 1 C and D, it would be better to put a question mark to indicate that these are hypotheses to be tested. 

      We appreciate this suggestion and have added question marks in Figure 1C and 1D to clearly indicate that these panels represent hypotheses under investigation clearly.

      (2) Figure 2A-I, Figure 4A-I, Figure 5A-I and K-S, in addition to enlarged views of single type II neuroblasts, it would be more convincing to include zoomed-out images of the entire larval brain or at least a portion of the brain to include neighboring wild-type type II neuroblasts as internal controls. Also, it would be ideal to show EcR staining from the same neuroblasts as IMP and Syp staining. 

      We thank the reviewer for this valuable input. In our imaging setup, the number of available antibody channels was limited to four (anti-Ase, anti-GFP, anti-Syp, and antiImp). Adding EcR in the same sample was therefore not technically possible, we performed EcR staining separately. 

      (3) The authors cited "Syed et al., 2024" (in the middle of the right column on page 5), but this reference is missing in the "References" section and should be added. 

      The missing citation has been added to the reference section.  

      (4) It would be better to include Ase staining in the relevant figure to indicate neuroblast identity as type I or type II. 

      We agree and now include representative Ase staining for both type 1 and type 2 NSC clones in Figure S1, along with corresponding text updates that describe these markers.

      Reviewer #2 (Recommendations for the authors): 

      Major comments 

      (1) The present conclusion relies on the results using Cdk1 RNAi and pav RNAi. It is still possible that Cdk1 and Pav are involved in the regulation of temporal patterning independent of the regulation of cell cycle or cytokinesis, respectively. To avoid this possibility, the authors need to inhibit cell cycle progression or cytokinesis in another alternative manner. 

      We thank the reviewer for raising this important point. While we cannot completely exclude gene-specific, cell-cycle-independent roles for Cdk1 or Pav, we observe consistent phenotypes across several independent manipulations that slow or block the cell cycle. Also, earlier studies using orthogonal approaches that delay G1/S (Dacapo/Rbf) or impair mitochondrial OxPhos (which lengthens G1/S; van den Ameele & Brand, 2019) produce similar temporal delays. These concordant phenotypes strongly support the interpretation that altered cell-cycle progression—rather than specific roles of a single gene—is the primary cause of the defect. While we cannot exclude additional, gene-specific effects of Cdk1 or Pav, the concordant phenotypes across independent perturbations make the cell-cycle disruption model the most parsimonious interpretation. We have clarified this reasoning in the discussion section on pages 8-9, lines 293-305 (lines 311-343 in tracked changes file).

      (2) To reach the present conclusion, the authors need to address the effects of acceleration of cell cycle progression or cytokinesis on temporal patterning. 

      We thank the reviewer for this insightful suggestion. To our knowledge, there are currently no established genetic tools that can specifically accelerate cell-cycle progression in Drosophila neuroblasts. However, our results demonstrate that blocking the cell cycle impairs the transition from early to late temporal gene expression. These findings suggest that proper cell-cycle progression is essential for the transition from early to late temporal identity in neuroblasts.

      Minor comments 

      (3) P3L2 (right), ... we blocked the NSC cell cycle...

      How did they do it? 

      Which fly lines were used?

      Why did they use the line? 

      These details are now included in the Materials and Methods and the Resource Table (pages 11-13). We used Wor-Gal4, Ase-Gal80 to drive UAS-Cdk1RNAi and UASpavRNAi in type 2 NSCs 

      (4) P5L1(left), ... we used the flip-out approach...

      Why did they conduct it? 

      Probably, the authors have reasons other than "to further ensure." 

      We have clarified in the text on page 4, lines 137-139, that the flip-out approach was used to generate random single-cell clones, enabling quantitative analysis of type 2 NSCs within an otherwise wild-type brain. 

      (5) P5L8(left), ... type 2 hits were confirmed by lack of the type 1 Asense...  The authors must examine Deadpan (Dpn) expression as well. Because there are a lot of Asense (Ase) negative cells in the brain (neurons, glial cell, and neuroepithelial cells). 

      Type II NSCs can be identified as Dpn+/Ase- cells.

      We agree that Dpn is a helpful marker. However, we reliably distinguished type II NSCs by their lack of Ase and larger cell size relative to surrounding neurons and glia, which are smaller in size and located deeper within the clone. These differences, together with established lineage patterns, allow unambiguous identification of type 2 NSCs across all genotypes. We have now added representative type I and type 2 NSC clones to the supplemental figure S1 (E-G’) with Asense stains to demonstrate how we differentiate type I from type II NSCs. 

      (6) P5L32(left), To do this, we induced... 

      This sentence should be made more concise.

      Please rephrase it. 

      The sentence has been rewritten for clarity and concision.

      (7)  P5L42(left), ...lack of EcR/Syp expression (Figure 2).  However, EcR expression is still present (Figure 2I). 

      In some large pavRNAi clones, a weak EcR signal can be observed near the cell membrane; however, none of the nuclear compartments—where EcR is typically localized—show detectable staining. We selected a representative nuclear image for the figure and addressed this observation on page 8, lines 283-291 (lines 301-309 in tracked changes file).

      (8) P7L29(left), ......had persistent Imp expression...

      Imp expression is faint compared to that in Figure 2G.

      The differences between Figures 2G and 3G should be discussed. 

      We thank the reviewer for this comment. We have added a note in the Methods section clarifying that brightness and contrast were adjusted per panel for optimal visualization; thus, apparent differences in signal intensity do not reflect biological variation. Fluorescence intensity for each neuroblast was normalized to the mean intensity of neighboring wild-type neuroblasts imaged in the same field. A neuroblast was considered Imp-positive when its normalized nuclear intensity was at least 2× the local background. This scoring criterion was applied uniformly across all genotypes and time points. All quantifications were performed on the raw LSM files in Fiji prior to assembling the figure panels.

      (9) P8 (Figure 5)

      The Imp expression is faint compared to that in Figure 5Q.

      The difference between Figure 5G and 5Q should be discussed further. 

      As mentioned above, we have clarified our image processing approach in the Methods section to explain any differences in signal appearance between these figures.

      (10) P10 Materials and Methods

      The authors did not mention the fly lines used. This is very important for the readers. 

      We thank the reviewer for bringing this oversight to our attention. The Resource Table was inadvertently omitted from the initial submission. The complete list of fly lines and reagents used in this study is now provided in the updated Resource Table.

      Reviewer #3 (Recommendations for the authors): 

      Major points 

      (1) The authors mention that the heat-shock induction at 42ALH is well after svp temporal window and therefore the cell cycle block independently affects Syp and EcR expression. However, Figure 3 shows svp-LacZ expression at 48ALH. If svp expression is indeed transient in Type 2 NSCs, then this must be validated using an immunostaining of the svp-LacZ line with svp antibody. This is crucial as the authors claim that cell cycle block doesn't affect does affect svp expression and is required independently. 

      We thank the reviewer for bringing this important issue to our attention. As noted, Svp protein is expressed transiently and stochastically in type 2 NSCs (Syed et al., 2017), making direct antibody quantification challenging upon cell cycle block. Consistent with previous work (Syed et al., 2017), we used the svp-LacZ reporter line to visualize stabilized Svp expression, which reliably captures Svp expression in type 2 NSCs (Syed et al., 2017 https://doi.org/10.7554/eLife.26287, and Dhilon et al., 2024 https://doi.org/10.1242/dev.202504).

      (2) The authors have successfully slowed down the cell cycle and showed that it affects temporal progression. However, a converse experiment where the cell cycle is sped up in NSCs would be an important test for the direct coupling of temporal factor expression and cell cycle, wherein the expectation would be the precocious expression of late temporal factors in faster cycle NSCs. 

      We agree that such an experiment would be ideal. However, as noted above (Reviewer #2 comment 2), to our knowledge, no suitable tools currently exist to accelerate neuroblast cell-cycle progression without pleiotropic effects.

      Minor point 

      The authors must include Ray and Li (https://doi.org/10.7554/eLife.75879) in the references when describing that "...cell cycle has been shown to influence temporal patterning in some systems,...".  

      We thank the reviewer for this helpful suggestion. The cited reference (Ray and Li, eLife, 2022) has now been included and appropriately referenced in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors aim to investigate the potential improvements of ANNs when used to explain brain data using top-down feedback connections found in the neocortex. To do so, they use a retinotopic and tonotopic organization to model each subregion of the ventral visual (V1, V2, V4, and IT) and ventral auditory (A1, Belt, A4) regions using Convolutional Gated Recurrent Units. The top-down feedback connections are inspired by the apical tree of pyramidal neurons, modeled either with a multiplicative effect (change of gain of the activation function) or a composite effect (change of gain and threshold of the activation function).

      To assess the functional impact of the top-down connections, the authors compare three architectures: a brain-like architecture derived directly from brain data analysis, a reversed architecture where all feedforward connections become feedback connections and vice versa, and a random connectivity architecture. More specifically, in the brain-like model the visual regions provide feedforward input to all auditory areas, whereas auditory areas provide feedback to visual regions.

      First, the authors found that top-down feedback influences audiovisual processing and that the brain-like model exhibits a visual bias in multimodal visual and auditory tasks. Second, they discovered that in the brain-like model, the composite integration of top-down feedback, similar to that found in the neocortex, leads to an inductive bias toward visual stimuli, which is not observed in the feedforward-only model. Furthermore, the authors found that the brain-like model learns to utilize relevant stimuli more quickly while ignoring distractors. Finally, by analyzing the activations of all hidden layers (brain regions), they found that the feedforward and feedback connectivity of a region could determine its functional specializations during the given tasks.

      Strengths:

      The study introduces a novel methodology for designing connectivity between regions in deep learning models. The authors also employ several tasks based on audiovisual stimuli to support their conclusions. Additionally, the model utilizes backpropagation of error as a learning algorithm, making it applicable across a range of tasks, from various supervised learning scenarios to reinforcement learning agents. Conversely, the presented framework offers a valuable tool for studying top-down feedback connections in cortical models. Thus, it is a very nice study that also can give inspiration to other fields (machine learning) to start exploring new architectures.

      We thank the reviewer for their accurate summary of our work and their kind assessment of its strengths.

      Weaknesses:

      Although the study explores some novel ideas on how to study the feedback connections of the neocortex, the data presented here are not complete in order to propose a concrete theory of the role of top-down feedback inputs in such models of the brain.

      (1) The gap in the literature that the paper tries to fill in the ability of DL algorithms to predict behavior: "However, there are still significant gaps in most deep neural networks' ability to predict behavior, particularly when presented with ambiguous, challenging stimuli." and "[...] to accurately model the brain."

      It is unclear to me how the presented work addresses this gap, as the only facts provided are derived from a simple categorization task that could also be solved by the feedforward-only model (see Figures 4 and 5). In my opinion, this statement is somewhat far-fetched, and there is insufficient data throughout the manuscript to support this claim.

      We can see now that the way the introduction was initially written led to some confusion about our goal in this study. Our goal here was not to demonstrate that top-down feedback can enable superior matches to human behaviour. Rather, our goal was to determine if top-down feedback had any real implications for processing ambiguous stimuli. The sentence that the reviewer has highlighted was intended as an explanation for why top-down feedback, and its impact on ambiguous stimuli, might be something one would want to examine for deep neural networks. But, here, we simply wanted to (1) provide an overview of the code base we have created, (2) demonstrate that top-down feedback does impact the processing of ambiguous stimuli.

      We agree with the reviewer that if our goal was to improve our ability to predict behaviour, then there was a big gap in the evidence we provided here. But, this was not our goal, and we believe that the data we provide here does convincingly show that top-down feedback has an impact on processing of ambiguous stimuli. We have updated the text in the introduction to make our goals more clear for the reader and avoid this misunderstanding of what we were trying to accomplish here. Specifically, the end of the introduction is changed to:

      “To study the effect of top-down feedback on such tasks, we built a freely available code base for creating deep neural networks with an algorithmic approximation of top-down feedback. Specifically, top-down feedback was designed to modulate ongoing activity in recurrent, convolutional neural networks. We explored different architectural configurations of connectivity, including a configuration based on the human brain, where all visual areas send feedforward inputs to, and receive top-down feedback from, the auditory areas. The human brain-based model performed well on all audiovisual tasks, but displayed a unique and persistent visual bias compared to models with only driving connectivity and models with different hierarchies. This qualitatively matches the reported visual bias of humans engaged in audio-visual tasks. Our results confirm that distinct configurations of feedforward/feedback connectivity have an important functional impact on a model's behavior. Therefore, top-down feedback captures behaviors and perceptual preferences that do not manifest reliably in feedforward-only networks. Further experiments are needed to clarify whether top-down feedback helps an ANN fit better to neural data, but the results show that top-down feedback affects the processing of stimuli and is thus a relevant feature that should be considered for deep ANN models in computational neuroscience more broadly.”

      (2) It is not clear what the advantages are between the brain-like model and a feedforward-only model in terms of performance in solving the task. Given Figures 4 and 5, it is evident that the feedforward-only model reaches almost the same performance as the brain-like model (when the latter uses the modulatory feedback with the composite function) on almost all tasks tested. The speed of learning is nearly the same: for some tested tasks the brain-like model learns faster, while for others it learns slower. Thus, it is hard to attribute a functional implication to the feedback connections given the presented figures and therefore the strong claims in the Discussion should be rephrased or toned down.

      Again, we believe that there has been a misunderstanding regarding the goals of this study, as we are not trying to claim here that there are performance advantages conferred by top-down feedback in this case. Indeed, we share the reviewer’s assessment that the feedforward only model seems to be capable of solving this task well. To reiterate: our goal here was to demonstrate that top-down feedback alters the computations in the network and, thus, has distinct effects on behaviour that need to be considered by researchers who use deep networks to model the brain. But we make no claims of “superiority” of the brain-like model.

      In-line with this, we’re not completely sure which claims in the discussion the reviewer is referring to. We note that we were quite careful in our claims. For example, in the first section of the discussion we say:

      “Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature.”

      And later on:

      “In summary, our study shows that modulatory top-down feedback and the architectural diversity enabled by it can have important functional implications for computational models of the brain. We believe that future work examining brain function with deep neural networks should therefore consider incorporating top-down modulatory feedback into model architectures when appropriate.”

      If we have missed a claim in the discussion that implies superiority of the brain-like model in terms of task performance we would be happy to change it.

      (3) The Methods section lacks sufficient detail. There is no explanation provided for the choice of hyperparameters nor for the structure of the networks (number of trainable parameters, number of nodes per layer, etc). Clarifying the rationale behind these decisions would enhance understanding. Moreover, since the authors draw conclusions based on the performance of the networks on specific tasks, it is unclear whether the comparisons are fair, particularly concerning the number of trainable parameters. Furthermore, it is not clear if the visual bias observed in the brain-like model is an emerging property of the network or has been created because of the asymmetries in the visual vs. auditory pathway (size of the layer, number of layers, etc).

      We thank the reviewer for raising this issue, and want to provide some clarifications: First, the number of trainable parameters are roughly equal, since we were only switching the direction of connectivity (top-down versus bottom-up), not the number of connections. We confirmed the biggest difference in size is between models with composite and multiplicative feedback; models with composite feedback have roughly ~1K more parameters, and all models are within the 280K parameter range. We now state this in the methods.

      Second, because superior performance was not the goal of this study, as stated above, we conducted limited hyperparameter tuning. Given the reviewer’s comment, we wondered whether this may have impacted our results. Therefore, we explored different hyperparameters for the model during the multimodal auditory tasks, which show the clearest example of the visual dominance in the brainlike model (Figure 3).

      We explored different hidden state sizes, learning rates and processing times, and examined whether the core results were different. We found that extremely high learning rates (0.1) destabilize all models and that some models perform poorly under different processing times. But overall, the core results are evident across all hyperparameters where the models learn i.e the different behaviors of models with different connectivities and the visual dominance observed in the brainlike model. We now provide these results in a supplementary figure (Fig. S2, showing larger models trained with different learning rates, and Fig S3, which shows the effect of processing time on AS task performance).

      Reviewer #2 (Public review):

      Summary:

      This work addresses the question of whether artificial deep neural network models of the brain could be improved by incorporating top-down feedback, inspired by the architecture of the neocortex.

      In line with known biological features of cortical top-down feedback, the authors model such feedback connections with both, a typical driving effect and a purely modulatory effect on the activation of units in the network.

      To assess the functional impact of these top-down connections, they compare different architectures of feedforward and feedback connections in a model that mimics the ventral visual and auditory pathways in the cortex on an audiovisual integration task.

      Notably, one architecture is inspired by human anatomical data, where higher visual and auditory layers possess modulatory top-down connections to all lower-level layers of the same modality, and visual areas provide feedforward input to auditory layers, whereas auditory areas provide modulatory feedback to visual areas.

      First, the authors find that this brain-like architecture imparts the models with a light visual bias similar to what is seen in human data, which is the opposite in a reversed architecture, where auditory areas provide a feedforward drive to the visual areas.

      Second, they find that, in their model, modulatory feedback should be complemented by a driving component to enable effective audiovisual integration, similar to what is observed in neural data.

      Last, they find that the brain-like architecture with modulatory feedback learns a bit faster in some audiovisual switching tasks compared to a feedforward-only model.

      Overall, the study shows some possible functional implications when adding feedback connections in a deep artificial neural network that mimics some functional aspects of visual perception in humans.

      Strengths:

      The study contains innovative ideas, such as incorporating an anatomically inspired architecture into a deep ANN, and comparing its impact on a relevant task to alternative architectures.

      Moreover, the simplicity of the model allows it to draw conclusions on how features of the architecture and functional aspects of the top-down feedback affect the performance of the network.

      This could be a helpful resource for future studies of the impact of top-down connections in deep artificial neural network models of the neocortex.

      We thank the reviewer for their summary and their recognition of the innovative components and helpful resources therein.

      Weaknesses:

      Overall, the study appears to be a bit premature, as several parts need to be worked out more to support the claims of the paper and to increase its impact.

      First, the functional implication of modulatory feedback is not really clear. The "only feedforward" model (is a drive-only model meant?) attains the same performance as the composite model (with modulatory feedback) on virtually all tasks tested, it just takes a bit longer to learn for some tasks, but then is also faster at others. It even reproduces the visual bias on the audiovisual switching task. Therefore, the claims "Altogether, our results demonstrate that the distinction between feedforward and feedback inputs has clear computational implications, and that ANN models of the brain should therefore consider top-down feedback as an important biological feature." and "More broadly, our work supports the conclusion that both the cellular neurophysiology and structure of feed-back inputs have critical functional implications that need to be considered by computational models of brain function" are not sufficiently supported by the results of the study. Moreover, the latter points would require showing that this model describes neural data better, e.g., by comparing representations in the model with and without top-down feedback to recorded neural activity.

      To emphasize again our specific claims, we believe that our data shows that top-down feedback has functional implications for deep neural network behaviour, not increased performance or neural alignment. Indeed, our results demonstrate that top-down feedback alters the behaviour of the networks, as shown by the differences in responses to various combinations of ambiguous stimuli. We agree with the reviewer that if our goal was to claim either superior performance on these tasks, or better fit to neural data, we would need to actually provide data supporting that claim.

      Given the comments from the reviewer, we have tried to provide more clarity in the introduction and discussion regarding our claims. In particular, we now highlight that we are not trying to demonstrate that the models with top-down feedback exhibit superior performance or better fit to neural data.

      As one final note, yes, the reviewer understood correctly that the “only feedforward” model is a model with only driving inputs. We have renamed the feedforward-only models to drive only models and added additional emphasis in the text to ensure that the distinction is clear for all readers.

      Second, the analyses are not supported by supplementary material, hence it is difficult to evaluate parts of the claims. For example, it would be helpful to investigate the impact of the process time after which the output is taken for evaluation of the model. This is especially important because in recurrent and feedback models the convergence should be checked, and if the network does not converge, then it should be discussed why at which point in time the network is evaluated.

      This is an excellent point, and we thank the reviewer for raising it. We allowed the network to process the stimuli for seven time-steps, which was enough for information from any one region to be transmitted to any other. We found in some initial investigations that if we shortened the processing time some seeds would fail to solve the task. But, based on the reviewer’s comment, we have now also run additional tests with longer processing times for the auditory tasks where we see the clearest visual bias (Figure 3). We find that different process times do not change the behavioral biases observed in our models, but may introduce difficulties ignoring visual stimuli for some models. Thus, while process time is an important hyperparameter for optimal performance of the model, the central claim of the paper remains. We include this new data in a supplementary figure S3.

      Third, the descriptions of the models in the methods are hard to understand, i.e., parameters are not described and equations are explained by referring to multiple other studies. Since the implications of the results heavily rely on the model, a more detailed description of the model seems necessary.

      We agree with the reviewer that the methods could have been more thorough. Therefore, we have greatly expanded the methods section. We hope the model details are now more clear.

      Lastly, the discussion and testable predictions are not very well worked out and need more details. For example, the point "This represents another testable prediction flowing from our study, which could be studied in humans by examining the optical flow (Pines et al., 2023) between auditory and visual regions during an audiovisual task" needs to be made more precise to be useful as a prediction. What did the model predict in terms of "optic flow", how can modulatory from simple driving effect be distinguished, etc.

      We see that the original wording of this prediction was ambiguous, thank you for pointing this out. In the study highlighted (Pines et al., 2023) the authors use an analysis technique for measuring information flow between brain regions, which is related to analysis of optical flow in images, but applied to fMRI scans. This is confusing given the current study, though. Therefore, we have changed this sentence to make clear that we are speaking of information flow here. 

      Reviewer #3 (Public review):

      Summary:

      This study investigates the computational role of top-down feedback in artificial neural networks (ANNs), a feature that is prevalent in the brain but largely absent in standard ANN architectures. The authors construct hierarchical recurrent ANN models that incorporate key properties of top-down feedback in the neocortex. Using these models in an audiovisual integration task, they find that hierarchical structures introduce a mild visual bias, akin to that observed in human perception, not always compromising task performance.

      Strengths:

      The study investigates a relevant and current topic of considering top-down feedback in deep neural networks. In designing their brain-like model, they use neurophysiological data, such as externopyramidisation and hierarchical connectivity. Their brain-like model exhibits a visual bias that qualitatively matches human perception.

      We thank the reviewer for their summary and evaluation of our paper’s strengths.

      Weaknesses:

      While the model is brain-inspired, it has limited bioplausibility. The model assumes a simplified and fixed hierarchy. In the brain with additional neuromodulation, the hierarchy could be more flexible and more task-dependent.

      We agree, there are still many facets of top-down feedback that we have not captured here, and the modulation of hierarchy is an interesting example. We have added some consideration of this point to the limitations section of the discussion.

      While the brain-like model showed an advantage in ignoring distracting auditory inputs, it struggled when visual information had to be ignored. This suggests that its rigid bias toward visual processing could make it less adaptive in tasks requiring flexible multimodal integration. It hence does not necessarily constitute an improvement over existing ANNs. It is unclear, whether this aspect of the model also matches human data. In general, there is no direct comparison to human data. The study does not evaluate whether the top-down feedback architecture scales well to more complex problems or larger datasets. The model is not well enough specified in the methods and some definitions are missing.

      We agree with the reviewer that we have not demonstrated anything like superior performance (since the brain-like network is quite rigid, as noted) nor have we shown better match to human data with the brain-like network. This was not our intended claim. Rather, we demonstrated here simply that top-down feedback impacts behavior of the networks in response to ambiguous stimuli. We have now added statements to the introduction and discussion to make our specific claims (which are supported by our data, we believe) clear.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I believe that the work is very nice but not so mature at this stage. Below, you can find some comments that eventually could improve your manuscript.

      (1) Intro, last sentence: "Therefore, top-down feedback is a relevant feature that should be considered for deep ANN models in computational neuroscience more broadly." I don't understand what the authors refer to with this sentence. There are numerous models (deep ANNs) that have been used to model the neural activity and are much simpler than the one proposed here which contains very complex models and connectivity. Although I do agree that the top-down connections are very important there is no data to support their importance for modeling the brain.

      Respectfully, we disagree with the reviewer that we don’t provide data to demonstrate the importance of top-down feedback for modelling. Indeed, we provided a great deal of data to show that top-down feedback in the networks has real functional implications for behaviour, e.g., it can induce a human-like visual bias. Thus, top-down feedback is a factor that one should care about when modelling the brain. But, we agree with the reviewer that more demonstration of the utility of using top-down feedback for achieving better fits to neural data would be an important next step. 

      (2) I suggest adding some extra supplementary simulations where, for example, the number of data for visual and auditory pathways is equal in size (i.e., the same number of examples), the number of layers is identical (3 per pathway), and also the number of parameters. Doing this would help strengthen the claims presented in the paper.

      In fact, all of the hyperparameters the reviewer mentions here were identical for the different networks, so the experiments the reviewer is requesting here were already part of the paper. We now clarify this in the text.

      (3) Results: I suggest adding Tables with quantifications of the presented results. For example, best performance, epochs to converge, etc. As it is now, it is very hard to follow the evidence shown in Figures.

      This is a good suggestion, we have now added this table to the start of the supplemental figures.

      (4) Figure 2e, 3e: Although VS3, and AS3 have been used only for testing, the plot shows alignments with respect to training epochs. The authors should clarify in the Methods if they tested the network with all intermediate weights during VS1/VS2 or AS1/AS2 training.

      Testing scenarios in this context meant that the model was never shown the scenario/task during training, but the models were indeed evaluated on the VS3 and AS3 after each training epoch. We have added clarifications to the figure legends.

      (5) Methods: It would be beneficial to discuss how specific hyperparameters were selected based on prior research, empirical testing, or theoretical considerations. Also, it is not clear how the alignment (visual or audio) is calculated. Do the authors use the examples that have been classified correctly for both stimuli or do they exclude those from the analysis (maybe I have missed it).

      As noted above, because superior performance was not the goal of this study, we conducted limited hyperparameter tuning. But we have extended the results with additional hyperparameter tuning in a supplementary figure, and describe the hyperparameter choices more thoroughly in the methods. As well, all data includes all model responses, regardless of whether they were correct or not. We now clarify this in the methods.

      (6) Code: The code repository lacks straightforward examples demonstrating how to utilize the modeling approach. Given that it is referred to as a "framework", one would expect it to facilitate easy integration into various models and tasks. Including detailed instructions or clear examples would significantly improve usability and help users effectively apply the proposed methodology.

      We agree with the reviewer, this would be beneficial. We have revised the README of the codebase to explain the model and its usage more clearly and included an interactive jupyter notebook with example training on MNIST.

      Some minor comments are given below. Generally speaking, the Figures need to be more carefully checked for consistent labels, colors, etc.

      (1) Page 4, 1st paragraph - grammar correction: "a larger infragranular layer" or "larger infragranular layers"

      Thank you for catching this, we have fixed the text.

      (2) Page 4, 2nd para - rephrase: "In three additional control ANNs" → "In the third additional control ANN"

      In fact, we did mean three additional control ANNs, each one representing a different randomized connectivity profile. We now clarify this in the text and provide the connectivity of the two other random graphs in the supplemental figures.

      (3) Page 4, VAE acronym needs to be defined before its first use

      The variational autoencoder is introduced by its full name in the text now.

      (4) Page 4: Fig. 2c reference should be Fig. 2b, Fig. 2d should be Fig. 2c, Fig. 2b should be Fig. 2d, VS4; Fig. 2b, bottom should be VS4; Fig. 2f, Fig. 2f to Fig. 2g. Double check the Figure references in the text. Here is very confusing for the reader.

      We have now fixed this, thank you for catching it.

      (5) Page 5, 1st para: "Altogether, our results demonstrated both" → "Altogether, our results demonstrated that both"

      This has been updated.

      (6) Figure 2: In the e and g panels the x label is missing.

      This was actually because the x-axis were the same across the panels, but we see how this was unclear, so we have updated the figure.

      (7) Figure 3: There is no panel g (the title is missing); In panels b, c, e, and g the y label is missing, and in panels e and g the x label is missing. Also, the Feedforward model is shown in panel g but it is introduced later in the text. Please remove it from Figure 3. Also in legend: "AV Reverse graph" → "Reverse graph". Also, "Accuracy" and "Alignment" should be presented as percentages (as in Figure 2).

      This has been corrected.

      (8) Figure 4; x labels are missing.

      As with point (6), this was actually because the x-axis were the same across the panels, but we see how this was unclear, so we have updated the figure.

      (9) Page 7; I can’t find the cited Figure S1.

      Apologies, we have added the supplemental figure (now as S4). It shows the results of models with multiplicative feedback on the task in Fig 5 (as opposed to models with composite feedback shown in the main figure).

      Reviewer #2 (Recommendations for the authors):

      (1) Discussion Section 3.1 is only a literature review, and does not really add any value.

      Respectfully, we think it is important to relate our work to other computational work on the role of top-down feedback, and to make clear what our specific contribution is. But, we have updated the text to try to place additional emphasis on our study’s contribution, so that this section is more than just a literature review.

      “Our study adds to this previous work by incorporating modulatory top-down feedback into deep, convolutional, recurrent networks that can be matched to real brain anatomy. Importantly, using this framework we could demonstrate that the specific architecture of top-down feedback in a neural network has important computational implications, endowing networks with different inductive biases.”

      (2) Including ipython notebooks and some examples would be great to make it easier to use the code.

      We now provide a demo of how to use the code base in a jupyter notebook.

      (3) The description of the model is hard to comprehend. Please name and describe all parameters. Also, a figure would be great to understand the different model equations.

      We have added definitions of all model terms and parameters.

      (4) The terminology is not really clear to me. For example "The results further suggest that different configurations of top-down feedback make otherwise identically connected models functionally distinct from each other and from traditional feedforward only recurrent models." The feedforward and only recurrent seem to contradict each other. Would maybe driving and modulatory be a better term here? I also saw in the code that you differentiate between three types of inputs, modulatory, threshold offset and basal (like feedforward). How about you only classify connections based on these three type? I was also confused about the feedforward only model, because I was unsure whether it is still feedback connections but with "basal" quality, or whether feedback connections between modalities and higher-to-lower level layers were omitted altogether.

      We take the reviewer’s point here. To clarify this, we have updated the text to refer to “driving only” rather than “feedforward only”, to make it obvious that what we change in these models is simply whether the connection has any modulatory impact on the activity. 

      (5) "incorporating it into ANNs can affect their behavior and help determine the solutions that the network can discover." -> Do you mean constrain? Overall, I did not really get this point.

      Yes, we mean that it constrains the solutions that the network is likely to discover.

      (6) "ignore the auditory inputs when they visual inputs were unambiguous" -> the not they

      This has been fixed. Thank you for catching it.

      (7) xlabel in Figure 4 is missing.

      This has been fixed, thank you for catching it.

      Reviewer #3 (Recommendations for the authors):

      Major:

      (1) How alignment is computed is not defined. In addition to a proper definition in the methods section, it would be nice to briefly define it when it first appears in the results section.

      We’ve added an explicit definition of how alignment is calculated in the methods and emphasized the calculation when its first explained in the results

      (2) A connectivity matrix for the feedforward-only model is missing and could be added.

      We have added this to Figure 1.

      (3) The connectivity matrix for each random model should also be shown.

      We’ve shown each of the random model configurations in the new supplemental figure S1.

      (4) Initial parameters are not defined, such as W, b etc. A table with all model parameters would be great.

      We have added a table to the methods listing all of the parameters.

      (5) Would be nice to show the t-sne plots (not just the NH score) for each model and each task in the appendix.

      We can provide these figures on request. They massively increase the file size of the paper pdf, as there’s 49 of them for each task and each model, 980 in total. An example t-SNE plot is provided in figure 6.

      Minor:

      (1) Page 4:

      "we refer to this as Visual-dominant Stimulus case 1, or VS1; Fig. 1a, top)." This should be Fig. 2a.

      (2) "In stimulus condition VS1, all of the models were able to learn to use the auditory clues to disambiguate the images (Fig. 2c)."

      This should be Fig. 2b.

      (3) "In comparison, in VS2, we found that the brainlike model learned to ignore distracting audio inputs quickly and consistently compared to the random models, and a bit more rapidly than the auditory information (Fig 2d)."

      This should be Fig. 2c.

      (4) "VS3; Fig. 2b, top"

      This should be Fig. 2d

      (5) "while all other models had to learn to do so further along in training (Fig. 2e)."

      It is not stated explicitly, but this suggests that the image-aligned target was considered correct, and that weight updates were happening.

      (6) "VS4; Fig. 2b, bottom"

      This should be Fig. 2f

      (7) "adept at learning (Fig. 2f)."

      This should be Fig. 2g

      (8) Figure 3:b,c,e y-labels are missing

      3f: both x and y labels are missing

      (9) Figure labeling in the text is not consistent (Fig. 1A versus Fig. 2a)

      (10) Doubled "the" in ""This shows that the inductive bias towards vision in the brainlike model depended on the presence of the multiplicative component of the the feedback"

      (11) Page 9 Figure 6: The caption says b shows the latent spaces for the VS2 task, whereas the main text refers to 6b as showing the latent space for the AS2 task. Please correct which task it is.

      (12) Methods 4.1 page 13

      "which is derived from the feedback input (h_{l−1})"

      This should be h_{l+1}

      (13) r_l, u_l, u and c are not defined to which aspects of the model they refer to

      Even though this is based on a previous model, the methods section should completely describe the model.

      Equations 1,2,3: the notation [x;y] is unclear and should be defined.

      Equation 5: u should probably be u_l.

      (14) Page 14 typo: externopyrmidisation.

      (15) It is confusing to use different names for the same thing: the all-feedforward model, the all feedforward network, the feedforward network, and the feedforward-only model are probably all the same? Consistent naming would help here.

      Thank you for the detailed comments! We’ve fixed the minor errors and renamed the feedforward models to drive-only models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Comment 1: 5-HT2A Antibody Specificity

      Was this antibody validated to be 5-HT2A receptor-specific? Can the authors reason why the discrepancy may arise, and if the axonal expression is specific to the cultured neurons?

      We performed extensive validation of the anti-5-HT2A receptor antibody (Alomone #ASR-033), which is summarized in the accompanying Author response images:

      Positive findings (Author response image 1c-e, Author response image 2a): (1) Western blot showed a single band at the expected molecular weight (~50 kDa) in neural progenitors and iPSCderived neurons. (2) The blocking peptide (#BLP-SR033) abolished Western blot bands and markedly reduced immunofluorescence signals in neurons, confirming epitope-specific binding.

      Negative findings (Author response image 1a-b, Author response image 2a-b, Author response image 3): (1) We detected positive immunofluorescence signals in HEK293 and HeLa cells (Author response image 1a-b), which do not express 5-HT2AR. (2) Western blot also showed bands in HEK293 and HeLa cells (Author response image 2a-b). (3) Single-cell RNA-seq analysis of HEK293T cells confirmed complete absence of HTR2A expression (Author response image 3a). (4) qPCR showed no detectable HTR2A transcripts in iPSCs or HeLa cells (Ct > 36), while neural progenitors and neurons showed clear expression (Author response image 3b). (5) siRNA knockdown experiments failed to produce a corresponding decrease in immunofluorescence or Western blot signals, despite reduced HTR2A transcript levels (data not shown).

      BLAST analysis: Protein BLAST analysis of the 13-amino acid immunogenic peptide sequence identified the human 5-HT2A receptor as the top hit (9/13 amino acids overlap). However, shorter sequence similarities were also found with other proteins, including APPBP1 (6/9 amino acids), Immunoglobulin Heavy Chain (6/7 amino acids), and Interleukin31 receptor (6/8 amino acids). While these partial homologies do not provide a definitive mechanistic explanation for the observed off-target binding, they illustrate that the epitope sequence is not entirely unique to the 5-HT2A receptor.

      Conclusion: While our validation confirmed epitope-specific binding (blocking peptide effective in neurons), the antibody clearly detects something in cells that demonstrably lack HTR2A gene expression. This indicates off-target binding to other proteins sharing the epitope sequence. We have therefore removed all antibody-based 5-HT2A receptor experiments from the revised manuscript. This includes the receptor internalization data from Figure 1. The remaining findings (BDNF upregulation, gene expression changes, morphological effects, electrophysiology) are supported by independent methods including pharmacological blockade with ketanserin.

      Comment 2: Psilocin Dose Selection

      It would be helpful to specify the dose of psilocin tested, and describe how this dose was chosen.

      We used 10 µM psilocin based on: (1) The seminal study by Ly et al. (2018), which demonstrated neuroplasticity effects at this concentration in rat cortical neurons. (2) Our own dose-response experiments (Figure S2B) showing maximal BDNF increase at 10 µM compared to lower concentrations (10 nM, 100 nM, 1 µM). We have clarified this in the revised Methods section.

      Comment 3: Dose vs. Time Dependence

      Given that only one dose is tested, it is also possible that this reflects dose dependence, with the longer time exposure leading to higher dose exposure.

      We agree that dose dependence cannot be excluded with our current experimental design. This point is now moot as we have removed the 5-HT2A receptor internalization experiments from the manuscript. Future studies in our group will address dose-dependent effects on other readouts.

      Comment 4: Control Conditions

      What is the 'control' here? A more appropriate control would be 24 hours after vehicle application.

      The control condition is indeed a vehicle (DMSO) control collected at the same time point as the experimental condition (i.e., 24 hrs post-treatment). We have clarified this in the revised figure legends and Methods section to avoid confusion.

      Comment 5: Sample Size Description

      The sample size was not clearly described. Statistical analyses should consider that neurites from the same cells are not independent.

      We have expanded the sample size descriptions in the figure legends. Analyses were performed using 5-10 microscope images per condition, with 15 ROIs per image, across at least two independent differentiations from two genetic backgrounds. Regarding independence: each neurite segment exists within a distinct microenvironment and can be considered an independent measurement unit, consistent with established practices in the field (Paul et al., 2021, CNS Neurosci Ther). We acknowledge this increases statistical power and have noted this in the Methods.

      Reviewer #2:

      Comment 1: 5-HT2A Antibody Validation

      Without validation (using for example knockdown techniques to decrease expression of 5HT2A), the experiments using this antibody should be excluded from the manuscript.

      We agree with this assessment. As detailed in our response to Reviewer 1 (Comment 1) and documented in the Response to Reviewer Figure, our extensive validation attempts—including siRNA knockdown—could not conclusively demonstrate antibody specificity. We have removed all antibody-based 5-HT2A receptor experiments from the revised manuscript.

      Comment 2: Serotonin in Cell Media

      Did the authors evaluate whether 5-HT is present in the cell media?

      The cell culture media used in our experiments does not contain serotonin. We have explicitly stated this in the revised Methods section.

      Comment 3: Statistical Analysis of Figure S1F

      Some of the datasets are not statistically analyzed, such as Figure S1F.

      Figure S1F related to the 5-HT2A receptor experiments and has been removed from the revised manuscript along with the associated data.

      Comment 4: Translational Validity of Prolonged Exposure

      The authors continuously exposed cells to psilocin for hours or days. Since this is not the model of what occurs in vivo, the findings lack translational validity.

      We acknowledge this limitation. Most experiments (BDNF, gene expression, branching) were conducted 24–48 hrs after a brief 10-minute exposure, which better reflects the in vivo situation. Prolonged exposures (96 hrs) were used specifically for synaptogenesis experiments based on literature showing that repeated LSD administration enhances spine density (Inserra et al., 2022; De Gregorio et al., 2022). Our in vitro system lacks metabolizing enzymes and glial cells, which may introduce temporal biases. We have added a discussion of these limitations in the revised manuscript.

      Comment 5: Ketanserin Effect on BDNF

      In Figure 2E, ketanserin by itself seems to reduce BDNF density. How do the authors conclude that ketanserin blocks psi-induced effects?

      We identified that one cell line (Ctrl 1) with inherently higher BDNF density was inadvertently excluded from the ketanserin-only condition. After removing Ctrl 1 from all conditions and reanalyzing, the difference between Ctrl and Ket alone is no longer significant. The significant difference between Psi+Ket and Ket alone demonstrate that psilocin exerts effects that ketanserin can block, consistent with 5-HT2A receptor mediation. The revised figure and statistical analysis are included in the updated manuscript.

      Comment 6: mCherry Localization mCherry (Fig 4A) seems to be retained in the nucleus.

      The CamKII promoter drives expression of cytoplasmic mCherry, which fills the entire neuron including soma, dendrites, and axons. The apparent nuclear signal reflects mCherry accumulation in the soma, which surrounds the nucleus. The images clearly show mCherry extending into neurites, which was essential for our Sholl analysis of neuronal complexity.

      Comment 7: Reference 36

      Reference 36 is a review article that does not mention psilocin.

      Our statement refers broadly to serotonergic psychedelics increasing neurotrophic factors. Reference 36 (Colaço et al., 2020) examines ayahuasca, which contains the serotonergic psychedelic DMT. We have revised the text to clarify this point.

      Summary of Major Revisions

      (1) Removed all 5-HT2A receptor antibody-based experiments from Figure 1 and supplementary figures due to inconclusive specificity validation. An Author response image documenting our validation attempts is provided.

      (2) Clarified control conditions (vehicle controls at matched time points) in figure legends.

      (3) Expanded sample size descriptions in Methods and figure legends.

      (4) Re-analyzed ketanserin experiments with consistent cell line inclusion.

      (5) Added discussion of translational limitations.

      (6) Added new Figure S5 summarizing proposed signaling pathways.

      (7) Expanded discussion on the relevance of iPSC-derived neurons for drug development.

      Author response image 1.

      Immunostaining for 5-HT2A receptor across cell types and peptide-blocking control. (a) HEK293 cells display a positive immunofluorescent signal despite not endogenously expressing 5-HT2AR, indicating nonspecific antibody reactivity. (b) HeLa cells also exhibit a positive signal despite lacking endogenous 5-HT2AR expression, further demonstrating nonspecific antibody binding in non-expressing cell types. (c) Neural progenitor cells show clear positive 5-HT2AR staining. (d) iPSC-derived neurons exhibit robust and well-defined 5-HT2AR staining. (e) Application of the Alomone 5-HT2AR blocking peptide (#BLP-SR033) markedly reduces neuronal signal intensity, supporting epitope-specific binding.

      Author response image 2.

      Western blot analysis of 5-HT2A receptor abundance and peptide-blocking control. (a-b) In line with the immunofluorescence a single band is detected in iPSCs, HEK cells, neural progenitors, iPSC-derived neurons and (b) HeLa cells. (a) Preincubation of the primary antibody with the corresponding blocking peptide abolishes this band across all samples, consistent with specific binding of the antibody to its intended epitope.

      Author response image 3.

      Lack of detectable 5-HT2AR expression in HEK and HeLa cells. (a) Analysis of a human-only HEK293T single-cell RNA-seq dataset (10x Genomics; https://www.10xgenomics.com/datasets/293-t-cells-1-standard-1-1-0, accessed 2025-11-25) shows no meaningful HTR2A expression, whereas other genes such as GAPDH, TP53, MYC, and ACTB are robustly detected. Consistently, evaluation of a “Barnyard” dataset - an equal mixture of human HEK293T and mouse NIH3T3 cells (10x Genomics; https://www.10xgenomics.com/datasets/20-k-1-1mixture-of-human-hek-293-t-and-mouse-nih-3-t-3-cells-3-ht-v-3-1-3-1-high-6-1-0, accessed 2025-1125) reveals only ~4 of ~10,000 droplets with minimal HTR2A signal, confirming the absence of meaningful expression.(b) (b) qPCR analysis further demonstrates no detectable HTR2A transcripts in iPSCs or HeLa cells (Ct > 36), while neural progenitors and iPSC-derived cortical neurons show expression when normalized to housekeeping genes GAPDH and TBP.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and the reviewers for the detailed and constructive comments. In revising the manuscript we have: (i) clarified what is new relative to prior stress tolerance work, (ii) made explicit that we observe phenotypic convergence without a shared genetic route, (iii) stated upfront that we evolved four independent lines plus two controls, and (iv) corrected figure legends, statistics, and the missing citations. Below we respond point-by-point.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents findings on the adaptation mechanisms of Saccharomyces cerevisiae under extreme stress conditions. The authors try to generalize this to adaptation to stress tolerance. A major finding is that S. cerevisiae evolves a quiescence-like state with high trehalose to adapt to freeze-thaw tolerance independent of their genetic background. The manuscript is comprehensive, and each of the conclusions is well supported by careful experiments.

      Strengths:

      This is excellent interdisciplinary work.

      Weaknesses:

      I have questions regarding the overall novelty of the proposal, which I would like the authors to explain.

      (1) Earlier papers have shown that loss of ribosomal proteins, that slow growth, leads to better stress tolerance in S. cerevisiae. Given this, isn’t it expected that any adaptation that slows down growth would, overall, increase stress tolerance? Even for other systems, it has been shown that slowing down growth (by spore formation in yeast or bacteria/or dauer formation in C. elegans) is an effective strategy to combat stress and hence is a likely route to adaptation. The authors stress this as one of the primary findings. I would like the authors to explain their position, detailing how their findings are unexpected in the context of the literature.

      We agree that the link between slower growth and higher stress tolerance has been well studied. What is distinctive here is that repeated, near-lethal freeze–thaw selected not only for a tolerant/quiescent-like state but also for a shorter lag on re-entry. In this regime of freeze–thaw–regrowth, cells that are tolerant but slow to restart would be outcompeted by naive fast growers. Our quiescence-based selection simulations reproduce exactly this constraint. We have added this explanation to the Results to make clear that the novelty is the co-evolution of a tolerant, trehaloserich state together with rapid regrowth under an alternating regime.

      (2) Convergent evolution of traits: I find the results unsurprising. When selecting for a trait, if there is a major mode to adapt to that stress, most of the strains would adapt to that mode, independent of the route. According to me, finding out this major route was the objective of many of the previous reports on adaptive evolution. The surprising part in the previous papers (on adaptive evolution of bacteria or yeast) was the resampling of genes that acquired mutations in multiple replicates of an evolution experiments, providing a handle to understand the major genetic route or the molecular mechanism that guides the adaptation (for example in this case it would be - what guides the overaccumulation of trehalose). I fail to understand why the authors find the results surprising, and I would be happy to understand that from the authors. I may have missed something important.

      Our surprise was precisely that we did not see the classical pattern of “phenotypic convergence + repeated mutations in the same locus/module.” All independently evolved lines converged on a trehalose-rich, mechanically reinforced, quiescence-like phenotype, but population sequencing across lines did not reveal a single repeatedly hit gene or small shared pathway, even when we increased selection stringency (1–3 freeze–thaw cycles per round). We have now stated in the manuscript that this decoupling (strong phenotypic convergence, non-overlapping genetic routes) is the central inference: selection is acting on a physiologically defined state that multiple genotypes can reach.

      (3) Adaptive evolution would work on phenotype, as all of selective evolution is supposed to. So, given that one of the phenotypes well-known in literature to allow free-tolerance is trehalose accumulation, I think it is not surprising that this trait is selected. For me, this is not a case of ”non-genetic” adaptation as the authors point out: it is likely because perturbation of many genes can individually result in the same outcome - up-regulation of trehalose accumulation. Thereby, although the adaptation is genetic, it is not homogeneous across the evolving lines - the end result is. Do the authors check that the trait is actually a non-genetic adaptation, i.e., if they regrow the cells for a few generations without the stress, the cells fall back to being similarly only partially fit to freeze-thaw cycles? Additionally, the inability to identify a network that is conserved in the sequencing does not mean that there is no regulatory pathway. A large number of cryptic pathways may exist to alter cellular metabolic states.

      This is a point in continuation of point #2, and I would like to understand what I have missed.

      We agree, and we have removed the wording “non-genetic adaptation.” The evolved populations retain high survival even after regrowth for ≥25 generations without freeze–thaw, so the adaptation is clearly genetically maintained. What our data show is that there is no single genetic route to the shared phenotype; different mutations can all drive cells into the same trehalose-rich, quiescencelike, mechanochemically reinforced state. We now describe this as “genetic diversification with phenotypic convergence.”

      (4) To propose the convergent nature, it would be important to check for independently evolved lines and most probably more than 2 lines. It is not clear from their results section if they have multiple lines that have evolved independently.

      We indeed evolved four independent lines and maintained two independent controls. We have added this information at the start of the Results so that the level of replication is immediately clear.

      (5) For the genomic studies, it is not clear if the authors sequenced a pool or a single colony from the evolved strains. This is an important point, since an average sequence will miss out on many mutations and only focus on the mutations inherited from a common ancestral cell. It is also not clear from the section.

      We sequenced population samples from the evolved lines. Our specific question was whether independently evolved lines would show the same high-frequency genetic solution, as is often seen in parallel evolution. Pool sequencing may under-sample rare/private variants, but it is appropriate for detecting such shared, high-frequency routes — and we do not find any. We have clarified this rationale in the Methods/Results.

      Reviewer #2 (Public review):

      Summary:

      The authors used experimental evolution, repeatedly subjecting Saccharomyces cerevisiae populations to rapid liquid-nitrogen freeze-thaw cycles while tracking survival, cellular biophysics, metabolite levels, and whole-genome sequence changes. Within 25 cycles, viability rose from ~2 % to ~70 % in all independent lines, demonstrating rapid and highly convergent adaptation despite distinct starting genotypes. Evolved cells accumulated about threefold more intracellular trehalose, adopted a quiescence-like phenotype (smaller, denser, non-budding cells), showed cytoplasmic stiffening and reduced membrane damage, and re-entered growth with shorter lag traits that together protected them from ice-induced injury. Whole-genome sequencing indicated that multiple genetic routes can yield the same mechano-chemical survival strategy. A population model in which trehalose controls quiescence entry, growth rate, lag, and freeze-thaw survival reproduced the empirical dynamics, implicating physiological state transitions rather than specific mutations as the primary adaptive driver. The study therefore concludes that extreme-stress tolerance can evolve quickly through a convergent, trehalose-rich quiescence-like state that reinforces membrane integrity and cytoplasmic structure.

      Strengths:

      The strengths of the paper are the experimental design, data presentation and interpretation, and that it is well-written.

      (1) While the phenotyping is thorough, a few more growth curves would be quite revealing to determine the extent of cross-stress protection. For example, comparing growth rates under YPD vs. YPEG (EtOH/glycerol), and measuring growth at 37ºC or in the presence of 0.8 M KCl.

      We thank the referee for the interesting suggestions. However, growth rates alone may be difficult to interpret since WT strains also show different growth rates under these conditions. Therefore, comparing the relative fitness or survival of the evolved strains versus the WT under these stresses would be more informative. In the present study we limited growth/survival measurements to what was needed to parameterize the adaptation model in YPD under the freeze–thaw regime. We have now added a statement in the Discussion that, given the shared trehalose/mechanical mechanism, such cross-stress assays are an expected and straightforward follow-up.

      (2) Is GEMS integrated prior to evolution? Are the evolved cells transformable?

      Yes. GEMs were integrated prior to evolution, because the non-integrated evolved population showed low transformation efficiency, likely due to altered cell-wall properties.

      (3) From the table, it looks like strains either have mutations in Ras1/2 or Vac8. Given the known requirements of Ras/PKA signaling for the G1/S checkpoint (to make sure there are enough nutrients for S phase), this seems like a pathway worth mentioning and referencing. Regarding Vac8, its emerging roles in NVJ and autophagy suggest another nutrient checkpoint, perhaps through TORC1. The common theme is rewired metabolism, which is probably influencing the carbon shuttling to trehalose synthesis.

      We appreciate the reviewer’s suggestion to consider pathways like Ras/PKA (linked to Ras1/2) and autophagy/TORC1 (linked to Vac8) as potential upstream modulators. While these pathways are involved in nutrient sensing and metabolic regulation, we choose not to emphasize them specifically. This is because (i) some evolved lines lack Ras1/2 or Vac8 variants, and (ii) none of the variants lies directly in trehalose synthesis/degradation pathways. Furthermore, direct links to trehalose accumulation are not well established for these specific variants in this context, and pathways like Ras are global regulators with broad effects. Together with the strongly convergent phenotype, this supports our main inference that multiple genetic/metabolic routes can feed into the same trehalose-rich, mechanochemically reinforced, quiescence-like state. We have added a note in the discussion regarding metabolic rewiring and trehalose.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Generally, the results sections should have more details. The figures should be corrected, and the legends should be checked for correctness. The manuscript seems to have been assembled in haste?

      We have expanded the relevant Results subsections with one-sentence motivations (why each measurement was performed) and we have corrected the figure legends for ordering and consistency.

      Figure 3: It will be good to have the correct p-values on the figure itself. P-values are typically less than 1, unless there is some special method (here the values presented are , etc). Please explain how the P-values were obtained in the figure legend itself.

      Figure 3 now shows the actual p-values. The legend specifies the details and the sample sizes used.

      Figure 5: It is not clear what the error bars show in 5B, E (different evolved population/ clones/ cells?). All the figure legends are mixed up, please correct them. It is difficult to follow the paper.

      Figure 5 legends now state clearly what the error bars represent (biological replicates) and which panels are from single-cell measurements. We have checked the panel lettering and legend order for consistency with the flow of the main text.

      Reviewer #3 (Recommendations for the authors):

      Overall, the paper is outstanding, well-written, and insightful.

      A point to address is that there are missing citations on lines 60, 91.

      We have added the missing citations at both locations. We apologize for the omission, which was due to a compilation error. This error has been fixed, and the bibliography has been corrected (now containing 74 references).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Authors state, "we identified ETF dehydrogenase (ETFDH) as one of the most dispensable metabolic genes in neoplasia." Surely there are thousands of genes that are dispensable for neoplasia. Perhaps the authors can revise this sentence and similar sentiments in the text.

      We agree with the reviewer and have corrected the text accordingly. Specifically, we rephrased the sentence: “Surprisingly, we observed that in contrast to muscle, ETFDH is one of the most non-essential metabolic genes in cancer cells.” to “Surprisingly, we observed that in contrast to muscle, ETFDH is a non-essential gene in acute lymphoblastic leukemia NALM-6 cells”

      Authors state, "These findings show that ETFDH loss elevates glutamine utilization in the CAC to support mitochondrial metabolism." While elevated glutamine to CAC flux is consistent with the statement that increased glutamine, the authors have not measured the effect of restoring glutamine utilization to baseline on mitochondrial metabolism. Thus, the causality implied by the authors can only be inferred based on the data presented. Indeed, the increased glutamine consumption may be linked to the increase in ROS, as glutamate efflux via system xCT is a major determinant of glutamine catabolism in vitro.

      Indeed. We changed the statement "These findings show that ETFDH loss elevates glutamine utilization in the CAC to support mitochondrial metabolism." to "Collectively, these data demonstrate that ETF insufficiency in cancer cells remodels mitochondrial metabolism and increases the glutamine consumption and anaplerosis."

      Authors state that the mechanism described is an example of "retrograde signaling". However, the mechanism seems to be related to a reduction in BCAA catabolism, suggesting that the observed effects may be a consequence of altered metabolic flux rather than a direct signaling pathway. The data presented do not delineate whether the observed effects stem from disrupted mitochondrial communication or from shifts in nutrient availability and metabolic regulation.

      Notwithstanding that the term “retrograde” was used to refer to signaling from mitochondria to mTORC1, rather than from mTORC1 to mitochondria [1], we have removed the term “retrograde signaling” throughout the manuscript.

      The authors should discuss which amino acids that are ETFDH substrates might affect mTORC1 activity or consider whether other ETFDH substrates might also affect mTORC1 in their discussion. Along these lines, the authors might consider discussing why amino acids that are not ETFDH substrates are increased upon ETFDH loss.

      Based on the literature, we expect that branched chain amino acids that are ETFDH substrates (e.g., leucine) are likely to play a major role in activating mTORC1 upon ETFDH abrogation. As expected, the aforementioned amino acids are among those that are the most highly upregulated in ETFDH deficient cells (Fig 3A). We have, however, never formally tested the role of branched chain amino acid in activating mTORC1 in the context of ETFDH disruption. The increase in amino acids that are not metabolized via ETFDH, is likely to stem from global metabolic rewiring of ETFDH-deficient cells and observed alterations in amino acid uptake (e.g., glutamine; Fig 2F). We discuss this in the revised version of the paper as follows:

      “Several metabolites can be sensed via signaling partners upstream of mTORC1, including leucine, arginine, methionine/SAM, and threonine [2]. Branched-chain amino acids (leucine, isoleucine, and valine), which are among the highest upregulated metabolites in ETFDH deficient cells (Fig 3A) serve as ETFDH substrates, and have been described to display strong activation capabilities towards mTORC1 in the literature [3,4]. Glutamine can also activate mTORC1 through Arf family of GTPases [5]. Indeed, glutamine can supplement the non-essential amino acid (NEAA) pool through transamination [6] and amino acid uptake [7]. Accordingly, the maintenance of NEAA that are non-ETFDH substrates may be supported by the global metabolic rewiring fueled by enhanced glutamine metabolism in ETFDH-deficient cells. Deciphering the mechanisms leading to accumulation of specific amino acids and their role in ETFDH-dependent mTORC1 modulation is warranted.”

      Reviewer #2 (Public review):

      The authors would strengthen the paper considerably by adding back catalytically inactive ETFDH to show that the activity of this enzyme is responsible for the increased growth phenotypes and changes in labeling that they observe.

      Based on the Reviewers’ suggestions we performed these experiments. Herein, we took advantage of Y304A/G306E ETFDH mutant that impairs electron transfer from ETF and cannot substitute for the wild type (WT) gene function in ETFDH-deficient myoblasts [8]. We expressed WT and Y304A/G306E ETFDH mutant in ETFDH KO HCT116 colorectal cancer cells and confirmed that they are expressed to a comparable level (Supplementary Figure 6C). Re-expression of WT decreased proliferation, while suppressing mTORC1 signaling and increasing 4E-BP1 levels relative to control (vector infected) ETFDH KO EV HCT116 cells (Supplementary Figure 6D). In contrast, proliferation rates, mTORC1 signaling and 4E-BP1 levels remained largely unchanged upon Y304A/G306E ETFDH mutant expression in ETFDH KO HCT116 cells (Supplementary Figure 6D). Similarly, re-expression of WT ETFDH disrupted the bioenergetic phenotype associated with ETFDH loss, in contrast to re-expression of Y304A/G306E ETFDH mutant, which exhibited similar bioenergetic profiles as ETFDH KO control (Supplementary Figure 6E-F). Collectively these findings argue that the ETFDH activity is required for its tumor suppressive effects.

      If nucleotide pool and labeling data are available, or can be obtained readily, this would significantly strengthen the tracing data already obtained.

      We followed Reviewer’s suggestion and measured nucleotide levels. This revealed that loss of ETFDH results in increase in steady-state nucleotide pools (Supplementary Figure 2K), consistent with increased aspartate labelling and accelerated tumor growth.

      References

      (1) Morita, M. et al. mTORC1 controls mitochondrial activity and biogenesis through 4EBP-dependent translational regulation. Cell Metab 18, 698-711 (2013). https://doi.org/10.1016/j.cmet.2013.10.001

      (2) Valenstein, M. L. et al. Structural basis for the dynamic regulation of mTORC1 by amino acids. Nature 646, 493-500 (2025). https://doi.org/10.1038/s41586-025-09428-7

      (3) Appuhamy, J. A., Knoebel, N. A., Nayananjalie, W. A., Escobar, J., & Hanigan, M. D. Isoleucine and leucine independently regulate mTOR signaling and protein synthesis in MAC-T cells and bovine mammary tissue slices. J Nutr 142, 484-491 (2012). https://doi.org/10.3945/jn.111.152595

      (4) Herningtyas, E. H. et al. Branched-chain amino acids and arginine suppress MaFbx/atrogin-1 mRNA expression via mTOR pathway in C2C12 cell line. Biochim Biophys Acta 1780, 1115-1120 (2008). https://doi.org/10.1016/j.bbagen.2008.06.004

      (5) Jewell, J. L. et al. Metabolism. Differential regulation of mTORC1 by leucine and glutamine. Science 347, 194-198 (2015). https://doi.org/10.1126/science.1259472

      (6) Tan, H. W. S., Sim, A. Y. L. & Long, Y. C. Glutamine metabolism regulates autophagy-dependent mTORC1 reactivation during amino acid starvation. Nat Commun 8, 338 (2017). https://doi.org/10.1038/s41467-017-00369-y

      (7) Chen, R. et al. The general amino acid control pathway regulates mTOR and autophagy during serum/glutamine starvation. J Cell Biol 206, 173-182 (2014).https://doi.org/10.1083/jcb.201403009

      (8) Herrero Martin, J. C. et al. An ETFDH-driven metabolon supports OXPHOS efficiency in skeletal muscle by regulating coenzyme Q homeostasis. Nat Metab 6, 209-225 (2024). https://doi.org/10.1038/s42255-023-00956-y

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary:

      Schafer et al. tested whether the hippocampus tracks social interactions as sequences of neural states within an abstract social space defined by dimensions of affiliation and power, using a task in which participants engaged in narrative-based social interactions. The findings of this study revealed that individual social relationships are represented by unique sequences of hippocampal activity patterns. These neural trajectories corresponded to the history of trial-to-trial affiliation and power dynamics between participants and each character, suggesting an extended role of the hippocampus in encoding sequences of events beyond spatial relationships.

      The current version has limited information on details in decoding and clustering analyses which can be improved in the future revision.

      Strengths:

      (1) Robust Analysis: The research combined representational similarity analysis with manifold analyses, enhancing the robustness of the findings and the interpretation of the hippocampus's role in social cognition.

      (2) Replicability: The study included two independent samples, which strengthens the generalizability and reliability of the results.

      Weaknesses:

      I appreciate the authors for utilizing contemporary machine-learning techniques to analyze neuroimaging data and examine the intricacies of human cognition. However, the manuscript would benefit from a more detailed explanation of the rationale behind the selection of each method and a thorough description of the validation procedures. Such clarifications are essential to understand the true impact of the research. Moreover, refining these areas will broaden the manuscript's accessibility to a diverse audience.

      We thank the reviewer for these comments and have addressed them in various ways.

      First, we removed the spline-based decoding and spectral clustering analyses. As we detail in our response to the recommendations, these approaches were complex and raised legitimate interpretational concerns, making it unclear how they supported our core claims. The revised manuscript now focuses on a set of representational similarity analyses to show representations consistent with social dimension similarity (affiliation vs. power decision trials) and social location similarity (trajectory/map-like coding based on participant choices).

      Second, we expanded the Methods and Results to more clearly explain the analyses, the questions they address, and associated controls and robustness tests. The dimension similarity analysis tests whether hippocampal patterns differentiate affiliation and power decisions in a way consistent with an abstract dimension representation. The location similarity RSAs test whether within-character neural pattern distances scale with Euclidean distance in social space (relationship-specific trajectories), and whether pattern distances across all characters scale with location distances when distances are globally standardized, consistent with a shared map-like coordinate system.

      Third, we emphasize new controls. For the dimension similarity RSA, we test for potential confounds such as word count, text sentiment, and reaction time differences between affiliation and power trials. For the location similarity RSA, we control for temporal distance between trials and show (in the Supplement) that the reported effects cannot be explained by temporal autocorrelation in the fMRI data or by the relationship between temporal distance and behavioral location distance.

      We believe that these changes address the reviewer’s request for clearer rationale and validation.

      Reviewer #2 (Public review):

      Summary:

      Using an innovative task design and analysis approach, the authors set out to show that the activity patterns in the hippocampus related to the development of social relationships with multiple partners in a virtual game. While I found the paper highly interesting (and would be thrilled if the claims made in the paper turned out to be true), I found many of the analyses presented either unconvincing or slightly unconnected to the claims that they were supposed to support. I very much hope the authors can alleviate these concerns in a revision of the paper.

      Strengths & Weaknesses:

      (1) The innovative task design and analyses, and the two independent samples of participants are clear strengths of the paper.

      We thank the reviewer for this comment.

      (2) The RSA analysis is not what I expected after I read the abstract and tile of the result section "The hippocampus represents abstract dimensions of affiliation and power". To me, the title suggests that the hippocampus has voxel patterns, which could be read out by a downstream area to infer the affiliation and power value, independent of the exact identity of the character in the current trial. The presented RSA analysis however presents something entirely different - namely that the affiliation trials and power trials elicit different activity patterns in the area indicated in Figure 3. What is the meaning of this analysis? It is not clear to me what is being "decoded" here and alternative explanations have not been considered. How do affiliation and power trials differ in terms of the length of sentences, complexity of the statements, and reaction time? Can the subsequent decision be decoded from these areas? I hope in the revision the authors can test these ideas - and also explain how the current RSA analysis relates to a representation of the "dimensions of affiliation and power".

      We agree that this analysis needed to be better justified and explained. We have revised the text to clarify that by “represents the interaction decision trials along abstract social dimensions” we mean that hippocampal multivoxel patterns differentiate affiliation and power decisions in a way consistent with the conceptual framework of underlying latent dimensions. The analysis tests one simple prediction of this view – that on average these trial types are separable in the neural patterns. We have added details to the Methods, showing how the affiliation and power trials do not differ in word count or in sentiment, but do differ in their semantics, as assessed by a Large Language Model, as we expect from our task assumptions. Thanks to the reviewer’s comment, we also tested for and found a reaction time difference between affiliation and power trials, that we now control for.

      (3) Overall, I found that the paper was missing some more fundamental and simpler RSA analyses that would provide a necessary backdrop for the more complicated analyses that followed. Can you decode character identity from the regions in question? If you trained a simple decoder for power and affiliation values (using the LLE, but without consideration of the sequential position as used in the spline analysis), could you predict left-out trials? Are affiliation and power represented in a way that is consistent across participants - i.e. could you train a model that predicts affiliation and power from N-1 subjects and then predict the Nth subject? Even if the answer to these questions is "no", I believe that they are important to report for the reader to get a full understanding of the nature of the neural representations in these areas. If the claim is that the hippocampus represents an "abstract" relationship space, then I think it is important to show that these representations hold across relationships. Otherwise, the claim needs to be adjusted to say that it is a representation of a relationship-specific trajectory, but not an abstract social space.

      We appreciate this comment and agree on the value of clear, conceptually simple analyses. To address this concern, we have simplified our main analysis significantly by removing the spline-based analysis and substituting it with a multiple regression representational similarity analysis approach. We test whether within-character neural pattern distances scale with distance in social space (relationship-specific trajectories), and whether pattern distances across all characters scale with location distances when distances are globally standardized. We find evidence for both, consistent with a shared map-like coordinate system.

      We agree that decoding character identity and an across-participant decoding approach could be informative. However, our current task is not well designed for such analyses and as such would complicate the paper. Although we agree that these questions are interesting, they would test questions that are outside the scope of this paper. 

      (4) To determine that the location of a specific character can be decoded from the hippocampal activity patterns, the authors use a sequential analysis in a lowdimensional space (using local linear embedding). In essence, each trial is decoded by finding the pair of two temporally sequential trials that is closest to this pattern, and then interpolating the power/affiliation values linearly between these two points. The obvious problem with this analysis is that fMRI pattern will have temporal autocorrelation and the power and affiliation values have temporal autocorrelation. Successful decoding could just reflect this smoothness in both time series. The authors present a series of control analyses, but I found most of them to not be incisive or convincing and I believe that they (and their explanation of their rationale) need to be improved. For example, the circular shifting of the patterns preserves some of the autocorrelation of the time series - but not entirely. In the shifted patterns, the first and last items are considered to be neighboring and used in the evaluation, which alone could explain the poor performance. The simplest way that I can see is to also connect the first and last item in a circular fashion, even when evaluating the veridical ordering. The only really convincing control condition I found was the generation of new sequences for every character by shuffling the sequence of choices and re-creating new artificial trajectories with the same start and endpoint. This analysis performs much better than chance (circular shuffling), suggesting to me that a lot of the observed decoding accuracy is indeed simply caused by the temporal smoothness of both time series.

      We thank the reviewer for emphasizing this important concern; we agree that we did not sufficiently address this in the initial submission. This concern is one main reason we removed the spline-based analysis and now use regression-based representational similarity analyses in its place. In the revision, we report autocorrelation-related analyses in the supplement, and via controls and additional analysis show that temporal distance (or its square) cannot explain the location-like effects. This substantially improves our ability to interpret the findings.

      (5) Overall, I found the analysis of the brain-behavior correlation presented in Figure 5 unconvincing. First, the correlation is mostly driven by one individual with a large network size and a 6.5 cluster. I suspect that the exclusion of this individual would lead to the correlation losing significance. Secondly, the neural measure used for this analysis (determining the number of optimal clusters that maximize the overlap between neural clustering and behavioral clustering) is new, non-validated, and disconnected from all the analyses that had been reported previously. The authors need to forgive me for saying so, but at this point of the paper, would it not be much more obvious to use the decoding accuracy for power and affiliation from the main model used in the paper thus far? Does this correlate? Another obvious candidate would be the decoding accuracy for character identity or the size of the region that encodes affiliation and power. Given the plethora of candidate neural measures, I would appreciate if the authors reported the other neural measures that were tried (and that did not correlate). One way to address this would have been to select the method on the initial sample and then test it on the validation sample - unfortunately, the measure was not pre-registered before the validation sample was collected. It seems that the correlation was only found and reported on the validation sample?

      We agree that this analysis was too complicated and under constrained, and thus not convincing. We think that removing this cluster-based analysis is the most conservative response to the reviewer’s concerns and have removed it from the revised paper.

      Recommendations to the authors:

      Reviewer #1 (Recommendations for the authors):

      The manuscript's description of the shuffling analysis performed during decoding is currently ambiguous, particularly concerning the control variables. This ambiguity is present only in the Figure 4 legends and requires a more detailed explanation within the methods section. It is essential to clarify whether the permutation process was conducted within each character's data set or across multiple characters' data sets. If permutations were confined to within-character data, the conclusion would be that the hippocampus encodes context-specific information rather than providing a twodimensional common space.

      We thank the reviewer for this comment. We have now removed the spline analysis due to these and other problems and have replaced it with representational similarity analyses that are both more rigorous and easier to interpret. We think these analyses allow us to make the claim that the characters are represented in a common space. 

      In the methods, we explain the analyses (page 23-24, lines 475-500):

      “We also expected the hippocampus to represent the different characters’ changing social locations, which are implicit in the participant’s choices. We used multiple regression searchlight RSA to test whether hippocampal pattern dissimilarity increases with social location distance, based on participant-specific trial-wise beta images where boxcar regressors spanned each trial’s reaction time.”

      “We ran two complementary regression analyses to address two related questions. First, we asked whether the hippocampus represents how a specific relationship changes over time. For this analysis, for each participant and each searchlight, we computed character-specific (i.e., only for same character trial pairs) correlation distances between trial-wise beta patterns and Euclidean distances between the social location behavioral coordinates. Distances were zscored within character trial pairs to isolate character-specific changes. The second analysis asked whether the there is a common map-like representation, where all trials, regardless of relationship, are represented in a shared coordinate system. Here, we included all trial pairs and z-scored the distances globally. For both regression analyses, we included control distances to control for possible confounds. To account for generic time-related changes, we controlled for absolute scan-time difference, as this correlated with location distance across participants (see Temporal autocorrelation of hippocampal beta patterns in the supplement). Although the square of this temporal distance did not explain any additional variance in behavioral distances, we ran a robustness analysis including both temporal distance and its square and saw qualitatively the same clusters with similar effect sizes. As such, we report the main analysis only. We included binary dimension difference (0 = trial pairs of different dimension, 1 = trials pairs of the same dimension), to ensure effects could not be explained by dimension-related effects. In the group-level model, we controlled for sample and the average reaction time between affiliation and power decisions.”

      In the results, we describe the results and our interpretation (pages 11-12, lines 185208):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation. Does it also represent the changing social coordinates of each character? To test this, we multiple-regression RSA searchlight to test whether left hippocampus patterns represent the characters’ changing social locations across interactions (see Figure 3). We restricted the distances to those from trial pairs from the same character and standardized the distances within character (see Figure 3BD). We controlled for temporal distance to ensure the effect was not explainable by the time between trials, and for whether the trials shared the same underlying dimension (affiliation or power; see Location similarity searchlight analyses for more details). At the group level, we controlled for sample and the average reaction time difference between affiliation and power trials. Using the same testing logic as the dimensionality similarity analysis, we first tested our hypothesis in the bilateral hippocampus and found widespread effects in both the left (peak voxel MNI x/y/z = -35/-22/-15, cluster extent = 1470 voxels) and right (peak voxel MNI x/y/z = 37/-19/-14, cluster extent = 1953 voxels) hemispheres. The whole-brain searchlight analysis revealed additional clusters in the left putamen (-27/-3/14, cluster extent = 131 voxels) and left posterior cingulate cortex (-10/-28/41, cluster extent = 304 voxels).”

      “We then asked a second, complementary question: does the hippocampus represent all interactions, across characters, within a shared map? To test for this map-like structure, we repeated the analysis but now included all trial pairs, z-scoring distances globally rather than within character (Figure 3E-F). The remainder of the procedure followed the same logic as the preceding analysis. The hippocampus analysis revealed an extensive right hippocampal cluster (27/27/-14, cluster extent = 1667 voxels). The whole-brain analysis did not show any significant clusters.”

      We also describe the results in the discussion (page 12, lines 220-226): 

      “Then, we show that the hippocampus tracks the changing social locations (affiliation and power coordinates), above and beyond the effects of dimension or time; the hippocampus seemed to reflect both the changing within-character locations, tracking their locations over time, and locations across characters, as if in a shared map. Thus, these results suggest that the hippocampus does not just encode static character-related representations but rather tracks relationship changes in terms of underlying affiliation and power.”

      The manuscript's description of the decoding analysis is unclear regarding the variability of the decoded positions. The authors appear to decode the position of a character along a spline, which raises the question of whether this position correlates with time, since characters are more likely to be located further from the center in later trials. There is a concern that the decoded position may not solely reflect the hippocampal encoding of spatial location, but could also be influenced by an inherent temporal association. Given that a character's position at time t is likely to be similar to its positions at t−1 and t+1, it is crucial that the authors clearly articulate their approach to separating spatial representation from temporal autocorrelation. While this issue may have been addressed in the construction of the test set, the manuscript does not seem to adequately explain how such biases were mitigated in the training set.

      We agree that temporal confounding needs to be better accounted for, as our claims depend on space-like signals being separable from time-like ones. We address this in several ways in the revised manuscript.

      First, we emphasize that this is a narrative-based task, where temporal structure is relevant. As such, our analyses aim to demonstrate that effects go beyond simple temporal confounds, like trial order or time elapsed.

      Despite the temporal structure to the task, the decisions for the same character are spaced in time, and interleaved with other characters’ decisions, reducing the chance that a simple temporal confound could explain trajectory-related effects. We now describe the task better in the revised methods (page 16, lines 314-318):

      “All six characters’ decision trials are interleaved with one another and with narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to that same character, such that each character’s choices are separated by an average of ~20 seconds (range 12 seconds to 10 min).”

      To address temporal autocorrelation in the fMRI time series, we used SPM’s FAST algorithm. Briefly, FAST models temporal autocorrelation as a weighted combination of candidate correlation functions, using the best estimate to remove autocorrelated signal.

      We also now report the temporal autocorrelation profile of the hippocampal beta series in the supplement, including (pages 29-31, lines 593-656):

      “The Social Navigation Task is a narrative-based task, where the relationships with characters evolve over time; trial pairs that are close in time may have more similar fMRI patterns for reasons unrelated to social mapping (e.g., slow drift). It is important to account for the role of time in our analyses, to ensure effects go beyond simple temporal confounds, like the time between decision trials. To aid in this, we quantified how fMRI signals change over time using a pattern autocorrelation function across decision trial lags. We defined the left and right hippocampus and the left and right intracalcarine cortex using the HarvardOxford atlas and thresholded them at 50% probability. We chose intracalcarine corex as an early visual control region that largely corresponds to primary visual cortex (V1), as it is likely to be driven by the visually presented narrative. We used the same trial-wise beta images as in the location similarity RSA (boxcar regressors spanning each decision trial’s reaction time). For each participant and region-of-interest (ROI), we extracted the decision trial-by-voxel beta matrix and quantified three kinds of temporal dependence: beta autocorrelation, multivoxel pattern correlation and multivoxel pattern correlation after regressing out temporal distance.”

      “To estimate the temporal autocorrelation of the trial-wise beta values, we treated each voxel’s beta values as a time series across trials and measured how much a voxel’s response on one trial correlated (Pearson) with its response on previous trials. We averaged these voxel wise autocorrelations within each ROI. At one trial apart (lag 1), both the hippocampus and V1 showed small positive autocorrelations, indicating modest trial-to-trial carryover in response amplitude (see Supplemental figure 1) that by three trials apart was approximately 0.”

      “Because our representational similarity analyses depend on trial-by-trial pattern similarity, we also estimated how multivoxel patterns were autocorrelated over time. For each lag, we computed the Pearson correlation between each trial’s voxelwise pattern and the pattern from the trial that many trials earlier, then averaged those correlations to obtain a single autocorrelation value for that lag. At one trial apart, both regions showed positive autocorrelation, with V1 having greater autocorrelation than the hippocampus; pattern correlations between trials 3 or 4 trials apart reduced across participants, settling into low but positive values. Then, for each participant and ROI, we regressed out the effect of absolute trial onset differences from all pairwise pattern correlations, to mirror the effects of controlling for these temporal distances in regressions. After removing this temporal distance component, the short lag pattern autocorrelation dropped substantially in both regions. The similarity in autocorrelation profiles between the two regions suggests that significant similarity effects in the hippocampus are unlikely to be driven by generic temporal autocorrelation.”

      “Relationship between behavioral location distance and temporal distance “

      “We also quantified how temporal distances between trials relates to their behavioral location distances, participant by participant. Our dimension similarity analysis controls for temporal distance between trials by design (see Social dimension similarity searchlight analysis), but our location similarity analysis does not. To decide on covariates to include in the analysis, we tested whether temporal distances can explain behavioral location distances. For each participant, we computed the correlations between trial pairs’ Euclidean distances in social locations and their linear temporal distances (“linear”) and the temporal distances squared (“quadratic”), to test for nonlinear effects. We then summarized the correlations using one-sample t-tests. The linear relationship was statistically significant (t<sub>49</sub> = 12.24, p < 0.001), whereas the quadratic relationship was not (t<sub>49</sub> = -0.55, p = 0.586). Similarly, in participant specific regressions with both linear and quadratic temporal distances, the linear effect was significant (t<sub>49</sub> = 5.69, p < 0.001) whereas the quadratic effect was not (t<sub>49</sub> = 0.20, p = 0.84). Based on this, we included linear temporal distances as a covariate in our location similarity analyses (see Location similarity searchlight analyses), and verified that adding a quadratic temporal distance covariate does not alter the results. Thus, the reported location-related pattern similarity effects go beyond what can be explained by temporal distance alone.”

      How the free parameter of spectral clustering was determined, if there is any?

      The interpretation of the number of hippocampal activity clusters is ambiguous. It is suggested that this number could fluctuate due to unique activity patterns or the fit to behaviorally defined trajectories. A lower number of clusters might indicate either a noisier or less distinct representation, raising the question of the necessity and interpretability of such a complex analysis. This concern is compounded by the potential sensitivity of the clustering to the variance in Euclidean distances of each trial's position relative to the center. If a character's position is consistently near the center, this could artificially reduce the perceived number of clusters. Furthermore, the manuscript should address whether there is any correlation between the number of clusters and behavioral performance. Specifically, what are the implications if participants are able to perform the task adequately with a smaller number of distinct hippocampal representation states?

      The rationale for conducting both cluster analysis and position decoding as separate analyses remains unclear. While cluster analysis can corroborate the findings of position decoding, it is not apparent why the authors chose to include trials across characters for cluster analysis but not for decoding analysis. An explanation of the reasoning behind this methodological divergence would help in understanding the distinct contributions of each analysis to the study's findings.

      The paper by Cohen et al. (1997), which provides the questionnaire for measuring the social network index, is not cited in the references. Upon reviewing the questionnaire that the author may have used, it appears that the term "social network size" does not refer to the actual size but to a score or index derived from the questionnaire responses. It may be more appropriate to replace the term "size" with a different term to more accurately reflect this distinction.

      Thank you for seeking these clarifications. Given the complexity of this analysis, we have decided to drop it to focus instead on our dimension and location representational similarity analysis results.

      Reviewer #2 (Recommendations for the authors):

      How did the participants' decisions on previous trials influence the future trials that the subjects saw? If the different participants were faced with different decision trials, then how did you compare their decision? If two participants made the same decisions, would they have seen exactly the same sequence of trials (see point X on how the trial sequence was randomized).

      All participants experience the same narrative, with the same decisions (i.e., the same available options); their choices (i.e., the options they select) are what implicitly shape each character’s affiliation and power locations, and thus each character’s trajectory. In other words, the narrative is fixed; what changes is the social coordinates assigned to each trial’s outcome depending on the participant’s choice of how to interact from the two narrative options. This means that we can meaningfully compare participants' neural patterns, given that every participant received the same text and images throughout.

      We have now added details on the narrative structure, replacing more ambiguous statements with a clearer description (page 16, lines 309-318):

      “The sequence of trials, including both narrative and decision trials, were fixed across participants; all that differs are the choices that the participants make. Narrative trials varied in duration, depending on the content (range 2-10 seconds), but were identical across participants. Decision trials always lasted 12 seconds, with two options presented until the participant made a choice, after which a blank screen was presented for the remainder of the duration. All six characters’ decision trials are interleaved with one another, and with the narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to another decision with the same character, such that each character’s choices are separated by an average of ~20 seconds (ranging from 12 seconds to 10 min).”

      Figure 2B: I assume that "count" is "count of participants"? It would be good to indicate this on the axis/caption.

      Thank you for noting this. We have now removed this figure to improve the clarity of our figures. 

      We have shown that the hippocampus represents the interaction decision trials along abstract social dimensions, but does it track each relationship's unique sequence of abstract social coordinates?". Please clarify what you mean by "represents the interaction decision trials”.

      By “represents the interaction decision trials along abstract social dimensions”, we mean that when the participant makes a choice during the social interactions the hippocampal patterns represent the current social dimension of the choice (affiliation vs power). In other words, the hippocampal BOLD patterns differentiate affiliation and power decisions, consistent with our hypothesis of abstract social dimension representation in the hippocampus. We have clarified this (page 11, lines 185-187):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation.”

      Page 8: "Hippocampal sequences are ordered like trajectories": It is not entirely clear to me what is meant by the split midpoint. Is this the midpoint of the piece-wise linear interpolation between two points, or simply the mean of all piecewise splines from one character? If the latter, is the null model the same as simply predicting the mean affiliation and power value for this character? If yes, please clarify and simplify this for the reader.

      Page 8: "Hippocampal sequences track relationship-specific paths". First, I was misled by the "relationship-specific". I first understood this to mean that you wanted to test whether two relationships (i.e. the identity of the partner) had different representations in Hippocampus, even if the power/affiliation trajectories are the same. I suggest changing the title of this section.

      The analysis in this section also breaks any temporal autocorrelation of measured patterns - so I am not sure if this is a strong analysis that should be interpreted at all. This analysis seems to not address the claim and conclusion that is drawn from it. I assume that the random trajectories have different choices and different affiliation/power values than the true trajectories. So the fact that the true trajectories can be better decoded simply shows that either choices or affiliation and power (or both) are represented in the neural code - but not necessarily anything beyond this.

      Page 9: "Neural trajectories reflect social locations, not just choices". The motivation of this analysis is not clear to me. As I understand this analysis, both social location and choices are changed from the real trajectories. How can it then show that it reflects social locations, not just the choices?

      Figure 4 caption: "on the -based approximation" Is there a missing "point"-[based] here?

      We agree with the reviewer that this analysis is hard to interpret and does not adequately address concerns regarding temporal autocorrelation, and as such we have removed it from the manuscript. We describe the new results that include controlling for temporal distance between trials (pages 11-12, lines 185-208):

      “We have shown that the left hippocampus represents the affiliation and power trials differently, consistent with an abstract dimensional representation. Does it also represent the changing social coordinates of each character? To test this, we multiple-regression RSA searchlight to test whether left hippocampus patterns represent the characters’ changing social locations across interactions (see Figure 3). We restricted the distances to those from trial pairs from the same character and standardized the distances within character (see Figure 3BD). We controlled for temporal distance to ensure the effect was not explainable by the time between trials, and for whether the trials shared the same underlying dimension (affiliation or power; see Location similarity searchlight analyses for more details). At the group level, we controlled for sample and the average reaction time difference between affiliation and power trials. Using the same testing logic as the dimensionality similarity analysis, we first tested our hypothesis in the bilateral hippocampus and found widespread effects in both the left (peak voxel MNI x/y/z = -35/-22/-15, cluster extent = 1470 voxels) and right (peak voxel MNI x/y/z = 37/-19/-14, cluster extent = 1953 voxels) hemispheres. The whole-brain searchlight analysis revealed additional clusters in the left putamen (-27/-3/14, cluster extent = 131 voxels) and left posterior cingulate cortex (-10/-28/41, cluster extent = 304 voxels).”

      “We then asked a second, complementary question: does the hippocampus represent all interactions, across characters, within a shared map? To test for this map-like structure, we repeated the analysis but now included all trial pairs, z-scoring distances globally rather than within character (Figure 3E-F). The remainder of the procedure followed the same logic as the preceding analysis. The hippocampus analysis revealed an extensive right hippocampal cluster (27/27/-14, cluster extent = 1667 voxels). The whole-brain analysis did not show any significant clusters.”

      We emphasize that the results are robust to the inclusion of temporal distance squared, in the methods (pages 23-24, lines 493-496):

      “Although the square of this temporal distance did not explain any additional variance in behavioral distances, we ran a robustness analysis including both temporal distance and its square and saw qualitatively the same clusters with similar effect sizes.”

      Page 8: last paragraph: The text sounds like you have already shown that you can decode character identity from the patterns - but I do not believe you have it this point. I would consider this would be an interesting addition to the paper, though.

      This section has been removed, and we have been careful to not imply this in the current version of the manuscript. While we agree a character identity decoding would enrich our argument, we do not believe our task is well-suited to capture a character identity effect. Each character only has 12 decision trials, and these trials are partially clustered in time - this is one problem of temporal autocorrelation that we thank the reviewers for pushing us to consider in more detail. Dimension and location patterns, on the other hand, are more natural to analyze in our task, especially in representational similarity analyses that test whether the relevant differences scale with neural distances.

      Page 14ff: Why is "Analysis section" not part of "Materials and Methods"? I believe adding the analysis after a careful description of the methods would improve the clarity of this section.

      We agree with the reviewer and have now consolidated these two sections.

      Two or three examples of Affiliation and Power decision trials should be provided, so the reader can form a more thorough understanding of how these dimensions were operationalized. For the RSA analysis, it is important to consider other differences between these two types of trials.

      We agree that adding examples will clarify the operationalization of these dimensions. We now include example affiliation and power trials in a table (page 17-18).

      We thank the reviewer for noting the need to rule out alternative hypotheses; we have added several such tests. Affiliation and power trials were not different in word count (page 17, lines 329-332):

      “To ensure that any observed neural or behavioral differences were not confounded by trivial features of the text, we tested for differences between the affiliation and power trials (where the two options are concatenated). There were no differences in word count (affiliation average = 26.6, power average = 25.6; t-test p = 0.56).”

      They were also not different in their sentiment, as assessed by a Large Language Model (LLM) analysis (page 17, lines 332-335): 

      “The text’s sentiment also did not differ between these trial types (t-test p = 0.72), as quantified by comparing sentiment compound scores (from most negative, −1, to most positive, +1), using a Large Language Model (LLM) specialized for sentiment analysis [26]. “

      The affiliation and power trials were different in terms of semantic content, consistent with our assumptions (page 17, lines 337-347):

      “Our framework assumes that affiliation and power trials differ in their semantic content–that is, in the conceptual meaning of the text, beyond word count or sentiment. To test this assumption, we used an LLM-based semantic embedding analysis. Each decision trial was embedded into a semantic vector. We then measured the cosine similarity between pairs of trials and calculated the difference between average within-dimension similarity (affiliation-affiliation and power-power comparisons) and average between-dimension similarity (affiliationpower comparisons) and assessed its statistical significance with permutation testing (1,000 shuffles of trial labels). As expected, decision trials of the same dimension were more similar to each other than trials of different dimension, across multiple LLMs (OpenAI’s text-embedding-3-small [27]: similarity difference = 0.041, p < 0.001; all-MiniLM-L12-v2 [28]: similarity difference = 0.032, p < 0.001).”

      The affiliation and power trials were different in average reaction time. To control for this difference in the dimension RSA analysis, we added each participant’s absolute value reaction time difference between the trial types as a covariate. The results were nearly identical to what they were before. We updated the text to reflect this new control (page 23, lines 471-474):

      “However, there was a significant difference in the average reaction time between affiliation and power decisions across participants (t<sub>49</sub> = 6.92, p < 0.001; affiliation mean = 4.92 seconds (s), power mean = 4.51 s), so we controlled for this in the group-level analysis.”

      The exact implementation and timing of the behavioral tasks should be described better. How many narrative trials were intermixed with the decision trials? Which characters were they assigned to? How was the sequence of trials determined? Was it fixed across participants, or randomized?

      We agree that additional details are helpful. In the Methods, we now describe this with more detail (page 16, lines 301-318):

      “There are two types of trials: “narrative” trials where background information is provided or characters talk or take actions (a total of 154 trials), and “decision” trials where the participant makes decisions in one-on-one interactions with a character that can change the relationship with that character (a total of 63 trials). On each decision, participants used a button response box to select between the two options. The options (1 or 2, assigned to the index and middle fingers) choice directions (+/-1 arbitrary unit on the current dimension) were counterbalanced.”

      “The sequence of trials, including both narrative and decision trials, were fixed across participants; all that differs are the choices that the participants make. Narrative trials varied in duration, depending on the content (range 2-10 seconds), but were identical across participants. Decision trials always lasted 12 seconds, with two options presented until the participant made a choice, after which a blank screen was presented for the remainder of the duration. All six characters’ decision trials are interleaved with one another, and with the narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to another decision with the same character, such that each character’s choices are separated by an average of ~20 seconds (ranging from 12 seconds to 10 min).”

      What is the exact timing of trials during fMRI acquisition - i.e. how long were the trials, what was the ITI, were there long phases of rest to determine the resting baseline? These are all important factors that will determine the covariance between regressors and should be reported carefully. Ideally, I would like to see the trial-by-trial temporal auto-correlation structure across beta-weights to be reported.

      We thank the reviewer for asking for this clarification. We have added the following text to clarify the trial timing (page 16, lines 314-318):

      “All six characters’ decision trials are interleaved with one another and with narrative slides. On average, after a decision trial for a given character, participants view ~11 narrative slides and complete ~3 decisions for other characters before returning to that same character, such that each character’s choices are separated by an average of ~20 seconds (range 12 seconds to 10 min).”

      We now describe the temporal autocorrelation patterns in the supplement, including how we decided on how to control for temporal distance in representational similarity analyses (pages 29-31, lines 593-656):

      “The Social Navigation Task is a narrative-based task, where the relationships with characters evolve over time; trial pairs that are close in time may have more similar fMRI patterns for reasons unrelated to social mapping (e.g., slow drift). It is important to account for the role of time in our analyses, to ensure effects go beyond simple temporal confounds, like the time between decision trials. To aid in this, we quantified how fMRI signals change over time using a pattern autocorrelation function across decision trial lags. We defined the left and right hippocampus and the left and right intracalcarine cortex using the HarvardOxford atlas and thresholded them at 50% probability. We chose intracalcarine corex as an early visual control region that largely corresponds to primary visual cortex (V1), as it is likely to be driven by the visually presented narrative. We used the same trial-wise beta images as in the location similarity RSA (boxcar regressors spanning each decision trial’s reaction time). For each participant and region-of-interest (ROI), we extracted the decision trial-by-voxel beta matrix and quantified three kinds of temporal dependence: beta autocorrelation, multivoxel pattern correlation and multivoxel pattern correlation after regressing out temporal distance.”

      “To estimate the temporal autocorrelation of the trial-wise beta values, we treated each voxel’s beta values as a time series across trials and measured how much a voxel’s response on one trial correlated (Pearson) with its response on previous trials. We averaged these voxel wise autocorrelations within each ROI. At one trial apart (lag 1), both the hippocampus and V1 showed small positive autocorrelations, indicating modest trial-to-trial carryover in response amplitude (see Supplemental figure 1) that by three trials apart was approximately 0.”

      “Because our representational similarity analyses depend on trial-by-trial pattern similarity, we also estimated how multivoxel patterns were autocorrelated over time. For each lag, we computed the Pearson correlation between each trial’s voxelwise pattern and the pattern from the trial that many trials earlier, then averaged those correlations to obtain a single autocorrelation value for that lag. At one trial apart, both regions showed positive autocorrelation, with V1 having greater autocorrelation than the hippocampus; pattern correlations between trials 3 or 4 trials apart reduced across participants, settling into low but positive values. Then, for each participant and ROI, we regressed out the effect of absolute trial onset differences from all pairwise pattern correlations, to mirror the effects of controlling for these temporal distances in regressions. After removing this temporal distance component, the short lag pattern autocorrelation dropped substantially in both regions. The similarity in autocorrelation profiles between the two regions suggests that significant similarity effects in the hippocampus are unlikely to be driven by generic temporal autocorrelation.”

      “Relationship between behavioral location distance and temporal distance “

      “We also quantified how temporal distances between trials relates to their behavioral location distances, participant by participant. Our dimension similarity analysis controls for temporal distance between trials by design (see Social dimension similarity searchlight analysis), but our location similarity analysis does not. To decide on covariates to include in the analysis, we tested whether temporal distances can explain behavioral location distances. For each participant, we computed the correlations between trial pairs’ Euclidean distances in social locations and their linear temporal distances (“linear”) and the temporal distances squared (“quadratic”), to test for nonlinear effects. We then summarized the correlations using one-sample t-tests. The linear relationship was statistically significant (t<sub>49</sub> = 12.24, p < 0.001), whereas the quadratic relationship was not (t<sub>49</sub> = -0.55, p = 0.586). Similarly, in participant specific regressions with both linear and quadratic temporal distances, the linear effect was significant (t<sub>49</sub> = 5.69, p < 0.001) whereas the quadratic effect was not (t<sub>49</sub> = 0.20, p = 0.84). Based on this, we included linear temporal distances as a covariate in our location similarity analyses (see Location similarity searchlight analyses), and verified that adding a quadratic temporal distance covariate does not alter the results. Thus, the reported location-related pattern similarity effects go beyond what can be explained by temporal distance alone.”

    1. Author response:

      We acknowledge the concerns raised by both reviewers and plan to address them in our revision:

      Regarding Reviewer #1's comments: We will strengthen the statistical framework and address the concerns about multiple comparison corrections. We will also expand our literature review to better motivate our hypotheses, particularly incorporating the work on lateralization patterns in MGN/LGN and the existing evidence on first-order thalamic nuclei in linguistic processing.

      Regarding Reviewer #2's comments: We acknowledge the valid concern that linguistic and non-linguistic stimuli differ beyond linguistic content, including some low-level sensory properties. We will elaborate on the creation and properties of these stimuli in the Methods section and upload stimuli examples to an online repository to provide transparency about differences. We will also add a discussion of this limitation in the Discussion section, acknowledging that disentangling effects of linguistic processing from low-level stimulus properties will require further testing in future research. Additionally, we will moderate part of our claims and reorganize the presentation of results as suggested, and clarify our contribution relative to existing literature.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Thach et al. report on the structure and function of trimethylamine N-oxide demethylase (TDM). They identify a novel complex assembly composed of multiple TDM monomers and obtain high-resolution structural information for the catalytic site, including an analysis of its metal composition, which leads them to propose a mechanism for the catalytic reaction.

      In addition, the authors describe a novel substrate channel within the TDM complex that connects the N-terminal Zn²-dependent TMAO demethylation domain with the C-terminal tetrahydrofolate (THF)-binding domain. This continuous intramolecular tunnel appears highly optimized for shuttling formaldehyde (HCHO), based on its negative electrostatic properties and restricted width. The authors propose that this channel facilitates the safe transfer of HCHO, enabling its efficient conversion to methylenetetrahydrofolate (MTHF) at the C-terminal domain as a microbial detoxification strategy.

      Strengths:

      The authors provide convincing high-resolution cryo-EM structural evidence (up to 2 Å) revealing an intriguing complex composed of two full monomers and two half-domains. They further present evidence for the metal ion bound at the active site and articulate a plausible hypothesis for the catalytic cycle. Substantial effort is devoted to optimizing and characterizing enzyme activity, including detailed kinetic analyses across a range of pH values, temperatures, and substrate concentrations. Furthermore, the authors validate their structural insights through functional analysis of active-site point mutants.

      In addition, the authors identify a continuous channel for formaldehyde (HCHO) passage within the structure and support this interpretation through molecular dynamics simulations. These analyses suggest an exciting mechanism of specific, dynamic, and gated channeling of HCHO. This finding is particularly appealing, as it implies the existence of a unique, completely enclosed conduit that may be of broad interest, including potential applications in bioengineering.

      Weaknesses:

      Although the idea of an enclosed channel for HCHO is compelling, the experimental evidence supporting enzymatic assistance in the reaction of HCHO with THF is less convincing. The linear regression analysis shown in Figure 1C demonstrates a THF concentration-dependent decrease in HCHO, but the concentrations used for THF greatly exceed its reported KD (enzyme concentration used in this assay is not reported). It has previously been shown that HCHO and THF can couple spontaneously in a non-enzymatic manner, raising the possibility that the observed effect does not require enzymatic channeling. An additional control that can rule out this possibility would help to strengthen the evidence. For example, mutating the THF binding site to prevent THF binding to the protein complex could clarify whether the observed decrease in HCHO depends on enzyme-mediated proximity effects. A mutation which would specifically disable channeling could be even more convincing (maybe at the narrowest bottleneck).

      We agree with the reviewer that HCHO and THF can react spontaneously in a non-enzymatic manner, and our experiments were not intended to demonstrate enzymatic channeling. The linear regression analysis in Figure 1C was designed solely to confirm that HCHO reacts with THF under our assay conditions. Accordingly, THF was titrated over a broad concentration range starting from zero, and the observed THF concentration–dependent decrease in HCHO reflects this chemical reactivity.

      We do not interpret these data as evidence that the enzyme catalyzes or is required for the HCHO–THF coupling reaction. Instead, the structural observation of an enclosed channel is presented as a separate finding. We have clarified this point in the revised text to avoid overinterpretation of the biochemical data (page 2, line 16).

      Another concern is that the observed decrease in HCHO could alternatively arise from a reduced production of HCHO due to a negative allosteric effect of THF binding on the active site. From this perspective, the interpretation would be more convincing if a clear coupled effect could be demonstrated, specifically, that removal of the product (HCHO) from the reaction equilibrium leads to an increase in the catalytic efficiency of the demethylation reaction.

      We agree that, in principle, a decrease in detectable HCHO could also arise from an indirect effect of THF binding on enzyme activity. However, in our study the experiment was not designed to assess catalytic coupling or allosteric regulation. The assay in question monitors HCHO levels under defined conditions and does not distinguish between changes in HCHO production and downstream consumption.

      Additionally, we do not interpret the observed decrease in HCHO as evidence that THF binding enhances catalytic efficiency, or that removal of HCHO shifts the reaction equilibrium. Instead, the data are presented to establish that HCHO can react with THF under the assay conditions. Any potential allosteric effects of THF on the demethylation reaction, or kinetic coupling between HCHO removal and catalysis, are beyond the scope of the current study, and are not claimed.

      While the enzyme kinetics appear to have been performed thoroughly, the description of the kinetic assays in the Methods section is very brief. Important details such as reaction buffer composition, cofactor identity and concentration (Zn<sup>2+</sup>), enzyme concentration, defined temperature, and precise pH are not clearly stated. Moreover, a detailed methodological description could not be found in the cited reference (6), if I am not mistaken.

      Thank you for the suggestion. We have added reference [24] to the methodological description on page 8. The Methods section has been revised accordingly on page 8 under “TDM Activity Assay,” without altering the Zn<sup>2+</sup> concentration.

      The composition of the complex is intriguing but raises some questions. Based on SDS-PAGE analysis, the purified protein appears to be predominantly full-length TDM, and size-exclusion chromatography suggests an apparent molecular weight below 100 kDa. However, the cryo-EM structure reveals a substantially larger complex composed of two full-length monomers and two half-domains.

      We appreciate the reviewer’s careful analysis of the apparent discrepancy between the biochemical characterization and the cryo-EM structure. This issue is addressed in Figure S1, which may have been overlooked.

      As shown in Figure S1, the stability of TDM is highly dependent on protein and salt conditions. At 150 mM NaCl, SEC reveals a dominant peak eluting between 10.5 and 12 mL, corresponding to an estimated molecular weight of ~170–305 kDa (blue dot, Author response image 1). This fraction was explicitly selected for cryo-EM analysis and yields the larger complex observed in the reconstruction. At lower salt concentrations (50 mM) or higher (>150 mM NaCl), the protein either aggregates or elutes near the void volume (~8 mL).

      SDS–PAGE analysis detects full-length TDM together with smaller fragments (~40–50 kDa and ~22–25 kDa). The apparent predominance of full-length protein on SDS–PAGE likely reflects its greater staining intensity per molecule and/or a higher population, rather than the absence of truncated species.

      Author response image 1.

      Given the lack of clear evidence for proteolytic fragments on the SDS-PAGE gel, it is unclear how the observed stoichiometry arises. This raises the possibility of higher-order assemblies or alternative oligomeric states. Did the authors attempt to pick or analyze larger particles during cryo-EM processing? Additional biophysical characterization of particle size distribution - for example, using interferometric scattering microscopy (iSCAT)-could help clarify the oligomeric state of the complex in solution.

      Cryo-EM data were collected exclusively from the size-exclusion chromatography fraction eluting between 10.5 and 12 mL. This fraction was selected to isolate the dominant assembly in solution. Extensive 2D and 3D particle classification did not reveal distinct classes corresponding to smaller species or higher-order oligomeric assemblies. Instead, the vast majority of particles converged to a single, well-defined structure consistent with the 2 full-length + 2 half-domain stoichiometry.

      A minor subpopulation (~2%) exhibited increased flexibility in the N-terminal region of the two full-length subunits, but these particles did not form a separate oligomeric class, indicating conformational heterogeneity rather than alternative assembly states (Author response image 2). Together, these data support the 2+2½ architecture as the predominant and stable complex under the conditions used for cryo-EM. Additional techniques, such as iSCAT, would provide complementary information, but are not required to support the conclusions drawn from the SEC and cryo-EM analyses presented here.

      Author response image 2.

      The authors mention strict symmetry in the complex, yet C2 symmetry was enforced during refinement. While this is reasonable as an initial approach, it would strengthen the structural interpretation to relax the symmetry to C1 using the C2-refined map as a reference. This could reveal subtle asymmetries or domain-specific differences without sacrificing the overall quality of the reconstruction.

      We thank the reviewer for this thoughtful suggestion. In standard cryo-EM data processing, symmetry is typically not imposed initially to minimize potential model bias; accordingly, we first performed C1 refinement before applying C2 symmetry. The resulting C1 reconstructions revealed no detectable asymmetry or domain-specific differences relative to the C2 map. In addition, relaxing the symmetry consistently reduced overall resolution, indicating lower alignment accuracy and further supporting the presence of a predominantly symmetric assembly.

      In this context, the proposed catalytic role of Zn<sup>2+</sup> raises additional questions. Why is a 2:1 enzyme-to-metal stoichiometry observed, and how does this reconcile with previous reports? This point warrants discussion. Does this imply asymmetric catalysis within the complex? Would the stoichiometry change under Zn<sup>2+</sup>-saturating conditions, as no Zn<sup>2+</sup> appears to be added to the buffers? It would be helpful to clarify whether Zn<sup>2+</sup> occupancy is equivalent in both active sites when symmetry is not imposed, or whether partial occupancy is observed.

      The observed ~2:1 enzyme-to-Zn<sup>2+</sup> stoichiometry likely reflects the composition of the 2 full-length + 2 half-domain (2+2½) complex. In this assembly, only the core domains that are fully present in the complex contribute to metal binding. The truncated or half-domains lack the Zn<sup>2+</sup> binding domain. As a result, only two metal-binding sites are occupied per assembled complex, consistent with the measured stoichiometry.

      We note that Zn<sup>2+</sup> was not deliberately added to the buffers, so occupancy may not reflect full saturation. Based on our cryo-EM and biochemical data, both metal-binding sites in the full-length subunits appear to be occupied to an equivalent extent, and no clear evidence of asymmetric catalysis is observed under these current experimental conditions. Full Zn<sup>2+</sup> saturation could potentially increase occupancy, but was not explored in these experiments.

      The divalent ion Zn<sup>2+</sup> is suggested to activate water for the catalytic reaction. I am not sure if there is a need for a water molecule to explain this catalytic mechanism. Can you please elaborate on this more? As one aspect, it might be helpful to explain in more detail how Zn-OH and D220 are recovered in the last step before a new water molecule comes in.

      Thank you for your suggestion. We revised our text in page 2 as bellow.

      Based on our structural and biochemical data, we propose a structurally informed working model for TMAO turnover by TDM (Scheme 1). In this model, Zn<sup>2+</sup> plays a non-redox role by polarizing the O–H bond of the bound hydroxyl, thereby lowering its pK<sub>a</sub>. The D220 carboxylate functions as a general base, abstracting the proton to generate a hydroxide nucleophile. This hydroxide then attacks the electrophilic N-methyl carbon of TMAO, forming a tetrahedral carbinolamine (hemiaminal) intermediate. Subsequent heterolytic cleavage of the C–N bond leads to the release of HCHO. D220 then switches roles to act as a general acid, donating a proton to the departing nitrogen, which facilitates product release and regenerates the active site. This sequence allows a new water molecule to rebind Zn<sup>2+</sup>, enabling subsequent catalytic turnovers. This proposed pathway is consistent with prior mechanistic studies, in which water addition to the azomethine carbon of a cationic Schiff base generates a carbinolamine intermediate, followed by a rate-limiting breakdown to yield an amino alcohol and a carbonyl compound, in the published case, an aldehyde (Pihlaja et al., J. Chem. Soc. Perkin Trans. 2, 1983, 8, 1223–1226).

      Overall, the authors were successful in advancing our structural and functional understanding of the TDM complex. They suggest an interesting oligomeric complex composition which should be investigated with additional biophysical techniques.

      Additionally, they provide an intriguing hypothesis for a new type of substrate channeling. Additional kinetic experiments focusing on HCHO and THF turnover by enzymatic proximity effects would strengthen this potentially fundamental finding. If this channeling mechanism can be supported by stronger experimental evidence, it would substantially advance our understanding and knowledge of biologic conduits and enable future efforts in the design of artificial cascade catalysis systems with high conversion rate and efficiency, as well as detoxification pathways.

      Reviewer #2 (Public review):

      Summary:

      The manuscript reports a cryo-EM structure of TMAO demethylase from Paracoccus sp. This is an important enzyme in the metabolism of trimethylamine oxide (TMAO) and trimethylamine (TMA) in human gut microbiota, so new information about this enzyme would certainly be of interest.

      Strengths:

      The cryo-EM structure for this enzyme is new and provides new insights into the function of the different protein domains, and a channel for formaldehyde between the two domains.

      Weaknesses:

      (1) The proposed catalytic mechanism in this manuscript does not make sense. Previous mechanistic studies on the Methylocella silvestris TMAO demethylase (FEBS Journal 2016, 283, 3979-3993, reference 7) reported that, as well as a Zn2+ cofactor, there was a dependence upon non-heme Fe<sup>2+</sup>, and proposed a catalytic mechanism involving deoxygenation to form TMA and an iron(IV)-oxo species, followed by oxidative demethylation to form DMA and formaldehyde.

      In this work, the authors do not mention the previously proposed mechanism, but instead say that elemental analysis "excluded iron". This is alarming, since the previous work has a key role for non-heme iron in the mechanism. The elemental analysis here gives a Zn content of about 0.5 mol/mol protein (and no Fe), whereas the Methylocella TMAO demethylase was reported to contain 0.97 mol Zn/mol protein, and 0.35-0.38 mol Fe/mol protein. It does, therefore, appear that their enzyme is depleted in Zn, and the absence of Fe impacts the mechanism, as explained below.

      The proposed catalytic mechanism in this manuscript, I am sorry to say, does not make sense to me, for several reasons:

      (i) Demethylation to form formaldehyde is not a hydrolytic process; it is an oxidative process (normally accomplished by either cytochrome P450 or non-heme iron-dependent oxygenase). The authors propose that a zinc (II) hydroxide attacks the methyl group, which is unprecedented, and even if it were possible, would generate methanol, not formaldehyde.

      (ii) The amine oxide is then proposed to deoxygenate, with hydroxide appearing on the Zn - unfortunately, amine oxide deoxygenation is a reductive process, for which a reducing agent is needed, and Zn2+ is not a redox-active metal ion;

      (iii) The authors say "forming a tetrahedral intermediate, as described for metalloproteinase", but zinc metalloproteases attack an amide carbonyl to form an oxyanion intermediate, whereas in this mechanism, there is no carbonyl to attack, so this statement is just wrong.

      So on several counts, the proposed mechanism cannot be correct. Some redox cofactor is needed in order to carry out amine oxide deoxygenation, and Zn<sup>2+</sup>cannot fulfil that role. Fe<sup>2+</sup> could do, which is why the previously proposed mechanism involving an iron(IV)-oxo intermediate is feasible. But the authors claim that their enzyme has no Fe. If so, then there must be some other redox cofactor present. Therefore, the authors need to re-analyse their enzyme carefully and look either for Fe or for some other redox-active metal ion, and then provide convincing experimental evidence for a feasible catalytic mechanism. As it stands, the proposed catalytic mechanism is unacceptable.

      We thank the reviewer for the detailed and thoughtful mechanistic critique. We fully agree that Zn<sup>2+</sup> is not redox-active, and cannot directly mediate oxidative demethylation or amine oxide deoxygenation. We acknowledge that the oxidative step required for the conversion of TMAO to HCHO is not explicitly resolved in the present study. Accordingly, we have revised the manuscript to remove any implication of Zn<sup>2+</sup>-mediated redox chemistry, and have eliminated the previously imprecise analogy to zinc metalloproteases.

      We recognize and now discuss prior biochemical work on TMAO demethylase from Methylocella silvestris (MsTDM), which proposed an iron-dependent oxidative mechanism (Zhu et al., FEBS 2016, 3979–3993). That study reported approximately one Zn<sup>2+</sup> and one non-heme Fe<sup>2+</sup> per active enzyme, implicated iron in catalysis through homology modeling and mutagenesis, and used crossover experiments suggesting a trimethylamine-like intermediate and oxygen transfer from TMAO, consistent with an Fe-dependent redox process. However, that system lacked experimental structural information, and did not define discrete metal-binding sites.

      In contrast,

      (1) Our high-resolution cryo-EM structures and metal analyses of TDM consistently reveal only a single, well-defined Zn<sup>2+</sup>-binding site, with no structural evidence for an additional iron-binding site as in the previous report (Zhu et al., FEBS 2016, 3979–3993).

      (2) To investigate the potential involvement of iron, we expressed TDM in LB medium supplemented with Fe(NH<sub>4</sub>)<sub>2</sub>SO<sub>4</sub> and determined its cryo-EM structure. This structure is identical to the original one, and no EM density corresponding to a second iron ion was observed. Moreover, the previously proposed Fe<sup>2+</sup>-binding residues are spatially distant (Figure S6).

      (3) ICP-MS analysis shows undetectable Iron, and only Zinc ion (Figure S5).

      (4) Our enzyme kinetics analysis with the TDM without Iron is comparable to that of from MsTDM (Figure 1A). The differences in Km and Vmax we propose is due to the difference in the overall sequence of the enzymes. Please also see comment at the end on a new published paper on MsTDM.

      While we cannot comment on the MsTDM results, our ‘experimental’ results do not support the presence of an iron-binding site. Our data indicate that this chemistry is unlikely to be mediated by a canonical non-heme iron center as proposed for MsTDM. We therefore revised our model as a structural framework that rationalizes substrate binding, metal coordination, and product stabilization, while clearly delineating the limits of mechanistic inference supported by the current data.

      The scheme 1 and proposal mechanism section were revised in page 4. Figure S6 was added.

      (2) Given the metal content reported here, it is important to be able to compare the specific activity of the enzyme reported here with earlier preparations. The authors do quote a Vmax of 16.52 µM/min/mg; however, these are incorrect units for Vmax, they should be µmol/min/mg. There is a further inconsistency between the text saying µM/min/mg and the Figure saying µM/min/µg.

      Thank you for the correction. We converted the V<sub>max</sub> unit to nmol/min/mg. and revised the text in page 2. We also compared with the value of the previous report in the TDM enzyme by revising the text on page 2. See also the note on a newly published manuscript and its comparison.

      (3) The consumption of formaldehyde to form methylene-THF is potentially interesting, but the authors say "HCHO levels decreased in the presence of THF", which could potentially be due to enzyme inhibition by THF. Is there evidence that this is a time-dependent and protein-dependent reaction? Also in Figure 1C, HCHO reduction (%) is not very helpful, because we don't know what concentration of formaldehyde is formed under these conditions; it would be better to quote in units of concentration, rather than %.

      We appreciate this important point. We have revised Figure 1C to present HCHO levels in absolute concentration units. While the current data demonstrate reduced detectable HCHO in the presence of THF, we agree that distinguishing between HCHO consumption and potential THF-mediated enzyme inhibition would require dedicated time-course and protein-dependence experiments. We have therefore revised the description to avoid overinterpretation and limit our conclusions to the observed changes in HCHO concentration in page 2, line 18-19.

      (4) Has this particular TMAO demethylase been reported before? It's not clear which Paracoccus strain the enzyme is from; the Experimental Section just says "Paracoccus sp.", which is not very precise. There has been published work on the Paracoccus PS1 enzyme; is that the strain used? Details about the strain are needed, and the accession for the protein sequence.

      Thank you for this comment. We now indicate that the enzyme is derived from Paracoccus sp. DMF and provide the accession number for the protein sequence (WP_263566861) in the Experimental Section (page 8, line 4).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The ITC experiment requires a ligand-into-buffer titration as an additional control. Also, maybe I misunderstood the molar ratio or the concentrations you used, but if you indeed added a total of 4.75 μL of 20 μM THF into 250 μL of 5 μM TDM, it is not clear to me how this leads to a final molar ratio of 3.

      We thank the reviewer for this suggestion. A ligand-into-buffer control ITC experiment was performed and is now included in Figure S8C, which shows no realizable signal.

      Regarding the molar ratio, it is our mistake. The experiment used 2.45 μL injections of 80 μM THF into 250 μL of 5 μM TDM. This corresponds to a final ligand concentration of ~12.8 μM, giving a ligand-to-protein molar ratio of ~2.6. We revised our text in page 9, ITC section.

      (2) Characterization/quality check of all mutant enzymes should be performed by NanoDSF, CD spectroscopy or similar techniques to confirm that proteins are properly folded and fit for kinetic testing.

      We appreciate the reviewer’s suggestion. All mutant proteins, including D220A, D367A, and F327A, were purified with yields similar to the wild-type enzyme. Additionally, cryo-EM maps of the mutants show well-defined density and overall structural integrity consistent with the wild-type. These findings indicate that the introduced mutations do not significantly affect protein folding, supporting their use for kinetic analysis. While NanoDSF might reveal differences in thermal stability due to mutations, it does not provide structural information. Our conclusions are not based on minor differences in thermostability. Our cryo-EM structures of the mutants offer much more reliable structural data than CD spectroscopy.

      (3) Best practice would suggest overlapping pH ranges between different buffer systems in the pH-dependence experiments to rule out buffer-specific effects independent of pH.

      We thank the reviewer for this helpful suggestion. We agree that overlapping pH ranges between different buffer systems can be valuable for excluding buffer-specific effects. In this study, the pH-dependence experiments were intended to provide a qualitative assessment of pH sensitivity rather than a detailed analysis of buffer-independent pKa values. While we cannot fully exclude minor buffer-specific contributions, the overall trends observed were reproducible and sufficient to support the conclusions drawn. We have added a clarifying statement to the revised manuscript to reflect this consideration, page 2, line 12.

      (4) Structural comparison revealed high similarity to a THF-binding protein, with superposition onto a T protein.": It would be nice to show this as an additional figure, as resolution and occupancy for THF are low.

      We thank the reviewer for this suggestion. To address this point, we have revised Figure S6 by adding an additional panel (C, now is Figure S7C) showing the structural superposition of TDM with the THF-binding T protein. This comparison is included to better illustrate the structural similarity, despite the limited resolution and partial occupancy of THF density in our map.

      (5) Editing could have been done more thoroughly. Some spelling mistakes, e.g. "RESEULTS", "redius", "complec"; kinetic rate constants should be written in italic (not uniform between text and figures); Prism version is missing; Vmax of 16.52 µM/min/mg - doublecheck units; Figure S1B: The "arrow on the right" might have gone missing.

      We corrected the spelling in page 2 ~ line 10, page 5 ~ line 34, page 6 ~ line40. Prism version was added. The arrow was added into figure S1B. The Vmax unit is corrected to nmol/min/mg.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors must re-examine the metal content of their purified enzyme, looking in particular for Fe or another redox-active metal ion, which could be involved in a reasonable catalytic mechanism.

      We thank the reviewer for this suggestion and have carefully re-examined the metal content of TDM. Elemental analyses by EDX and ICP-MS consistently detected Zn<sup>2+</sup> in purified TDM (Zn:protein ≈ 1:2), whereas Fe was below the detection limit across multiple independent preparations (Fig. S5A,B). To assess whether iron could be incorporated or play a functional role, we expressed TDM in E. coli grown in LB medium supplemented with Fe(NH<sub>4</sub>SO<sub>4</sub>)<sub>2</sub> and performed activity assays in the presence of exogenous Fe<sup>2+</sup>. Neither condition resulted in enhanced enzymatic activity.

      Consistent with these biochemical data, all cryo-EM structures reveal a single, well-defined metal-binding site coordinated by three conserved cysteine residues and occupied by Zn<sup>2+</sup>, with no evidence for an additional iron species or other redox-active metal site.

      (2) The specific activity of the enzyme should be quoted in the same units as other literature papers, so that the enzyme activity can be compared. It could be, for example, that the content of Fe (or other redox-active metal) is low, and that could then give rise to a low specific activity.

      Thank you for the suggestion, we quoted the enzyme units as similar with previous report. and revised the text in in page 2.

      Since the submission of our paper a new report on MsTDM has been published (Cappa et al., Protein Science 33(11), e70364). It further supports our findings. First, the reported kinetic parameters using ITC (Vmax = 0.309 μmol/s, approximately 240 nmol/min/mg; Km = 0.866 mM) are comparable to our observed (156 nmol/min/mg and 1.33 mM, respectively) in the absence of exogenous iron. Second, the optimal pH for enzymatic activity similar to that observed in our paraTDM. Third, the reported two-state unfolding behavior is consistent with our cryo-EM structural observations, in which the more dynamic subunits appear to destabilize prior to unfolding of the core domains. Based on these findings, we now propose that Zn<sup>2+</sup> appears to function primarily as an organizational cofactor at the core catalytic domain (revised Scheme 1).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript describes critical intermediate reaction steps of a HA synthase at the molecular level; specifically, it examines the 2nd step, polymerization, adding GlcA to GlcNAc to form the initial disaccharide of the repeating HA structure. Unlike the vast majority of known glycosyltransferases, the viral HAS (a convenient proxy extrapolated to resemble the vertebrate forms) uses a single pocket to catalyze both monosaccharide transfer steps. The authors' work illustrates the interactions needed to bind & proof-read the UDP-GlcA using direct and '2nd layer' amino acid residues. This step also allows the HAS to distinguish the two UDP-sugars; this is very important as the enzymes are not known or observed to make homopolymers of only GlcA or GlcNAc, but only make the HA disaccharide repeats GlcNAc-GlcA.

      Strengths:

      Overall, the strengths of this paper lie in its techniques & analysis.

      The authors make significant leaps forward towards understanding this process using a variety of tools and comparisons of wild-type & mutant enzymes. The work is well presented overall with respect to the text and illustrations (especially the 3D representations), and the robustness of the analyses & statistics is also noteworthy.

      Furthermore, the authors make some strides towards creating novel sugar polymers using alternative primers & work with detergent binding to the HAS. The authors tested a wide variety of monosaccharides and several disaccharides for primer activity and observed that GlcA could be added to cellobiose and chitobiose, which are moderately close structural analogs to HA disaccharides. Did the authors also test the readily available HA tetramer (HA4, [GlcA-GlcNAc]2) as a primer in their system? This is a highly recommended experiment; if it works, then this molecule may also be useful for cryo-EM studies of CvHAS as well.

      The reviewer requested testing whether an HA tetratsaccharide could also serve as an glycosyl transfer acceptor for HAS. The commerically available HA tetrasaccharide (HA4) is terminated at its non-reducing end by GlcA, therein we proceeded to measure its effect on UDP-GlcNAc turnover kientics. Titration of HA4 failed to elicit any detectable change in UDP-GlcNAc turnover rate, indicating no priming. This is now mentioned in the main text and the data is shown in Fig. S9.

      Weaknesses:

      In the past, another report describing the failed attempt of elongating short primers (HA4 & chitin oligosaccharides larger than the cello- or chitobiose that have activity in this report) with a vertebrate HAS, XlHAS1, an enzyme that seems to behave like the CvHAS ( https://pubmed.ncbi.nlm.nih.gov/10473619/); this work should probably be cited and briefly discussed. It may be that the longer primers in the 1999 paper and/or the different construct or isolation specifics (detergent extract vs crude) were not conducive to the extension reaction, as the authors extracted recombinant enzyme.

      We apologize for the oversight. This reference is now cited (ref. 18) together with the description of the failed elongation of HA4 by CvHAS.

      There are a few areas that should be addressed for clarity and correctness, especially defining the class of HAS studied here (Class I-NR) as the results may (Class I-R) or may not (Class II) align (see comment (a) below), but overall, a very nicely done body of work that will significantly enhance understanding in the field.

      Done as requested

      Reviewer #2 (Public review):

      Summary:

      The paper by Stephens and co-workers provides important mechanistic insight into how hyaluronan synthase (HAS) coordinates alternating GlcNAc and GlcA incorporation using a single Type-I catalytic centre. Through cryo-EM structures capturing both "proofreading" and fully "inserted" binding poses of UDP-GlcA, combined with detailed biochemical analysis, the authors show how the enzyme selectively recognizes the GlcA carboxylate, stabilizes substrates through conformational gating, and requires a priming GlcNAc for productive turnover.

      These findings clarify how one active site can manage two chemically distinct donor sugars while simultaneously coupling catalysis to polymer translocation.

      The work also reports a DDM-bound, detergent-inhibited conformation that possibly illuminates features of the acceptor pocket, although this appears to be a purification artefact (it is indeed inhibitory) rather than a relevant biological state.

      Overall, the study convincingly establishes a unified catalytic mechanism for Type-I HAS enzymes and represents a significant advance in understanding HA biosynthesis at the molecular level.

      Strengths:

      There are many strengths.

      This is a multi-disciplinary study with very high-quality cryo-EM and enzyme kinetics (backed up with orthogonal methods of product analysis) to justify the conclusions discussed above.

      Weaknesses:

      There are few weaknesses.

      The abstract and introduction assume a lot of detailed prior knowledge about hyaluronan synthases, and in doing so, risk lessening the readership pool.

      A lot of discussion focuses on detergents (whose presence is totally inhibitory) and transfer to non-biological acceptors (at high concentrations). This risks weakening the manuscript.

      The abstract and parts of the introduction have been revised to address the reviewer’s concerns.

      Reviewer #1 (Recommendations for the authors):

      (1) As noted above, please state in title, abstract & introduction that this work is focused on a "Class I-NR HAS" (as described in Ref. #4), and NOT all HAS families...this is truly essential to note as someone working with the Pasteurella HAS version (Class II) would be totally misled & at this point, no one knows the Streptococcus HAS (Class-IR) mechanistic details which could be different due to its inverse molecular directionality of elongation compared to the CvHAS Class I-NR enzyme.

      Done as requested.

      (2) Page 6 - for the usefulness of the HAS mutants as being folded correctly, it was stated these mutants are suitable since they all 'purify' similarly...the use of the more proper term should probably be 'chromatograph', similarly suggesting similar hydrodynamic radii without massive folding issues.

      This has been revised to state that they all exhibited comparable size exclusion chromatography profiles.

      “All mutants share similar size exclusion chromatography profiles with the WT enzyme, suggesting that the substitutions do not cause a folding defect (Fig. S3).”

      (3) Page 7 - please check these sentences (& rest of paragraph?) as the meaning is not clear. "First, UDP-GlcNAc was titrated in the presence of excess UDP-GlcA, resulting in a response similar to the acceptor-free condition (Fig. 2C). However, the maximum reaction velocity at 20 mM UDP-GlcNAc was approximately 25% lower than that measured in the presence of UDP-GlcNAc only (Fig. 2C)."

      The paragraph has been revised to avoid confusion.

      (4) In Methods, please use an italicized 'g' for the centrifugation steps globally.

      Changed as requested

      (5) Please note the source/vendor for the HA standards on gels.

      Done

      (6) Page 35 - TLC section.

      (a) 'n-butanol' (with italic n) is the most widespread chemical name (not butan-1-ol).

      Done

      (b) Also, for all of the TLC images, the origin and the solvent front should be marked.

      Changed as suggested.

      Reviewer #2 (Recommendations for the authors):

      A number of minor issues should be addressed.

      (1) Abstract

      Two comments on the Abstract, which I found surprisingly weak given the quality of the work, and lacking a key detail.

      A major conceptual contribution of this work is the demonstration of how a single Type-I catalytic centre discriminates, positions, and transfers two chemically distinct substrates in an alternating pattern. This distinguishes HAS from dual-active-site (Type-II) glycosyltransferases and is important for understanding HA polymerization.

      However, this central point is not clearly articulated in the abstract. I suggest explicitly stating that HAS performs both GlcNAc and GlcA transfer reactions within a single catalytic site, and that the proofreading/inserted poses illuminate how this multifunctionality is achieved.

      The abstract currently ends with the observation of a DDM-bound, detergent-inhibited state. While this is interesting, it absolutely does not represent the central conceptual advance of the study and gives the abstract an artefactual ending.

      I strongly recommend revising the final sentences to emphasize the broader mechanistic insight and not an "artefact" (indeed, the enzyme is inactive in the presence of this detergent; it is thus a very unusual way to conclude an abstract).

      That is, finish with the wider implications of how HAS coordinates alternating substrate use, proofreading, and polymer translocation. Ending on the main mechanistic or biological significance would make the abstract considerably stronger and more aligned with the main message of the paper.

      The abstract has been revised thoroughly to reflect the important insights gained on CvHAS’ catalytic function and HA biogenesis in general.

      (2) Introduction

      The distinction between single active-centre enzymes, which transfer both sugars alternately, and twin catalytic domain enzymes that each perform one addition is surely central to the whole paper. But it is not discussed. Surely this has to be covered. There is a lot of work in this space, including, but not limited to:

      https://doi.org/10.1093/glycob/cwg085

      https://doi.org/10.1093/glycob/10.9.883

      https://doi.org/10.1093/glycob/cwad075 (includes this author team)

      Originally back to https://doi.org/10.1021/bi990270y

      If the authors instead assume such a level of knowledge for the reader, then surely they are writing for a specialist audience, not consistent with the wider readership ambitions of eLife?

      The Introduction has been revised as suggested by the reviewer, providing necessary background to frame our description of the Chlorella virus HAS. We made a deliberate effort to put new insights into a broader context.

      (3) Results and Discussion

      DDM "was observed for >50% of the analysed particles". I struggled with this. I couldn't understand how the authors selected particles that did or did not contain DDM. The main body text states: "To our surprise, careful sorting of the UDP-GlcA supplemented cryo EM dataset revealed a CvHAS subpopulation that was not bound to the substrate, but, instead, a DDM molecule near the active site (Fig 3A and S7). This was observed for >50% of the analyzed particles."

      That reads like there is one sample with two populations. But the figures and the methods section suggest differently: they suggest two samples with different data-collection regimes. That does not match the main text. Could this be clarified?

      Yes, that wasn’t explained well. We clarified the text to stress that the DDM-bound sample came from a dataset that was intended to resolve an UDP-GlcA-bound state, but instead revealed the inhibition by DDM.

      Also in this space, in the modern world, "nominal magnification" has no real meaning, and calibrated pixel size would be more appropriate. Can this be given, please?

      The relevant Methods section now states: “imaging of … was performed at a calibrated pixel size of 0.652 Å”.

      The discovery of DDM in the active site is surprising. But it is an inhibitory artefact. Is this section pushed a little too hard? Also, "The coordination of DDM's maltoside moiety, an αlinked glucose disaccharide, is consistent with priming by cellobiose and chitobiose." I'm not sure why an α-linked maltose is consistent with the binding of a β-linked cellobiose. That makes no sense. There will be no other enzymes where starch and cellulose oligos are mutually accepted. Consider rewriting.

      We like to stress the DDM coordination because it could lead to the development of compounds that can really function as inhibitors, either for HAS or other related enzymes. In the observed DDM binding pose, the alpha-linkage is not recognized. Instead, the reducing end glucosyl unit stacks against Trp342 while the non-reducing unit extends into the catalytic pocket. Hence, a similar binding pose is conceivable for cellobiose and potentially also for chitobiose. The relevant section has been reworded.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Review:

      Reviewer #1 (Public review):

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult and better understanding how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediate EWS-FLI cis activity (at msat) and to generate the formation of specific TADs. Furthermore, cells expressing DBD deficient EWS-FLI display very poor colony forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      This new version of the study comprises as requested new data from an additional cell line. The new data has strengthened the manuscript. Nevertheless, some of the arguments of the authors pertaining to the limitations of immunoblots to assess stability of the DBD constructs or the poor reproducibility of the Micro C data remain problematic. While the effort to repeat MicroC in a different cell line is appreciated, the data are as heterogeneous as those in A673 and no real conclusion can be drawn. The authors should tone down their conclusions. If DBD has a strong effect on chromatin organization, it should be reproducible and detectable. The transcriptomic and cut and tag data are more consistent and provide robust evidence for their findings at these levels. 

      We agree that the Micro-C data have more apparent heterogeneity within and across cell lines as compared to other analyses such as our included CUT&Tag and RNA-seq. We addressed the possible limitations of the technique as well as inherent biology that might be driving these findings in our previous responses. Despite the poor clustering on the PCA plots, our analysis on differential interacting regions, TADs and loops remain consistent across both cell lines. We are confident that these findings reflect the context of transcriptional regulation by the constructs, therefore the role of the alpha-helix in modulating chromatin organization. To address the concerns raised by the editors and reviewers for the strength of the conclusions we drew from the Micro-C findings we have made changes to the language used to describe them throughout the manuscript. Find these changes outlined below.

      • On lines 70-71, "is required to restructure" was changed to "is implicated in restructuring of"

      • On line 91, "is required for" was changed to "participates in"

      • On line 98, "is required for" changed to "is potentially required for"

      • On line 360-361, "is required for restructuring" changed to "participates in restructuring"

      Concerning the issue of stability of the DBD and DBD+ constructs, a simple protein half-life assay (e.g. cycloheximide chase assay) could rule out any bias here and satisfactorily address the issue.

      While we generally agree that a cycloheximide assay is a relatively simple approach to look at protein half-life, as we discussed last me the assays included in this paper are performed at equilibrium and rely on the concentration of protein at the me of the assay. This is particularly true for assays involving crosslinking, like Micro-C. As discussed in our prior response, western blots are semi quantitative at best, even when normalized to a housekeeping protein. In analyzing the relative protein concentration of DBD vs. DBD+ with relative protein intensities first normalized to tubulin and using the wildtype EWSR1::FLI1 rescue as a reference point, we find that there is no statistical difference in the samples used for micro-C here (Author responseimage 1A) or across all of the samples that we have used for publication (Author response image 1B). This does show that DBD generally has more variable expression levels relative to wildtype EWSR1::FLI1, and this is consistent with our experience in the lab.

      Nonetheless, we did attempt to perform the requested cycloheximide chase experiment to determine protein stability. Unfortunately, despite an extensive number of troubleshooting attempts, we have not been able to get good expression of DBD for these experiments. The first author who performed this work has left the lab and we have moved to a new lab space since the benchwork was performed. We continue to try to troubleshoot to get this experimental system for DBD and DBD+ to work again. When we tried to look at stability of DBD+ following cycloheximide treatment, there did appear to be some difference in protein stability (Author response image 2). However, these conditions are not the same conditions as those we published, they do not meet our quality control standards for publication, and we are concerned about being close to the limit of detection for DBD throughout the later timepoints. Additional studies will be needed with more comparable expression levels between DBD and DBD+ to satisfactorily address the reviewer concerns.

      Author response image 1.

      Expression Levels of DBD and DBD+ Across Experiments. Expression levels of DBD and DBD+ protein based on western blot band intensity normalized by tubulin band intensity. Expression levels are relative to wildtype EWSR1::FLI1 rescue levels and are calculated for (A) A673 samples used for micro-C and (B) all published studies of DBD and DBD+. P-values were calculated with an unpaired t-test.

      Author response image 2.

      CHX chase assay to determine the stability of DBD and DBD+. (A) Knock-down of endogenous EWSR1::FLI1 detected with FLI1 ab and rescue with DBD and DBD+ detected with FLAG ab. (B) CHX chase assay to determine the stability of DBD and DBD+ in A-673 cells with quantification of the protein levels (n=3). Error bars represent standard deviation. The half-lives (t1/2) of DBD and DBD+ were listed in the table.

      Suggestions:

      The Reviewing Editor and a referee have considered the revised version and the responses of the referees. While the additional data included in the new version has consolidated many conclusions of the study, the MicroC data in the new cell line are also heterogeneous and as the authors argue, this may be an inherent limitation of the technique. In this situation, the best would be for the authors to avoid drawing robust conclusions from this data and to acknowledge its current limitations.

      As discussed above, we have changed the language regarding our conclusions from micro-C data to soften the conclusions we draw per the Editor’s suggestion.

      The referee and Reviewing Editor also felt that the arguments of the authors concerning a lack of firm conclusions on the stability of EWS-FLI1 under +/-DBD conditions could be better addressed. We would urge the authors to perform a cycloheximide chase type assay to assess protein half-life. These types of experiments are relatively simple to perform and should address this issue in a satisfactory manner.

      As discussed above, we do not feel that differences in protein stability would affect the results here because the assays performed required similar levels of protein at equilibrium. Our additional analyses in this response shows that there are not significant differences between DBD and DBD+ levels in samples that pass quality control and are used in published studies. However, we attempted to address the reviewer and editor comments with a cycloheximide chase assay and were unable to get samples that would have passed our internal quality control standards. These data may suggest differences in protein stability, but it is unclear that these conditions accurately reflect the conditions of the published experiments, or that this would matter with equal protein levels at equilibrium.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study by Howe and colleagues investigates the role of the posterolateral cortical amygdala (plCoA) in mediating innate responses to odors, specifically attraction and aversion. By combining optogenetic stimulation, single-cell RNA sequencing, and spatial analysis, the authors identify a topographically organized circuit within plCoA that governs these behaviors. They show that specific glutamatergic neurons in the anterior and posterior regions of plCoA are responsible for driving attraction and avoidance, respectively, and that these neurons project to distinct downstream regions, including the medial amygdala and nucleus accumbens, to control these responses.

      Strengths:

      The major strength of the study is the thoroughness of the experimental approach, which combines advanced techniques in neural manipulation and mapping with high-resolution molecular profiling. The identification of a topographically organized circuit in plCoA and the connection between molecularly defined populations and distinct behaviors is a notable contribution to understanding the neural basis of innate motivational responses. Additionally, the use of functional manipulations adds depth to the findings, offering valuable insights into the functionality of specific neuronal populations.

      Weaknesses:

      There are some weaknesses in the study's methods and interpretation. The lack of clarity regarding the behavior of the mice during head-fixed imaging experiments raises the possibility that restricted behavior could explain the absence of valence encoding at the population level.

      We agree with idea that head-fixation may alter the state of the animal and the neural encoding of odor. To address this, we have provided further analysis of walking behavior during the imaging sessions, which is provided in Figure S2. Overall, we could not identify any clear patterns in locomotor behavior that are odor-specific. Moreover, when neural activity was sorted depending on the behavioral state (walking, pausing or fleeing) we didn’t observe any apparent patterns in odor-evoked neural activity. This is now discussed in the Results and Limitations sections of the manuscript.

      Furthermore, while the authors employ chemogenetic inhibition of specific pathways, the rationale for this choice over optogenetic inhibition is not fully addressed, and this could potentially affect the interpretation of the results.

      The rationale was logistical. First, inhibition of over a timescale of minutes is problematic with heat generation during prolonged optical stimulation. Second, our behavioral apparatus has a narrow height between the ceiling and floor, making tethering difficult. This is now explained the results section. The trade-off of using chemogenetics is that we are silencing neurons and not specific projections. However, because we find that NAc- and MeA- projecting neurons have little shared collateralization, we believe the conclusion of divergent pathways still stands. This is now discussed in the Limitations section.

      Additionally, the choice of the mplCoA for manipulation, rather than the more directly implicated anterior and posterior subregions, is not well-explained, which could undermine the conclusions drawn about the topographic organization of plCoA.

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intra plCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have now clarified this in the text.

      Despite these concerns, the work provides significant insights into the neural circuits underlying innate behaviors and opens new avenues for further research. The findings are particularly relevant for understanding the neural basis of motivational behaviors in response to sensory stimuli, and the methods used could be valuable for researchers studying similar circuits in other brain regions. If the authors address the methodological issues raised, this work could have a substantial impact on the field, contributing to both basic neuroscience and translational research on the neural control of behavior.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by the Root laboratory and colleagues describes how the posterolateral cortical amygdala (plCoA) generates valenced behaviors. Using a suite of methods, the authors demonstrate that valence encoding is mediated by several factors, including spatial localization of neurons within the plCoA, glutamatergic markers, and projection. The manuscript shows convincingly that multiple features (spatial, genetic, and projection) contribute to overall population encoding of valence. Overall, the authors conduct many challenging experiments, each of which contains the relevant controls, and the results are interpreted within the framework of their experiments.

      Strengths:

      - For a first submission the manuscript is well constructed, containing lots of data sets and clearly presented, in spite of the abundance of experimental results.

      - The authors should be commended for their rigorous anatomical characterizations and posthoc analysis. In the field of circuit neuroscience, this is rarely done so carefully, and when it is, often new insights are gleaned as is the case in the current manuscript.

      - The combination of molecular markers, behavioral readouts and projection mapping together substantially strengthen the results.

      - The focus on this relatively understudied brain region in the context is valence is well appreciated, exciting and novel.

      Weaknesses:

      - Interpretation of calcium imaging data is very limited and requires additional analysis and behavioral responses specific to odors should be considered. If there are neural responses behavioral epochs and responses to those neuronal responses should be displayed and analyzed.

      We have now considered this, see response above.

      - The effect of odor habituation is not considered.

      We considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Optogenetic data in the two subregions relies on very careful viral spread and fiber placement. The current anatomy results provided should be clear about the spread of virus in A-P, and D-V axis, providing coordinates for this, to ensure readers the specificity of each sub-zone is real.

      We were careful to exclude animals for improper targeting. The spread of virus is detailed in Figures S3, S8 & S9.

      - The choice of behavioral assays across the two regions doesn't seem balanced and would benefit from more congruency.

      The choice of the 4-quadrant assay was used because this study builds off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. Moreover, the approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. We did examine other non-olfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways.

      - Rationale for some of the choices of photo-stimulation experiment parameters isn't well defined.

      The parameters for photo-stimulation were based on those used in our past work (Root et al., 2014). We used a gradient of frequency from 1-10 Hz based on the idea that odor likely exists in a gradient and this was meant to mimic a potential gradient, though we don’t know if it exists. The range in stimulation frequencies appears to align with the actual rate of firing of plCoA neurons (Iurilli et al., 2017).

      Reviewer #3 (Public review):

      Summary:

      Combining electrophysiological recording, circuit tracing, single cell RNAseq, and optogenetic and chemogenetic manipulation, Howe and colleagues have identified a graded division between anterior and posterior plCoA and determined the molecular characteristics that distinguish the neurons in this part of the amygdala. They demonstrate that the expression of slc17a6 is mostly restricted to the anterior plCoA whereas slc17a7 is more broadly expressed. Through both anterograde and retrograde tracing experiments, they demonstrate that the anterior plCoA neurons preferentially projected to the MEA whereas those in the posterior plCoA preferentially innervated the nucleus accumbens. Interestingly, optogenetic activation of the aplCoA drives avoidance in a spatial preference assay whereas activating the pplCoA leads to preference. The data support a model that spatially segregated and molecularly defined populations of neurons and their projection targets carry valence specific information for the odors. The discoveries represent a conceptual advance in understanding plCoA function and innate valence coding in the olfactory system.

      Strengths:

      The strongest evidence supporting the model comes from single cell RNASeq, genetically facilitated anterograde and retrograde circuit tracing, and optogenetic stimulation. The evidence clear demonstrates two molecularly defined cell populations with differential projection targets. Stimulating the two populations produced opposite behavioral responses.

      Weaknesses:

      There are a couple of inconsistencies that may be addressed by additional experiments and careful interpretation of the data.

      Stimulating aplCoA or slc17a6 neurons results in spatial avoidance, and stimulating pplCoA or slc17a7 neurons drives approach behaviors. On the other hand, the authors and others in the field also show that there is no apparent spatial bias in odor-driven responses associated with odor valence. This discrepancy may be addressed better. A possibility is that odor-evoked responses are recorded from populations outside of those defined by slc17a6/a7. This may be addressed by marking activated cells and identifying their molecular markers. A second possibility is that optogenetic stimulation activates a broad set of neurons that and does not recapitulate the sparseness of odor responses. It is not known whether sparsely activation by optogenetic stimulation can still drive approach of avoidance behaviors.

      We agree that marking specific genetic or projection defined neurons could help to clarify if there are some neurons have more selective valence responses. However, we are not able to perform these experiments at the moment. We have included new data demonstrating that sparser optogenetic activation evokes behaviors similar in magnitude as the broader activation (see Figure S4).

      The authors show that inhibiting slc17a7 neurons blocks approaching behaviors toward 2-PE. Consistent with this result, inhibiting NAc projection neurons also inhibits approach responses. However, inhibiting aplCOA or slc17a6 neurons does not reduce aversive response to TMT, but blocking MEA projection neurons does. The latter two pieces of evidence are not consistent with each other. One possibility is that the MEA projecting neurons may not be expressing slc17a6. It is not clear that the retrogradely labeling experiments what percentage of MEA- and NACprojecting neurons express slc17a6 and slc17a7. It is possible that neurons expressing neither VGluT1 nor VGluT2 could drive aversive or appetitive responses. This possibility may also explain that silencing slc17a6 neurons does not block avoidance.

      We have now performed RNAscope staining on retrograde tracing to better define this relationship. Although the VGluT1 and VGluT2 neurons have biased projections to the MeA and NAc, respectively, there is some nuance detailed in Figure S10. Generally, MeA projecting neurons are predominately VGluT2+, whereas NAc projecting have about 20% that express both. Some (less than 35%) retrogradely labeled neurons were not detected as VGluT1 or VGluT2 positive, suggesting that other populations could also contribute. We agree that the discrepancy between MeA-projection and VGluT2 silencing is likely due to incomplete targeting of the MeA-projecting population with the VGluT2-cre line. This is included in the Discussion section.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main:

      (1) For the head-fixed imaging experiments, what is the behavior of the mice during odor exposure? Could the weak reliability of individual neurons be due to a lack of approach or avoidance behavior? Could restricted behavior also explain the lack of valence encoding at the population level?

      We agree that this is a limitation of head-fixed recordings. In the revised manuscript we did attempt to characterize their behavioral response, and look for correlations in odor representation. Although we did find different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). For example, one might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors as all odors evoked a mixture of responses (Figure S2A-D, text lines 208-232). We then examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any apparent patterns in odor responses (Figure S2E,F). Lastly, we acknowledge in the text that the lack of valence encoding may be an artifact of head-fixation (see lines 849-857).

      (2) For the optogenetic manipulations of Vglut1 and Vglut2 neurons, why was the injection and fiber targeted to the medial portion of the plCoA, if the hypothesis was that these glutamatergic neuron populations in different regions (anterior or posterior) are responsible for approach and avoidance? 

      We targeted the middle region of plCoA because it contains a mixture of cell types found in both the anterior and posterior plCoA, allowing us to test the hypothesis that cell types, not intraplCoA location, elicit different responses. Had we targeted the anterior or posterior regions, we would expect to simply recapitulate the result from activation of random cells in each region. As a result, we think stimulation in the middle plCoA is a better test for the contribution of cell types. We have clarified this in the text (Lines 417-419).

      Could this explain the lack of necessity with the DREADD experiments? 

      For the loss of function experiments, a larger volume of virus was injected to cover a larger area and we did confirm targeting of the appropriate areas. Though, it is always possible that the lack of necessity is due to incomplete silencing.

      Further, why was an optogenetic inhibition approach not utilized? 

      Although optogenetic inhibition could have plausibly been used instead, we chose chemogenetic inhibition for two reasons: First, for minutes-long periods of inhibition, optical illumination poses the risk of introducing heat related effects (Owen et al., 2019). In fact, we first tried optical inhibition but controls were exhibited unusually large variance. Second, it is more feasible in our assay as it has a narrow height between the floor and lid that complicates tethering to an optic fiber. Past experiments overcame this with a motorized fiber retraction system (Root et al., 2014), but this is highly variable with user-dependent effects, so we found chemogenetics to be a more practical strategy. We have added a sentence to explain the rationale (see lines 561-563).

      (3) The specific subregion of the nucleus accumbens that was targeted should be named, as distinct parts of the nucleus accumbens can have very different functions. 

      We attempted to define specific subregions of the nucleus accumbens and found that plCoA projection is not specific to the shell or core, anterior or posterior, rather it broadly innervates the entire structure. We have added a note about this in manuscript (see lines 470-471). Given that we did not find notable subregion-specific outputs within the NAc, targeting was directed to the middle region of NAc, with coordinates stated in the methods. 

      (4) Why was an intersectional DREADD approach used to inhibit the projection pathways, as opposed to optogenetic inhibition? The DREADD approach could potentially affect all projection targets, and the authors might want to address how this could influence the interpretation of the results.

      This is partly addressed above in point 2. As for interpretation, we acknowledge that the intersectional approach silences the neurons projecting to a given target and not the specific projection and we have been careful with the wording. Although this may complicate the conclusion, we did map the collaterals for NAc and MeA projecting neurons and find that neurons do not appreciably project to both targets and have minimal projections to other targets. We have now taken care to state that we silence the neurons projecting to a structure, not silencing the projection, and we acknowledge this caveat. However, since the MeA- and NAcprojecting neurons appear to be distinct from each other (largely not collateralizing to each other), the conclusion that these divergent pathways are required still stands. We have added discussion of this in the Limitations section (see lines 859-863).

      Minor:

      (1) Line 402 needs a reference.

      We have added the missing reference (now line 441).

      (2) The Supplemental Figure labeling in the main text should be checked carefully.

      Thank you for pointing this out. We have fixed the prior errors.

      (3) Panel letter D is missing from Figure 2.

      This has been fixed.

      Reviewer #2 (Recommendations for the authors):

      Major Concerns, additional experiments:

      - In the calcium imaging experiments mice were presented with the same odor many times. Overall responses to odor presentations were quite variable and appear to habituate dramatically (Figure S1F). The general conclusion from these experiments are a lack of consistent valence-specific responses of individual neurons, but I wonder if this conclusion is slightly premature. A few potential explanatory factors that may need additional attention are: -First, despite recording video of the mouse's face during experiments, no behavioral response to any odor is described. Is it possible these odors when presented in head-fixed conditions do not have the same valence?

      Yes, we agree that this is a possibility. We have added a discussion in the Limitations section (see lines 849-857). We have also added additional behavioral analysis discussed below.

      On trials with neural responses are there behavioral responses that could be quantified? 

      We have now added data in which we attempt to characterize their behavioral response, to look for correlations in odor representation (see lines 208-228). Although we did observe different patterns of odor-evoked walking behavior, these patterns were not reliable or specific to particular odors (Figure S2). One might expect aversive odors to pause walking or elicit a fast fleeing-like response, but we did not observe any apparent differences for locomotion between odors (Figure S2A-D). Next, we examined responses to odor depending on the behavioral state (walking, pausing or fleeing) and didn’t observe any meaningful differences in odor responses (Figure S2E,F). Lastly, we acknowledge that the odor representation may be different in freely moving animals that exhibit dynamic responses to odor (see lines 859-857).

      - Habituation seems to play a prominent role in the neural signals, is there a larger contribution of valence if you look only at the first delivery (or some subset of the 20 presentations) of an odor type for a given trial? 

      Indeed, we considered this, but we did not find any apparent differences in valence encoding as measured by the proportion of neurons with significant valence scores across trials (see Figure 1J).

      - Is it reasonable to exclude valence encoding as a possibility when largely neurons were unresponsive to the positive valence odors (2PE and peanut) chosen when looking at the average cluster response (Figure 1F)? 

      It is true that we see fewer neurons responding to the appetitive odors (Figure 1H) and smaller average responses within the cluster, but some neurons do respond robustly. If these were valence responses, we would predict that neural responses should be similarly selective, but we do not observe any such selectivity. The sparseness of responses to appetitive odors does cause the average cluster analysis (Figure 1F) to show muted responses to these odors, consistent with the decreased responsivity to appetitive odors. Moreover, single neuron response analysis reveals that a given neuron is not more likely to respond to appetitive or aversive odors with any selectivity greater than chance. For these reasons, we think it is reasonable to conclude an absence of valence responses, which is consistent with the conclusion from another report (Iurilli et al., 2017).

      - While the preference and aversion assay with 4 corners is an interesting set-up and provides a lot of data for this particular manuscript. It would be helpful to test additional behaviors to determine whether these circuits are more conserved. As it stands the current manuscript relies on very broad claims using a single behavioral readout. Some attempts to use head-fixed approaches with more defined odor delivery timelines and/or additional valenced behavioral readouts is warranted.

      We appreciate the suggestion, but are not able to perform these experiments at the moment. The choice of the 4-quadrant assay was used because it built off of our prior experiments that demonstrate a role for the plCoA in innate behavior. It is noteworthy that the responses to odor seen in this assay are generally in agreement with other olfactory behavioral assays, so one wouldn’t predict a different result. The approach and avoidance responses measured in this assay are precisely the behaviors we wish to understand. Moreover, we did examine other nonolfactory behavioral readouts (Figures S3, S8), and didn’t observe any effect of manipulation of these pathways. Lastly, we have tried to define parameters for head-fixed behavior that would permit correlation of neural responses with behavior, including longer stimulations and closed loop locomotion control of odor concentration, but were unsuccessful at establishing parameters that generated reliable behavioral responses. We acknowledge that one limitation of the study is the limited behavioral tests with two odors and whether the circuits are more broadly necessary for other odors. 

      Minor comments:

      • Please define PID in the Results when it is first introduced.

      Done (see line 154)

      • Line 412 Figure S5C-N should be Figure S6C-N.

      Fixed. Now Figure S8C-N due to additional figures (see line 451).

      • Throughout the Discussion it would be helpful if the authors referred to specific Figure panels that support their statements (e.g. lines 654-656 "[...] which is supported by other findings presented here showing that both VGluT2+ and VGluT1+ neurons project to MeA, while the projection to NAc is almost entirely composed of VGluT1+ neurons".

      Thank you for the suggestion. We have added figure references in the discussion.

      • Line 778 "producing" should be "produce".

      Corrected (see line 840)

      • The figures are very busy, especially all the manipulations. The authors are commended for including each data point, but they might consider a more subtle design (translucent lines only for each animal, and one mean dot for the SEM), just to reduce the overall clutter of an already overwhelming figure set. But this is ultimately left to the authors to resolve and style to their liking. 

      Thank you for the suggestion. We have tried some different styles but like the original best.

      Reviewer #3 (Recommendations for the authors):

      If within reach, I suggest that the author determine the percentage of retrogradely labeled neurons to NAc or MEA that expresses GluT1 and GluT2. 

      We have done this for the middle region plCoA that has the greatest mixture of cell types (See Figure S10, lines 504-517). We find that the MeA projecting neurons are mostly VGluT2+ with a minority that express both VGluT1 and VGlut2. NAc-projecting neurons are primarily VGluT1+ with about 20% expressing VGlut2 as well.

      It would also be nice to sparse label of aplCoA and pplCoA using ChR2 to see if sparse activation drives approach or avoidance. 

      We agree that it would be useful to vary the sparseness of the ChR2 expression, to see if produces similar results. We examined this using sparsely labeled odor ensembles, as previously done (Root et al., 2014). Briefly, we used the Arc-CreER mouse to label TMT responsive neurons with a cre-dependent ChR2 AAV vector targeted to the anterior or posterior regions, while previously we had broadly targeted the entirety of plCoA. We had established that this labeling method captures about half of the active cells detected by Arc expression, which is on the order of hundreds of neurons rather than thousands by broad cre-independent expression. Remarkably, we get effects similar in magnitude that are not significantly different from that with broader activation of the anterior or posterior domains (see new Figure S4, lines 267-288). It still remains possible that there is a threshold number of neurons that are necessary to elicit behavior, but that is beyond the scope of the current study. However, these data indicate that the effect of activating anterior and posterior domains is not an artifact of broad stimulation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      We appreciate the positive assessment. We recognize that since all of the work in this manuscript was done in vitro, there are reasonable concerns about the translatability of these data to clinical settings. These results should not directly inform malaria policy, but we hope that these data bring new considerations to the approach for choosing strategic antimalarial combinations. We have modified the manuscript to clarify this distinction.

      Public Reviews

      Reviewer #1 (Public Review):

      We thank the reviewer for their thoughtful summary of this manuscript. It is important to note that DHA-PPQ did show antagonism in RSAs. In this modified RSA, 200 nM PPQ alone inhibited growth of PPQ-sensitive parasites approximately 20%. If DHA and PPQ were additive, then we would expect that addition of 200 nM PPQ would shift the DHA dose response curve to the left and result in a lower DHA IC50. Please refer to Figure 4a and b as examples of additive relationships in dose-response assays. We observed no significant shift in IC50 values between DHA alone and DHA + PPQ. This suggests antagonism, albeit not to the extent seen with CQ. We have modified the manuscript to emphasize this point. As the reviewer pointed out, it is fortunate that despite being antagonistic, clinically used artemisinin-4-aminoquinoline combinations are effective, provided that parasites are sensitive to the 4-aminoquinoline. It is possible that superantagonism is required to observe a noticeable effect on treatment efficacy (Sutherland et al. 2003 and Kofoed et al. 2003), but that classical antagonism may still have silent consequences. For example, if PPQ blocks some DHA activation, this might result in DHA-PPQ acting more like a pseudo-monotherapy. However, as the reviewer pointed out, while our data suggest that DHA-PPQ and AS-ADQ are “non-optimal” combinations, the clinical consequences of these interactions are unclear. We have modified the manuscript to emphasize the later point.

      While the Ac-H-FluNox and ubiquitin data point to a likely mechanism for DHA-quinoline antagonism, we agree that there are other possible mechanisms to explain this interaction.  We have addressed this limitation in the discussion section. Though we tried to measure DHA activation in parasites directly, these attempts were unsuccessful. We acknowledge that the chemistry of DHA and Ac-H-FluNox activation is not identical and that caution should be taken when interpreting these data. Nevertheless, we believe that Ac-H-FluNox is the best currently available tool to measure “active heme” in live parasites and is the best available proxy to assess DHA activation in live parasites. These points are now addressed in the discussion section. Both in vitro and in parasite studies point to a roll for CQ in modulating heme, though an exact mechanism will require further examination. Similar to the reviewer, we were perplexed by the differences observed between in vitro and in parasite assays with PPQ and MFQ. We proposed possible hypotheses to explain these discrepancies in the discussion section. Interestingly, our data corelate well with hemozoin inhibition assays in which all three antimalarials inhibit hemozoin formation in solution, but only CQ and PPQ inhibit hemozoin formation in parasites. In both assays, in-parasite experiments are likely to be more informative for mechanistic assessment.

      It remains unclear why K13 genotype influences RSA values, but not early ring DHA IC50 values. In K13<sup>WT</sup> parasites, both RSA values and DHA IC50 values were increased 3-5 fold upon addition of CQ. This suggests that CQ-mediated resistance is more robust than that conferred by K13 genotype. However, this does not necessarily suggest a different resistance mechanism. We acknowledge that in addition to modulating heme, it is possible that CQ may enhance DHA survival by promoting parasite stress responses. Future studies will be needed to test this alternative hypothesis. This limitation has been acknowledged in the manuscript. We have also addressed the reviewer’s point that other factors, including poor pharmacokinetic exposure, contributed to OZ439-PPQ treatment failure.

      Reviewer #2 (Public Review):

      We appreciate the positive feedback. We agree that there have been previous studies, many of which we cited, assessing interactions of these antimalarials. We also acknowledge that previous work, including our own, has shown that parasite genetics can alter drug-drug interactions. We have included the author’s recommended citations to the list of references that we cited. Importantly, our work was unique not only for utilizing a pulsing format, but also for revealing a superantagonistic phenotype, assessing interactions in an RSA format, and investigating a mechanism to explain these interactions. We agree with the reviewer that implications from this in vitro work should be cautious, but hope that this work contributes another dimension to critical thinking about drug-drug interactions for future combination therapies. We have modified the manuscript to temper any unintended recommendations or implications.

      The reviewer notes that we conclude “artemisinins are predominantly activated in the cytoplasm”. We recognize that the site of artemisinin activation is contentious. We were very clear to state that our data combined with others suggest that artemisinins can be activated in the parasite cytoplasm. We did not state that this is the primary site of activation. We were clear to point out that technical limitations may prevent Ac-H-FluNox signal in the digestive vacuole, but determined that low pH alone could not explain the absence of a digestive vacuole signal.

      With regard to the “reproducibility” and “mechanistic definition” of superantagonism, we observed what we defined as a one-sided superantagonistic relationship for three different parasites (Dd2, Dd2 PfCRT<sup>Dd2</sup>, and Dd2 K13<sup>R539T</sup>) for a total of nine independent replicates. In the text, we define that these isoboles are unique in that they had mean ΣFIC50 values > 2.4 and peak ΣFIC50 values >4 with points extending upward instead of curving back to the axis. As further evidence of the reproducibility of this relationship, we show that CQ has a significant rescuing effect on parasite survival to DHA as assessed by RSAs and IC50 values in early rings.

      Reviewer #3 (Public Review):

      We thank the reviewer for their positive feedback. We acknowledge that no combinations tested in this manuscript were synergistic. However, two combinations, DHA-MFQ and DHA-LM, were additive, which provides context for contextualizing antagonistic relationships. We have previously reported synergistic and additive isobolograms for peroxide-proteasome inhibitor combinations using this same pulsing format (Rosenthal and Ng 2021). These published results are now cited in the manuscript.

      We believe that these findings are specific to 4-aminoquinoline-peroxide combinations, and that these findings cannot be generalized to antimalarials with different mechanisms of action. Note that the aryl amino alcohols, MFQ and LM, were additive with DHA. Since the mechanism of action of MFQ and LM are poorly understood, it is difficult to speculate on a mechanism underlying these interactions.

      We agree with the reviewer that while the heme probe may provide some mechanistic insight to explain DHA-quinoline interactions, there is much more to learn about CQ-heme chemistry, particularly within parasites.

      The focus of this manuscript was to add a new dimension to considerations about pairings for combination therapies. It is outside the scope of this manuscript to suggest alternative combinations. However, we agree that synergistic combinations would likely be more strategic clinically.

      An in vitro setup allows us to eliminate many confounding variables in order to directly assess the impact of partner drugs on DHA activity. However, we agree that in vivo conditions are incredibly more complex, and explicitly state this.

      We agree that in the future, modeling studies could provide insight into how antagonism may contribute to real-world efficacy. This is outside the scope of our studies.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      The key weaknesses identified in this manuscript are described in the 'weaknesses' section of the public review. The major one is the inconsistency around the H-FluNox response in the chemical vs biological experiments. I can't think of a simple experiment to resolve this issue, but it is good that this data is openly provided in the manuscript. I believe there could be more discussion to clarify this limitation with the current study, and the conclusions, and particularly the title, should be softened regarding the mechanism of antagonism being based on heme reactivity.

      We have softened the title and conclusions to take into account the limitations of our studies.

      (1) Please double-check the definitions for isobologram interpretation. In most antimicrobial interaction studies, I see the threshold for antagonism at sumFIC50 of 1.5, or even 2. 1.25 is often interpreted as additive in many studies.

      We acknowledge that different studies use various cutoff values. Our interpretations for additive versus antagonistic versus superantagonistic were based not only on mean ΣFIC50 values, but also isobologram shape. For example, the flat isoboles for MFQ-DHA were clearly distinct from the curved isoboles of PPQ-DHA. It is unclear what cutoff value(s) would be most clinically relevant.

      (2) For the MFQ-PPQ interaction study, please make it clear that these drugs have very long half-lives (weeks), so the 4 h pulse assay isn't really relevant to their overall activity. It probably shows a slower onset of action, but there is plenty of drug remaining for many days in the clinical scenario, so perhaps the data from the traditional 48h assay is more relevant. The same consideration applies to OZ439, which may impact the interpretation of that data.

      We have now included the half-lives of these compounds in the discussion section. Our intent was to use a pulsing format to make these isobolograms comparable with the other assays. It is important to note that pulses can reveal stronger phenotypes that might be missed with traditional methods. Thus, while 48 h assays may better mimic in vivo conditions, they could also mask important phenotypes.

      Reviewer #3 (Recommendations for the Authors):

      I have included most of my concerns in the public review. Below are some additional specific points for consideration:

      (1) It is expected to include a synergistic combination as a control (e.g., artemisinin + lumefantrine) to contextualize the degree of antagonism observed. The experimental design should show some synergistic profiles in comparison. Adding a few experiments by including a synergistic control is needed.

      Both MFQ-DHA and LM-DHA combinations were additive, which provides context for antagonistic combinations. This is now stated in the results section pertaining to Figure 1. We have also included a reference to our previous publication in which we demonstrated that proteasome inhibitor-peroxide combinations are synergistic to additive using this same pulsing format.

      (2) Consider in vivo validation or pharmacokinetic/pharmacodynamic modeling to strengthen the translational relevance of the findings when it comes to doses and the IC50 correlations.

      We agree that this would be useful to do in future, but it is outside the scope of the current study.

      (3) It would be beneficial to include a discussion section on how the findings are generalizable to different Plasmodium falciparum genotypes (3D7, Dd2, MRA-1284) and their relevance.

      Findings were consistent across three parasite backgrounds depending on PfCRT genotype. This point has been included in the discussion section. The background of these parasites is also provided in Table 1.

      (4) Potential evaluation criteria to understand where certain combinations should be reconsidered can be included as a suggestion for the wider audience.

      Our in vitro studies suggest that pulsing isobolograms would be a useful assay to include when evaluating combination therapies. While we believe that synergistic combinations would be more strategic than antagonistic combinations, we cannot provide evaluation criteria or make recommendations for reconsidering currently used combinations.

      (5) Further elaborate on the mechanistic basis of heme inactivation by quinolines. If data are available, please include more data on the specificity of the process.

      Despite our best efforts, we were unable to evaluate quinoline-heme interactions in parasites. Even in vitro, this interaction has remined elusive for decades. We agree that this would be an important future step towards supporting a specific mechanism for quinoline-DHA antagonism.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study generated 3D cell constructs from endometrial cell mixtures that were seeded in the Matrigel scaffold. The cell assemblies were treated with hormones to induce a "window of implantation" (WOI) state. Although many bioinformatic analyses point in this direction, there are major concerns that must be addressed.

      Strengths:

      The addition of 3 hormones to enhance the WOI state (although not clearly supported in comparison to the secretory state).

      Comments on revisions:

      The authors did their best to revise their study according to the Reviewers' comments. However, the study remains unconvincing, incomplete and at the same time still too dense and not focused enough.

      Reviewer #2 (Public review):

      Zhang et al. have developed an advanced three-dimensional culture system of human endometrial cells, termed a receptive endometrial assembloid, that models the uterine lining during the crucial window of implantation (WOI). During this mid-secretory phase of the menstrual cycle, the endometrium becomes receptive to an embryo, undergoing distinctive changes. In this work, endometrial cells (epithelial glands, stromal cells, and immune cells from patient samples) were grown into spheroid assembloids and treated with a sequence of hormones to mimic the natural cycle. Notably, the authors added pregnancy-related factors (such as hCG and placental lactogen) on top of estrogen and progesterone, pushing the tissue construct into a highly differentiated, receptive state. The resulting WOI assembloid closely resembles a natural receptive endometrium in both structure and function. The cultures form characteristic surface structures like pinopodes and exhibit abundant motile cilia on the epithelial cells, both known hallmarks of the mid-secretory phase. The assembloids also show signs of stromal cell decidualization and an epithelial mesenchymal transition, like process at the implantation interface, reflecting how real endometrial cells prepare for possible embryo invasion.

      Although the WOI assembloid represents an important step forward, it still has limitations: the supportive stromal and immune cell populations decrease over time in culture, so only earlypassage assembloids retain full complexity. Additionally, the differences between the WOI assembloid and a conventional secretory-phase organoid are more quantitative than absolute; both respond to hormones and develop secretory features, but the WOI assembloid achieves a higher degree of differentiation due to the addition of "pregnancy" signals. Overall, while it's a reinforced model (not an exact replica of the natural endometrium), it provides a valuable in vitro system for implantation studies and testing potential interventions, with opportunities to improve its long-term stability and biological fidelity in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      This study generated 3D cell constructs (i.e., assembloids) that were treated with hormones to induce a 'window of implantation' (WOI) state. While the authors have made large efforts to address the reviewers' feedback, the study's findings remain unconvincing and incomplete.

      (1) The authors have appropriately revised the terminology from 'organoids' to 'assembloids' in several parts of the manuscript. However, this revision remains incomplete, as the main title, figure legends, and figure titles still contain the incorrect term. A thorough review of the entire manuscript is recommended to ensure consistent and accurate use of terminology.

      Thank you for your meticulous review. We have now conducted a full check and confirmed that terminology is used consistently and accurately throughout the text.

      (1) Previous comments raised concerns about the feasibility of robustly passaging assembloid structures - comprising epithelial, stromal and immune cells - under epithelial growth conditions. The authors responded by stating that they optimized the expansion medium with a stromal cell-promoting factor. Additionally, rather than conducting scRNA-seq on both early and late passages (P6-P10) as suggested, they performed immunofluorescence staining, which confirmed the persistence of stromal cells at passage 6. However, the presence of immune cells was not addressed. Confirmation of their presence is essential for all further claims. Moreover, a more zoomed-out view of the immunostaining would help clarify the overall cellular composition across the entire well and facilitate comparison with corresponding brightfield images.

      Whole-mount immunofluorescence of the 6th - generation assembloids revealed that CD45<sup>+</sup> immune cells surrounded FOXA2<sup>+</sup> glands, with a more zoomed-out view provided.

      Author response image 1.

      Whole-mount immunofluorescence showed that CD45<sup>+</sup> cells (immune cells) were arranged around the glandular spheres that were FOXA2<sup>+</sup>. Scale bar =50 μm (left) and 30 μm (right).

      In their response, the authors mention using the first three passages to ensure optimal cell diversity and viability. However, the manuscript states that 'assembloids derived from the first generation are used for experiments' (line 106). This discrepancy must be clarified.

      Thank you for your suggestion. We have revised the relevant content to “The assembloids derived from the first three generation are used for experiments” (Line 90-91).

      (2) The authors have made a commendable effort to bring more focus to the manuscript, which has improved readability.

      We thank you for your insightful suggestions, which have greatly improved the quality of our manuscript.

      (3) The "embryo implantation" part remains very unconvincing. How did authors define "the blastoids could grow within the endometrial assembloids and interact with them"? What did they mean with "grow"? Did blastoids further differentiate? Normally, blastoids cannot further "grow". "Survival rates of blastoids" is not equal to "growth". It is not clear how the survival rate was quantified. Besides, regarding the "interaction rates", how did authors define and quantify it? Actually, blastoids are able to attach to Matrigel efficiently (even without any endometrial cells), so authors cannot simply define the "interaction" as the co-localization of blastoids and assembloids via brightfield images. In addition, for the assembloids as the 3D structures grow in the Matrigel, the epithelial parts are normally apical-in, while the blastoids attach to the apical (lumen) side of the epithelial cells, so physiologically, blastoids should interact with the apical part of the epithelial cells instead of the outside of the assembloids.

      (1) What did they mean with "grow"? Did blastoids further differentiate?

      On the one hand, volume and morphology undergo continuous dynamic changes; on the other hand, only the inner cell mass and trophectoderm exist at the blastocyst stage, with the ICM further differentiating into OCT4<sup>+</sup> epiblast and GATA6<sup>+</sup> hypoblast.

      (2) Survival rates of blastoids" is not equal to "growth". It is not clear how the survival rate was quantified.

      The definition of "survival rate" is as follows: morphologically, the blastocoel remains noncollapsed and the cell boundaries are distinct (with no obvious cell detachment); molecularly, the markers of epiblast, hypoblast and trophectoderm are expressed. The survival rate is calculated as the ratio of viable embryoids to the total number of embryoids.

      (3) Besides, regarding the "interaction rates", how did authors define and quantify it? Actually, blastoids are able to attach to Matrigel efficiently (even without any endometrial cells), so authors cannot simply define the "interaction" as the co-localization of blastoids and assembloids via brightfield images.

      The criteria for determining interaction include not only attachment between the blastoids and assembloids observed via brightfield images, but also their sustained tight adhesion against external mechanical perturbations (e.g., medium replacement, immunostaining procedures).

      (4) In addition, for the assembloids as the 3D structures grow in the Matrigel, the epithelial parts are normally apical-in, while the blastoids attach to the apical (lumen) side of the epithelial cells, so physiologically, blastoids should interact with the apical part of the epithelial cells instead of the outside of the assembloids.

      You are absolutely correct. In vivo, the embryo indeed makes initial contact with the apical side of the epithelial cells. The introduction of the blastoid co-culture model herein is intended to demonstrate that this receptive endometrial assembloids can better support blastoid growth and development.

      (4) Previous comments highlighted the absence of distinct shifts in gene expression profiles between SEC assembloids and WOI assembloids, which contrasts with findings from primary endometrial tissue reported by Wang et al. (2020). While the authors have expanded their analysis using the Mfuzz algorithm and identified changes in mitochondria- and cilia-associated genes, the manuscript still lacks evidence of significant transcriptional changes in key WOI marker genes, as described in Wang et al. This discrepancy must be addressed and discussed in greater depth to clarify the biological relevance of their model.

      The endometrium in vivo involves complex crosstalk among multiple cell types and is tightly regulated by the hypothalamic-pituitary-ovarian (HPO) axis, thus exhibiting distinct shifts in gene expression during the peri-implantation period.

      In our in vitro model, alterations in mitochondria- and cilia-related genes were observed, which to a certain extent demonstrates that these window of implantation (WOI) assembloids possess receptive-phase characteristics and can be employed to investigate WOI-associated scientific questions or conduct in vitro drug screening.

      However, substantial efforts are still required to optimize the current model for fully recapitulating the dynamic changes in endometrial gene expression across different phases in vivo, and this aspect is further addressed in the Limitations section of our discussion (Line 342-353).

      “However, our WOI endometrial assembloids also exhibit some limitations. It is undeniable that the assembloids cannot perfectly replicate the in vivo endometrium, which comprises functional and basal layers with a greater abundance of cell subtypes, under superior regulation by hypothalamic-pituitary-ovarian (HPO) axis. Specifically, stromal and immune cells are challenging to stably passage, and their proportion is lower than in the in vivo endometrium. While the in vivo peri-implantation period exhibits intricate gene expression dynamics driven by systemic regulation, our models only partially recapitulate these changes, primarily in mitochondria- and cilia-associated genes. Nevertheless, to some extent, these WOI assembloids possess receptivity characteristics and can be utilized for investigating receptivity-related scientific questions or conducting in vitro drug screening. Further refinements are required to fully simulate the dynamic endometrial gene expression patterns across all menstrual cycle stages. We are looking forward to integrating stem cell induction, 3D printing, and microfluidic systems to modify the culture environment.”

      (5) In the authors' response document, they present data integrating their results with those of Garcia Alonso et al. (2021). However, these integrated analyses are not included in the revised manuscript (which should be, if answering a major concern).

      Thanks for your valuable suggestions. We have now integrated the findings of Garcia Alonso et al. (2021) into the revised manuscript (Line 132) and Figure S2E–F.

      (8) Fig 2D: The authors have clarified that CD45+ staining is used. However, they have not yet adapted the typo in the figure legend of the right picture.

      Thanks for your thorough review. The left panel of Figure 2D is stained with CD45 to label immune cells, while the right panel is stained with CD44. These details have been clearly indicated in both the manuscript and the figure legend.  

      (9) All quantification analyses (as described in the authors' response document) should be clearly described in the Materials & Methods section.  

      Thanks for your valuable suggestions. All quantification analyses have now been added to the Supporting Materials and Methods section (Line 94-104, Line 110-111, Line 241244).

      (10) The authors have provided clarification regarding their method for quantifying immunofluorescence staining (e.g., OLFM4 expression in Fig. 3C) in their response document. However, these methodological details are not included in the revised manuscript. It is important that such information is incorporated into the manuscript itself to ensure transparency and reproducibility for others.

      Thanks for your valuable suggestions. All quantification analyses have now been added to the Supporting Materials and Methods section (Line 94-104).

      (13) It is needed to include the author's response to the comment about literature showing the opposite of increased number of cilia during the WOI into the discussion part of the paper.

      We appreciate your suggestions. The relevant content has now been added to the Discussion section (Lines 319–323).

      (14) In the authors' response, they explain the difference between pinopodes and microvilli. They should include this explanation briefly in the manuscript. Moreover, Fig. 3F lacks a picture of cilia structure in CTRL condition. In addition, the structures that are indicated as cilia with an orange arrow seem to not be attached to the endometrial cells (anymore). It would be useful to show another more representative picture for the cilia.

      (1) Thank you for your valuable suggestions. The distinction between pinopodes and microvilli has now been added to the Supporting Materials and Methods section (Line 230-236).

      (2) You are probably referring to Figure 2F—we did not observe ciliary structures in the CTRL group.

      (3) The cilia structure was visualized via transmission electron microscopy (TEM), which requires ultrathin sectioning. Thus, the cilia shown in the image correspond to a single cross-section of the captured assembloids. Owing to technical limitations, three-dimensional visualization of cilia on the cells cannot be achieved.

      (17) The results on co-culturing blastoids with the WOI assembloids is not convincing. The blastoids are exposed to the basolateral side of the endometrial epithelial cells, while in vivo, blastocysts interact with the apical side of the endometrial epithelial cells first (apposition and attachment), followed by invasion into the endometrium. This means that the interaction shown here is not physiological. Therefore, it is not justified to say that this platform holds promise to investigate maternal-fetal interactions.

      We agree with your perspective that discrepancies exist between this model and the physiological processes in vivo. However, such differences do not negate the scientific value of the model.

      The core merit of this study lies in the successful establishment of co-culture systems for blastoids and WOI assembloids. Notably, genuine cross-talk occurs between the two components, thereby providing a practical and operational tool for subsequent research.

      Although the current contact orientation differs from that observed in vivo, future optimization of the cell culture protocol (via modulation of cell polarity) will enable the model to better recapitulate physiological conditions. Therefore, the innovation and operability of this model within specific research contexts still render it a robust platform for investigating maternal-fetal interactions.

      Overall, it is highly recommended that the authors carefully review the manuscript for grammatical errors, inconsistencies and issues with scientific phrasing. The language throughout the text requires substantial editing to improve clarity, readability and precision. 

      We appreciate your suggestions. A full manuscript check was performed to rectify grammatical errors, inconsistencies, and inappropriate scientific phrasing, with further language refinement by a native English-speaking specialist.

      Fig 1A: This overview is unclear. How many days do the assembloids grow before being stimulated with hormones? Are CTRL assembloids only kept in culture until day 2 and SEC and WOI assembloids until day 8? This is also not clear form the Materials and Methods section. Should be clarified.

      Thanks for your valuable suggestions. We have now updated the overview (Figure 1A) and Materials and Methods section (Line 370-371, Line 379-381).

      “Hormonal treatment was initiated following the assembly of the endometrial assembloids (about 7-day growth period).”

      “The CTRL group was cultured in ExM without hormone supplementation and subjected to parallel culture for 8 days along with the two aforementioned groups.”

      Fig 1B: From these brightfield images, it appears that the size of the assembloids remains relatively consistent from Day 0 to Day 3 and up to Day 11 (especially in CTRL). However, in Fig S1A, the assembloids on Day 11 appear significantly larger compared to those on Day 2 (or Day 4). Authors should clarify this discrepancy (since both of the figures are shown as "brightfield of endometrial assembloids").

      You are probably referring to the observation that the assembloids at Day 11 in Fig. S1A are smaller in size than those at Day 2 (or Day 4) in Fig. 1B. This discrepancy arises because the time points in Fig. 1B are calculated starting from the initiation of hormone treatment for the SEC and WOI groups, rather than from the beginning of the overall culture as in Fig. S1A. In addition, assembloids exhibit size variability during the same culture period due to individual heterogeneity.

      To eliminate ambiguity, we have now labeled “Hormone Day 0, Day 2, Day 8” in Fig. 1B and revised the corresponding figure legend to read: “Endometrial assembloids from the CTRL, SEC, and WOI groups, which were subjected to hormone treatment on Days 0, 2, and 8, exhibited comparable growth patterns throughout the culture period.”

      Fig 2G: authors still used the description "organoids" here instead of "assembloids".

      We appreciate your careful review. Corrections have been made accordingly.

      Fig. 3C: For the OLFM4 staining quantification, in the Y-axis authors wrote "proportion of OLFM4 (+) cells (OLFM4 (+)/total", but in the rebuttal letter they mention "its fluorescence intensity (quantified as mean grey value) was significantly stronger in both the SEC and WOI groups compared to the CTRL group". This is confounding and should be clarified.

      We apologize for incorrectly writing "fluorescence intensity" in the rebuttal letter; the correct term should be the "proportion of OLFM4 (+) cells (OLFM4 (+)/total)" as shown in Fig. 3C.

      Fig 5D: Acetyl-α-tubulin is the marker of ciliated cells and should be expressed in the cilia instead of the whole cells. It is very strange to quantify as "mean fluorescence intensity (acetyl-αtubulin/DAPI)" to assess the cilia. Please clarify.

      Thank you for your insightful comment. To clarify, the ratio "mean fluorescence intensity (acetyl-α-tubulin/DAPI)" was calculated within individual acetyl-α-tubulin<sup>+</sup> ciliated cells. Acetyl-αtubulin fluorescence was normalized to the DAPI signal of the same cell nucleus, not the wholecell population. This corrected for variations in cell number and staining efficiency to ensure data accuracy.

      Fig 5F: it is very bizarre that unciliated epithelium was transformed from ciliated epithelium, and CTRL was transformed from SEC and WOI. Should be clarified and discussed.

      Pseudotime analysis sorts discrete cells along a "pseudotime axis" based on similarities and differences in cellular gene expression, thereby simulating cell state transitions.

      Ciliated epithelium → unciliated epithelium: During the menstrual cycle, ciliated and unciliated epithelia undergo mutual transformation from the secretory phase (or mid-secretory phase) to the menstrual phase, and then to the proliferative phase. Here, we demonstrate the transition of ciliated cells to unciliated cells from the SEC and WOI stages to the CTRL stage.

      Notably, the two cell types coexist, and what is presented here merely reflects a transformation trend. Relative content has been incorporated into the Discussion section (Line 319-321).

      “Throughout the menstrual cycle, ciliated and unciliated epithelia undergo mutual transformation from the secretory phase (or mid-secretory phase) to the menstrual phase, and then to the proliferative phase.”

      Fig 5H: To show "enhanced invasion ability", authors must provide some quantification and statistic analysis. It is very hard to see the difference between the CTRL and SEC regarding ROR2Wnt5A.

      We appreciate your suggestion. Quantification and statistic analysis have been added to Figure 5H.

      Fig 6A: please elaborate the "mIVC1" and "mIVC2" in the figure legends.

      Additions have been made to the figure legends accordingly, as follows: "mIVC1: modified In Vitro Culture Medium 1; mIVC2: modified In Vitro Culture Medium 2."

      Fig S1D: Is the PAS staining also done in CTRL assembloids? In addition, it is stated that the assembloids secrete glycogen because of a positive PAS staining, while it could also be neutral mucins, glycoproteins, etc, which are all detected by PAS staining. So, the authors should be more careful in stating that it is glycogen, or a PAS staining with diastase digestion should be done.

      The PAS staining results for the CTRL group are presented in Fig. S1I. In addition, results of PAS staining with diastase digestion are included in Figure S1.

      Line 120: references?

      The reference has been added accordingly.

      Line 178: The term 'Endometrial Receptivity Test (ERT)' is used. Do the authors mean Endometrial Receptivity Analysis (ERA) test? ERA is the commonly used abbreviation for this test. Moreover, the authors describe ERA as 'a kind of gene analysis-based test.' This should be rephrased more scientifically correct.

      Thank you for your valuable suggestion. We have revised the term to ERA, and modified the phrase "a kind of gene analysis-based test" to "gene expression profiling-based diagnostic assay" (Lines 160–163).

      “We performed Endometrial Receptivity Analysis (ERA), a gene expression profiling-based diagnostic assay that integrates high-throughput sequencing and machine learning to quantify the expression of endometrial receptivity-associated genes.”

      Line 83: assemblies à assembloids

      We appreciate your suggestion. The text has been updated to “the endometrial assembloids progressed from epithelial organoids, to assemblies of epithelial and stromal cells and then to stem cell-laden 3D artificial endometrium”.

      The Materials and Methods section currently lacks the needed details. Authors should substantially expand this section to clearly describe all experimental and analytical procedures, including, aùmong others, immunofluorescence staining, quantification methods, bioinformatics analyses and statistical approaches. Providing comprehensive methodological information is essential.

      A detailed description of these methods is provided in the Supporting Materials and Methods section.

      Reviewer #2 (Recommendations for the authors): 

      The revised manuscript is much improved in clarity, focus, and experimental support. The authors have thoughtfully addressed the major concerns from the previous review. In particular, the logic and flow of the paper are clearer, it now guides the reader through the rationale (constructing a WOI model), the comparative analysis against in vivo tissue and simpler organoids, and the key features that distinguish the WOI assembloid. The added functional validation (especially the blastoid co-culture experiment) significantly strengthens the work by showing a tangible outcome of "receptivity" beyond molecular profiling. The distinction between the standard secretory-phase organoid and the WOI assembloid is now more convincing, as the authors highlight several specific differences in morphology (more cilia, pinopodes), metabolism, and implantation success that favor the WOI model. The manuscript also reads cleaner with the bioinformatic sections condensed to the most important findings (excess detail was trimmed or moved to supplements) and the rationale for gene/pathway selection explicitly stated.

      The manuscript has been significantly strengthened through the addition of functional assays (like the blastoid co-culture), clearer transcriptomic and proteomic data, and detailed analyses of hormone treatments, cilia biology, and stromal and immune cell behavior in early passages. These updates confirm that the WOI assembloid supports embryo attachment and outperforms standard secretory organoids, while integrating external references and clarifications on terminology. Minor suggestions remain, such as clarifying statistical significance and adding functional interpretations for certain observations, but overall, the manuscript is now more robust and biologically convincing.

      Remaining points for clarification: There are a few minor points that still merit attention:

      - Use of the Endometrial Receptivity Test (ERT): As previously mentioned, if the authors have ERT data for the SEC organoid group, including that information would further support the claim that the WOI assembloid is uniquely receptive. If not, it would be helpful to add a statement clarifying that the ERT was employed specifically as a confirmatory test for the WOI assembloids, rather than as a comparative measure across all groups.

      Thank you for your valuable suggestion. We have now supplemented the description in the Supporting Materials and Methods section (Lines 160–162) as follows: “ERA was employed specifically as a confirmatory test for the WOI assembloids, rather than as a comparative measure across all groups.”

      - Because the assembloids are created from primary tissue samples, it would be helpful to briefly comment on how consistent the findings were across different patient-derived samples. For example, did all biological replicates show similar expression of receptivity markers and comparable capacity to support blastoid attachment? Although this seems implied, including a sentence in the Methods or Results sections that specifies the number of donor lines tested would help readers assess the model's variability and reproducibility.

      We appreciated your advice. The relevant statement has been added to the Supporting Materials and Methods section. (Line 312-313).

      “All biological replicates (fourteen individuals) of endometrial assembloids show similar expression of receptivity markers and comparable capacity to support blastoid attachment.”

      - The authors mention promising future directions, such as integrating 3D printing and microfluidics to further enhance the model, which is an excellent forward-looking statement. It would also be valuable to suggest the inclusion of additional cell types, like more robust immune cell populations or endothelial components, as future improvements to create an even more comprehensive model of the endometrial lining.

      Thank you for your valuable suggestion. 3D printing and microfluidics serve as approaches for introducing multiple cell types. We have supplemented the following statement in the manuscript: “We are looking forward to integrating stem cell induction, 3D printing, and microfluidic systems to modify the culture environment.” (Line 352-353).

      We are grateful for your valuable feedback and constructive criticism, which have helped us improve the quality of our work in terms of content and presentation. We have diligently revised the manuscript and made necessary changes. Here, we have attached the revised manuscript, figures, and all supplementary materials for your re-evaluation. Thank you again for your continued support and look forward to your favorable decision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper presents maRQup, a Python pipeline for automating the quantitative analysis of preclinical cancer immunotherapy experiments using bioluminescent imaging in mice. maRQup processes images to quantify tumor burden over time and across anatomical regions, enabling large-scale analysis of over 1,000 mice. The study uses this tool to compare different CAR-T cell constructs and doses, identifying differences in initial tumor control and relapse rates, particularly noting that CD19.CD28 CAR-T cells show faster initial killing but higher relapse compared to CD19.4-1BB CAR-T cells. Furthermore, maRQup facilitates the spatiotemporal analysis of tumor dynamics, revealing differences in growth patterns based on anatomical location, such as the snout exhibiting more resistance to treatment than bone marrow.

      Strengths:

      (1) The maRQup pipeline enables the automatic processing of a large dataset of over 1,000 mice, providing investigators with a rapid and efficient method for analyzing extensive bioluminescent tumor image data.

      (2) Through image processing steps like tail removal and vertical scaling, maRQup normalizes mouse dimensions to facilitate the alignment of anatomical regions across images. This process enables the reliable demarcation of nine distinct anatomical regions within each mouse image, serving as a basis for spatiotemporal analysis of tumor burden within these consistent regions by quantifying average radiance per pixel.

      Weaknesses:

      (1) While the pipeline aims to standardize images for regional assessment, the reliance on scaling primarily along the vertical axis after tail removal may introduce limitations to the quantitative robustness of the anatomically defined regions. This approach does not account for potential non-linear growth across dimensions in animals of different ages or sizes, which could result in relative stretching or shrinking of subjects compared to an average reference.

      Our answer to this comment is included in the Supplemental Methods. The standard deviation of the mouse pixels was calculated to ensure that the image processing steps did not alter the shape or size of the mice. Such consistency is particularly striking because our dataset was accrued by nine lab members over the last five years, before we conceived and carried out our analysis (c.f., answer to point #2). In fact, it is the very consistency of this IVIS measurement that led us to conceive our pipeline. As seen from Supplemental Figure 4G, there is minimal difference in the shape or size of the mice across 7,534 images. A total of 99 images were removed either due to being too slanted (91/7663, 1.2%) or due to processing errors (8/7633, 0.1%). Also, the vertical scaling was conducted while keeping the aspect ratio unchanged to prevent any non-anatomical scaling. Hence, we did not record any nonlinear growth of the mice that would warrant more convoluted alignment and/or batch correction for our images.

      (2) Furthermore, despite excluding severely slanted images, the pipeline does not fully normalize for variations in animal pose during image acquisition (e.g., tucked body, leaning). This pose variability not only impacts the precise relative positioning of internal anatomical regions, potentially making their definition based on relative image coordinates more qualitative than truly quantitative for precise regional analysis, but it also means that the bioluminescent light signal from the tumor will not propagate equally to the camera, as photons will travel differentially through the tissue. This differing light path through tissues due to variable positioning can introduce large variability in the measured radiance that was not accounted for in the analysis algorithm. Achieving more robust anatomical and quantitative normalization might require methods that control animal posture using a rigid structure during imaging.

      Reviewer #1 is correct that different mouse postures would be an issue when aligning the images and normalizing for size. However, all experiments are conducted for luminescence measurements in the IVIS system (i.e., this requires anesthesia and long integration time for imaging). In our experience and in our 1000+ mouse dataset, we noticed that all experiments (n=37) did place the anesthetized mice in a stretched/elongated position. Of note, these experiments were conducted by nine different researchers who were not instructed on how to place the mice on the machine for ideal image processing, thus showing that the standard protocol of imaging mice on IVIS does not introduce large variations in animal pose during image acquisition. We think the issue raised by Reviewer #1 is moot in the context of classical settings for mouse luminescence imaging.

      Reviewer #2 (Public review):

      Summary:

      The authors developed a method that automatically processes bioluminescent tumor images for quantitative analysis and used it to describe the spatiotemporal distribution of tumor cells in response to CD19-targeting CAR-T cells, comprising CD28 or 4-1BB costimulatory domains. The conclusion highlights the dependence of tumor decay and relapse on the number of injected cells, the type of cells, and the initial growth rate of tumors (where initial is intended from the first day of therapy). The authors also determined the spatiotemporal analysis of tumor response to CAR T therapy in different regions of the mouse body in a model of acute lymphoblastic leukemia (ALL).

      Strengths:

      The analysis is based on a large number of images and accounts for many variables. The results of the analysis largely support their claims that the kinetics of tumor decay and relapse are dependent on the CAR T co-stimulatory domain and number of cells injected and tumor growth rates. 

      Weaknesses:

      The study does not specify how a) differences in mouse positioning (and whether they excluded not-aligned mice) and b) tumor spread at the start of therapy influenced their data. The study does not take into account the potential heterogeneity of CAR T cells in terms of CAR T expression or T cell immunophenotype (differentiation, exhaustion, fitness...).

      See answer #2 to Reviewer #1.

      Author response image 1.

      Author response image 1 shows the average tumor radiance on day zero (when CAR-T cell therapy was administered) for all mice. While there is some spread, most mice had tumor localized to the liver or bone marrow.

      Reviewer #3 (Public review):

      Summary:

      The paper "The 1000+ mouse project: large-scale spatiotemporal parametrization and modeling of preclinical cancer immunotherapies" is focused on developing a novel methodology for automatic processing of bioluminescence imaging data. It provides quantitative and statistically robust insights into preclinical experiments that will contribute to optimizing cell-based therapies. There is an enormous demand for such methods and approaches that enable the spatiotemporal evaluation of cell monitoring in large cohorts of experimental animals.

      Strengths:

      The manuscript is generally well written, and the experiments are scientifically sound. The conclusions reflect the soundness of experimental data. This approach seems to be quite innovative and promising to improve the statistical accuracy of BLI data quantification. 

      This methodology can be used as a universal quantification tool for BLI data for in vivo assessment of adoptively transferred cells due to the versatility of the technology.

      Weaknesses: 

      No weaknesses were identified by this Reviewer. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In this paper, the authors propose a significant advancement in optical image data analysis by employing automation. They effectively demonstrate the valuable insights that can be gained from analyzing extensive datasets with a more unbiased methodology. At present, I do not have any specific suggestions for improvement.

      However, it is important to note that this work is limited in its operational scope. Specifically, it relies on predefined ROIs rather than aligning the signal site with anatomical systems. The scaling model and image cropping are simplistic, animal pose is not taken into account, and the data output needs to be called semi-quantitative or qualitative, and would have been stronger utilizing an AI agent. Nevertheless, this work underscores the potential of automated systems in preclinical image analysis, which is a crucial step towards developing more sophisticated approaches to optical image data analysis.

      While our analysis used predefined ROIs, the maRQup pipeline allows users to manually draw ROIs on the mouse image.

      Reviewer #2 (Recommendations for the authors):

      The writing and presentation of data are clear and accurate, but some additional information should be added regarding the imaging protocol used to acquire the original data. 

      The authors mention fluorescence in Figure 1. I expected all the data to be generated from bioluminescent NALM-6 tumors, since bioluminescence is indeed measured in average radiance and can be per pixel (p/sec/cm2/sr/pixel). Fluorescence should be measured using radiance efficiency (p/sec/cm2/sr)/(µW/cm2), a unit that compensates for non-uniform excitation light pattern in the instrument. Would the author find different results if fluorescence data were analyzed separately?

      Reviewer #2 is correct that the unit for fluorescence would be radiance efficiency. The word “fluorescent” was included in the label of Figure 1a  to highlight that our workflow could be applied to other types of light-generating methods (i.e., fluorescence vs. bioluminescence). However, in this study, measurements of bioluminescent tumors only were analyzed. If fluorescence measurements are to be analyzed, our methods of image acquisition and processing would be directly applicable.

      Did the author ever check the signal of the snout in mice with no tumor?

      In mice with no tumor, there is no detectable signal in the snout (or anywhere else, for that matter).

      The urine of mice contains phosphor, and might give a background signal, especially if longer exposure is used at the end of the study.

      For the mice with no tumor injection, the luminescence signal was below background (<10<sup>2</sup> p/sec/cm<sup>2</sup>/sr/pixel). In particular, we do not detect any signal in the bladder/urine. Additionally, as described in the Supplemental Methods and Figure 1b, only pixels that were on the mouse as determined from the brightfield image were used to calculate the tumor burden from the radiance of the luminescent image. This method ensures that any background signal (e.g., from phosphor in mouse urine) would be excluded in the radiance quantification and not bias the results.

      Additionally, as described in the Methods, the exposure time was held constant at 30 seconds for each IVIS measurement across all 37 experiments.

      The data using more than 2 million cells comes from only 10 mice, and maybe the biological relevance of this group is limited since it will not be achievable and translatable in humans (PMID: 33653113).

      We appreciate Reviewer #2’s attention to this issue. The effect observed in our study is large enough to reach statistical significance despite the small number of mice. Note that the dosing regimen used was optimized for the murine NSG model and would require appropriate scaling before clinical application. Nonetheless, NSG mice remain the gold standard for pre‑clinical in vivo evaluation and their use is generally required by regulatory agencies, such as the FDA, for assessing novel CAR‑T cell therapies; thus these findings are relevant for advancing such treatments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review): 

      Strengths:

      (1) The use of chronic two-photon Ca<sup>2+</sup> imaging in awake, behaving mice represents a major technical strength, minimizing confounds introduced by anesthesia. The development of a Pf4Cre:GCaMP6s reporter line, combined with high-resolution intravital imaging, enables long-term and subcellular analysis of macrophage Ca<sup>2+</sup> dynamics in the meninges.

      (2) The comparison between perivascular and non-perivascular macrophages reveals clear niche-dependent differences in Ca<sup>2+</sup> signaling properties. The identification of macrophage Ca<sup>2+</sup> activity temporally coupled to dural vasomotion is particularly intriguing and highlights a potential macrophage-vascular functional unit in the dura.

      (3) By linking macrophage Ca<sup>2+</sup> responses to CSD and implicating CGRP/RAMP1 signaling in a subset of these responses, the study connects meningeal macrophage activity to clinically relevant neuroimmune pathways involved in migraine and other neurological disorders.

      Thank you for recognizing the strengths in our work.

      Weaknesses: 

      (1) The manuscript relies heavily on Pf4Cre-driven GCaMP6s expression to selectively image meningeal macrophages. Although prior studies are cited to support Pf4 specificity, Pf4 is not an exclusively macrophage-restricted marker, and developmental recombination cannot be excluded. The authors should provide direct validation of reporter specificity in the adult meninges (e.g., co-labeling with established macrophage markers and exclusion of other Pf4-expressing lineages). At minimum, the limitations of Pf4Cre-based labeling should be discussed more explicitly, particularly regarding how off-target expression might affect Ca<sup>2+</sup> signal interpretation.

      We acknowledge that PF4 is not an exclusively macrophage-restricted marker. Yet, among meningeal immunocytes, it is almost exclusively expressed in macrophages (1, 2). Furthermore, in the adult mouse meninges, Pf4<sup>Cre</sup>-based reporter lines label nearly all dural and leptomeningeal macrophages and almost no other cells (3, 4). This Cre line has also been used to target border-associated macrophages (2, 4). Moreover, a recent study suggests that the bacterial artificial chromosome used to generate the Pf4<sup>Cre</sup> line does not affect meningeal macrophage activity (4). Nonetheless, while we already discussed PF4 expression in meningeal megakaryocytes, in a revised version, we plan to discuss the possibility that a very small population of other meningeal immune cells may also be labeled.

      (2) The manuscript offers an extensive characterization of Ca<sup>2+</sup> event features (frequency spectra, propagation patterns, synchrony), but the biological significance of these signals is largely speculative. There is no direct link established between Ca<sup>2+</sup> activity patterns and macrophage function (e.g., activation state, motility, cytokine release, or interaction with other meningeal components). The discussion frequently implies functional specialization based on Ca<sup>2+</sup> dynamics without experimental validation. To strengthen the conceptual impact, a clearer framing of the study as a foundational descriptive resource, rather than a functional dissection, would improve alignment between data and conclusions.

      In our discussion, we indicated that “the exact link between the distinct Ca<sup>2+</sup> signal properties of meningeal macrophage subsets observed herein and their homeostatic function remains to be established”. In a revised version, we plan to further acknowledge that this is primarily a descriptive study that provides a foundational landscape of Ca<sup>2+</sup> dynamics in meningeal macrophages.

      (3) The GLM analysis revealing coupling between dural perivascular macrophage Ca<sup>2+</sup> activity and vasomotion is technically sophisticated and intriguing. However, the directionality of this relationship remains unresolved. The current data do not distinguish whether macrophages actively regulate vasomotion, respond to mechanical or hemodynamic changes, or are co-modulated by neural activity. Statements suggesting that macrophages may "mediate" vasomotion are therefore premature. The authors should reframe these conclusions more cautiously, emphasizing correlation rather than causation, and expand the discussion to explicitly outline experimental strategies required to establish causality (e.g., macrophage-specific Ca<sup>2+</sup> manipulation). 

      In the results section, we indicated that our data suggest that dural perivascular macrophages are functionally coupled to locomotion-driven dural vasomotion, either responding to it or mediating it. Furthermore, in our discussion, we discussed the possibilities that 1) macrophages sense vascular-related mechanical changes and 2) macrophage Ca<sup>2+</sup> signaling may regulate dural vasomotion. Moreover, we explicitly state that studying causality will require an experimental approach that has yet to be developed, enabling selective manipulation of dural perivascular macrophages.

      (4) The authors conclude that synchronous Ca<sup>2+</sup> events across macrophages are driven by extrinsic signals rather than intercellular communication, based primarily on distance-time analyses. This conclusion is not sufficiently supported, as spatial independence alone does not exclude paracrine signaling, vascular cues, or network-level coordination. No perturbation experiments are presented to test alternative mechanisms. The authors can either provide additional experimental evidence or rephrase the conclusion to acknowledge that the source of synchrony remains unresolved. 

      Thank you for this suggestion. In the revision, we will indicate that the source of synchrony remains unresolved.

      (5) A major and potentially important finding is that the dominant macrophage response to CSD is a persistent decrease in Ca<sup>2+</sup> activity, which is independent of CGRP/RAMP1 signaling. However, this phenomenon is not mechanistically explored. It remains unclear whether Ca<sup>2+</sup> suppression reflects macrophage inhibition, altered viability, homeostatic resetting, or an anti-inflammatory program. Minimally, the discussion should be more deeply engaged with possible interpretations and implications of this finding. 

      While we propose that the decrease in macrophage calcium signaling following CSD could indicate that a hyperexcitable cortex dampens meningeal immunity, in the revised version, we plan to elaborate on the possible implications of this finding.

      (6) The pharmacological blockade of RAMP1 supports a role for CGRP signaling in persistent Ca<sup>2+</sup> increases after CSD, but the experiments are based on a relatively small number of cells and animals. The limited sample size constrains confidence in the generality of the conclusions. Pharmacological inhibition alone does not establish cell-autonomous effects in macrophages. The authors should acknowledge these limitations more explicitly and avoid overextension of the conclusions. 

      We plan to acknowledge these limitations.

      Reviewer #2 (Public review): 

      Using chronic intravital two-photon imaging of calcium dynamics in meningeal macrophages in Pf4Cre:TIGRE2.0-GCaMP6 mice, the study identified heterogeneous features of perivascular and non-perivascular meningeal macrophages at steady state and in response to cortical spreading depolarization (CSD). Analyses of calcium dynamics and blood vessels revealed a subpopulation of perivascular meningeal macrophages whose activity is coupled to behaviorally driven diameter fluctuations of their associated vessels. The analyses also investigated synchrony between different macrophage populations and revealed a role for CGRP/RAMP1 signaling in the CSD-induced increase, but not the decrease, in calcium transients.

      This is a timely study at both the technical and conceptual levels, examining calcium dynamics of meningeal macrophages in vivo. The conclusions are well supported by the findings and will provide an important foundation for future research on immune cell dynamics within the meninges in vivo. The paper is well written and clearly presented.

      Thank you.

      I have only minor comments. 

      (1) Please indicate the formal definition of perivascular versus non-perivascular macrophages in terms of distance from the blood vessel. This information is not provided in the main text or the Methods. In addition, please explain how the meningeal vasculature was imaged in the main text. 

      We did not measure the exact distance of the perivascular macrophages from the blood vessels, but defined them as such based on previous data showing that these cells reside along the abluminal surface and maintain tight interactions with mural cells (5). We plan to provide this information in the revised manuscript.

      (2) Similarly, the method used to induce acute CSD (pin prick) is not described in the main text and is only mentioned in the figure legends and Methods. Additional background on the neurobiology of acute CSD, as well as the resulting brain activity and neuroinflammatory responses, could be helpful.

      We plan to add the method for inducing CSD (i.e., a pinprick in the frontal cortex) to the Results section and provide more background in the Introduction section.

      Reviewer #3 (Public review):

      Strengths: 

      Sophisticated in vivo imaging of meningeal immune cells is employed in the study, which has not been performed previously. A detailed analysis of the distinct calcium dynamics in various subtypes of meningeal macrophages is provided. Functional relevance of the responses is also noted in relation to CSD events.

      Thank you for recognizing the strengths of our paper

      Weaknesses:

      (1) The specificity of the methods used to target both meningeal macrophages and RAMP1 is limited. Additional discussion points on the functional relevance of the two subtypes of meningeal macrophages and their calcium responses are warranted. A section on potential pitfalls should be included. 

      We plan to address these issues in the revision

      References

      (1) H. Van Hove et al., A single-cell atlas of mouse brain macrophages reveals unique transcriptional identities shaped by ontogeny and tissue environment. Nat Neurosci 22, 1021-1035 (2019).

      (2) F. A. Pinho-Ribeiro et al., Bacteria hijack a meningeal neuroimmune axis to facilitate brain invasion. Nature 615, 472-481 (2023).

      (3) G. L. McKinsey et al., A new genetic strategy for targeting microglia in development and disease. Elife 9,  (2020).

      (4) H. J. Barr et al., The circadian clock regulates scavenging of fluid-borne substrates by brain border-associated macrophages. bioRxiv,  (2025).

      (5) H. Min et al., Mural cells interact with macrophages in the dura mater to regulate CNS immune surveillance. J Exp Med 221,  (2024).

    1. Author response:

      Public reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The assessment of liver and adipose tissue responses to DHH7 loss is insufficient to support claims that it alters systemic lipolysis. In this new mouse model, liver histology is necessary, especially given the cholesterol increase in the KO. As this is a newly established mouse line, common assessments of the liver during HFD feeding would be important for interpreting the phenotype.

      We will add the data of the liver histology in the revised version.

      (2) The data show DHH7 loss causes adipose tissue dysfunction and alterations in lipid metabolism. Beyond that, I suggest not stating more regarding the phenotype of the DHH7 mice for this work. A thorough analysis would be needed to determine which factor drives the obesity and changes in energy balance in the mice. For example, the KO mice had lower oxygen consumption (but no change in CO2 production, which is also usually similarly altered), suggesting a CNS component could drive obesity. However, since the data are not normalized for lean mass and there is no information about locomotor activity, this analysis is incomplete. RER may be informative if available. A broad conservative description of the KO phenotype would be more accurate since Pgr4 has many paracrine targets and likely has autocrine signaling in the liver.

      We will add the data of CO2 production, locomotor activity and RER in the revised version.

      (3) Most references to lipolysis or lipolysis flux systemically would be inaccurate. To suggest a suppression of lipolysis, serum NEFA would need to be measured, and in vivo or in vitro lipolysis assays performed to test the effect of DHH7 loss or the specificity of PGR4 action on adipocytes in vivo. To demonstrate adipose tissue dysfunction, analysis of lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction should be performed/measured.

      We will measure the serum NEFA to test the effect of DHHC7. We will analyze the lipogenesis markers, canonical markers for insulin sensitivity, and mitochondrial dysfunction.

      (4) Line 179: The experiment was performed in brown adipocytes to show that Prg4 does not affect p-CREB Figure S8 under the heading: "DHHC7 controls hepatic PKA-CREB activity through Gαi palmitoylation to regulate Prg4 transcription." Unless repeated using liver lysate, the conclusions stated in the text throughout the paper should be revised.

      The figure S8 is to demonstrate that Prg4 has no impact on forskolin induced CREB phosphorylation at Ser133, and provide the evidence that the prg4 acts on the upstream of adenylyl cyclase. We will revise the description.

      (5) It appears that the serum and liver proteomics were only assessed for factors that increased in KO mice? Were proteins that were significantly decreased analyzed?

      We are analyzing the decreased proteins in the following project.

      (6) The beige adipocyte culture method is unclear. The methods do not describe the fat pad used, and the protocol suggests the cells would be differentiated into mature white adipocytes. If they are beige cells, a reference for the method, gene expression, and cell images could support that claim.

      We will add a reference for the method, gene expression, asn cell images.

      (7) The use of tamoxifen can confound adipocyte studies, as it increases beigeing and weight gain even after a brief initiation period. Both groups were treated with Tam, but another way to induce Cre would be ideal.

      We will use the Doxycycline-inducible systems in the future.

      (8) Evidence for the lack of the glucose phenotype is incomplete. One reason could be due to the IP route of glucose administration, which has a large impact on glucose handling during a GTT. To confirm the absence of a glucose tolerance phenotype, an OGTT should be performed, as it is more physiological. In addition, the mice should be fed for 16 weeks. Prg4 affects immune cells, changing how adipose tissue expands, and 12 weeks of HFD feeding is often not long enough to see the effects of adipose tissue inflammation spilling over into the system.

      We will perform the OGTT and feed the mice for 16 weeks in the future.

      (9) There may be liver-adipose tissue crosstalk in KO mice, but this was not fully assessed in this study and would be difficult to determine in any setting, given the diverse cell types that are targets of Pdg4. The crosstalk claim is unnecessary to share the basic premises; there is the DHH7 mechanism/phenotype and the Pgr4 mechanism/phenotype, and while there is no Pgr4 adipose direct mechanism, the paper can be successfully reframed.

      We will reframe the paper.

      (10) Although the DHH7 loss on the chow diet did not result in a phenotype, did the Pgr4 increase in the KO mice on chow? This would determine whether either i) the expression of Pgr4 is dependent on HFD/obesity, or ii) circulating Pgr4 has effects only in an HFD condition. The receptors may also change on HFD, especially in adipocytes.

      We will test the Prg4 in the KO mice on chow diet.

      Reviewer #2 (Public review):

      (1) Figures: All data should be presented in dot-boxplot format so the reader knows how many samples were analyzed for each assay and group. n=3 for some assays/experiments is incredibly low, particularly when considering the heterogeneity in responsiveness to HFD, food intake, etc.

      We will present the data in dot-boxplot format.

      (2) Figure 1E-F: It is unclear when the food intake measure was performed. Mice can alter their feeding behavior based on a myriad of environmental and biological cues. It would also be interesting to show food intake data normalized to body mass over time. Mice can counterregulate anorexigenic cues by altering neuropeptide production over time. It is not clear if this is occurring in these mice, but the timing of measuring food intake is important. Additionally, the VO2 measure appears to be presented as being normalized to total body mass, when in fact, it would probably be more accurate to normalize this to lean body mass. Normalizing to total body mass provides a denominator effect due to excessive adiposity, but white fat is not as metabolically active as other high-glucose-consuming tissues. If my memory serves me right, several reports have discussed appropriate normalizations in circumstances such as this.

      We will see how to be more accurate to normalize.

      (3) Figure 1J-N: It is not all that surprising that fasting glucose and/or TGs were found to be similar between groups. It is well-established that mice have an incredible ability to become hyperinsulinemic in an effort to maintain euglycemia and lipid metabolism dynamics. A few relatively easy assays can be performed to glean better insights into the metabolic status of the authors' model. First, fasting insulin concentrations will be incredibly helpful. Secondly, if the authors want to tease out which adipose depot is most adversely affected by ablation, they could take an additional set of CON and KO mice, fast them for 5-6 hours, provide a bolus injection of insulin (similar to that provided during an insulin tolerance test), and then quickly harvest the animals ~15 minutes after insulin injections; followed by evaluating AKT phosphorylation. This will really tell them if these issues have impairments in insulin signaling. The gold-standard approach would be to perform a hyperinsulinemic-euglyemic clamp in the CON and KO mice. I now see GTT and ITT data, but the aforementioned assays could help provide insight.

      We have the data for evaluating AKT phosphorylation and will add it in the revised version.

      (4) Figure 3A: This looks overexposed to me.

      We will replace it with short exposed one.

      (5) Figures 3-4: It appears that several of these assays could be complemented with culture-based models, which would almost certainly be cleaner. The conditioned media could then be used from hepatocyte cultures to treat differentiated adipocytes.

      We will perform the cell culture experiments for Figures 3-4

      (6) Figure 4: It is unclear how to interpret the phospho-HSL data because the fasting state can affect this readout. It needs to be made clear how the harvest was done. Moreover, insulin and glucagon were never measured, and these hormones have a significant influence over HSL activity. I suspect the KO mice have established hyperinsulinemia, which would likely affect HSL activity. This provides an example of why performing some of these experiments in a dish would make for cleaner outcomes that are easier to interpret.

      We will perform some experiments in cell culture dish.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) Lack of a causal-effect study to generate evidence directly linking hepatocyte DHH7 and PRG4 in driving adipose expansion and obesity upon HFD feeding.

      We will perform the causal-effect study to demonstrate the hypothesis.

      (2) Lack of direct evidence to support that PRG4 inhibits adipocyte lipolysis via GPR146. A functional assay demonstrating adipocyte lipolysis is required.

      We will add the direct evidence in the revised version.

      (3) The conclusion is largely based on the correlation evidence.

      We will perform the experiment to strengthen the conclusion base on the a causal-effect study.

  2. Feb 2026
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Lin et al. presents a timely, technically strong study that builds patientspecific midbrain-like organoids (MLOs) from hiPSCs carrying clinically relevant GBA1 mutations (L444P/P415R and L444P/RecNcil). The authors comprehensively characterize nGD phenotypes (GCase deficiency, GluCer/GluSph accumulation, altered transcriptome, impaired dopaminergic differentiation), perform CRISPR correction to produce an isogenic line, and test three therapeutic modalities (SapC-DOPS-fGCase nanoparticles, AAV9GBA1, and SRT with GZ452). The model and multi-arm therapeutic evaluation are important advances with clear translational value.

      My overall recommendation is that the work undergo a major revision to address the experimental and interpretive gaps listed below.

      Strengths:

      (1) Human, patient-specific midbrain model: Use of clinically relevant compound heterozygous GBA1 alleles (L444P/P415R and L444P/RecNcil) makes the model highly relevant to human nGD and captures patient genetic context that mouse models often miss.

      (2) Robust multi-level phenotyping: Biochemical (GCase activity), lipidomic (GluCer/GluSph by UHPLC-MS/MS), molecular (bulk RNA-seq), and histological (TH/FOXA2, LAMP1, LC3) characterization are thorough and complementary.

      (3) Use of isogenic CRISPR correction: Generating an isogenic line (WT/P415R) and demonstrating partial rescue strengthens causal inference that the GBA1 mutation drives many observed phenotypes.

      (4) Parallel therapeutic testing in the same human platform: Comparing enzyme delivery (SapC-DOPS-fGCase), gene therapy (AAV9-GBA1), and substrate reduction (GZ452) within the same MLO system is an elegant demonstration of the platform's utility for preclinical evaluation.

      (5) Good methodological transparency: Detailed protocols for MLO generation, editing, lipidomics, and assays allow reproducibility

      Weaknesses:

      (1) Limited genetic and biological replication

      (a) Single primary disease line for core mechanistic claims. Most mechanistic data derive from GD2-1260 (L444P/P415R); GD2-10-257 (L444P/RecNcil) appears mainly in therapeutic experiments. Relying primarily on one patient line risks conflating patient-specific variation with general nGD mechanisms.

      We thank the reviewer for highlighting the importance of genetic and biological replication. An additional patient-derived iPSC line was included in the manuscript, therefore, our study includes two independent nGD patient-derived iPSC lines, GD2-1260 (GBA1<sup>L444P/P415R</sup>) and GD2-10-257 (GBA1<sup>L444P/RecNcil</sup>), both of which carry the severe mutations associated with nGD. These two lines represent distinct genetic backgrounds and were used to demonstrate the consistency of key disease phenotypes (reduced GCase activity, elevated substrate, impaired dopaminergic neuron differentiation, etc.) across different patient’s MLOs. Major experiments (e.g., GCase activity assays, substrate, immunoblotting for DA marker TH, and therapeutic testing with SapC-DOPS-fGCase, AAV9-GBA1) were performed using both patient lines, with results showing consistent phenotypes and therapeutic responses (see Figs. 2-6, and Supplementary Figs. 4-5). To ensure clarity and transparency, a new Supplementary Table 2 summarizes the characterization of both the GD2-1260 and GD2-10-257 lines.

      (b) Unclear biological replicate strategy. It is not always explicit how many independent differentiations and organoid batches were used (biological replicates vs. technical fields of view).

      Biological replication was ensured in our study by conducting experiments in at least 3 independent differentiations per line, and technical replicates (multiple organoids/fields per batch) were averaged accordingly. We have clarified biological replicates and differentiation in the figure legends. 

      (c) A significant disadvantage of employing brain organoids is the heterogeneity during induction and potential low reproducibility. In this study, it is unclear how many independent differentiation batches were evaluated and, for each test (for example, immunofluorescent stain and bulk RNA-seq), how many organoids from each group were used. Please add a statement accordingly and show replicates to verify consistency in the supplementary data.

      In the revision, we have clarified biological replicates and differentiation in the figure legend in Fig.1E; Fig.2B,2G; Fig.3F, 3G; Fig.4B-C,E,H-J, M-N; Fig.6D; and Fig.7A-C, I.

      (d) Isogenic correction is partial. The corrected line is WT/P415R (single-allele correction); residual P415R complicates the interpretation of "full" rescue and leaves open whether the remaining pathology is due to incomplete correction or clonal/epigenetic effects.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was not feasible because GBA1 overlaps with a highly homologous pseudogene (PGBA), which makes precise editing technically challenging. Consequently, only the L444P mutation was successfully corrected, and the resulting isogenic line retains the P415R mutation in a heterozygous state. Because Gaucher disease is an autosomal recessive disorder, individuals carrying a single GBA1 mutation (heterozygous carriers) do not develop clinical symptoms. Therefore, the partially corrected isogenic line, which retains only the P415R allele, represents a clinically relevant carrier model. Consistent with this, our results show that GCase activity was restored to approximately 50% of wild-type levels (Fig.4B-C), supporting the expected heterozygous state. These findings also make it unlikely that the remaining differences observed are due to clonal variation or epigenetic effects.

      (e) The authors tested week 3, 4, 8, 15, and 28 old organoids in different settings. However, systematic markers of maturation should be analyzed, and different maturation stages should be compared, for example, comparing week 8 organoids to week 28 organoids, with immunofluorescent marker staining and bulk RNAseq.

      We agree that a systematic analysis of maturation stages is essential for validating the MLO model. Our data integrated a longitudinal comparison across multiple developmental windows (Weeks 3 to 28) to characterize the transition from progenitors to mature/functional states for nGD phenotyping and evaluation of therapeutic modalities: 1) DA differentiation (Wks 3 and 8 in Fig. 3): qPCR analysis demonstrated the progression of DA-specific programs. We observed a steady increase in the mature DA neuron marker TH and ASCL1. This was accompanied by a gradual decrease in early floor plate/progenitor markers FOXA2 and PLZF, indicating a successful differentiation path from progenitors to differentiated/mature DA neurons. 2) Glycosphingolipid substrates accumulation (Wks 15 and 28 in Fig 2): To assess late-stage nGD phenotyping, we compared GluCer and GluSph at Week 15 and Week 28. This comparison highlights the progressive accumulation of substrates in nGD MLOs, reflecting the metabolic consequences of the disease at different mature stage. 3) Organoid growth dynamics (Wks 4, 8, and 15 in new Fig. 4): The new Fig. 4 tracks physical maturation through organoid size and growth rates across three key time points, providing a macro-scale verification of consistent development between WT and nGD groups. By comparing these early (Wk 3-8) and late (Wk 15-28) stages, we confirmed that our MLOs transition from a proliferative state to a post-mitotic, specialized neuronal state, satisfied the requirement for comparing distinct maturation stages.

      (f) The manuscript frequently refers to Wnt signaling dysregulation as a major finding. However, experimental validation is limited to transcriptomic data. Functional tests, such as the use of Wnt agonist/inhibitor, are needed to support this claim (see below).

      We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work.

      (g) Suggested fixes / experiments

      Add at least one more independent disease hiPSC line (or show expanded analysis from GD2-10-257) for key mechanistic endpoints (lipid accumulation, transcriptomics, DA markers).

      Additional line iPSC GD2-10-257 derived MLO was included in the manuscript. This was addressed above [see response to Weaknesses (1)-a]. 

      Generate and analyze a fully corrected isogenic WT/WT clone (or a P415R-only line) if feasible; at minimum, acknowledge this limitation more explicitly and soften claims.

      We attempted to generate an isogenic iPSC line by correcting both GBA1 mutations (L444P and P415R). However, this was unsuccessful because the GBA1 gene overlaps with a pseudogene (PGBA) located 16 kb downstream of GBA1, which shares 96-98% sequence similarity with GBA1 (Ref#1, #2), which complicates precise editing. GBA1 is shorter (~5.7 kb) than PGBA (~7.6 kb). The primary exonic difference between GBA1 and PGBA is a 55-bp deletion in exon 9 of the pseudogene. As a result, the isogenic line we obtained carries only the P415R mutation, and L444P was corrected to the normal sequence. We have included this limitation in the Methods as “This gene editing strategy is expected to also target the GBA1 pseudogene due to the identical target sequence, which limits the gene correction on certain mutations (e.g., P415R)”. 

      References:

      (1) Horowitz M., Wilder S., Horowitz Z., Reiner O., Gelbart T., Beutler E. The human glucocerebrosidase gene and pseudogene: structure and evolution. Genomics (1989). 4, 87–96. doi:10.1016/0888-7543(89)90319-4

      (2) Woo EG, Tayebi N, Sidransky E. Next-Generation Sequencing Analysis of GBA1: The Challenge of Detecting Complex Recombinant Alleles. Front Genet. (2021). 12:684067. doi:10.3389/fgene.2021.684067. PMCID: PMC8255797.

      Report and increase independent differentiations (N = biological replicates) and present per-differentiation summary statistics.

      This was addressed above [see response to Weaknesses (1)-b, (1)-c]. 

      (2) Mechanistic validation is insufficient

      (a) RNA-seq pathways (Wnt, mTOR, lysosome) are not functionally probed. The manuscript shows pathway enrichment and some protein markers (p-4E-BP1) but lacks perturbation/rescue experiments to link these pathways causally to the DA phenotype.

      (b) Autophagy analysis lacks flux assays. LC3-II and LAMP1 are informative, but without flux assays (e.g., bafilomycin A1 or chloroquine), one cannot distinguish increased autophagosome formation from decreased clearance.

      (c) Dopaminergic dysfunction is superficially assessed. Dopamine in the medium and TH protein are shown, but no neuronal electrophysiology, synaptic marker co-localization, or viability measures are provided to demonstrate functional recovery after therapy.

      (d) Suggested fixes/experiments

      Perform targeted functional assays:

      (i) Wnt reporter assays (TOP/FOP flash) and/or treat organoids with Wnt agonists/antagonists to test whether Wnt modulation rescues DA differentiation.

      (ii) Test mTOR pathway causality using mTOR inhibitors (e.g., rapamycin) or 4E-BP1 perturbation and assay effects on DA markers and autophagy.

      Include autophagy flux assessment (LC3 turnover with bafilomycin), and measure cathepsin activity where relevant.

      Add at least one functional neuronal readout: calcium imaging, MEA recordings, or synaptic marker quantification (e.g., SYN1, PSD95) together with TH colocalization.

      We thank the reviewer for these valuable suggestions. We agree that the suggested experiments could provide additional mechanistic insights into this study and will consider them in future work. Importantly, the primary conclusions of our manuscript, that GBA1 mutations in nGD MLOs resulted in nGD pathologies such as diminished enzymatic function, accumulation of lipid substrates, widespread transcriptomic changes, and impaired dopaminergic neuron differentiation, which can be corrected by several therapeutic strategies in this study, are supported by the evidence presented. The suggested experiments represent an important direction for future research using brain organoids.

      (3) Therapeutic evaluation needs greater depth and standardization

      (a) Short windows and limited durability data. SapC-DOPS and AAV9 experiments range from 48 hours to 3 weeks; longer follow-up is needed to assess durability and whether biochemical rescue translates into restored neuronal function.

      We agree with the reviewer. Because this is a proof-of-principle study, the treatment was designed within a short time window. Long-term studies with more comprehensive outcome assessments will be conducted in future work.

      (b) Dose-response and biodistribution are under-characterized. AAV injection sites/volumes are described, but transduction efficiency, vg copies per organoid, cell-type tropism quantification, and SapC-DOPS penetration/distribution are not rigorously quantified.

      We appreciate the reviewer’s concerns. This study was intended to demonstrate the feasibility and initial response of MLOs to AAV therapy. A comprehensive evaluation of AAV biodistribution will be considered in future studies.

      The penetration and distribution of SapC-DOPS have been extensively characterized in prior studies. In vivo biodistribution of SapC–DOPS coupled CellVue Maroon, a fluorescent cargo, was examined in mice bearing human tumor xenografts using real-time fluorescence imaging, where CellVue Maroon fluorescence in tumor remained for 48 hours (Ref. #3: Fig. 4B, mouse 1), 100 hours (Ref. #4: Fig. 5), up to 216 hours (Ref. #5: Fig. 3). Uptake kinetics were also demonstrated in cells, with flow cytometry quantification showing that fluorescent cargo coupled SapC-DOPS nanovesicles, were incorporated into human brain tumor cell membranes within minutes and remained stably incorporated into the cells for up to one hour (Ref. # 6: Fig. 1a and Fig. 1b). Building on these findings, the present study focuses on evaluating the restoration of GCase function rather than reexamining biodistribution and uptake kinetics.

      References:

      (3) X. Qi, Z. Chu, Y.Y. Mahller, K.F. Stringer, D.P. Witte, T.P. Cripe. Cancer-selective targeting and cytotoxicity by liposomal-coupled lysosomal saposin C protein. Clin. Cancer Res. (2009) 15, 5840-5851. PMID: 19737950.

      (4) Z. Chu, S. Abu-Baker, M.B. Palascak, S.A. Ahmad, R.S. Franco, and X. Qi. Targeting and cytotoxicity of SapC-DOPS nanovesicles in pancreatic cancer. PLOS ONE (2013) 8, e75507. PMID: 24124494.

      (5) Z. Chu, K. LaSance, V.M. Blanco, C.-H. Kwon, B., Kaur, M., Frederick, S., Thornton, L., Lemen, and X. Qi. Multi-angle rotational optical imaging of brain tumors and arthritis using fluorescent SapC-DOPS nanovesicles. J. Vis. Exp. (2014) 87, e51187, 17. PMID: 24837630.

      (6) J. Wojton, Z. Chu, C-H. Kwon, L.M.L. Chow, M. Palascak, R. Franco, T. Bourdeau, S. Thornton, B. Kaur, and X. Qi. Systemic delivery of SapC-DOPS has antiangiogenic and antitumor effects against glioblastoma. Mol. Ther. (2013) 21, 1517-1525. PMID: 23732993.

      (c) Specificity controls are missing. For SapC-DOPS, inclusion of a non-functional enzyme control (or heat-inactivated fGCase) would rule out non-specific nanoparticle effects. For AAV, assessment of off-target expression and potential cytotoxicity is needed.

      Including inactive fGCase would confound the assessment of fGCase in MLOs by immunoblot and immunofluorescence; therefore, saposin C–DOPS was used as the control instead. 

      We agree that assessment of Off-target expression and potential cytotoxicity for AAV is important; this will be included in future studies.

      (d) Comparative efficacy lacking. It remains unclear which modality is most effective in the long term and in which cellular compartments.

      To address this comment, we have added a new table (Supplementary Table 2) comparing the four therapeutic modalities and summarizing their respective outcomes. While this study focused on short-term responses as a proof-of-principle, future work will explore long-term therapeutic effects. 

      (e) Suggested fixes/experiments

      Extend follow-up (e.g., 6+ weeks) after AAV/SapC dosing and evaluate DA markers, electrophysiology, and lipid levels over time.

      We appreciate the reviewer’s suggestions. The therapeutic testing in patient-derived MLOs was designed as a proof-of-principle study to demonstrate feasibility and the primary response (rescue of GCase function) to the treatment. A comprehensive, long-term therapeutic evaluation of AAV and SapC-DOPS-fGCase is indeed important for a complete assessment; however, this represents a separate therapeutic study and is beyond the scope of the current work.

      Quantify AAV transduction by qPCR for vector genomes and by cell-type quantification of GFP+ cells (neurons vs astrocytes vs progenitors).

      For the AAV-treated experiments, we agree that measuring AAV copy number and GFP expression would provide additional information. However, the primary goal of this study was to demonstrate the key therapeutic outcome, rescue of GCase function by AAV-delivered normal GCase, which is directly relevant to the treatment objective.

      Include SapC-DOPS control nanoparticles loaded with an inert protein and/or fluorescent cargo quantitation to show distribution and uptake kinetics.

      As noted above [see response to Weakness (3)-c], using inert GCase would confound the assessment of fGCase uptake in MLOs; therefore, it was not suitable for this study. See response above for the distribution and uptake kinetics of SapC-DOPS [see response to Weaknesses (3)-b].

      Provide head-to-head comparative graphs (activity, lipid clearance, DA restoration, and durability) with statistical tests.

      We have added a new table (Supplementary Table 2) providing a head-to-head comparison of the treatment effects. 

      (4) Model limitations not fully accounted for in interpretation

      (a) Absence of microglia and vasculature limits recapitulation of neuroinflammatory responses and drug penetration, both of which are important in nGD. These absences could explain incomplete phenotypic rescues and must be emphasized when drawing conclusions about therapeutic translation.

      We agree that the absence of microglia and vasculature in midbrain-like organoids represents a limitation, as we have discussed in the manuscript. In this revision, we highlighted this limitation in the Discussion section and clarified that it may contribute to incomplete phenotyping and phenotypic rescue observed in our therapeutic experiments. Additionally, we have outlined future directions to incorporate microglia and vascularization into the organoid system to better recapitulate the in vivo environment and improve translational relevance (see 7th paragraph in the Discussion).

      (b) Developmental vs degenerative phenotype conflation. Many phenotypes appear during differentiation (patterning defects). The manuscript sometimes interprets these as degenerative mechanisms; the distinction must be clarified.

      We appreciate the reviewer’s comments. In the revised manuscript, we have clarified that certain abnormalities, such as patterning defects observed during early differentiation, likely reflect developmental consequences of GBA1 mutations rather than degenerative processes. Conversely, phenotypes such as substrate accumulation, lysosomal dysfunction, and impaired dopaminergic maturation at later stages are interpreted as degenerative features. We have updated the Results and Discussion sections to avoid conflating developmental defects with neurodegenerative mechanisms.

      (c) Suggested fixes

      Tone down the language throughout (Abstract/Results/Discussion) to avoid overstatement that MLOs fully recapitulate nGD neuropathology.

      The manuscript has been revised to avoid overstatements.

      Add plans or pilot data (if available) for microglia incorporation or vascularization to indicate how future work will address these gaps.

      The manuscript now includes further plans to address the incorporation of microglia and vascularization, described in the last two paragraphs in the Discussion. Pilot study of microglia incorporation will be reported when it is completed.

      (5) Statistical and presentation issues

      (a) Missing or unclear sample sizes (n). For organoid-level assays, report the number of organoids and the number of independent differentiations.

      We have clarified biological replicates and differentiation in the figure legend [see response to Weaknesses (1)-b, (1)-c]. 

      (b) Statistical assumptions not justified. Tests assume normality; where sample sizes are small, consider non-parametric tests and report exact p-values.

      We have updated Statistical analysis in the methods as described below:

      “For comparisons between two groups, data were analyzed using unpaired two-tailed Student’s t-tests when the sample size was ≥6 per group and normality was confirmed by the Shapiro-Wilk test. When the normality assumption was not met or when sample sizes were small (n < 6), the non-parametric Mann-Whitney U test was used instead. For comparisons involving three or more groups, one-way ANOVA followed by Tukey’s multiple comparison test was applied when data were normally distributed; otherwise, the nonparametric Dunn’s multiple comparison test was used. Exclusion of outliers was made based on cut-offs of the mean ±2 standard deviations. All statistical analyses were performed using GraphPad Prism 10 software. Exact p-values are reported throughout the manuscript and figures where feasible. A p-value < 0.05 was considered statistically significant.”

      (c) Quantification scope. Many image quantifications appear to be from selected fields of view, which are then averaged across organoids and differentiations.

      In this work, quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM. This multilevel averaging approach minimizes bias from regional heterogeneity within organoids and accounts for variability across differentiations. Representative confocal images shown in the figures were selected to accurately reflect the quantified data. We believe this standardized quantification strategy ensures robust and reproducible results while appropriately representing the 3D architecture of the organoids.

      In the revision, we have clarified the method used for image analysis of sectioned MLOs as below:

      “Quantitative immunofluorescence analyses (e.g., cell counts for FOXP1+, FOXG1+, SOX2+ and Ki67+ cells, as well as marker colocalization) were performed using ImageJ (NIH) on at least 3–5 randomly selected non-overlapping fields of view (FOVs) per organoid section, with a minimum of 3 organoids per differentiation batch. Each FOV was imaged at consistent magnification (60x) and z-stack depth to ensure comparable sampling across conditions. Data from individual FOVs were first averaged within each organoid to obtain an organoid-level mean, and then biological replicates (independent differentiations, n ≥ 3) were averaged to generate the final group mean ± SEM.”

      (d) RNA-seq QC and deposition. Provide mapping rates, batch correction details, and ensure the GEO accession is active. Include these in Methods/Supplement.

      RNA-seq data are from the same batch. The mapping rate is >90%. GEO accession will be active upon publication. These were included in the Methods.

      (e) Suggested fixes

      Add a table summarizing biological replicates, technical replicates, and statistical tests used for each figure panel.

      We have revised the figure legends to include replicates for each figure and statistical tests [see response in weaknesses (1)-b, (1)-c].

      Recompute statistics where appropriate (non-parametric if N is small) and report effect sizes and confidence intervals.

      Statistical analysis method is provided in the revision [see response in Weaknesses (5)-b].

      (6) Minor comments and clarifications

      (a) The authors should validate midbrain identity further with additional regional markers (EN1, OTX2) and show absence/low expression of forebrain markers (FOXG1) across replicates.

      We validated the MLO identity by 1) FOXG1 and 2) EN1. FOXG1 was barely detectable in Wk8 75.1_MLO but highly present in ‘age-matched’ cerebral organoid (CO), suggesting our culturing method is midbrain region-oriented. In nGD MLO, FOXG1 expression is significantly higher than 75.1_MLO, indicating that there was aberrant anterior-posterior brain specification, consistent with the transcriptomic dysregulation observed in our RNA-seq data.

      To further confirm midbrain identity, we examined the expression of EN1, an established midbrain-specific marker. Quantitative RT-PCR analysis demonstrated that EN1 expression increased progressively during differentiation in both WT-75.1 and nGD2-1260 MLOs at weeks 3 and 8 (Author response image 1). EN1 reached 34-fold and 373-fold higher levels than in WT-75.1 iPSCs at weeks 3 and 8, respectively, in WT-75.1 MLOs. In nGD MLOs, although EN1 expression showed a modest reduction at week 8, the levels were not significantly different from those observed in age-matched WT-75.1 MLOs (p > 0.05, ns).

      Author response image 1.

      qRT-PCR quantification of midbrain progenitor marker EN1 expression in WT-75.1 and GD2-1260 MLOs at Wk3 and Wk8. Data was normalized to WT-75.1 hiPSC cells and presented as mean ± SEM (n = 3-4 MLOs per group).ns, not significant.<br />

      (b) Extracellular dopamine ELISA should be complemented with intracellular dopamine or TH+ neuron counts normalized per organoid or per total neurons.

      We quantified TH expression at both the mRNA level (Fig. 3F) and the protein level (Fig. 3G/H) from whole-organoid lysates, which provides a more consistent and integrative measure across samples. These TH expression levels correlated well with the corresponding extracellular (medium) dopamine concentrations for each genotype. In contrast, TH⁺ neuron counts may not reliably reflect total cellular dopamine levels because the number of cells captured on each organoid section varies substantially, making normalization difficult. Measuring intracellular dopamine is an alternative approach that will be considered in future studies.

      (c) For CRISPR editing: the authors should report off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus. (off-target analysis (GUIDE-seq or targeted sequencing of predicted off-targets) or at least in-silico off-target score and sequencing coverage of the edited locus). 

      The off-target effect was analyzed during gene editing and the chance to target other off-targets is low due to low off-target scores ranked based on the MIT Specificity Score analysis. The related method was also updated as stated below:

      “The chance to target other Off-targets is low due to low Off-target scores ranked based on the MIT Specificity Score analysis (Hsu, P., Scott, D., Weinstein, J. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832 (2013).https://doi.org/10.1038/nbt.2647).”

      (d) It should be clarified as to whether lipidomics normalization is to total protein per organoid or per cell, and include representative LC-MS chromatograms or method QC.

      The normalization was to the protein of the organoid lysate. This was clarified in the Methods section in the revision as stated below:

      “The GluCer and GluSph levels in MLO were normalized to total MLO protein (mg) that were used for glycosphingolipid analyses. Protein mass was determined by BCA assay and glycosphingolipid was expressed as pmol/mg protein. Additionally, GluSph levels in the culture medium were quantified and normalized to the medium volume (pmol/mL).”

      Representative LC-MS chromatograms for both normal and GD MLOs have been included in a new figure, Supplementary Figure 2.

      (e) Figure legends should be improved in order to state the number of organoids, the number of differentiations, and the exact statistical tests used (including multiplecomparison corrections).

      This was addressed above [see response to Weaknesses (1)-b and (5)-b].

      (f) In the title, the authors state "reveal disease mechanisms", but the studies mainly exhibit functional changes. They should consider toning down the statement.

      The title was revised to: Patient-Specific Midbrain Organoids with CRISPR Correction Recapitulate Neuronopathic Gaucher Disease Phenotypes and Enable Evaluation of Novel Therapies

      (7) Recommendations

      This reviewer recommends a major revision. The manuscript presents substantial novelty and strong potential impact but requires additional experimental validation and clearer, more conservative interpretation. Key items to address are:

      (a) Strengthening genetic and biological replication (additional lines or replicate differentiations).

      This was addressed above [see response to Weaknesses (1)-a, (1)-b, (1)-c].

      (b) Adding functional mechanistic validation for major pathways (Wnt/mTOR/autophagy) and providing autophagy flux data.

      (c) Including at least one neuronal functional readout (calcium imaging/MEA/patch) to demonstrate functional rescue.

      As addressed above [see response to Weaknesses (2)], the suggested experiments in b) and c) would provide additional insights into this study and we will consider them in future work. 

      (d) Deepening therapeutic characterization (dose, biodistribution, durability) and including specificity controls.

      This was addressed above [see response to Weaknesses (3)-a to e].

      (e) Improving statistical reporting and explicitly stating biological replicate structure.

      This was addressed above [see response to Weaknesses (1)-b, (5)-b].

      Reviewer #2 (Public review):

      Sun et al. have developed a midbrain-like organoid (MLO) model for neuronopathic Gaucher disease (nGD). The MLOs recapitulate several features of nGD molecular pathology, including reduced GCase activity, sphingolipid accumulation, and impaired dopaminergic neuron development. They also characterize the transcriptome in the MLO nGD model. CRISPR correction of one of the GBA1 mutant alleles rescues most of the nGD molecular phenotypes. The MLO model was further deployed in proof-of-principle studies of investigational nGD therapies, including SapC-DOPS nanovesicles, AAV9-mediated GBA1 gene delivery, and substrate-reduction therapy (GZ452). This patient-specific 3D model provides a new platform for studying nGD mechanisms and accelerating therapy development. Overall, only modest weaknesses are noted.

      We thank the reviewer for the supportive remarks.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors describe modeling of neuronopathic Gaucher disease (nGD) using midbrain-like organoids (MLOs) derived from hiPSCs carrying GBA1 L444P/P415R or L444P/RecNciI variants. These MLOs recapitulate several disease features, including GCase deficiency, reduced enzymatic activity, lipid substrate accumulation, and impaired dopaminergic neuron differentiation. Correction of the GBA1 L444P variant restored GCase activity, normalized lipid metabolism, and rescued dopaminergic neuronal defects, confirming its pathogenic role in the MLO model. The authors further leveraged this system to evaluate therapeutic strategies, including: (i) SapC-DOPS nanovesicles for GCase delivery, (ii) AAV9-mediated GBA1 gene therapy, and (iii) GZ452, a glucosylceramide synthase inhibitor. These treatments reduced lipid accumulation and ameliorated autophagic, lysosomal, and neurodevelopmental abnormalities.

      Strengths:

      This manuscript demonstrates that nGD patient-derived MLOs can serve as an additional platform for investigating nGD mechanisms and advancing therapeutic development.

      Comments:

      (1) It is interesting that GBA1 L444P/P415R MLOs show defects in midbrain patterning and dopaminergic neuron differentiation (Figure 3). One might wonder whether these abnormalities are specific to the combination of L444P and P415R variants or represent a 

      general consequence of GBA1 loss. Do GBA1 L444P/RecNciI (GD2-10-257) MLOs also exhibit similar defects?

      We observed reduced dopaminergic neuron marker TH expression in GBA1 L444P/RecNciI (GD2-10-257) MLOs, suggesting that this line also exhibits defects in dopaminergic neuron differentiation. These data are provided in a new Supplementary Fig. 4E, and are summarized in new Supplementary Table 2 in the revision.

      (2) In Supplementary Figure 3, the authors examined GCase localization in SapC-DOPSfGCase-treated nGD MLOs. These data indicate that GCase is delivered to TH⁺ neurons, GFAP⁺ glia, and various other unidentified cell types. In fruit flies, the GBA1 ortholog, Gba1b, is only expressed in glia (PMID: 35857503; 35961319). Neuronally produced GluCer is transferred to glia for GBA1-mediated degradation. These findings raise an important question: in wild-type MLOs, which cell type(s) normally express GBA1? Are they dopaminergic neurons, astrocytes, or other cell types?

      All cell types in wild-type MLOs are expected to express GBA1, as it is a housekeeping gene broadly expressed across neurons, astrocytes, and other brain cell types. Its lysosomal function is essential for cellular homeostasis and is therefore not restricted to any specific lineage. (https://www.proteinatlas.org/ENSG00000177628GBA1/brain/midbrain). 

      (3) The authors may consider switching Figures 2 and 3 so that the differentiation defects observed in nGD MLOs (Figure 3) are presented before the analysis of other phenotypic abnormalities, including the various transcriptional changes (Figure 2).

      We appreciate the reviewer’s suggestion; however, we respectfully prefer to retain the current order of Figures 2 and 3, as we believe this structure provides the clearest narrative flow. Figure 2 establishes the core biochemical hallmarks: reduced GCase activity, substrate accumulation, and global transcriptomic dysregulation (1,429 DEGs enriched in neural development, WNT signaling, and lysosomal pathways), which together provide essential molecular context for studying the specific cellular differentiation defects presented in Figure 3. Presenting the broader disease landscape first creates a coherent mechanistic link to the subsequent analyses of midbrain patterning and dopaminergic neuron impairment.

      To enhance readability, we have added a brief transitional sentence at the start of the Figure 3 paragraph: “Building on the molecular and transcriptomic hallmarks of GCase deficiency observed in nGD MLOs (Figure 2), we next investigated the impact on midbrain patterning and dopaminergic neuron differentiation (Figure 3).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Joint Public reviews:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In this revision, we have:

      - clarified that while epidemics occur in other parts of sub-Saharan Africa, our results are consistent with the epidemiological narrative of RVF in The Gambia, characterised by sustained, moderate transmission without resulting in substantial outbreaks (hyperendemicity).

      - discussed how model assumptions (e.g. seasonality, homogenous mixing) may bias our results toward an endemic quasi-equilibrium dynamic.

      - highlighted the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In this revision we have:

      - clarified this distinction in the manuscript to avoid overinterpretation.

      - emphasized the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause substantial abortions and neonatal deaths, these events occur during sporadic epidemics. In the Gambian context, where we’re not observing large outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This approach is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we have acknowledged this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for the M subpopulation in the dry season were not included in the appendix due to an oversight, though demographic turnover was implemented in the model code. We have now added the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay) is an important consideration in RVFV serology; however, whether this decline reflects a true loss of protective immunity following natural infection remains unknown. Available evidence suggests that infected cattle likely develop long-lasting immunity, and findings in humans further support this assumption, although longitudinal field data regarding RVFV-specific antibody durability in animals are not available to the best of our knowledge. From a modelling perspective, our objective was to estimate FOI and use it to predict an age-seroprevalence curve consistent with the observed cross-sectional age-seroprevalence patterns. We therefore adopted a parsimonious SIR framework, interpreting loss of seropositivity as a potential explanation for discrepancies between observed and predicted age-seroprevalence rather than explicitly modelling waning immunity. We have now:

      - clarified this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      - highlighted that while an SEIS/SIRS framework could theoretically generate different long-term dynamics, evaluating this approach requires stronger evidence for true immunity loss.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment and for raising an important conceptual point. However, the force of infection in our study is not estimated using a serocatalytic framework. Instead, FOI is estimated mechanistically within the transmission model as a function of the number of infectious cattle, rather than from age-stratified seroprevalence data.

      RVF-induced mortality is accounted for through its effect on the infectious compartment, where increased mortality reduces the number and duration of infectious cattle and therefore indirectly reduces FOI. Consequently, RVF-related cattle death does not need to be explicitly incorporated into the FOI expression itself. Seroreversion similarly does not influence FOI estimation under this modelling framework. We have clarified this distinction in the Methods section to avoid confusion between mechanistic transmission models and serocatalytic approaches.

      (7) Clarifying previous vs. current study components

      We have revised the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We have expanded the limitations section to identify the sparse household movement data as contributing most to uncertainty. We have outlined how these limitations may have implications for our conclusions, and may lead to under- or over-estimation of periods of heightened transmission risk.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not ideally be suited to exploring a movement ban. In this revised manuscript, we have removed this analysis. We are currently developing separate work focused on RVF vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

      Reviewer #1 (Recommendations for the authors):

      We thank the reviewer for the recommendations regarding the Introduction, Methods, Results, and Supplementary Figures. We have addressed these points below and revised the manuscript accordingly.

      (1) Introduction: Should avoid describing as "inaccessible" the regions that are inhabited by nomadic and transhumant pastoralists.

      We have revised the wording to “hard-to-reach” regions.

      (2) Methods: Can the authors state what share of the animals included in the household survey data were cattle as opposed to other small ruminants? It would be helpful to understand what share of the data is "excluded"

      We have now included the total number of cattle sampled, providing clarity on the proportion of data used in the analyses.

      (3) Methods: When introducing the deterministic model, it seems unnecessary to mention the initialization conditions (i.e., introduction of a single infected individual at time 0) when this is later repeated in the Estimation of model parameters section, where it seems simulations were first conducted.

      We have removed the redundant description.

      (4) Results: Could the negative correlation between geographic distance of connected herds and mean seroprevalence simply indicate proximal exposure rather than common risk factors?

      We acknowledge that both mechanisms are plausible. RVFV transmission is strongly influenced by share environmental factors that shape mosquito dynamics; however, direct transmission between proximal cattle herds may also occur through close contact with infectious tissues, bodily fluids, or contaminated materials. We have clarified this interpretation in the Results section.

      (5) Figure S5: inconsistent notation for the scaling factor parameter (tau), which is expressed in equations and tables as psi.

      We thank the reviewer for identifying this issue and have corrected all instances to ensure consistent use of tau throughout the manuscript.

      (6) Figure S6: Why a density plot, isn't the number of temporary extinctions (x-axis) discrete?

      We have replaced the density plot with a bar plot in Figure S6.

    1. Author response:

      eLife Assessment

      This useful study examines whether the sugar trehalose, coordinates energy supply with the gene programs that build muscle in the cotton bollworm (Helicoverpa armigera). The evidence for this currently is incomplete. The central claim - that trehalose specifically regulates an E2F/Dp-driven myogenic program - is not supported by the specificity of the data: perturbations and sequencing are systemic, alternative explanations such as general energy or amino-acid scarcity remain plausible, and mechanistic anchors are also limited. The work will interest researchers in insect metabolism and development; focused, tissue-resolved measurements together with stronger mechanistic controls would substantially strengthen the conclusions.

      We thank the reviewer for the thoughtful and constructive evaluation of our work and for recognizing its potential relevance to researchers working on insect metabolism and development. We fully agree that our current evidence is preliminary and that the mechanistic link between trehalose and the E2F/Dp‑driven myogenic program needs to be strengthened.

      Our intention was to present trehalose-E2F/Dp coupling as a working model emerging from our data, rather than as a fully established pathway. We agree that systemic manipulations of trehalose and whole‑larval RNA‑seq cannot fully differentiate global metabolic stress from specific effects on myogenic programs. In the revision, we plan to include additional metabolic readouts (e.g., ATP/AMP ratio, key amino acids where available) to better discuss the overall energetic and nutritional state. We will reanalyze our RNA‑seq data to more clearly distinguish broad stress/metabolic signatures from cell‑cycle/myogenic signatures. Furthermore, we will reframe our discussion to explicitly state that we cannot completely rule out a contribution of general energy or amino‑acid scarcity at this stage.

      We acknowledge that, with our current experiments, the specificity for an E2F/Dp‑driven program is inferred mainly from enrichment of E2F targets among differentially expressed genes, and expression changes in canonical E2F partners and downstream cell‑cycle/myogenic regulators. To address this more rigorously, we are performing targeted qRT-PCR for a panel of well‑characterized E2F/Dp target genes and myogenic markers in larval muscle versus non‑muscle tissues, following trehalose perturbation. Where technically feasible, testing whether partial knockdown of HaE2F or HaDp modifies the effect of trehalose manipulation on selected myogenic markers. These data, even if limited, will help to provide a more direct functional link, and we will include them in the manuscript if completed in time. In parallel, we will soften statements that imply a fully established, trehalose‑specific regulation of E2F/Dp and instead present this as a strong candidate pathway suggested by the current data.

      We fully agree that tissue‑resolved analyses are essential to move from systemic correlations to causality in muscle. We are in the process of standardizing larval muscle dissections and isolating thoracic/abdominal body wall muscle for trehalose, glycogen, and expression assays. Comparing expression of key metabolic and myogenic genes in muscle versus fat body and midgut, under trehalose manipulation. These tissue‑resolved data will directly address whether the transcriptional changes we report are preferentially localized to muscle.

      We are grateful for the reviewer’s critical but encouraging comments. We will moderate our central claims, also explicitly consider and discuss alternative explanations. Further, we will add tissue‑resolved and more focused mechanistic data as far as possible within the current revision. We believe these changes will substantially strengthen the manuscript and better align our conclusions with the evidence we presently have.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this work by Mohite et al., they have used transcriptomic and metabolic profiling of H. armigera, muscle development, and S. frugiperda to link energy trehalose metabolism and muscle development. They further used several different bioinformatics tools for network analysis to converge upon transcriptional control as a potential mechanism of metabolite-regulated transcriptional programming for muscle development. The authors have also done rescue experiments where trehalose was provided externally by feeding, which rescues the phenotype. Though the study is exciting, there are several concerns and gaps that lead to the current results as purely speculative. It is difficult to perform any genetic experiments in non-model insects; the authors seem to suggest a similar mechanism could also be applicable in systems like Drosophila; it might be possible to perform experiments to fill some missing mechanistic details.

      A few specific comments below:

      The authors used N-(phenylthio) phthalimide (NPP), a trehalose-6-phosphate phosphatase (TPP) inhibitor. They also find several genes, including enzymes of trehalose metabolism, that change. Further, several myogenic genes are downregulated in bulk RNA sequencing. The major caveat of this experiment is that the NPP treatment leads to reduced muscle development, and so the proportion of the samples from the muscles in bulk RNA sequencing will be relatively lower, which might have led to the results. So, a confirmatory experiment has to be performed where the muscle tissues are dissected and sequenced, or some of the interesting targets could be validated by qRT-PCR. Further to overcome the off-target effects of NPP, trehalose rescue experiments could be useful.

      Thank you for this valuable comment. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. This will help determine whether the gene expression patterns observed in the RNA-seq data are muscle-specific or systemic.

      Even the reduction in the levels of ADP, NAD, NADH, and NMN, all of which are essential for efficient energy production and utilization, could be due to the loss of muscles, which perform predominantly metabolic functions due to their mitochondria-rich environment. So it becomes difficult to judge if the levels of these energy molecules' reduction are due to a cause or effect.

      We thank the reviewer for this thoughtful comment and agree that reduced levels of ADP, NAD, NADH, and NMN could arise either from a disturbance of energy metabolism or from loss of mitochondria‑rich muscles. Our current data cannot fully separate these two possibilities. Still, several studies support the interpretation that perturbing trehalose metabolism causes a primary systemic energy deficit that is coupled to mitochondrial function, not merely a passive consequence of tissue loss.

      For example:

      (1) Our previous study in H. armigera showed that chemical inhibition of trehalose synthesis results in depletion of trehalose, glucose, glucose‑6‑phosphate, and suppression of the TCA cycle, indicating reduced energy levels and dysregulated fatty‑acid oxidation (Tellis et al., 2023).

      (2) Chang et al. (2022) showed that trehalose catabolism and mitochondrial ATP production are mechanistically linked. HaTreh1 localizes to mitochondria and physically interacts with ATP synthase subunit α. 20‑hydroxyecdysone increases HaTreh1 expression, enhances its binding to ATP synthase, and elevates ATP content, while knockdown of HaTreh1 or HaATPs‑α reduces ATP levels.

      (3) Similarly, our previous study inhibition of Treh activity in H. armigera generates an “energy‑deficient condition” characterized by deregulation of carbohydrate, protein, fatty‑acid, and mitochondria‑related pathways, and a concomitant reduction in key energy metabolites (Tellis et al., 2024).

      (4) The starvation study in H. armigera has shown that reduced hemolymph trehalose is associated with respiratory depression and large‑scale reprogramming of glycolysis and fatty‑acid metabolism (Jiang et al., 2019).

      These findings support a direct coupling between trehalose availability and systemic energy/redox state. Therefore, the coordinated decrease in ADP, NAD, NADH, and NMN following TPS/TPP silencing is consistent with a primary disturbance of systemic energy and mitochondrial metabolism rather than exclusively a secondary consequence of muscle loss. We agree, however, that the present whole‑larva metabolite measurements do not allow a quantitative partitioning between changes due to altered muscle mass and those due to intrinsic metabolic impairment at the cellular level. Thus, tissue-specific quantification of these metabolites would allow us to directly test whether altered energy metabolites are a cause or consequence of muscle loss.

      References:

      (1) Tellis, M. B., Mohite, S. D., Nair, V. S., Chaudhari, B. Y., Ahmed, S., Kotkar, H. M., & Joshi, R. S. (2024). Inhibition of Trehalose Synthesis in Lepidoptera Reduces Larval Fitness. Advanced Biology, 8(2), 2300404.

      (2) Chang, Y., Zhang, B., Du, M., Geng, Z., Wei, J., Guan, R., An, S. and Zhao, W., 2022. The vital hormone 20-hydroxyecdysone controls ATP production by upregulating the binding of trehalase 1 with ATP synthase subunit α in Helicoverpa armigera. Journal of Biological Chemistry, 298(2).

      (3) Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Jiang, T., Ma, L., Liu, X.Y., Xiao, H.J. and Zhang, W.N., 2019. Effects of starvation on respiratory metabolism and energy metabolism in the cotton bollworm Helicoverpa armigera (Hübner)(Lepidoptera: Noctuidae). Journal of Insect Physiology, 119, p.103951.

      The authors have used this transcriptomic data for pathway enrichment analysis, which led to the E2F family of transcription factors and a reduction in the level of when trehalose metabolism is perturbed. EMSA experiments, though, confirm a possibility of the E2F interaction with the HaTPS/TPP promoter, but it lacks proper controls and competition to test the actual specificity of this interaction. Several transcription factors have DNA-binding domains and could bind any given DNA weakly, and the specificity is ideally known only from competitive and non-competitive inhibition studies.

      We thank the reviewer for this important comment and fully agree that EMSA alone, without appropriate competition and control reactions, cannot establish the specificity or functional relevance of a transcription factor-DNA interaction. In our study, we found the E2F family from GRN analysis of the RNA seq data obtained upon HaTPS/TPP silencing, suggesting a potential regulatory connection. After that, we predicted E2F binding sites on the promoter of HaTPS/TPP. The EMSA experiments were intended as preliminary evidence that E2F can associate with the HaTPS/TPP promoter in vitro. We will clarify this in the manuscript by softening our conclusion to indicate that our data support a “possible E2F-HaTPS/TPP interaction”. We also perform EMSA with specific and non‑specific competitors to confirm the E2F binding to the HaTPS/TPP promoter.

      The work seems to have connected the trehalose metabolism with gene expression changes, though this is an interesting idea, there are no experiments that are conclusive in the current version of the manuscript. If the authors can search for domains in the E2F family of transcription factors that can bind to the metabolite, then, if not, a chip-seq is essential to conclusively suggest the role of E2F in regulating gene expression tuned by the metabolites.

      A previous study in D. melanogaster, Zappia et al., (2016) showed vital role of E2F in skeletal muscle required for animal viability. They have shown that Dp knockdown resulted in reduced expression of genes encoding structural and contractile proteins, such as Myosin heavy chain (Mhc), fln, Tropomyosin 1 (Tm1), Tropomyosin 2 (Tm2), Myosin light chain 2 (Mlc2), sarcomere length short (sals) and Act88F, and myogenic regulators, such as held out wings (how), Limpet (Lmpt), Myocyte enhancer factor 2 (Mef2) and spalt major (salm). Also, ChiP-qRT-PCR showed upstream regions of myogenic genes, such as how, fln, Lmpt, sals, Tm1 and Mef2, were specifically enriched with E2f1, E2f2, and Dp antibodies in comparison with a nonspecific antibody. Further, Zappia et al. (2019) reported a chip-seq dataset that suggests that E2F/Dp directly activates the expression of glycolytic and mitochondrial genes during muscle development. Zappia et al., (2023) showed the regulation of one of the glycolytic genes, Phosphoglycerate kinase (Pgk) by E2F during Drosophila development.

      However, the regulation of trehalose metabolic genes by E2F/Dp and vice versa was not studied previously. So here in our study, we tried to understand the correlation of trehalose metabolism and E2F/Dp in the muscle development of H. armigera.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Rogers, A., Islam, A.B. and Frolov, M.V., 2019. Rbf activates the myogenic transcriptional program to promote skeletal muscle differentiation. Cell reports, 26(3), pp.702-719.

      (3) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      Some of the above concerns are partially addressed in experiments where silencing of E2F/Dp shows similar phenotypes as with NPP and dsRNA. It is also notable that silencing any key transcription factor can have several indirect effects, and delayed pupation and lethality could not be definitely linked to trehalose-dependent regulation.

      Yes. It’s true that silencing of any key transcription factor can have several indirect effects. Our intention was not to argue that delayed pupation and lethality are exclusively due to trehalose-dependent regulation, but that E2F/Dp and HaTPS/TPP silencing showed a consistent set of phenotypes and molecular changes, such as (i) transcriptomic enrichment of E2F targets upon trehalose perturbation, (ii) reduced HaTPS/TPP expression following E2F/Dp silencing, (iii) reduced myogenic gene expression that parallels the phenotypes observed with HaTPS/TPP silencing and (iv) restoration of E2F and Dp expression in E2F/Dp‑silenced insects upon trehalose feeding in the rescue assay. Together, these findings support a functional association between E2F/Dp and trehalose homeostasis. At the same time, we fully acknowledge that these results do not exclude additional, trehalose‑independent roles of E2F/Dp in development.

      Trehalose rescue experiments that rescue phenotype and gene expression are interesting. But is it possible that the fed trehalose is metabolized in the gut and might not reach the target tissue? In which case, the role of trehalose in directly regulating transcription factors becomes questionable. So, a confirmatory experiment is needed to demonstrate that the fed trehalose reaches the target tissues. This could possibly be done by measuring the trehalose levels in muscles post-rescue feeding. Also, rescue experiments need to be done with appropriate control sugars.

      Yes, it’s possible that, to some extent, trehalose is metabolized in the gut. Even though trehalase is present in the insect gut, some of the trehalose will be absorbed via trehalose transporters on the gut lining. Trehalose feeding was not rescued in insects fed with the control diet (empty vector and dsHaTPP), which contains chickpea powder, which is composed of an ample amount of amino acids and carbohydrates. Insects fed exclusively on a trehalose-containing diet are rescued, but not on a control diet that contains other carbohydrates. We agree that direct measurement of trehalose in target tissues will provide important confirmation. In the manuscript, we will measure trehalose levels in muscle, gut, and haemolymph after trehalose feeding.

      No experiments are performed with non-target control dsRNA. All the experiments are done with an empty vector. But an appropriate control should be a non-target control.

      Yes, there was no experiment with non-target dsRNA. Earlier, we have optimized a protocol for dsRNA delivery and its effectiveness in target knockdown (concentration, time) experiment, and published several research articles using a similar protocol:

      (1) Chaudhari, B.Y., Nichit, V.J., Barvkar, V.T. and Joshi, R.S., 2025. Mechanistic insights in the role of trehalose transporter in metabolic homeostasis in response to dietary trehalose. G3: Genes, Genomes, Genetics, p. jkaf303.

      (2) Barbole, R.S., Sharma, S., Patil, Y., Giri, A.P. and Joshi, R.S., 2024. Chitinase inhibition induces transcriptional dysregulation altering ecdysteroid-mediated control of Spodoptera frugiperda development. Iscience, 27(3).

      (3) Patil, Y.P., Wagh, D.S., Barvkar, V.T., Gawari, S.K., Pisalwar, P.D., Ahmed, S. and Joshi, R.S., 2025. Altered Octopamine synthesis impairs tyrosine metabolism affecting Helicoverpa armigera vitality. Pesticide Biochemistry and Physiology, 208, p.106323.

      (4) Tellis, M.B., Chaudhari, B.Y., Deshpande, S.V., Nikam, S.V., Barvkar, V.T., Kotkar, H.M. and Joshi, R.S., 2023. Trehalose transporter-like gene diversity and dynamics enhances stress response and recovery in Helicoverpa armigera. Gene, 862, p.147259.

      (5) Joshi, K.S., Barvkar, V.T., Hadapad, A.B., Hire, R.S. and Joshi, R.S., 2025. LDH-dsRNA nanocarrier-mediated spray-induced silencing of juvenile hormone degradation pathway genes for targeted control of Helicoverpa armigera. International Journal of Biological Macromolecules, p.148673.

      The same vector backbone and preparation procedures were used for both control and experimental constructs, allowing us to specifically compare the effects of the target dsRNA. The phenotypes and gene expression changes we observed were specific to the target genes and were not seen in the empty vector controls, suggesting that the effects are not due to nonspecific responses of dsRNA delivery or vector components.<br /> We acknowledge your suggestions, and in future studies, we will keep non-target dsRNA as a control in silencing assays.

      Reviewer #2 (Public review):

      Summary:

      This study shows that the knockdown of the effects of TPS/TPP in Helicoverpa armigera and Spodoptera frugiperda can be rescued by trehalose treatment. This suggests that trehalose metabolism is necessary for development in the tissues that NPP and dsRNA can reach.

      Strengths:

      This study examines an important metabolic process beyond model organisms, providing a new perspective on our understanding of species-specific metabolism equilibria, whether conserved or divergent.

      Weaknesses:

      While the effects observed may be truly conserved across Lepidopterans and may be muscle-specific, the study largely relies on one species and perturbation methods that are not muscle-specific. The technical limitations arising from investigations outside model systems, where solid methods are available, limit the specificity of inferences that may be drawn from the data.

      Thank you for this potting out this experimental weakness. We will validate the gene expression data using qRT-PCR on muscle tissue samples from both treated and control groups. We will also perform metabolite analysis with muscle samples. This will help to determine whether the observed gene expression patterns and metabolite changes are muscle-specific or systemic.

      Reviewer #3 (Public review):

      The hypothesis is that Trehalose metabolism regulates transcriptional control of muscle development in lepidopteran insects.

      The manuscript investigates the role of Trehalose metabolism in muscle development. Through sequencing and subsequent bioinformatics analysis of insects with perturbed trehalose metabolism (knockdown of TPS/TPP), the authors have identified transcription factor E2F, which was validated through RT-PCR. Their hypothesis is that trehalose metabolism regulates E2F, which then controls the myogenic genes. Counterintuitive to this hypothesis, the investigators perform EMSAs with the E2F protein and promoter of the TPP gene and show binding. Their knockdown experiments with Dp, the binding partner of E2F, show direct effect on several trehalose metabolism genes. Similar results are demonstrated in the trehalose feeding experiment, where feeding trehalose leads to partial rescue of the phenotype observed as a result of Dp knockdown. This seems contradictory to their hypothesis. Even more intriguing is a similar observation between paramyosin, a structural muscle protein, and E2F/Dp - they show that paramyosin regulates E2F/Dp and E2F/Dp regulated paramyosin. The only plausible way to explain the results is the existence of a feed-forward loop between TPP-E2F/Dp and paramyosin-E2F/Dp. But the authors have mentioned nothing in this line. Additionally, I think trehalose metabolism impacts amino acid content in insects, and that will have a direct bearing on muscle development. The sequencing analysis and follow-up GSEA studies have demonstrated enrichment of several amino acid biosynthetic genes. Yet authors make no efforts to measure amino acid levels or correlate them with muscle development. Any study aiming to link trehalose metabolism and muscle development and not considering the above points will be incomplete.

      We appreciate the reviewer’s efforts in the careful evaluation of this manuscript and constructive comments. From our and earlier data we found it was difficult to consider linear pathway “trehalose → E2F → muscle,” but rather a regulatory module in which trehalose metabolism and E2F/Dp form an interdependent circuit controlling myogenic genes. E2F/Dp binds and activates trehalose metabolism genes (TPS/TPP, Treh1) and myogenic structural genes, consistent with EMSA (TPS/TPP-E2F) and predicted binding sites of E2F on metabolic genes, Treh1, Pgk, and myogenic genes such as Act88F, Prm, Tm1, Fln, etc. At the same time, perturbing trehalose synthesis reduces E2F/Dp expression and myogenic gene expression, and trehalose feeding partially restores all three. This bidirectional influence is similar to E2F‑dependent control of carbohydrate metabolism and systemic sugar homeostasis described in D. melanogaster, where E2F/Dp both regulates metabolic genes and is itself constrained by metabolic state (Zappia et al., 2023a; Zappia et al., 2021).

      The reciprocal regulation between Prm and E2F/Dp is indeed intriguing. Rather than a paradox, we interpret this as evidence that E2F/Dp couples metabolic genes and structural muscle genes within a shared module, and that key sarcomeric components (such as paramyosin) feed back on this transcriptional program. Similar cross‑talk between E2F‑controlled metabolic programs and tissue function has been documented in D. melanogaster muscle and fat body, where E2F loss in one tissue elicits systemic changes in the other (Zappia et al., 2021). For further confirmation of E2F-regulated Prm, we will perform EMSA on the Prm promoter with appropriate controls.

      We fully agree that amino‑acid metabolism is a critical missing piece. In the manuscript, we will quantify the amino acid levels and include the results: “Amino acids display differential levels showing cysteine, leucine, histidine, valine, and proline showed significant reductions, while isoleucine and lysine showed non-significant reductions upon trehalose metabolism perturbation. These results are consistent with previous reports published by Tellis et al. (2024) and Shi et al. (2016)”. We will reframe our conclusions more cautiously as establishing a trehalose-E2F/Dp-muscle development, while stating that “definitive causal links via amino‑acid metabolism remain to be demonstrated”.

      Reference:

      (1) Zappia, M. P., Kwon, Y.-J., Westacott, A., Liseth, I., Lee, H. M., Islam, A. B., Kim, J., & Frolov, M. V. (2023a). E2F regulation of the Phosphoglycerate kinase gene is functionally important in Drosophila development. Proceedings of the National Academy of Sciences, 120(15), e2220770120.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      (3)Tellis, M., Mohite, S. and Joshi, R., 2024. Trehalase inhibition in Helicoverpa armigera activates machinery for alternate energy acquisition. Journal of Biosciences, 49(3), p.74.

      (4) Shi, J.F., Xu, Q.Y., Sun, Q.K., Meng, Q.W., Mu, L.L., Guo, W.C. and Li, G.Q., 2016. Physiological roles of trehalose in Leptinotarsa larvae revealed by RNA interference of trehalose-6-phosphate synthase and trehalase genes. Insect Biochemistry and Molecular Biology, 77, pp.52-68.

      Author response image 1.

      The result section of the manuscript is quite concise, to my understanding (especially the initial few sections), which misses out on mentioning details that would help readers understand the paper better. While technical details of the methods should be in the Materials and Methods section, the overall experimental strategy for the experiments performed should be explained in adequate detail in the results section itself or in figure legends. I would request authors to include more details in the results section. As an extension of the comment above, many times, abbreviations have been used without introducing them. A thorough check of the manuscript is required regarding this.

      Thank you very much for pointing out this issue. We will revise the manuscript content according to these suggestions.

      The Spodoptera experiments appear ad hoc and are insufficient to support conservation beyond Helicoverpa. To substantiate this claim, please add a coherent, minimal set of Spodoptera experiments and present them in a dedicated subsection. Alternatively, consider removing these data and limiting the conclusions (and title) to H. armigera.

      We thank the reviewer for this helpful comment. We agree that, in this current version of the manuscript, the S. frugiperda experiments are not sufficiently systematic to support strong claims about conservation beyond H. armigera. Our primary focus in this study is indeed on H. armigera, and the addition of the S. frugiperda data was intended only as preliminary, supportive evidence rather than a central component of our conclusions. To avoid over‑interpretation and to keep the manuscript focused and coherent, we will remove all S. frugiperda data from the revised version, including the corresponding text and figures. We will also adjust the title, abstract, and conclusion to clearly state that our findings are limited to H. armigera.

      In order to check the effects of E2F/Dp, a dsRNA-mediated knockdown of Dp was performed. Why was the E2F protein, a primary target of the study, not chosen as a candidate? The authors should either provide justification for this or perform the suggested experiments to come to a conclusion. I would like to point out that such experiments were performed in Drosophila.

      Thank you for this thoughtful comment and the specific suggestion. We agree that directly targeting E2F would, in principle, be an informative complementary approach. In our study, however, we prioritized Dp knockdown for two main reasons. First, E2F is a large family, and E2F-Dp functions as an obligate heterodimer. Previous work in D. melanogaster has shown that depletion of Dp is sufficient to disrupt E2F-dependent transcription broadly, often with more efficient loss of complex activity than targeting individual E2F isoforms (Zappia et al., 2021; Zappia et al., 2016). Second, in our preliminary trials, we performed a dsRNA feeding assay with dsHaE2F, dsHaDp, and combined dsHaE2F plus dsHaDp. In that assay, we did not achieve silencing of E2F in dsRNA targeting HaE2F (dsHaE2F). So here, as E2F is a large family, other E2F isoforms may be compensating for the silencing effect of targeted HaE2F. However, HaE2F showed significantly reduced expression upon dsHaDp and combined dsHaE2F plus dsHaDp feeding (Figure A), whereas HaDp showed a significant reduction in its expression in all three conditions (Figure B).  As we observed reduced expression of both HaE2F and HaDp upon combined feeding of dsHaE2F and dsHaDp, we further performed a rescue assay by exogenous feeding of trehalose. We observed the significant upregulation of HaE2F, HaDp, trehalose metabolic genes (HaTPS/TPP and HaTreh1), and myogenic genes (HaPrm and HaTm2) (Figure C). For these reasons, we focused on Dp silencing as a more reliable way to impair E2F/Dp complex function in H. armigera.

      Author response image 2.

      References:

      (1) Zappia, M.P. and Frolov, M.V., 2016. E2F function in muscle growth is necessary and sufficient for viability in Drosophila. Nature Communications, 7(1), p.10509.

      (2) Zappia, M.P., Guarner, A., Kellie-Smith, N., Rogers, A., Morris, R., Nicolay, B., Boukhali, M., Haas, W., Dyson, N.J. and Frolov, M.V., 2021. E2F/Dp inactivation in fat body cells triggers systemic metabolic changes. elife, 10, p.e67753.

      Silencing of HaDp resulted in a significant decrease in HaE2F expression. I find this observation intriguing. DP is the cofactor of E2F, and they both heterodimerise and sit on the promoter of target genes to regulate them. I would request authors to revisit this result, as it contradicts the general understanding of how E2F/Dp functions in other organisms. If Dp indeed controls E2F expression, then further experiments should be conducted to come to a conclusion convincingly. Additionally, these results would need thorough discussion with citations of similar results observed for other transcription factor-cofactor complexes.

      Thank you for highlighting this point and for prompting us to examine these data more carefully. Silencing HaDp leading to reduced HaE2F mRNA is indeed unexpected if one only considers the canonical view of E2F/Dp as a heterodimer that co-occupies target promoters without strongly regulating each other’s expression. However, several lines of work suggest that transcription factor-cofactor networks frequently include feedback loops in which cofactors influence the expression of their partner TFs. First, in multiple systems, transcription factors and their cofactors are known to regulate each other’s transcription, forming positive or negative feedback loops. For example, in hematopoietic cells, the transcription factor Foxp3 controls the expression of many of its own cofactors, and some of these cofactors in turn facilitate or stabilize Foxp3 expression, forming an interconnected regulatory network rather than a simple one‑way interaction (Rudra et al., 2012). Second, E2F/Dp complexes exhibit non‑canonical regulatory mechanisms and can regulate broad sets of targets, including other transcriptional regulators. Several studies show that E2F/Dp proteins not only control classical cell‑cycle genes but also participate in diverse processes such as DNA damage signaling, mitochondrial function, and differentiation (Guarner et al., 2017; Ambrus et al., 2013; Sánchez-Camargo et al., 2021). In D. melanogaster, complete loss of dDP alters the expression of direct targets E2F/DP, including dATM (Guarner et al., 2017).

      All these reports indicate that the E2F-Dp complex sits at the top of multi‑layer regulatory hierarchies. Such architectures make it plausible that Dp silencing in H. armigera could modulate HaE2F expression in a non-canonical way.

      References:

      (1) Rudra, D., DeRoos, P., Chaudhry, A., Niec, R.E., Arvey, A., Samstein, R.M., Leslie, C., Shaffer, S.A., Goodlett, D.R. and Rudensky, A.Y., 2012. Transcription factor Foxp3 and its protein partners form a complex regulatory network. Nature immunology, 13(10), pp.1010-1019.

      (2) Guarner, A., Morris, R., Korenjak, M., Boukhali, M., Zappia, M.P., Van Rechem, C., Whetstine, J.R., Ramaswamy, S., Zou, L., Frolov, M.V. and Haas, W., 2017. E2F/DP prevents cell-cycle progression in endocycling fat body cells by suppressing dATM expression. Developmental cell, 43(6), pp.689-703.

      (3) Ambrus, A.M., Islam, A.B., Holmes, K.B., Moon, N.S., Lopez-Bigas, N., Benevolenskaya, E.V. and Frolov, M.V., 2013. Loss of dE2F compromises mitochondrial function. Developmental cell, 27(4), pp.438-451.

      (4) Sánchez-Camargo, V.A., Romero-Rodríguez, S. and Vázquez-Ramos, J.M., 2021. Non-canonical functions of the E2F/DP pathway with emphasis in plants. Phyton, 90(2), p.307.

      I consider the overall bioinformatics analysis to remain very poorly described. What is specifically lacking is clear statements about why a particular dry lab experiments were conducted.

      We again thank the reviewer for advising us to give a biological context/motivation for every bioinformatics analysis performed. The bioinformatics analyses devised here, try to explain the systems-level perturbations of HaTPS/TPP silencing to explain the observed phenotype and to discover transcription factors potentially modulating the HaTPS/TPP induced gene regulatory changes.

      (1) Gene set enrichment analyses:

      Differential gene expression analyses of the bulk RNA sequencing data followed by qRT-PCR confirmed the transcriptional changes in myogenic genes and gene expression alterations in metabolic and cell cycle-related genes. These perturbations merely confirmed the effect induced by HaTPS/TPP silencing in obviously expected genes. We wanted to see whether using an “unbiased” system-level statistical analyses like gene set enrichment analyses (GSEA), can reveal both expected and novel biological processes that underlie HaTPS/TPP silencing. GSEA results revealed large-scale transcriptional changes in 11 enriched processes, including amino acid metabolism, energy metabolism, developmental regulatory processes, and motor protein activity. GSEA not only divulged overall transcriptionally enriched pathways but also identified the genes undergoing synchronized pathway-level transcriptional change upon HaTPS/TPP silencing.

      (2) Gene regulatory network analysis:

      Although GSEA uncovered potential pathway-level changes, we were also interested in identifying the gene regulatory network associated with such large-scale process-level transcriptional perturbations. Interestingly, the biological processes undergoing perturbations were also heterogeneous (e.g., motor protein activity, energy metabolism, amino acid metabolism, etc.). We hypothesized that the inference of a causal gene regulatory network associated with the genes associated with GSEA-enriched biological processes should predict core/master transcription factors that might synchronously regulate metabolic and non-metabolic processes related to HaTPS/TPP silencing, thereby providing a broad understanding of the perturbed phenotype. The gene regulatory network analysis statistically inferred an “active” gene regulatory network corresponding to the GSEA-enriched KEGG gene sets. Ranking the transcription factors (TFs) based on the number of outgoing connections (outdegree centrality) within the active gene regulatory network, E2F family TFs were identified to be top-ranking, highly connected transcription factors associated with the transcriptionally enriched processes. This suggests that E2F family TFs are central to controlling the flow of regulatory information within this network. Intriguingly, E2F has been previously implicated in muscle development in insects (Zappia et al., 2016). Further extracting the regulated targets of E2F family TFs within this network revealed the mechanistic connection with the 11 enriched processes. This GRN analysis was crucial in discovering and prioritizing E2F TFs as central transcription factors mediating HaTPS/TPP silencing effects, which was not apparent using trivial analyses like differential gene expression analysis.

      As per the reviewer’s suggestions, we will add these outlined points in the text of the manuscript (Results section) to further give context and clarity to the bioinformatics analyses conducted in this study.

      In my judgement, the EMSA analysis presented is technically poor in quality. It lacks positive and negative controls, does not show mutation analysis or super shifts. Also, it lacks any competition assays that are important to prove the binding beyond doubt. I am not sure why protein is not detected at all in lower concentrations. Overall, the EMSA assays need to be redone; I find the current results to be unacceptable.

      Thank you for pointing out this issue. We will reperform the EMSA analysis with appropriate controls.  Although the gel image was not clear, there was a light band of protein (indicated by the white square) observed in well No. 8, where we used 8 μg of E2F protein and 75 ng of HaTPS/TPP promoter, upon gel stained with SYPRO Ruby protein stain, suggesting weak HaTPS/TPP-E2F complex formation.

      GSEA studies clearly indicate enrichment of the amino acid synthesis gene in TPP knockdown samples. This supports the plausible theory that a lack of Trehalose means a lack of enough nutrients, therefore less of that is converted to amino acids, and therefore muscle development is compromised. Yet the authors make no effort to measure amino acid levels. While nutrients can be sensed through signalling pathways leading to shut shutdown of myogenic genes, a simple and direct correlation between less raw material and deformed muscle might also be possible.

      We quantified amino acid levels as per the suggestion, and we observed differential levels of amino acids upon trehalose metabolism perturbation.

      However, we observed that insect were failed to rescue when fed a control chickpea-based artificial diet that contained nutrients required for normal growth and development. Based on this observation, we conclude that trehalose deficiency is the only possible cause for the defect in muscle development.

      The authors are encouraged to stick to one color palette while demonstrating sequencing results. Choosing a different color palette for representing results from the same sequencing analysis confuses readers.

      Thank you for the comment. We will revise the color palette as per the suggestion.

      Expression of genes, as understood from sequencing analysis in Figure 1D, Figure 2F, and Figure 3D, appears to be binary in nature. This result is extremely surprising given that the qRT-PCR of these genes have revealed a checker and graded expression.

      Thank you for pointing out this issue. We will revise the scale range for these figures to get more insights about gene expression levels and include figures as per the suggestion.

      In several graphs, non-significant results have been interpreted as significant in the results section. In a few other cases, the reported changes are minimal, and the statistical support is unclear; please recheck the analyses and include exact statistics. In the results section, fold changes observed should be discussed, as well as the statistical significance of the observed change.

      We will revise the analyses and include exact statistics as per the suggestion.

      Finally, I would add that trehalose metabolism regulates cell cycle genes, and muscle development genes establish correlation and causation. The authors should ensure that any comments they make are backed by evidence.

      We thank the reviewer for this insightful comment.  Although direct evidence in insects is currently lacking, multiple independent studies in yeast, plants and mammalian systems support a regulatory link between trehalose metabolism and the cell cycle. In budding yeast Saccharomyces cerevisiae, neutral Treh (Nth1) is directly phosphorylated and activated by the major cyclin‑dependent kinase Cdk1 at G1/S, routing stored trehalose into glycolysis to fuel DNA replication and mitosis (Ewald et al., 2016). CDK‑dependent regulation of trehalase activity has also been reported in plants, where CDC28‑mediated phosphorylation channels glucose into biosynthetic pathways necessary for cell proliferation (Lara-núñez et al., 2025). Furthermore, budding yeast cells accumulate trehalose and glycogen upon entry into quiescence and subsequently mobilize these stores to generate a metabolic “finishing kick” that supports re‑entry into the cell cycle (Silljé et al., 1999; Shi et al., 2010). Exogenous trehalose that perturbs the trehalose cycle impairs glycolysis, reduces ATP, and delays cell cycle progression in S. cerevisiae, highlighting a dose‑ and context‑dependent control of growth versus arrest (Zhang, Zhang and Li, 2020). In mammalian systems, trehalose similarly modulates proliferation-differentiation decisions. In rat airway smooth muscle cells, low trehalose concentrations promote autophagy, whereas higher doses induce S/G2–M arrest, downregulate Cyclin A1/B1, and trigger apoptosis, indicating a shift from controlled growth to cell elimination at higher exposure (Xiao et al., 2021). In human iPSC‑derived neural stem/progenitor cells, low‑dose trehalose enhances neuronal differentiation and VEGF secretion, while higher doses are cytotoxic, again highlighting a tunable impact on cell‑fate outcomes (Roose et al., 2025). In wheat, exogenous trehalose under heat stress reduces growth, lowers auxin, gibberellin, abscisic acid and cytokinin levels, and represses CycD2 and CDC2 expression, suggesting that trehalose signalling integrates with hormone pathways and core cell‑cycle regulators to restrain proliferation during stress (Luo, Liu, and Li, 2021). Together, these studies showed the importance of trehalose metabolism in cell‑cycle regulation to decide whether cells and tissues proliferate, differentiate, or remain quiescent.

      With respect to muscle development, previous work has implicated glycolytic metabolism in myogenesis and muscle growth. Tixier et al. (2013) showed that loss of key glycolytic genes results in abnormally thin muscles, while Bawa et al. (2020) demonstrated that loss of TRIM32 decreases glycolytic flux and reduces muscle tissue size. These findings indicate that carbohydrate and energy metabolism pathways are important determinants of muscle structure and growth. However, there are no previous studies about the role of trehalose metabolism in muscle development, other than as an energy source, so here we specifically set out to establish the involvement of trehalose metabolism in muscle development.

      References:

      (1) Ewald, J.C. et al. (2016) “The yeast cyclin-dependent kinase routes carbon fluxes to fuel cell cycle progression,” Molecular cell, 62(4), pp. 532–545.

      (2) Lara-núñez, A. et al. (2025) “The Cyclin-Dependent Kinase activity modulates the central carbon metabolism in maize during germination,” (January), pp. 1–16.

      (3) Silljé, H.H.W. et al. (1999) “Function of trehalose and glycogen in cell cycle progression and cell viability in Saccharomyces cerevisiae,” Journal of bacteriology, 181(2), pp. 396–400.

      (4) Shi, L. et al. (2010) “Trehalose Is a Key Determinant of the Quiescent Metabolic State That Fuels Cell Cycle Progression upon Return to Growth,” 21, pp. 1982–1990.

      (5) Zhang, X., Zhang, Y. and Li, H. (2020) “Regulation of trehalose, a typical stress protectant, on central metabolisms, cell growth and division of Saccharomyces cerevisiae CEN. PK113-7D,” Food Microbiology, 89, p. 103459.

      (6) Xiao, B. et al. (2021) “Trehalose inhibits proliferation while activates apoptosis and autophagy in rat airway smooth muscle cells,” Acta Histochemica, 123(8), p. 151810.

      (7) Roose, S.K. et al. (2025) “Trehalose enhances neuronal differentiation with VEGF secretion in human iPSC-derived neural stem / progenitor cells,” Regenerative Therapy, 30, pp. 268–277.

      (8) Luo, Y., Liu, X. and Li, W. (2021) “Exogenously-supplied trehalose inhibits the growth of wheat seedlings under high temperature by affecting plant hormone levels and cell cycle processes,” Plant Signaling & Behavior, 16(6).

      (9) Tixier, V., Bataillé, L., Etard, C., Jagla, T., Weger, M., DaPonte, J.P., Strähle, U., Dickmeis, T. and Jagla, K., 2013. Glycolysis supports embryonic muscle growth by promoting myoblast fusion. Proceedings of the National Academy of Sciences, 110(47), pp.18982-18987.

      (10) Bawa, S., Brooks, D.S., Neville, K.E., Tipping, M., Sagar, M.A., Kollhoff, J.A., Chawla, G., Geisbrecht, B.V., Tennessen, J.M., Eliceiri, K.W. and Geisbrecht, E.R., 2020. Drosophila TRIM32 cooperates with glycolytic enzymes to promote cell growth. elife, 9, p.e52358.

      Finally, we appreciate the meticulous review of this manuscript and constructive comments. We will perform the recommended experiments, data analysis, and revise the manuscript accordingly.

    1. Author response:

      We would like to thank the reviewers for their detailed reading of our manuscript and for the constructive comments they have provided.

      We plan to make structural changes to the introduction and the discussion. Reviewer #1 describes the “disconnect between the abstract/introduction and the discussion”. We agree that “the study's aims are not clearly or explicitly defined”. We will edit the introduction to state our aim of investigating the factors that affect using “crispants” in mouse functional genomics. In the discussion, we described how our findings inform sgRNA choice to ensure biallelic gene disruption in founders and how our extensive genotyping methods enabled us to determine the molecular basis for the observed phenotype (explaining why some founders showed the expected recessive trait and why it was partial or absent in others). We also concluded from our attempts of multiplexing that this had too great an impact on viability to be useful. We will edit the discussion to better address our aim and to elaborate on several points raised by the reviewers (discussed in more detail below). Specifically, we will provide examples of screening situations where generating crispant mice may be useful, e.g. preliminary in vivo studies to follow up candidates identified in large-scale cellular screens. We will also provide more context about our assumptions underlying our statement that the use of crispants will “dramatically reduce time, resources, and animal numbers” compared to ENU mutagenesis (where recessive traits require breeding of G2 females with G1 males to achieve homozygosity of de novo mutations in G3 offspring) and the work needed to validate this. We will more clearly acknowledge that our proof-of-principle study used visible phenotypes that can be assessed in individual animals and then discuss how the use of crispants could be extended to the investigation of quantitative or late-onset traits using cohorts of crispants (discussed further below). We will also discuss the assessment of non-null alleles to dissect protein function, building on our unexpected finding that a single round of CRISPR/Cas9mediated mutagenesis can generate an allelic series.

      Reviewer #1 asked us to address “how to interpret wild-type appearing founders”. We have discussed the mechanisms underlying the wild-type appearing founders generated in this study. This is linked with concerns in the field that incomplete editing, transcripts escaping nonsense-mediated decay, and/or the presence of in-frame mutations that don’t disrupt protein function may lead to founders that appear wild-type or have a partial phenotype. We have shown that our electroporation protocol results in very high levels of editing, but that this must always be assessed during genotyping. We found that by using an sgRNA that targets a critical protein domain, you can ensure that short in-frame indels also disrupt protein function. In future studies that determine how strain background modifies a phenotype that has been established on one strain (e.g. C57BL/6J), wild-type appearing founders would suggest that the new strain background rescues the null phenotype. In future studies that determine the consequence of targeting a second gene on a mutant background, wild-type appearing founders would indicate that the second mutation supresses the phenotype associated with the mutant background. We will add this to the discussion section where we describe possible screening situations in which crispant mice would be useful.

      Reviewer #3 states that “the relationship between the sgRNA/Cas9 concentrations delivered to the zygotes and the resulting editing efficiencies are not explicitly investigated.” Members of The Centre for Phenogenomics (TCP) Transgenic Production Core who co-author this study (Lauryl Nutter, Marina Gertsenstein and Lauri Lintott) have published detailed protocols on mouse model production, which we cite in this paper (PMID: 30040228; PMID: 33524495; PMID: 39999224). In PMID: 33524495, they tested a two-fold difference in Cas9 RNP concentrations for generating knock-out alleles. Using their optimised protocols for electroporation of one cell zygotes with RNPs, we achieved an extremely high editing rate. We did not vary the sgRNA/Cas9 concentrations as part of this study as our goal was to assess the ability to generate “complete” null animals. We do note, however, that by targeting two genes simultaneously whilst keeping the total RNP concentration constant (to avoid reagent toxicity), we halved the amount of each sgRNA and this did not lead to a decrease in editing efficiency. We will highlight this in the results/discussion section (as appropriate).

      Reviewer #1 asks about whether the use of crispants is applicable for “quantitative, late-onset, or more subtle phenotypes, including behavioral ones”. We are hopeful that this is possible and it is a priority for future studies. Crucially, cohorts of crispants can be generated in a single round of mutagenesis. Starting an experiment with ten donor females will produce ~100 zygotes, resulting in ~40 crispants. Power calculations must be performed to determine the size of the cohort required for the effect size and variability of the phenotype being studied, but many neurobehavioural studies use ~10 mutants vs ~10 controls. We note that sex and/or background genotype may mean that only some of the ~40 crispants produced can be used for phenotypic testing. This reviewer also raises the point about whether wild-type animals or mock-edited animals serve as the best controls. From work carried out by Lauryl Nutter and her colleagues from the IMPC (PMID: 37301944), we know that “wild-type” controls should ideally be from the same embryo pool as the crispants to avoid differences due to genetic drift within inbred colonies. This study also found that possible off-target mutations from CRISPR/Cas9-mediated mutagenesis is not an issue (despite a lot of attention in the literature). The suggestion of using mock-edited controls, resulting from zygotes that have gone through electroporation without RNP, addresses a possible need to control for the stress of undergoing the electroporation process. Our study shows that additional stress is caused by inducing and repairing a break in a neutral locus (EGFP). Controlling for these stressors may be particularly important when assessing behavioural phenotypes in crispants vs controls.

      Reviewer #2 states that “there could have been some discussion regarding how this approach would be impacted if mutations are dominant or embryonic lethal (for the latter, for example, F0 can be examined as embryos).” Our manuscript discusses how crispants could help with the study of genes that may be essential. Specifically, we stated that when CRISPR/Cas9-mediated mutagenesis fails to produce live pups, phenotypic assessment of crispant embryos could reveal whether targeting the gene impacts embryogenesis. Crispants can only be used to screen for recessive traits since both alleles are edited. The assessment of dominant traits is not addressed in our study and remains a challenge in the field. We note that CRISPRi screens in cultured cells reveal candidates that when partially downregulated lead to the desired phenotype. One possibility is to employ this set up in vivo using dCas9-KRAB transgenic mice (JAX stock #030000). We could add this point to the discussion section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) First, the concept of training or trained immunity refers to long-term epigenetic reprogramming in innate immune cells, resulting in a modified response upon exposure to a heterologous challenge. The investigations presented demonstrate phenotypic alterations in AMs seven days after ATP exposure; however, they do not assess whether persistent epigenetic remodeling occurs with lasting functional consequences. Therefore, a more cautious and semantically precise interpretation of the findings would be appropriate.

      In response, we have performed epigenetic analysis (ATAC seq analysis) as requested (Supp Fig. 1).

      (2) Furthermore, the in vivo data should be strengthened by additional analyses to support the authors' conclusions. The authors claim that susceptibility to Pseudomonas aeruginosa infection differs depending on the ATP-induced training effect. Statistical analyses should be provided for the survival curves, as well as additional weight curves or clinical assessments. Moreover, it would be appropriate to complement this clinical characterization with additional measurements, such as immune cell infiltration analysis (by flow cytometry), and quantification of pro-inflammatory cytokines in bronchoalveolar lavage fluid and/or lung homogenates.

      We have added the statistical analyses provided for the survival curves (new Fig. 1D), immune cell infiltration analysis, and quantification of pro-inflammatory cytokines in the lung (new Figs. 1, 2).

      (3) Moreover, the authors attribute the differences in resistance to P. aeruginosa infection to the ATP-induced training effect on AMs, based on a correlation between in vivo survival curves and differences in bacterial killing capacity measured in vitro. These are correlative findings that do not establish a causal role for AMs in the in vivo phenotype. ATP-mediated effects on other (i.e., non-AM) cell populations are omitted, and the possibility that other cells could be affected should be, at least, discussed. Adoptive transfer experiments using AMs would be a suitable approach to directly address this question.

      We have performed additional experiments and found that the numbers of lung macrophages were not significantly altered before and after ATP training (new Fig. 2), indicating the training effects are focused on lung resident macrophages.

      Reviewer #2 (Public review):

      (1) Missing details from methods/reported data: Substantial sections of key methods have not been disclosed (including anything about animal infection models, RNA-sequencing, and western blotting), and the statistical methods, as written, only address two-way comparisons, which would mean analysis was improperly performed. In addition, there is a general lack of transparency - the methods state that only representative data is included in the manuscript, and individual data points are not shown for assays.

      We have revised the methods and statistical analysis.

      (2) Poor experimental design including missing controls: Particularly problematic are the Seahorse assay data (requires normalization to cell numbers to interpret this bulk assay - differences in cell growth/loss between conditions would confound data interpretation) and bacterial killing assays (as written, this method would be heavily biased by bacterial initial binding/phagocytosis which would confound assessment of killing). Controls need to be included for subcellular fractionating to confirm pure fractions and for dye microscopy to show a negative background. Conclusions from these assays may be incorrect, and in some cases, the whole experiment may be uninterpretable.

      Seahorse assay methodology was updated to confirm the order of cell counting, time at seeding and cell counts. Methods were also updated to address the distinction between bacterial killing (Fig. 1B) and overall decrease in bacterial load.

      (3) The conclusions overstate what was tested in the experiments: Conceptually, there are multiple places where the authors draw conclusions or frame arguments in ways that do not match the experiments used. Particularly:

      (a) The authors discuss their findings in the context of importance for AM biology during respiratory infection but in vitro work uses cells that are well-established to be poor mimics of resident AMs (BMDM, RAW), particularly in terms of glycolytic metabolism.

      We have adjusted the text to reflect that the metabolic assay was performed on BMDMs. AMs are fragile for certain manipulations in vitro. We expect that the metabolic change is similar across several macrophage systems as well as the bacterial load reduction.

      (b) In vivo work does not address whether immune cell recruitment is triggered during training.

      We have performed immune cell infiltration analysis (new Fig. 2).

      (c) Figure 3 is used to draw conclusions about K+ in response to bacterial engulfment, but actually assesses fungal zymosan particles.

      We have corrected this in the manuscript.

      (d) Figure 5 is framed in bacterial susceptibility post-viral infection, but the model used is bacterial post-bacterial.

      We have corrected this in the manuscript.

      (e) In their discussion, the authors propose to have shown TWIK2-mediated inflammasome activation. They link these separately to ATP, but their studies do not test if loss of TWIK2 prevents inflammasome activation in response to ATP (Figure 4E does not use TWIK2 KO).

      We have now added the TWIK2 KO results (new Fig. 5E).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As noted in the public review, it would be advisable to further characterize the in vivo phenotype in order to strengthen the conclusions. Specifically, it would be useful to quantify the bacterial load in the bronchoalveolar lavage fluid and lung homogenates, as well as to measure cytokine levels both in the respiratory compartment and systemically. Additionally, a broader characterization of the immune response in the presence or absence of ATP-induced training would be valuable. In the absence of direct evidence demonstrating that trained AMs mediate the observed phenotype, the authors should adopt a more cautious interpretation of their results. Moreover, careful attention to semantic accuracy is recommended. The concept of trained immunity refers specifically to long-term epigenetic reprogramming that leads to an altered response of target cells upon a secondary challenge, distant from the initial stress. The data presented do not fully demonstrate this phenomenon, and the interpretations should remain aligned with the evidence provided.

      Bacterial load has been quantified (see more details in the Methods part). And we also measured immune cell infiltration, quantification of pro-inflammatory cytokines in the lung (new Figs. 1, 2), and epigenetic evaluation of vehicle- and ATP-treated cells (Supp. Fig. 1).

      Reviewer #2 (Recommendations for the authors):

      (1) It cannot be overstated how lacking the methods are. This includes no discussion of IACUC approval for animal procedures, which must be included as part of research ethics. It also needs to be made clear where raw data is being archived. This notably includes an accession for deposited RNA-sequencing data, although unmanipulated microscopy and western blot images should also be shown. Methods should discuss any pre-processing that occurred with images.

      We have revised the methods in the manuscript.

      (2) Per statistics, in addition to generally providing more detail and adjusting analyses if they have not been correctly performed, please disclose if SD or SEM is shown. Reporting aggregate data versus representative data would provide more rigor. Perhaps replicate experiments could be included in the supplemental if they cannot, for some reason, be aggregated. Detailed statistical methods for RNA-seq analysis also need to be included.

      More details have been provided in the methods section.

      (3) It is unclear whether bacterial killing assays were correctly designed and can be interpreted. What does cells collected mean? If the assay was focused on intracellular macrophage bacterial load, it is critical to assess and report phagocytosis since different input loads would confound the assessment of killing. A rigorous wash or an antibiotic to eliminate extracellular bacteria should also have been performed and be described in this case. If the total bacterial burden was assessed, that would use cells+media and also needs to be clear and described. With the information provided, it is unclear whether the assays performed are sufficiently rigorous to assess bacterial killing. In addition, Figure 1B reports using an MOI of 50-100, but all data is compiled in one graph - data from different levels of infection should be separated. Figure 5A shows a model with E.coli followed by PA, but that does not appear to be how the assay was structured in B or C. This also does not match how the experiment is written in the results section, which references S. aureus. It is unclear what tissue (or cells) were assessed in Figure 5. Whole lung? BAL? As written, no data provided regarding bacterial killing is of sufficient quality to be considered valid.

      We have re-written the bacterial killing assay in the manuscript. The methodology was corrected to distinguish bacterial killing vs load decrease and generally accurate methodology.

      (4) The in vitro data provide reasonable evidence that BMDM/RAW macrophage training can occur in response to ATP exposure. However, it is unclear whether training is an important mechanism for resident AM in vivo, or whether, in vivo, a broader inflammatory response is generated, recruiting additional immune cells that persist and change infection susceptibility. The authors argue for resident AM immune training, but do not provide sufficient evidence to counter the latter possibility (resident AM are never themselves directly assessed, and the presence of other immune cells in vivo is not excluded). See Iliakis et al 2023 (PMID 37640788) for discussion of how this issue continues to drive uncertainty in the field. For this study, at least providing flow cytometry data quantifying myeloid and lymphoid immune populations in BALF before and after various treatments would help address this caveat. Without knowing this, it also confounds the interpretation of Figure 1B; if BAL is not pure AM after training, perhaps 1B could be repeated with ex vivo training or resident AM could be purified?

      We have performed immune cell infiltration analysis in the lung (both to BALF and in-tissue, new Fig. 2).

      (5) Figure 3A appears to show that fewer than 50% of cells express GFP. Is it expected that only a fraction of RAW cells express TWIK2-GFP? How was this addressed in the analyses for Figure 3? Were cells not appearing to express any significant GFP, included in phagosomal-negative or excluded from analysis? Please include in the methods.

      The RAW cells were transfected with TWIK2-GFP and variable GFP expression was expected. These cells were expressing a non-integrated transgene, which has been added to the methods as well as the consideration of cells for the analysis. Cells without visible GFP expression were excluded.

      (6) Why are many data points in Figure 3D negative? This suggests that settings were not optimized for microscopy - perhaps there is a very high background signal and the ION stain is barely above it. This is concerning for the quality of data. Further, is it expected that only some cells are positive for ION K+? The images shown clearly differentiate phagosomal K with ATP versus the absence of K without, but it is surprising that some cells appear not to contain any ION K+ signal (not completely clear given lack of brightfield or other cell staining) - this may again point to issues with imaging settings that confound data interpretation. This analysis should be carefully assessed.

      This has been updated in the methodology. In old Fig. 3D (new Fig. 4D), the presented data is the net intensity of the phagosome, subtracting the average cytoplasmic MFI from that of the area corresponding to an engulfed zymosan-af594 bead. Thus, a negative value has higher cytoplasmic IonK signal than that of the phagosome.

      (7) The Discussion states that it will be interesting to test whether ATP-TWIK2 is a common mechanism of training and specifically references LPS as an ATP-generating signal. However, Figure 2D data show that LPS induces only transient TWIK2 translocation; the authors have data suggesting that, in the context of LPS, TWIK2 'training' will not be engaged. This line of discussion shows incomplete consideration of the data.

      We have further limited this language in the text such that this may require differential sensitivity/damage sustained by macrophages as compared to that of epi/endothelial cells in response to bacterial endotoxin.

      (8) For RNA-sequencing, plots of the actual genes changed for the mitochondrial pathways of interest would be helpful information for readers, as would a heat map showing sample purity between groups for macrophage markers versus possible contaminant cells, which can also be generated from precursors in BMDM cultures. In general, information in Methods regarding how the analyses in Figure 4B were run is necessary, per cutoffs used to determine DEGs, number of samples in each group, sex of samples used, etc. Greater transparency of data would be appreciated, so plots that show variation between replicates, such as heat maps, would be ideal. Supplemental tables would also be nice.

      We have added to the methodology of the RNA sequencing analysis

      (9) The use of alternate DAMPs is a positive addition to the experimental design, but no data is given regarding the concentrations used. Ideally, positive controls showing histones/NAD are used at acutely activating concentrations could be included but at least references supporting the doses chosen or information about how doses were selected should be given. It is easy to find substantial literature on histones as a DAMP, but it was unclear why/how NAD was selected.

      We have added these concentrations and corresponding references.

      (10) The E.coli CFU reported in Figure 5B are extraordinarily low. In addition, CFU are generally shown on a log scale, but this appears to be linear. Please confirm that these data are correct. Perhaps improved methods might explain why? Is the second hit a low dose?

      These have been corrected in the new Fig. 6B.

      (11) Given that loss of either TWIK2 or Nlrp3 ablates bacterial protection, a link should be tested - experiments should test whether loss of TWIK2 prevents inflammasome activation in response to ATP (TWIK2 KO in 4E) and if loss of Nlrp3 changes TWIK2 translocation (Nlrp3 KO in at least some experiments of Figures 2/3).

      We have now added the TWIK KO results (new Fig. 5E).

      (12) One of the most striking data pieces is Figure 1D. It would, therefore, strengthen the paper to repeat those experiments (even just with the high-dose ATP) using TEIK2/P2X7/NLRP3 KO mice and really show the importance of these pathways in vivo. This is conceptually Figure 5, but the survival data of Figure 1 is far more convincing than the relatively weak bacterial load data of Figure 5.

      Unfortunately, our previous laboratory has been closed and we have trouble acquiring enough mice for additional survival data during the transition period. However, the bacterial load data has been adjusted to the same bacterial counts per 5 mg lung tissue instead of per individual sampling, giving a more contextual interpretation of the data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      The reviewer was under the impression that that we did not perform biological replicates of our ChIP-seq experiments. All ChIP-seq (and ATAC-seq) experiments were performed with biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. We had indicated this in the text and methods but will try to make this even clearer.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      Histone modifications are difficult to alter genetically because of the high copy number of histone genes and inhibition of HATs/HDACs in general leads to alterations in other histone modifications. It is an inherent challenge in establishing causality of histone modifications, especially histone acetylation marks.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      We have modified the text in response to this point. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

      We agree that the paper does not provide mechanistic details or solid causality of H3K115ac. We have only emphasized the potential role of H3K115ac in nucleosome fragility based on our in vivo data and previously published in-vitro experiments (Manohar et.al., 2009, Chatterjee et. al., 2015). We do provide the evidence that H3K115ac is enriched on subnucleosomal particles via sucrose gradient sedimentation of MNase-digested chromatin (Figure 3C-D).

      Reviewer #2 (Public review):

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region. It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      ChIP-qPCR in S1B includes competition from native chromatin and shows high specificity to its target. We have provided antibody validation in three ways:

      - Western blot with dot-blot of synthetic peptides (Figure S1A).

      - Western blots with Whole cell extracts (Figure 4D).

      - ChIP-qPCR on native chromatin spiked with a cocktail of synthetic mono-nucleosomes, each carrying a single acetylation and a specific barcode (SNAP-ChIP K-AcylStat Panel).

      We could not include H3K115ac marked nucleosomes as they are not available in the panel. Figure S1B shows that the H3K115ac antibody exhibits negligible binding to known K-acyl marks, comparable to an unmodified nucleosome. Because of the absence of a H3K115ac modified barcoded nucleosome, we used the KLF4 promoter from mESCs as a positive control, in agreement with ChIP-seq signal shown in the genome browser profile (Figure 1E), the KLF4 promoter shows a significantly higher signal than the gene body.

      (2) The association of H3K115ac with fragile nucleosomes is based on MNase-sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      We have performed ChIP-seq on MNase digested mESC chromatin fractionated on sucrose gradients and this shows that H3K115ac is enriched in fractions containing sub-nucleosomal and fragile nucleosomes but depleted in fractions containing stable nucleosomes (Figure 3D).

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      H3K64ac and H3K122ac datasets were generated by us in a previous publication (Pradeepa et. al., 2016) using same native MNase ChIP protocol as used here. The ChIP-seq datasets for H3K122ac and H3K27ac are processed in an identical manner, with the same computational pipelines, to the H3K115ac data sets generated in this paper.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      We agree with the reviewer’s comment, but we have not claimed causality.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      Due to broad target specificity, redundancies and crosstalk among different classes of HATs and HDACs, it is not tractable to answer this question in the current manuscript.

      Reviewer #3 (Public reviews):

      Reviewer 3 is mistaken in thinking our ChIP experiments are performed under cross-linked conditions. As clearly stated in the main text and methods, all our ChIP-seq for histone modifications is done on native MNase-digested chromatin – with no cross-linking. This includes the spike-in experiment shown in Fig S1B to test H3K115ac antibody specificity against the bar-coded SNAP-ChIP® K-AcylStat Panel from Epicypher. We could not include H3K115ac bar-coded nucleosomes in that experiment since they are not available in the panel.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I have two primary concerns that resound through the entire paper:

      (a) Overall, the manuscript is making strong claims based on entirely correlative datasets. No quantitative analyses are performed to demonstrate co-occupancy/localization. Please see more detailed descriptions below.

      Our responses to specific points are provided against each comment below.

      (b) Lack of paired-end replicates for H3K115ac ChIP-seq. While the reviewer token for the deposited data was not made accessible to me, looking at Supplementary Table 1, it appears there are two H3K115ac ChIP-seq datasets. One is paired-end and is single-read. So are peaks called with only one replicate of PE? Or are inaccurate peaks called with SR datasets? Either way, this is not a rigorous way to evaluate H3K115ac localization.

      We are sorry that this reviewer was not able to access the data – the token for the GEO accession was provided for reviewers at the journal’s request. All ChIP-seq (and ATAC-seq) experiments (paired and single-end) were performed with two biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. This was indicated in both the main text and in the methods. In the revised manuscript we have tried to make this even clearer and have put the relevant Pearsons coefficient (r) into the text at the appropriate places. For the reviewer’s information, here is the complete list of data samples in the GEO Accession:

      Author response image 1.

      While I agree that H3K115ac occupancy is high at +CGIs, the authors downplay that H3K122ac and H3K27ac is also more highly enriched at these locations (page 7, last sentence of first paragraph). I imagine this is all due to the more highly transcribed nature of these genes. Sub-stratifying the K27ac and K122ac by transcription (as in Figure 1G) would help to demonstrate a unique nature of H3K115ac. But even better would be to do an analysis that plots H3K115ac enrichment vs transcription for every individual gene rather than aggregate analyses that are biased by single locations. For example, make an XY scatterplot of RNAPII occupancy or 4SU-seq signal vs H3K115ac level, where each point represents a single gene. Because the interpretation that it is CGI-based and not transcription is confounded with the fact that -CGI are more lowly transcribed. So, looking at Figure 1G, even the -CGI occupancy of H3K115ac is correlated with transcription, but it is just more lowly transcribed.

      We thank the reviewer for these suggestions but point out that Figure 1G shows H3K115ac signal for CGI+ and CGI– TSS that are matched for expressions levels (quartiles of 4SU-seq). Fig 1F shows that H3k115ac is much more of a discriminator between CGI+ and – than H3K27ac or H3K122ac.

      (2) H3K115ac, H3K27ac, and H3K122ac are all more enriched (in aggregate) at +CGI locations (Fig 1F); so do these locations just have more positioned nucleosomes? More H3.3? So that these PTMs are just more enriched due to the opportunity?

      Positioned nucleosomes are generally found downstream of the TSS of active CpG island promoters, so what the reviewer suggests may well account for the relative enrichment of H327ac and H3K122ac at CGI+ vs CGI- promoters in Fig.1F. But H3K115ac localisation is distinct, with the peak at the nucleosome-depleted region not the +1 nucleosome. This is also confirmed by the contour plots in Fig 3. Our observation is also not explained by an enrichment of H3.3 at CGI promoters, since we show that H3K115ac is not specific to H3.3 (Fig 4D).

      (3) The authors note in paragraph 2 of page 7 that "H3K115ac does not scale linearly with gene expression..." but the authors never show a quantification of this; stratification in four clusters is not able to make a linear correlation. Furthermore, in the second line of page 7, the authors state that the levels do generally correlate with transcription. To claim it is a specific CGI link and not transcription is tricky, but I encourage the authors to consider more quantifiable ways, rather than correlations, to demonstrate this point, if it is observed.

      We thank the reviewer for this comment, and taking it into consideration, we have decided to re-phrase this paragraph. The new text reads: “Non-CGI promoters have lower overall levels of transcription compared to CGI promoters, and for this promoter class H3K115ac enrichment detected by ChIP is only really seen for the highest quartile of transcription (4SU) quartile of expression (Figure 1G). CGI promoters on the other hand, exhibit significant levels of detected H3K115ac even for the lowest quartile of expression. These results suggest a special link between CGI promoters and H3K115ac”.

      (4) The authors claim on page 7 that "on average, transcription increased from TSS that also gained H3K115ac but to a modest extent, compared with the more substantial loss of H3K115ac from downregulated TSS". However, both upregulated and downregulated are significant; the difference in magnitude could simply be due to more highly or more lowly transcribed locations, meaning that fold change could be more robustly detected. I caution the authors to substantiate claims like this rather than stating a correlation.

      We thank the reviewer for this comment which relates to the data in Fig 2A. It is Fig. 2B shows that the association of H3K115ac loss with downregulation is statistically stronger than H3K115ac gain with upregulation, but only for CGI promoters. With regard to the text on the original pg 7 that is referred to, we have now reworded this to read “Average levels of transcription increased from TSS that also gained H3K115ac, and there was loss of H3K115ac from downregulated TSS (Figure 2A).”

      (5) For Figure 2C, the authors argue that H3K115ac correlate with bivalent locations. So this is all qualitative and aggregate localization; please quantitatively demonstrate this claim.

      Figure S2D provides statistics for this (observed/expected and Fishers exact test).

      (6) The authors claim in Figure 2 that H3115ac is dynamic during differentiation (title of Figure 2). However, there are locations that gain and lose, or maintain H3K115ac. In fact, the most discussed locations are H3K115ac with no change (2C); which means it is NOT dynamic during differentiation. So what is the message for the role during differentiation? From Supplemental Table 1, it appears there is a single ChIP experiment for H3K115ac in NPC, and it is a single read. So this is also a difficult claim with one replicate. Related to this, in S2A, the authors show K115ac where there is no change in transcription; so what is the role of H3K115ac at TSSs relevant to differentiation - it is at both locations changed and unchanged in transcription, but H3K115ac levels itself do not change at these subsets. So, how is this dynamic? This is very confusing, and clearer analyses and descriptions are necessary to deconvolute these data.

      We apologise for the misleading title for Figure 2. This has now been amended to “Changes in H3K115ac during differentiation”. The message of this figure is that whilst changes in H3K115ac at TSS are small (panels A-C), at enhancers the changes are much more dramatic (panel D). The reviewer is incorrect about the number of replicates for NPCs – there are two biological replicates (see response to point 1b).

      (7) The authors go on to examine H3K115ac enrichment on fragile nucleosomes through sucrose gradient sedimentation. A control for H3K27ac or H3K122ac would be nice for comparison.

      We do not have the material available to perform these experiments

      (8) When discussing Figures 3 and SF3, the authors mention performing a different MNase for a second ChIP. Showing the MNase distribution for both the more highly digested and the lowly digested would be nice. a) Related to the above, the authors show input in SF3E to argue that the difference in H3K115ac vs H3K27ac is not due to the library, but they do not show the MNase digestion patterns, which is more important for this argument.

      Input libraries (first two graphs of FigS3E) are the MNase-digested chromatin. Comparison of nucleotide frequencies from millions of reads is more robust method than the fragment length patterns.

      (9) The authors move on to examine H3K115ac at enhancers. Just out of curiosity, given what was found at promoters, is H3K115ac enriched at +CGI enhancers? And what is the correlation with enhancer transcription?

      This is an interesting point, but the number of enhancers associated with CGI is not very high and so we did not focus on this. We have not analysed a correlation with eRNAs in this paper.

      (10) The authors state on page 14 that the most frequent changes in H3K115ac during differentiation are at these enhancers. So do these changes connect with differentiation-specific genes, and/or genes that have altered transcription during differentiation? Just trying to understand the functional role.

      Given the challenges of connecting enhancers with target genes, we have not addressed this question quantitatively. However, we draw the reviewer’s attention to the Genome Browser shots in Figures 2D and S2C, which show clear gain of H3K115ac (and ATAC-seq peaks) at intra and intergenic regions close to genes whose transcription is activated during the differentiation to NPCs.

      (11) Related, at the end of page 14, the authors state that the changes in H3K115ac correlate with changes in ATAC-seq; I imagine this dynamic is not unique for H3K115ac and this is observed for other PTMs (H3K27ac), so assessing and clarifying this, to again get to the specific interest of H3K115ac, would be ideal.

      We have not claimed that chromatin accessibility is unique to H3K115ac. It is the location of H3K115ac which is found inside the ATAC-seq peak region while H3K27ac is found only upstream/downstream of the ATAC peak that is so striking. This is apparent in Fig 4C.

      (12) The authors examine levels of H3K115ac in H3.3 KO cell lines via western blot (Figure 4D), but no replicates and/or quantification are shown.

      We now provide a biological replicate for the Western Blot (new FigS4H) together with an image of the whole gel for the data in Fig 4D

      (13) In Figure S4 and at the end of page 17, the authors are arguing that there is a link to pioneer TF complexes, based on Oct4 binding. First, while Oct4 has pioneering activity, not all Oct4 sites (or motifs) are pioneering; this has been established. So if you want to use Oct4, substratifying by pioneer vs no pioneer is necessary. Second, demonstrating this is unique to pioneer and not to non-pioneer TFs would be an important control.

      In response to the reviewer’s comment, we have removed the term “pioneer” from the manuscript.

      (14) Minor point: Figure 4 A and B, there are some formatting issues with the scale bars.

      We thank the reviewer for pointing this out, and the errors have been corrected in the revised figure.

      (15) Minor point is that it should be clear when single replicates of data are used and when PE/SR sequences are combined or which one is used in each analysis, as this was hard to discern when reading the paper and figure legends.

      We have clearly stated in the text that, after Figure2, we repeated all experiments in paired-end mode. All processing steps are defined separately for single end and paired end datasets in the method section. Details of biological replicates are provided in Sup. Table 1. These concerns are also addressed in our response to Reviewer’s public comment-1.

      (16) Minor point: it is surprising that different MNase and different units were used in the ChIP vs sucrose sedimentation. Could the authors clarify why?

      Chromatin prep for sucrose gradients were done on a much larger scale than for ChIP-seq and required different setups to obtain the right level of MNase digestion.

      (17) The authors note that fragile nucleosomes contain H2A.Z and H3.3, but they never perform an analysis of available data to demonstrate a correlation (or better a quantifiable correlation) between H3K115ac occupancy and these marks at the locations they identify H3K115ac.

      Since have shown (Fig. 4) that depletion of H3.3 does not affect overall levels of H3K115ac, we do not think there is value in further quantitative correlative analyses of H3K115ac and variant histones.

      (18) Minor point: What is the overlap in peaks for H3K115ac, H3K122ac, and H3K27ac (Figure 1C)?

      Nearly all H3K115ac peaks overlap with H3K122ac and/or H3K27ac. Its most distinct properties are its association with CGI promoters, fragile nucleosomes and its unique localisation within the NDRs, three points that the manuscript is focussed on.

      Reviewer #3 (Recommendations for the authors):

      (1) The western blot results in Figure 4D probing for H3, H3.3, and H3K115ac use Ponceau S staining, presumably of an area of the membrane where histones might be expected to migrate, as a measure of loading. However, the Ponceau S bands appear uniformly weaker in the H3.3KO lanes, yet despite this, blotting with H3.3 antibody detects a band in H3.3 knockout ESCs, suggesting that the antibody does not have a high degree of specificity. Again, a blocking experiment with appropriate peptides would instill more confidence in the specificity of these reagents, and/or the authors could provide independent validation of the knockout model to differentiate between a partial knockout or antibody cross-reactivity (e.g., by Sanger sequencing).

      In a revised Fig. S4H we now show the whole gel corresponding to this blot but including co-staining with an antibody for H4 to provide a better loading control. We also provide a biological replicate of this Western blot in the lower panel of Fig. S4H.

      (2) The manuscript would benefit from in vitro follow-up and validation, but if the authors intend to keep the manuscript primarily in silico, I suggest dedicating a few lines in each section to explain the plots, their axes, and their purpose, as well as to assist with interpretation, rather than directly discussing the results. This would make the manuscript more accessible and understandable for a broader audience in the field of epigenetics.

      In the revised version, we have tried to improve the text to make the data more accessible to a broad audience.

    1. eLife Assessment

      This potentially important study explores the specificity of olfactory perceptual learning. In keeping with previous work, the authors found that learning to discriminate between two enantiomers does not generalize across the nostrils or to unrelated enantiomers, whereas learning to discriminate odor mixtures does generalize across the nostrils and to other odor mixtures, with this learning effect persisting over at least two weeks. While the evidence presented to support these findings is convincing, it remains unclear why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

      Discrimination of odor enantiomers ultimately relies on the enantioselectivity of olfactory receptors, whereas mixture discrimination likely depends on relative differences in perceived configural odor notes. These processes probably engage plasticity at different stages of the olfactory pathway. The revised Discussion (p.16-18) now elaborates on this distinction and the potential underlying mechanisms. Please also refer to our responses to Reviewer 1’s Point 1 and Reviewer 2’s Points 2 and 3 below.

      Reviewer #1 (Public Review):

      This study extends a previous study by the same group on the generalization of odor discrimination from one nostril to the other. In their earlier study, the group showed that learning to discriminate between two enantiomers does not generalize across nostrils. This was surprising given the Mainland & Sobel 2001 study that found that detecting androstenone in people who do not detect it can generalize across the two nostrils. In this study, they confirmed their previous results and reported that, unlike enantiomers, learning to discriminate odor mixtures generalizes across nostrils, generalizes to other odor mixtures, and is persistent over at least two weeks.

      This interesting and important result extends our knowledge of this phenomenon and will likely steer more research. It may also help develop new training protocols for people with impairments in their sense of smell.

      We thank the reviewer for the encouraging remarks.

      The main weakness of this study is its scope, as it does not provide substantial insight into why the results differ for enantiomers and why training on odor mixtures generalizes to other odor mixtures.

      We thank the reviewer for this insightful comment. While the present study does not directly identify the neural mechanisms underlying these differences, it provides behavioral constraints on where specificity and generalization may arise within the olfactory system. Further neuroimaging and neurophysiological work will be needed to fully elucidate the underlying mechanisms.

      Reviewer #2 (Public Review):

      The manuscript from Chang et al. taps on an important issue in olfactory perceptual plasticity, named the generalization of perceptual learning effect by training using odors. They employed a discrimination training/learning task with either binary odor mixture or odor enantiomers, and tested for post-training effect at several time intervals. Their results showed contrasting patterns of specificity (enantiomers) and transfer (odor mixtures), and the learning effect persisted at 2 weeks post-training. They demonstrated that the effect was independent of task difficulty, olfactory adaptation and gender.

      Overall this was a well-controlled study and shows novel results. The strength of the study includes the consideration of odor structure and perceptual (dis)similarity and the control training condition.

      We appreciate the reviewer’s positive assessment of our work.

      I have two minor issues that hope the authors could address in the next version of the manuscript.

      (1). The author used a binary odor mixture with a ration 7:9 or 9:11, why is this ratio chosen and used for the experiment?

      This ratio was selected based on pilot testing and practical constraints. During piloting, we evaluated several mixing ratios to identify those that met two key criteria: (1) Baseline indiscriminability: Most participants were unable to reliably discriminate between the two binary mixtures in a:b and b:a ratios at baseline. (2)Trainability: With 1–5 weeks of training, participants could acquire the ability to discriminate between them.

      The a:b ratios of 7:9 and 9:11 were the ratios that met both criteria in our pilot testing, making them suitable for assessing training‑induced improvements in mixture discrimination. This clarification has been added to the revised Olfactory Stimuli subsection of the Materials and Methods (p.19-20 of the revised manuscript).

      (2) Over the course of training, has the valence of odor (odor mixture) changed, it would be helpful to include these results in the supplements. As the author indicated in the discussion, the potential site underlying the transfer effect is the OFC, which has been found to represent odor valence previously (Anderson, Christoff et al. 2003). It would be nice to see the author replicate the results with odor/odor mixture valence (change) controlled.

      Anderson, A. K., K. Christoff, I. Stappen, D. Panitz, D. G. Ghahremani, G. Glover, J. D. Gabrieli and N. Sobel (2003). "Dissociated neural representations of intensity and valence in human olfaction." Nat Neurosci 6(2): 196-202.

      Odor valence ratings were not collected in Experiments 1 and 2. However, we have since conducted a new experiment examining concentration discrimination learning (see our response to Reviewer 1, Point 1), using the constituents of the mixtures from Experiment 2 as stimuli (i.e., concentration pairs of acetophenone, 2 octanone, methyl salicylate, and isoamyl butyrate). In this new experiment (now incorporated as Experiment 3 in the revised manuscript), unilateral odor valence ratings were collected at baseline (Day 0) and at the post training test and retests on Days N, N+1, N+3, N+7, and N+14.

      For all odor pairs (training and controls), there was no significant change in perceived valence from baseline to Day N, regardless of nostril (ps > 0.05 for the main effects of session and nostril, as well as their interaction; Figure S5D). Moreover, odor valence ratings remained stable across the five post training test sessions (ps ≥ 0.29 for the main and interaction effects involving session), showing the same pattern as at baseline (Figure S5D, F). Thus, training appeared to have no measurable influence on odor valence perception. These results have been incorporated into the revised manuscript on p.14-15.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors tested the hypothesis that at high elevations avian eggs will be adapted to prevent desiccation that might arise from loss of water to surrounding drier air. They used a combination of gas diffusion experiments and scanning electron microscopy to examine water vapour conductance rates and eggshell structure, including thickness, pore size, and pore density among 197 bird species distributed along an elevational gradient in the Andes. While there was a correlation between water vapour conductance and elevation among species, a decrease in water vapour conductance with elevation was not associated with eggshell thickness, pore size, and pore density, suggesting the variation in the structure of the eggshells is unlikely to do with among species differences in water loss along elevational gradients. This study is very interesting and timely, especially with increasing water vapour pressure due to climate warming. It is a very well-written study and easy to read. However, I have some concerns about the conclusions drawn from the results.

      There are more than twice as many species in low and medium-elevation sites compared to high-elevation sites, so the amount of variation in low and medium-elevation should be expected to be higher by default. The argument for a wider range of variation in lowelevation species will be stronger if the comparison was a similar sample size. Moreover, the pattern clearly breaks down within families. Note also that for Low and medium elevation there is no difference in the amount of variation in conductance residuals possibly because the sample sizes are similar. The seemingly strong positive correlation between eggshell conductance and egg mass may be driven by the five high and two medium-elevation species with large eggs. There seem to be hardly any high-elevation species with egg mass greater than 12g whereas species in low elevation egg size seem to be as high as 80g (Figure 2a). Since larger eggs (and thus eggs of larger birds) lose more water compared to smaller eggs, the correlation between water vapour conductance and elevation may be more strongly associated with body size distribution along elevational gradients rather than egg structure and function.

      We thank the reviewer for this thoughtful observation. As noted in our response to comment 3, we recognize that the higher number of species at low and mid-elevations reflects the natural turnover in species richness along elevational gradients, and we are transparent about this caveat in our revised Discussion section. Nevertheless, to address this specific concern, we conducted additional analyses excluding the species with large eggs (i.e., egg mass >12g, which are only present at low and mid-elevations in our dataset). These analyses are now included in the Supplementary Figure 1, and the main pattern of lower water vapor conductance at high elevations holds even when larger eggs are excluded.

      We agree that the well-known scaling relationship between egg mass and conductance (recognized since the 1970s) may partially explain the observed trends across the elevational gradient. Our aim was to explore whether the known relationship between egg size and conductance varies when incorporating environmental variables such as elevation, which brings with it changes in humidity and oxygen availability. While we acknowledge the possible confounding effect of body size distributions along the gradient, our results, even after controlling for egg size (residual analysis), still suggest a decrease in conductance at higher elevations, consistent with predictions based on environmental conditions.

      We have clarified these points in the revised Discussion, including the acknowledgment that disentangling the relative contributions of body size and elevation to conductance patterns remains challenging and warrants further study.

      Authors argue that the observed variation in the relationship between water vapour conductance and elevation among and within bird families suggests potential differences in the adaptive response to common selective pressures in terms of eggshell thickness and pore density, and size. The evidence for this is generally weak from the data analyses because the decrease in water vapour conductance with elevation was not consistent across taxonomic groups nor were differences associated with specific patterns in eggshell thickness and pore density, and size.

      We appreciate the reviewer’s comments on the observed variation in water vapor conductance across taxonomic groups. As mentioned in response to comment 7, we have removed the explicit analyses and figures showing within-family comparisons, as these were exploratory and not directly tied to a specific hypothesis. We have also toned down our speculations regarding the potential adaptive drivers of the observed variation. In the revised Discussion, we emphasize the need for further research to explore these patterns and acknowledge the limitations of our current dataset in making strong conclusions about the adaptive responses to selective pressures.

      It is not clear how the authors expected the relationship between water vapour conductance and elevation to differ among taxonomic groups and there was no attempt to explain the biological implication of these differences among taxonomic groups based on the specific traits of the species or their families. This missing piece of information is crucial to justify the argument that differences among taxonomic groups may be due to differences in adaptive response.

      We appreciate the reviewer’s point. To clarify, we were not expecting the relationship between water vapor conductance and elevation to differ among taxonomic groups. Rather, our primary hypothesis was that water vapor conductance would decrease with elevation due to the drier conditions in highland habitats, and we sought to link this pattern with structural characteristics of the eggshell. The suggestion of potential differences among taxonomic groups arose from the lack of a consistent pattern across families, which prompted us to consider possible adaptive variation. We now address this more clearly in the Discussion section, acknowledging the need for further exploration into the potential selective pressures driving this variation among taxonomic groups.

      Reviewer #2 (Public Review):

      This paper represents a strong advance for two main reasons. First, it provides evidence that egg physiology varies with elevation as predicted by the hypothesis that eggs are physiologically adapted to certain climatic conditions. This means egg physiological adaptation is a factor that could influence species' elevational ranges. Second, it is a proof-of-concept study that shows it is possible to measure eggshell physiology for a large number of species in the field in order to test hypotheses. As such, it should inspire many further tests that examine adaptation in egg physiology in the context of species' distributions along environmental gradients.

      There are two caveats that readers should be aware of. First, measuring these traits is difficult, and there remain questions about the efficacy of different methods. For example, the authors note that quantifying eggshell structures is very difficult, with several unresolved questions about their method of using scanning electron microscopy images to measure eggshell pores. Similarly, the authors mention that temperature variation may partially influence their main result that high-elevation eggs lose water at slower rates than low-elevation eggs (temperatures were colder for experiments at high elevations than for low elevations). Second, I regard the analyses of eggshell traits for specific families as exploratory. There are no a priori expectations for how different families might be expected to differ in their patterns. These analyses are fruitful in that they generate additional hypotheses that future work can test. However, it does mean that the statistical significance of eggshell trait relationships with elevation for specific families should be interpreted with caution.

      We thank Reviewer 2 for these insightful comments. As mentioned earlier, measuring these traits is indeed very challenging, and we acknowledge the limitations of our methods, particularly when it comes to using scanning electron microscopy to quantify eggshell structures. We are aware of the unresolved questions around these techniques, and we plan to continue refining these methods in future studies. Regarding the influence of temperature variation on water loss, we recognize that colder temperatures at high elevations may have influenced our results, and we address this potential confounding factor in the Discussion section, Line 257.

      We also agree with the reviewer’s point regarding the exploratory nature of the family-specific analyses. These analyses were not guided by specific hypotheses, other than the expectation of replicating the overall pattern, and we recognize that they should be interpreted with caution. They serve primarily to generate additional hypotheses for future studies. In the revised manuscript, we have toned down the emphasis on the statistical significance of eggshell trait relationships with elevation for specific families, and we emphasize the need for further research to confirm these patterns.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors assess the effectiveness of electroporating mRNA into male germ cells to rescue the expression of proteins required for spermatogenesis progression in individuals where these proteins are mutated or depleted. To set up the methodology, they first evaluated the expression of reporter proteins in wild-type mice, which showed expression in germ cells for over two weeks. Then, they attempted to recover fertility in a model of late spermatogenesis arrest that produces immotile sperm. By electroporating the mutated protein, the authors recovered the motility of ~5% of the sperm; although the sperm regenerated was not able to produce offspring using IVF, the embryos reached the 2-cell state (in contrast to controls that did not progress past the zygote state).

      This is a comprehensive evaluation of the mRNA methodology with multiple strengths. First, the authors show that naked synthetic RNA, purchased from a commercial source or generated in the laboratory with simple methods, is enough to express exogenous proteins in testicular germ cells. The authors compared RNA to DNA electroporation and found that germ cells are efficiently electroporated with RNA, but not DNA. The differences between these constructs were evaluated using in vivo imaging to track the reporter signal in individual animals through time. To understand how the reporter proteins affect the results of the experiments, the authors used different reporters: two fluorescent (eGFP and mCherry) and one bioluminescent (Luciferase). Although they observed differences among reporters, in every case expression lasted for at least two weeks. The authors used a relevant system to study the therapeutic potential of RNA electroporation. The ARMC2-deficient animals have impaired sperm motility phenotype that affects only the later stages of spermatogenesis. The authors showed that sperm motility was recovered to ~5%, which is remarkable due to the small fraction of germ cells electroporated with RNA with the current protocol. The sperm motility parameters were thoroughly assessed by CASA. The 3D reconstruction of an electroporated testis using state-of-the-art methods to show the electroporated regions is compelling.

      The main weakness of the manuscript is that although the authors manage to recover motility in a small fraction of the sperm population, it is unclear whether the increased sperm quality is substantial to improve assisted reproduction outcomes. The authors found that the rescued sperm could be used to obtain 2-cell embryos via IVF, but no evidence for more advanced stages of embryo differentiation was provided. The motile rescued sperm was also successfully used to generate blastocyst by ICSI, but the statistical significance of the rate of blastocyst production compared to non-rescued sperm remains unclear. The title is thus an overstatement since fertility was never restored for IVF, and the mutant sperm was already able to produce blastocysts without the electroporation intervention.

      Overall, the authors clearly show that electroporating mRNA can improve spermatogenesis as demonstrated by the generation of motile sperm in the ARMC2 KO mouse model.

      We thank the reviewer for this thoughtful and constructive comment. We agree that our study demonstrates a partial functional recovery of spermatogenesis rather than a complete restoration of fertility. Our main objective was to establish and validate a proof-of-concept approach showing that mRNA electroporation can rescue the expression of a missing or mutated protein in post-meiotic germ cells and result in the production of motile sperm.

      To address the reviewer’s concern, we have the title and discussion to more accurately reflect the scope of our findings. The new title reads:

      “Sperm motility in mice with oligo-astheno-teratozoospermia restored by in vivo injection and electroporation of naked mRNA”

      In the manuscript, we now emphasize that while motility recovery was significant, complete fertility restoration was not achieved. We have also clarified that:

      The 5% recovery in motile sperm represents a substantial improvement considering the small population of germ cells reached by the current electroporation method.

      The 2-cell embryo formation observed after IVF serves as a strong indication of partial functional recovery

      Finally, we now explicitly state in the Discussion that this approach should be considered a therapeutic proof-of-concept, demonstrating feasibility and potential, rather than a fully curative intervention.

      Reviewer #2 (Public review):

      The authors inject, into the rete testes, mRNA and plasmids encoding mRNAs for GFP and then ARMC2 (into infertile Armc2 KO mice) in a gene therapy approach to express exogenous proteins in male germ cells. They do show GFP epifluorescence and ARMC2 protein in KO tissues, although the evidence presented is weak. Overall, the data do not necessarily make sense given the biology of spermatogenesis and more rigorous testing of this model is required to fully support the conclusions, that gene therapy can be used to rescue male infertility.

      In this revision, the authors attempt to respond to the critiques from the first round of reviews. While they did address many of the minor concerns, there are still a number to be addressed. With that said, the data still do not support the conclusions of the manuscript.

      We thank the reviewer for their careful and detailed assessment of our manuscript. We appreciate the concerns raised regarding mRNA stability, GFP localization, and the interpretation of spermatogenesis stages, and we have addressed these points in the manuscript and in the responses below.

      (1) The authors have not satisfactorily provided an explanation for how a naked mRNA can persist and direct expression of GFP or luciferase for ~3 weeks. The most stable mRNAs in mammalian cells have half-lives of ~24-60 hours. The stability of the injected mRNAs should be evaluated and reported using cell lines. GFP protein's half-life is ~26 hours, and luciferase protein's half-life is ~2 hours.

      We thank the reviewer for this important comment. The stability of mRNA-GFP was assessed by RT-QPCR in HEK cells and seminiferous tubule cells (Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells (Fig. 5A). Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability, efficient translation within germ cells and the slow protein turnover that is typical of the spermatogenic lineage.

      (2) There is no convincing data shown in Figs. 1-8 that the GFP is even expressed in germ cells, which is obviously a prerequisite for the Armc2 KO rescue experiment shown in the later figures! In fact, to this reviewer the GFP appears to be in Sertoli cell cytoplasm, which spans the epithelium and surrounds germ cells - thus, it can be oft-confused with germ cells. In addition, if it is in germ cells, then the authors should be able to show, on subsequent days, that it is present in clones of germ cells that are maturing. Due to intracellular bridges, a molecule like GFP has been shown to diffuse readily and rapidly (in a matter of minutes) between adjacent germ cells. To clarify, the authors must generate single cell suspensions and immunostain for GFP using any of a number of excellent commercially-available antibodies to verify it is present in germ cells. It should also be present in sperm, if it is indeed in the germline.

      We thank the reviewer for this insightful comment. To directly address the concern, we performed additional experiments to assess GFP expression in germ cells following in vivo mRNA delivery. GFP-encoding mRNA was injected and electroporated into the testes on day 0. On day 1, testes were collected, enzymatically dissociated, and the resulting seminiferous tubule cell suspensions were cultured for 12 hours. Live cells were then analyzed by fluorescence microscopy (Fig. 10).

      We observed GFP expression in various germ cell types, including pachytene spermatocytes (53,4 %) (Fig 10 A-), round spermatids (25 %) (Fig 10B-E) and in elongated spermatids (11,4%) (Fig 10 C-E). The identification of these cell types was based on DAPI nuclear staining patterns, cell size fig 10 F, non-adherent characteristics, and the use of an enzymatic dissociation protocol.

      Fluorescence imaging revealed strong cytoplasmic GFP signals in each of these populations, confirming efficient transfection and translation of the delivered mRNA. These results demonstrate that the in vivo injection and electroporation protocol enables effective mRNA transfection across multiple stages of spermatogenesis. These results confirm that the injected mRNA is efficiently translated in germ cells at various stages of spermatogenesis. Together, these data validate the germ cell-specific nature of the GFP signal, supporting the Armc2 KO rescue experiments.

      As mentioned previously, we assessed the stability of mRNA-GFP using RT-QPCR in HEK cells and seminiferous tubule cells (see Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells. Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability and local translation within germ cells, as well as the slow protein turnover typical of the spermatogenic lineage.

      Other comments:

      70-1 This is an incorrect interpretation of the findings from Ref 5 - that review stated there were ~2,000 testis-enriched genes, but that does not mean "the whole process involves around two thousand of genes"

      We thank the reviewer for this helpful comment. We agree that our previous phrasing was imprecise. We have revised the sentence to clarify that approximately 2,000 genes show testis-enriched expression, rather than implying that the entire spermatogenic process is limited to these genes. The corrected sentence now reads:

      “Spermatogenesis involves the coordinated expression of a large number of genes, with approximately 2,000 showing testis-enriched expression, about 60% of which are expressed exclusively in the testes”

      74 would specify 'male':

      we have now specified it as you suggested.

      79-84 Are the concerns with ICSI due to the procedure itself, or the fact that it's often used when there is likely to be a genetic issue with the male whose sperm was used? This should be clarified if possible, using references from the literature, as this reviewer imagines this could be a rather contentious issue with clinicians who routinely use this procedure, even in cases where IVF would very likely have worked:

      We thank the reviewer for this important comment. Concerns about ICSI outcomes indeed reflect two partly overlapping causes: the procedure itself (direct sperm injection and associated laboratory manipulations) and the clinical/genetic background of couples undergoing ICSI (especially men with severe male-factor infertility). Large reviews and meta-analyses report a small increase in some perinatal and congenital risks after ART/ICSI, but these studies conclude that it is difficult to fully disentangle procedural effects from parental factors. Importantly, genetic or epigenetic abnormalities in the male (which motivate use of ICSI) likely contribute to adverse outcomes in offspring, while some studies also suggest that ICSI-specific manipulations may alter epigenetic marks in embryos. For these reasons professional bodies recommend reserving ICSI for appropriate male-factor indications rather than as routine insemination for non-male-factor cases

      We have revised the text accordingly to clarify this distinction:

      “ICSI can efficiently overcome the problems faced.  Nevertheless, concerns persist regarding the potential risks associated with this technique, including blastogenesis defect, cardiovascular defect, gastrointestinal defect, musculoskeletal defect, orofacial defect, leukemia, central nervous system tumors, and solid tumors [1-4]. Statistical analyses of birth records have demonstrated an elevated risk of birth defects, with a 30-40 % increased  likelihood in cases involving ICSI [1-4], and a prevalence of birth defects between 1 % and 4 % [3]. It is important to note, however, that the origin of these risks remains debated. Several large epidemiological and mechanistic studies indicate that both the procedure itself (direct microinjection and in vitro manipulation) and the underlying genetic or epigenetic abnormalities often present in men requiring ICSI contribute to the observed outcomes [1, 3] [5, 6] . To overcome these drawbacks, a number of experimental strategies have been proposed to bypass ARTs and restore spermatogenesis and fertility, including gene therapy [7-10].”

      199 Codon optimization improvement of mRNA stability needs a reference;

      We have added the references accordingly: [11-15]

      In one study using yeast transcripts, optimization improved RNA stability on the order of minutes (e.g., from ~5 minutes to ~17 minutes); is there some evidence that it could be increased dramatically to days or weeks?

      We agree with the reviewer that codon optimization can enhance mRNA stability, but available evidence indicates that this effect is moderate. In Saccharomyces cerevisiae, Presnyak et al. (2015) [16] showed that codon optimization increased mRNA half-life from approximately 5 minutes to ~17 minutes, representing a several-fold improvement rather than a shift to days or weeks. Similar codon-dependent stabilization has been observed in mammalian systems, where transcripts enriched in optimal codons exhibit longer half-lives and enhanced translation efficiency [11]; [17]). However, these studies consistently report effects on the scale of minutes to hours. In mammalian cells, the prolonged stability of therapeutic or vaccine mRNAs—lasting for days—is primarily achieved through additional features such as optimized untranslated regions, chemical nucleotide modifications (e.g., N¹-methylpseudouridine), and protective delivery systems, rather than codon usage alone ([18]; [19]).

      Other molecular optimizations that improve in vivo mRNA stability and translation include a poly(A) tail, which binds poly(A)-binding proteins to protect the transcript from 3′ exonuclease degradation and promotes ribosome recycling, and a CleanCap structure at the 5′ end, which mimics the natural Cap 1 configuration, protects against 5′ exonuclease attack, and enhances translational initiation [11-15]. Together, these modifications act synergistically to stabilize the transcript and support efficient translation.

      472-3 The reported half-life of EGFP is ~36 hours - so, if the mRNA is unstable (and not measured, but certainly could be estimated by qRT-PCR detection of the transcript on subsequent days after injection) and EGFP is comparatively more stable (but still hours), how does EGFP persist for 21 days after injection of naked mRNA??

      We thank the reviewer for this important comment. The stability of mRNA-GFP was assessed by RT-QPCR in HEK cells and seminiferous tubule cells (Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells (Fig. 5). Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability, efficient translation within germ cells and the slow protein turnover that is typical of the spermatogenic lineage.

      Curious why the authors were unable to get anti-GFP to work in immunostaining?

      We appreciate the reviewer’s question. We attempted to detect GFP using several commercially available anti-GFP antibodies under various standard immunostaining conditions. However, in our hands, these antibodies consistently produced either no signal or high background staining, making the results unreliable. We therefore relied on direct detection of GFP fluorescence, which provides a more accurate and specific readout of protein expression in our system.

      In Fig. 3-4, the GFP signals are unremarkable, in that they cannot be fairly attributed to any structure or cell type - they just look like blobs; and why, in Fig. 4D-E, why does the GFP signal appear stronger at 21 days than 15 days? And why is it completely gone by 28 days? This data is unconvincing.

      We would like to thank the reviewer for their comments. Figure 3–4 provides a global overview of GFP expression on the surface of the testis. The entire testis was imaged using an inverted epifluorescence microscope, and the GFP signal represents a composite of multiple seminiferous tubules across the tissue surface. Due to this whole-organ imaging approach, it is not possible to resolve individual structures such as the basement membrane or lumen, which is why the signals may appear as diffuse “blobs.”

      Regarding the time-course in Figure 4D–E, the apparent increase in GFP signal at 21 days compared with 15 days likely reflects accumulation and translation of the delivered mRNA in germ cells over time, whereas the absence of signal at 28 days corresponds to the natural turnover and degradation of GFP protein and mRNA in the tissue. We hope this explanation clarifies the observed patterns of fluorescence.

      If the authors did a single cell suspension, what types or percentage of cells would be GFP+? Since germ cells are not adherent in culture, a simple experiment could be done whereby a single cell suspension could be made, cultured for 4-6 hours, and non-adherent cells "shaken off" and imaged vs adherent cells. Cells could also be fixed and immunostained for GFP, which has worked in many other labs using anti-GFP.

      We thank the reviewer for this insightful comment. To directly address the concern, we performed additional experiments to assess GFP expression in germ cells following in vivo mRNA delivery. GFP-encoding mRNA was injected and electroporated into the testes on day 0. On day 1, testes were collected, enzymatically dissociated, and the resulting seminiferous tubule cell suspensions were cultured for 12 hours. Live cells were then analyzed by fluorescence microscopy (Fig. 10).

      We observed GFP expression in various germ cell types, including pachytene spermatocytes (53,4 %) (Fig 10 A-), round spermatids (25 %) (Fig 10B-E) and in elongated spermatids (11,4%) (Fig 10 C-E). The identification of these cell types was based on DAPI nuclear staining patterns, cell size fig 10 F, non-adherent characteristics, and the use of an enzymatic dissociation protocol.

      Fluorescence imaging revealed strong cytoplasmic GFP signals in each of these populations, confirming efficient transfection and translation of the delivered mRNA. These results demonstrate that the in vivo injection and electroporation protocol enables effective mRNA transfection across multiple stages of spermatogenesis.

      These results confirm that the injected mRNA is efficiently translated in germ cells at various stages of spermatogenesis. Together, these data validate the germ cell-specific nature of the GFP signal, supporting the Armc2 KO rescue experiments.

      As mentioned previously, we assessed the stability of mRNA-GFP using RT-QPCR in HEK cells and seminiferous tubule cells (see Fig. 5). mRNA-GFP was detected for up to 60 hours in HEK cells and for up to two weeks in seminiferous tubule cells. Together, these results suggest that the long-lasting fluorescence observed in our experiments reflects a combination of transcript stability and local translation within germ cells, as well as the slow protein turnover typical of the spermatogenic lineage.

      In Fig. 5, what is the half-life of luciferase? From this reviewer's search of the literature, it appears to be ~2-3 h in mammalian cells. With this said, how do the authors envision detectable protein for up to 20 days from a naked mRNA? The stability of the injected mRNAs should be shown in a mammalian cell line - perhaps this mRNA has an incredibly long half-life, which might help explain these results. However, even the most stable endogenous mRNAs (e.g., globin) are ~24-60 hrs.

      We did not directly assess the stability of luciferase mRNA, but we evaluated the persistence of GFP mRNA, which was synthesized and optimized using the same sequence optimization and chemical modification strategy as the luciferase mRNA. In these experiments, mRNA-GFP was detectable in seminiferous tubule cells for up to two weeks after injection. We therefore expect a similar stability profile for the luciferase mRNA. These findings suggest that the prolonged fluorescence or bioluminescence observed in our study likely reflects a combination of factors, including enhanced transcript stability, local translation within germ cells, and the inherently slow protein turnover characteristic of the spermatogenic lineage.

      527-8 The Sertoli cell cytoplasm is not just present along the basement membrane as stated, but also projects all the way to the lumina:

      we clarified the sentence " Sertoli cells have an oval to elongated nucleus and the cytoplasm presents a complex shape (“tombstone” pattern) along the basement membrane, with long projections that extend toward the lumen."

      529-30 This is incorrect, as round spermatids are never "localized between the spermatocytes and elongated spermatids" - if elongated spermatids are present, rounds are not - they are never coincident in the same testis section:

      We thank the reviewer for this important comment and for drawing attention to the detailed staging of the seminiferous epithelium. We agree that the spatial organization of germ cells varies depending on the stage of spermatogenesis. While round spermatids (steps 1–8) and elongated spermatids (steps 9–16) are typically associated with distinct stages, transitional stages of the seminiferous epithelium can contain both cell types in close proximity, reflecting the continuous and overlapping nature of spermatid differentiation (Meistrich, 2013, Methods Mol. Biol. 927:299–307). We have revised the text to clarify this point, indicating that the relative positioning of germ cell types depends on the stage of the seminiferous cycle rather than implying their constant coexistence within the same tubule section.

      Fig. 7. To this reviewer, all of the GFP appears to be in Sertoli cell cytoplasm In Figs 1-8 there is no convincing evidence presented that GFP is expressed in germ cells! In fact, it appears to be in Sertoli cells.

      We thank the reviewer for their observation. As previously mentioned, we have included an additional experiment specifically demonstrating GFP expression in germ cells (fig 10). This new data provides clear evidence that the GFP signal is not restricted to Sertoli cells and confirms successful uptake and translation of GFP mRNA in germ cells.

      Fig. 9 - alpha-tubuline?

      We corrected the figure.

      Fig. 11 - how was sperm morphology/motility not rescued on "days 3, 6, 10, 15, or 28 after surgery", but it was in some at 21 and 35? How does this make sense, given the known kinetics of male germ cell development??

      We note the reviewer’s concern regarding the timing of motile sperm appearance. Variability among treated mice is expected because transfection efficiency differed between spermatogonia and spermatids. Full spermiogenesis requires ~15 days, and epididymal transit adds ~8 days, consistent with motile sperm appearing around 21 days post-injection in some mice.

      And at least one of the sperm in the KO in Fig. B5 looks relatively normal, and the flagellum may be out-of-focus in the image? With only a few sperm for reviewers to see, how can we know these represent the population?

      We thank the reviewer for their comment. Upon closer examination of the image, the flagellum of the spermatozoon in question is clearly abnormally short and this is not due to being out of focus. Furthermore, the supplementary figure shows that the KO consistently lacks normal spermatozoa. These defects are consistent with previous findings from our laboratory [22], confirming that the observed phenotype is representative of the KO population rather than an isolated occurrence.

      Reviewer #3 (Public review):

      Summary:

      The authors used a novel technique to treat male infertility. In a proof-of-concept study, the authors were able to rescue the phenotype of a knockout mouse model with immotile sperm using this technique. This could also be a promising treatment option for infertile men.

      Strengths:

      In their proof-of-concept study, the authors were able to show that the novel technique rescues the infertility phenotype of Armc2 knockout spermatozoa. In the current version of the manuscript, the authors have added data on in vitro fertilisation experiments with Armc2 mRNA-rescued sperm. The authors show that Armc2 mRNA-rescued sperm can successfully fertilise oocytes that develop to the blastocyst stage. This adds another level of reliability to the data.

      Weaknesses:

      Some minor weaknesses identified in my previous report have already been fixed. The technique is new and may not yet be fully established for all issues. Nevertheless, the data presented in this manuscript opens the way for several approaches to immotile spermatozoa to ensure successful fertilisation of oocytes and subsequent appropriate embryo development.

      [Editors' note: The images in Figure 12 do not support the authors' interpretation that 2-cell embryos resulted from in vitro fertilization. Instead, the cells shown appear to be fragmented, unfertilized eggs. Combined with the lack of further development, it seems highly unlikely that fertilization was successful.]

      We thank the reviewer for their careful evaluation and constructive feedback. We appreciate the acknowledgment of the strengths of our study, particularly the proof-of-concept demonstration that Armc2-mRNA electroporation can rescue sperm motility in Armc2 knockout mice.

      Regarding the concern raised by the editor about Figure 12, we would like to clarify two technical points. First, the IVF experiments were performed using CD1 oocytes and B6D2 sperm. Due to strain-specific incompatibilities, fertilization of CD1 oocytes by B6D2 sperm typically does not progress beyond the two-cell stage (Fernández-González [23] et al., 2008, Biology of Reproduction). Therefore, the observation of two-cell embryos represents the expected limit of development in this cross and serves as a strong indication of successful fertilization, even though further development is not possible. Second, the oocytes used in these experiments were treated with collagenase to remove cumulus cells. This enzymatic treatment can sometimes affect the morphology of early embryos, which may explain why the two-cell embryos in Figure 12 appear less regular or somewhat fragmented. We also included a control showing embryos from B6D2 sperm with the same collagenase treatment on CD1 oocytes, which yielded similar appearances (Fig14 A4).

      To provide additional functional evidence, we complemented the IVF experiments with ICSI using rescued Armc2<sup>–/–</sup> sperm and B6D2 oocytes, which allowed embryos to develop to the blastocyst stage. In these experiments, 25% of injected oocytes reached the blastocyst stage with rescued sperm compared to 13% for untreated Armc2–/– sperm (Supplementary Fig. 9) These results support the functional competence of rescued sperm and demonstrate partial recovery of fertilization ability following Armc2 mRNA electroporation.

      We have clarified these points in the revised Results and Discussion sections to emphasize that the IVF data indicate partial functional recovery of rescued sperm rather than full fertility restoration. These clarifications address the editor’s concern while accurately representing the technical limitations of the strain combination used in our experiments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Fig 12 and Supplementary Fig 9 are mislabeled in the text and rebuttal.

      We thank the reviewer for pointing this out. We have carefully checked the manuscript and the rebuttal text, and corrected all references to Figure 12 and Supplementary Figure 9 to ensure they are accurately labeled and consistent throughout the text.

      Reviewer #3 (Recommendations for the authors):

      The contribution of the newly added authors should be clarified. All other aspects of inadequacy raised in my previous report have been adequately addressed.

      No further comments.

      We thank the reviewer for noting this. The contributions of the newly added authors have been clarified in the Author Contributions section of the revised manuscript. All other points raised in the previous review have been addressed as indicated.

      References

      (1) Hansen, M., et al., Assisted reproductive technologies and the risk of birth defects--a systematic review. Hum Reprod, 2005. 20(2): p. 328-38.

      (2) Halliday, J.L., et al., Increased risk of blastogenesis birth defects, arising in the first 4 weeks of pregnancy, after assisted reproductive technologies. Hum Reprod, 2010. 25(1): p. 59-65.

      (3) Davies, M.J., et al., Reproductive technologies and the risk of birth defects. N Engl J Med, 2012. 366(19): p. 1803-13.

      (4) Kurinczuk, J.J., M. Hansen, and C. Bower, The risk of birth defects in children born after assisted reproductive technologies. Curr Opin Obstet Gynecol, 2004. 16(3): p. 201-9.

      (5) Graham, M.E., et al., Assisted reproductive technology: Short- and long-term outcomes. Dev Med Child Neurol, 2023. 65(1): p. 38-49.

      (6) Palermo, G.D., et al., Intracytoplasmic sperm injection: state of the art in humans. Reproduction, 2017. 154(6): p. F93-f110.

      (7) Usmani, A., et al., A non-surgical approach for male germ cell mediated gene transmission through transgenesis. Sci Rep, 2013. 3: p. 3430.

      (8) Raina, A., et al., Testis mediated gene transfer: in vitro transfection in goat testis by electroporation. Gene, 2015. 554(1): p. 96-100.

      (9) Michaelis, M., A. Sobczak, and J.M. Weitzel, In vivo microinjection and electroporation of mouse testis. J Vis Exp, 2014(90).

      (10) Wang, L., et al., Testis electroporation coupled with autophagy inhibitor to treat non-obstructive azoospermia. Mol Ther Nucleic Acids, 2022. 30: p. 451-464.

      (11) Wu, Q., et al., Translation affects mRNA stability in a codon-dependent manner in human cells. eLife, 2019. 8: p. e45396.

      (12) Gallie, D.R., The cap and poly(A) tail function synergistically to regulate mRNA translational efficiency. Genes & Development, 1991. 5(11): p. 2108-2116.

      (13) Henderson, J.M., et al., Cap 1 messenger RNA synthesis with co-transcriptional CleanCap® analog improves protein expression in mammalian cells. Nucleic Acids Research, 2021. 49(8): p. e42.

      (14) Stepinski, J., et al., Synthesis and properties of mRNAs containing novel “anti-reverse” cap analogs. RNA, 2001. 7(10): p. 1486-1495.

      (15) Sachs, A.B., P. Sarnow, and M.W. Hentze, Starting at the beginning, middle, and end: translation initiation in eukaryotes. Cell, 1997. 89(6): p. 831-838.

      (16) Presnyak, V., et al., Codon optimality is a major determinant of mRNA stability. Cell, 2015. 160(6): p. 1111-24.

      (17) Cao, D., et al., Unlock the sustained therapeutic efficacy of mRNA. J Control Release, 2025. 383: p. 113837.

      (18) Karikó, K., et al., Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol Ther, 2008. 16(11): p. 1833-40.

      (19) Pardi, N., et al., mRNA vaccines — a new era in vaccinology. Nature Reviews Drug Discovery, 2018. 17(4): p. 261-279.

      (20) Meistrich, M.L. and R.A. Hess, Assessment of Spermatogenesis Through Staging of Seminiferous Tubules, in Spermatogenesis: Methods and Protocols, D.T. Carrell and K.I. Aston, Editors. 2013, Humana Press: Totowa, NJ. p. 299-307.

      (21) Au - Mäkelä, J.-A., et al., JoVE, 2020(164): p. e61800.

      (22) Coutton, C., et al., Bi-allelic Mutations in ARMC2 Lead to Severe Astheno-Teratozoospermia Due to Sperm Flagellum Malformations in Humans and Mice. Am J Hum Genet, 2019. 104(2): p. 331-340.

      (23) Fernández-Gonzalez, R., et al., Long-term effects of mouse intracytoplasmic sperm injection with DNA-fragmented sperm on health and behavior of adult offspring. Biol Reprod, 2008. 78(4): p. 761-72.

    1. Author response:

      Reviewer #1 (Public Review):

      The heterogeneity within the neutrophil population is becoming clear. However, it was not clear if neutrophil progenitors are also heterogenous. Because neutrophils are short-lived, it is technically challenging to tackle the question. This study used a system to isolate and expand clonal neutrophil progenitors (granulocyte-monocyte progenitors; GMPs) to achieve molecular and functional profiling. In the study, transcriptional profiling was performed by RNAseq and ATACseq. Functional assays were performed ex vivo to examine phagocytosis, ROS production, NET formation, and neutrophil swarming using Candida albicans, as well as C. glabrata and C. auris. The strengths of this study include the use of the neutrophil clone system to track GMPs developing into neutrophils. The clone-based approach made it possible to evaluate the functions of multiple neutrophil subpopulations. Limitations of this study include the dependency on ex vivo approaches and the modest degree of heterogeneity within presented neutrophils. Nevertheless, the finding - the heterogeneity of neutrophils can be traced back to the GMP stage - is significant.

      Reviewer #2 (Public Review):

      The stated goal of the authors is to establish and characterize an experimental system to study neutrophil heterogeneity in a manner that allows for functional outcomes to be probed. To do so, they start with murine GMPs that are conditionally immortalized by ER-HoxB8 expression and make single-cell clonal populations to ask whether those GMPs or neutrophils derived by differentiating such clonal GMPs harbor heterogeneity. At a conceptual level, this is an innovative approach that could shed light on mechanisms of neutrophil heterogeneity that have been described in both health and disease. They perform bulk multi-omics and functional analyses of both the clonal GMPs and neutrophil-like cells, including transcriptional and epigenetic profiling. However, the major weakness of the study is that the authors do not provide rigorous or convincing data that the cells they derive are truly mature neutrophils. To the contrary, the neutrophil-like cells lack Ly6G expression and so the authors fall back on using CD11b as the primary marker for delineating neutrophils; however, CD11b is expressed by both myeloid progenitors and some premature and mature myeloid lineages that are not neutrophils. They acknowledge this shortcoming, but they make an assumption that Ly6G expression is the only way in which the cells they derive are different from primary neutrophils without presenting any evidence indicating such. The authors use only SCF during the maturation of ER-HoxB8 GMPs into leukocytes, rather than including other cytokines such as G-CSF (or use in vivo maturation) that could have better-induced differentiation and maturation into granulocytes/neutrophils.

      Thank you. Of note, reviewer #1 also commented on the question of including other cytokines during the neutrophil differentiation process. We have included our response to reviewer #1 below, which includes the use of GM-CSF and IL-4.

      “We have now demonstrated enhanced Ly6G expression with GM-CSF and IL-4 treatment in a new Supplementary Figure 1.

      GMPs were washed out of estradiol-containing media and placed in fresh media containing 10 ng/ml GM-CSF and/or 1 ng/ml IL-4 for four days. Cells were collected and stained with CD117 (APC), F4/80 (AlexaFluor 488), Ly6G (PE), and CD11b (BV421). Neutrophil clones were run in biological triplicates, and undifferentiated GMPs were included as a negative control.

      GMPs stain as CD117POS / F4/80NEG / Ly6GNEG / CD11bNEG, indicating they are immature. The clones removed from estradiol differentiate and lose their CD117 expression. The mature cells remain F4/80NEG, as expected for mature neutrophils.

      The addition of GM-CSF to the media led to a significant increase in the expression of Ly6G. The addition of both GM-CSF + IL-4 did not further increase the proportion of Ly6G+ cells, and we have altered our statement slightly in the main text to reflect this finding (line 139).”

      The authors did not use their transcriptional analyses to further establish that the cells they derive from ER-HoxB8 GMPs are similar/different from primary murine neutrophils. Unfortunately, this shortcoming means that all of the analyses of neutrophil-like cells derived from clonal GMPs may or may not represent the transcriptional, epigenetic, etc. profile of a true mature neutrophil.

      Thank you. The ER-Hoxb8 system has been well-characterized by many authors at the function and at the transcriptional level, confirming that the cells highly reflect that same gene expression pattern as mature neutrophils. This was actually recently reviewed by Lail et al. (Traffic, 2022, PMID: 36117140). In terms of our analysis, we used transcriptional profiling to examine heterogeneity between different single-cell clones and not to re-validate the similarity with primary neutrophils.

      It is also not rigorously addressed whether what they call PMNs derived from clonal GMPs are a transcriptionally uniform population or if they harbor heterogeneity within the bulk population.

      Thank you. The reviewer poses an interesting, albeit challenging, question of whether even a single GMP clone can differentiate and result in mature neutrophil heterogeneity. To address this would require single cell sequencing of the resulting cells which we did not perform. We relied on single cell subcloning of the immature granulocyte monocyte progenitors to ensure a genetically identical clonal population. This was then additional confirmed by the retroviral insertional analysis. These analyses confirmed the clonal nature of our starting population, from which we posed the question of as whether the neutrophils derived from these clonal GMPs resulted in mature cells with consistent functional heterogeneity, which was indeed the case.

      Overall, while conceptually intriguing and in pursuit of an experimental system that would be impactful for the field, the study as performed has critical flaws.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      In their study, the authors investigated the F. graminearum homologue of the Drosophila Misato-Like Protein DML1 for a function in secondary metabolism and sensitivity to fungicides.

      Strengths:

      Generally, the topic of the study is interesting and timely, and the manuscript is well written, albeit in some cases, details on methods or controls are missing.

      Weaknesses:

      However, a major problem I see is with the core result of the study, the decrease in the DON content associated with the deletion of FgDML1. Although some growth data are shown in Figure 6, indicating a severe growth defect, the DON production presented in Figure 3 is not related to biomass. Also, the method and conditions for measuring DON are not described. Consequently, it could well be concluded that the decreased amount of DON detected is simply due to decreased growth, and the specific DON production of the mutant remains more or less the same.

      To alleviate this concern, it is crucial to show the details on the DON measurement and growth conditions and to relate the biomass formation under the same conditions to the DON amount detected. Only then can a conclusion as to an altered production in the mutant strains be drawn.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions. The point to point responds to the reviewer’s comments are listed as following. Our method for DON quantification was based on the amount per unit of mycelium. After obtaining the absorbance value from the ELISA reaction, the concentration of DON was calculated according to a standard curve and a formula, then divided by the dry weight of the mycelium to obtain the DON content per unit of mycelium, with the results finally expressed in µg/g.

      (1) Line 139f

      ... FgDML1 is a critical positive regulator of virulence ....

      Clearly, the deletion of FgDML1 impacts virulence, but it is too much of a general effect to say it is a regulator. DML1 acts high up in the cascade, impacting numerous processes, one of which is virulence. Generally, it has to be considered that deletion of DML1 causes a severe growth defect, which in turn is likely to lead to a plethora of effects. Besides discussing this fact, please also revise the manuscript to avoid references to "direct effects" or "regulator".

      Thank you very much for your advice. Our method for determining the amount of DON is based on the amount of mycelium per unit. After obtaining the absorbance value through Elisa reaction, we calculate the concentration of DON toxin according to the established standard curve and formula. Then, we divide it by the dry weight of mycelium to obtain the DON toxin content per unit mycelium, and finally present the results in µg/g. In summary, we conclude that the decrease in DON production by ΔFgDML is not due to slower hyphal growth, but rather a decrease in the ability of unit hyphae to produce DON toxins compared to the wild type. Given the decrease in DON toxin synthesis caused by FgDML1 deficiency, we believe that using a regulator is reasonable.

      (2) Line 143

      Please define "toxin-producing conditions".

      Thank you very much for your advice. We have accurately defined the conditions for toxin-producing conditions in the manuscript' toxin-inducing conditions '(28°C, 145 ×g, 7 days incubation)' (in L163-164)

      (3) Line 149

      A brief intro on toxisomes should be provided in the introduction to better integrate this into the manuscript's results.

      Thank you very much for your advice. We have added corresponding content about toxin producing bodies in the introduction section 'The biosynthesis of DON entails a reorganization of the endoplasmic reticulum into a specialized compartment termed the "toxisome" (Tang et al., 2018). The assembly of the toxisome coincides with the aggregation of key biosynthetic enzymes, which in turn enhances the efficiency of DON production. Concurrently, this compartmentalization serves as a self-defense mechanism, protecting the fungus from the autotoxicity of TRI pathway intermediates (Boenisch et al., 2017). The proteins TRI1, TRI4, TRI14, and Hmr1 are confirmed constituents of this structure(Kistler and Broz, 2015; Menke et al., 2013).' (in L86-93)

      (4) Line 153

      DON production decreases by about 80 %, but not to 0. Consequently, DML1 is important, but NOT essential for DON production.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'FgDML1 is essential for the biosynthesis of the DON toxin. '(in L161)

      (5) Line 168ff

      Please provide a reference for FgDnm1 being critical for mitochondrial fission and state whether such an interaction has been shown in other organisms.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'FgDnm1 is a key dynamin-related protein mediating mitochondrial fission(Griffin et al., 2005; Kang et al., 2023), suggesting that FgDML1 may form a complex with FgDnm1 to regulate mitochondrial fission and fusion processes. To our knowledge, this is the first report documenting an interaction between DML1 and Dnm in any fungal species, including model organisms such as S. cerevisiae. This novel finding provides new insights into the molecular mechanisms underlying mitochondrial dynamics in filamentous fungi. '(in L277-283)

      (6) Line 178

      Please specify whether Complex III activity was related to biomass and provide a p-value or standard deviation for the value.

      Thank you very much for your question. The activity determination of complex III was completed using a complex III enzyme activity kit (Solarbio, Beijing, China) (Li, et al 2022; Wang, et al 2022). Take 0.1 g of standardized mycelium as the sample for the experiment. Given that the mycelium has been homogenized, we believe that there is no necessary correlation between the activity and biomass of complex III. And we also refined the specific measurement steps in the article. ' Briefly, 0.1 g of mycelia was homogenized with 1 mL of extraction buffer in an ice bath. The homogenate was centrifuged at 600 ×g for 10 min at 4°C. The resulting supernatant was then subjected to a second centrifugation at 11,100 ×g for 10 min at 4°C. The pellet was resuspended in 200 μL of extraction buffer and disrupted by ultrasonication (200 W, 5 s pulses with 10 s intervals, 15 cycles). Complex III enzyme activity was finally measured by adding the working solution as per the manufacturer's protocol. Each treatment group contains three biological replicates and three technical replicates. '(in L511-517)

      Li C, et al. Amino acid catabolism regulates hematopoietic stem cell proteostasis via a GCN2-eIF2 axis. Cell Stem Cell. 2022 Jul 7; 29(7):1119-1134.e7. doi: 10.1016/j.stem.2022.06.004. PMID: 35803229.

      Wang K, et al. Locally organised and activated Fth1hi neutrophils aggravate inflammation of acute lung injury in an IL-10-dependent manner. Nat Commun. 2022 Dec 13;13(1):7703. doi: 10.1038/s41467-022-35492-y. PMID: 36513690; PMCID: PMC9745290

      (7) Line 185

      Albeit this headline is a reasonable hypothesis, you actually did not show that the conformation is altered. Please reword accordingly.

      Please also add references for cyazofamid acting on the QI site versus other fungicides acting on the QO site.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'Overexpression of FgQCR2, FgQCR8, and FgQCR9 may alters the conformation of the QI site, resulting in reduced sensitivity to cyazofamid. '(in L212-213). For fungicides targeting Qi and QO sites, we have added corresponding descriptions in the respective sections 'Numerous fungicides have been developed to inhibit the Qo site (e.g., pyraclostrobin, azoxystrobin)(Nuwamanya et al., 2022; Peng et al., 2022) and the Qi site (e.g., cyazofamid)(Mitani et al., 2001) of the cytochrome bc1 complex. '(in L327-329)

      (8) Line 200

      This section on growth should be moved up right after introducing the mutant strain.

      Thank you very much for your advice. We have advanced the part of nutritional growth and sexual asexual development before DON toxin to promote better reading and understanding. We arranged the sequence in the previous way to emphasize the new discovery between mitochondria and DON toxin. We found a significant decrease in DON toxin in ΔFgDML1, defects in the formation of toxin producing bodies, and downregulation of FgTRis at both the gene and protein levels. In summary, we believe that the absence of FgDML1 does indeed lead to a decrease in the content of DON toxin, and FgDML1 plays a regulatory role in the synthesis of DON toxin. In addition, our measurements of DON toxin, acetyl CoA, ATP and other indicators are all based on the amount per unit hyphae, excluding differences caused by hyphal biomass or growth. We have further refined the materials and methods to facilitate better reading and understanding.

      (9) Line 203

      "... significantly reduced growth rates ..."

      This is not what was measured here. Figure 6A shows a plate assay that can be used to assess hyphal extension. In the figure, it is also visible that the mycelium of the deletion mutant is much denser, maybe due to increased hyphal branching. Please reword.

      Additionally, it is important to include a biomass measurement here under the conditions used for DON assessment. Hyphal extension measurements cannot be used instead of biomass.

      Thank you very much for your advice. We have made changes to the wording of the corresponding sections based on your suggestions. 'The ΔFgDML1 strain displayed a distinct growth phenotype characterized by retardation in radial growth and the formation of more compact, denser hyphal networks on all tested media compared to the PH-1 and ΔFgDML-C strains. '(in L136-138).

      (10) Line 217

      Please include information on how long the cultures were monitored. Given the very slow growth of the mutant, perithecia formation may be considerably delayed beyond 14 days.

      Thank you very much for your advice. Based on your suggestion, we have extended the incubation time for sexual reproduction to 21 days to more accurately evaluate its sexual reproduction ability. Our results show that even after 21 days, Δ FgDML1 still cannot produce ascospores and ascospores, which proves that the absence of FgDML1 does indeed cause sexual reproduction defects in F. graminearum.

      Author response image 1.

      Discussion

      (11) Please mention your summary Figure 8 early on in the discussion, and explain conclusions with this figure in mind. Please avoid repetition of the results section as much as possible.

      Also, please state clearly what was already known from previous research and is in agreement with your results, and what is new (in fungi or generally).

      Thank you very much for your advice. Based on your suggestion, we mentioned Fig8 earlier in the first half of the discussion and provided guidance for the following text. We also conducted a more comprehensive discussion by analyzing our research results and comparing them with previous studies. 'Our study defines a novel mechanism through which FgDML1 governs mitochondrial homeostasis. We demonstrate that FgDML1 directly interacts with the key mitochondrial fission regulator FgDnm1 and positively modulates cellular bioenergetic metabolism, as evidenced by elevated ATP and acetyl-CoA levels (Fig. 8). '(in L250-253). 'The Misato/DML1 protein family is evolutionarily conserved from yeast to humans and plays a critical role in mitochondrial regulation. In S. cerevisiae, DML1 is an essential gene; its deletion is lethal, while its overexpression results in fragmented mitochondrial networks and aberrant cellular morphology, underscoring its necessity for normal mitochondrial function (Gurvitz et al., 2002). Similarly, in Homo sapiens, the homolog Misato localizes to the mitochondrial outer membrane, and both its depletion and overexpression are sufficient to disrupt mitochondrial morphology and distribution (Kimura and Okano, 2007). '(in L241-244).

      (12) Line 262ff

      Please specify if this interaction was shown previously in other organisms and provide references.

      Thank you very much for your advice. We have clearly stated in the corresponding section that the interaction between FgDML and FgDnm is the first reported, and to our knowledge, no relevant reports have been found in other species so far. ' Notably, FgDML1 was found to interact with FgDnm1 (Fig. 5E), FgDnm1 is a key dynamin-related protein mediating mitochondrial fission(Griffin et al., 2005; Kang et al., 2023), suggesting that FgDML1 may form a complex with FgDnm1 to regulate mitochondrial fission and fusion processes. To our knowledge, this is the first report documenting an interaction between DML1 and Dnm in any fungal species, including model organisms such as S. cerevisiae. This novel finding provides new insights into the molecular mechanisms underlying mitochondrial dynamics in filamentous fungi. '(in L276-283)

      (13) Line 287ff

      There is no result that would justify this speculation. Please remove.

      Thank you very much for your advice. We have modified the corresponding wording in the corresponding section. 'In conclusion, our findings suggest that the overexpression of assembly factors FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 potentially modifies the conformation of the Qi site, which specifically modulates the sensitivity of F. graminearum to cyazofamid. '(in L352-355)

      Materials and methods

      (14) A table with all primer sequences used in the study and their purpose is missing. For every experiment, the number of technical and biological replicates needs to be stated.

      Thank you very much for your advice. We have presented all the primers used in this study in Supplementary Table 1 (in Table S1) .We added the number of technical and biological replicates in the material and method descriptions for each experiment. 'For each sample, a total of 200 conidia were counted. The experiment included three biological replicates with three technical replicates each.'(in L434-436). 'Each treatment group contains three biological replicates. '(in L444-445). 'Each treatment group contains three biological replicates and three technical replicates. ' (in L463-464). 'Each treatment group contains three biological replicates and three technical replicates. '(in L474-475). 'Each treatment group contains three biological replicates. '(in L483). 'Each treatment group contains three biological replicates and three technical replicates.'(in L501-502). 'Each treatment group contains three biological replicates and three technical replicates. '(in L516-517). 'The experiment was independently repeated three times. '(in L533-534).

      (15) Line 369ff

      Please provide final concentrations used for assays here.

      Thank you very much for your advice. The final concentration has been displayed in the Figure (in Fig6. A, B) (in Fig. S3). And we have provided supplementary Table 2 to reflect the concentration in a more intuitive way.(in Table. S2)

      (16) Line 383

      Please provide a reference or data on the use of F2du for transformant selection and explain the abbreviation.

      Thank you very much for your advice. Based on your suggestion, we have provided the full name and references of F2du. 'Transformants were selected on PDA plates containing either 100 μg/mL Hygromycin B (Yeasen, Shanghai, China) or 0.2 μmol/mL 5-Fluorouracil 2'-deoxyriboside (F2du) (Solarbio, Beijing, China)(Zhao et al., 2022). '(in L405-407).

      (17) Line 407

      Please provide a reference for the method and at least a brief description.

      Thank you very much for your advice. Based on your suggestion, we have added references and provided a brief introduction to the method. 'As previously described (Tang et al., 2020; Wang et al., 2025), Specifically, coleoptiles were inoculated with conidial suspensions and incubated for 14 days, while leaves were inoculated with fresh mycelial plugs and incubated for 5 days, followed by observation and quantification of disease symptoms. DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018). '(in L466-471)

      (18) Line 414ff

      Also, here, the amount of biomass has to be considered for the measurement to be able to distinguish if actually less of the compounds were produced or if the effect seen was merely due to an altered amount of biomass present.

      Thank you very much for your advice. We believe that biomass is not within the scope of our measurement indicators, as we have measured and calculated based on unit hyphae. Therefore, we have ruled out experimental bias caused by a decrease in biomass.

      RNA and RT-qPCR

      (19) Line 461

      When the strains were transferred to AEA medium, was the biomass measured, at least wet weight, and in which culture volume was it done? It makes a big difference if the amount of (wet) biomass dilutes a small amount of fungicide-containing culture or if biomass is added in at least roughly equal amounts in sufficient growth medium to ensure equal conditions.

      Thank you very much for your question. Our sample processing controlled the wet weight of the samples before dosing, ensuring that the wet weight of the mycelium obtained from each sample before dosing was 0.2g. The mycelium was obtained through AEA with a volume of 100mL. This ensured consistency in the initial biomass between groups before dosing, and also ensured the accuracy of the drug concentration.

      (20) Line 466

      Please provide the name and supplier of the kit.

      Thank you very much for your advice. We have added corresponding content in the corresponding location. 'Mycelium was collected and total RNA was extracted following the instructions provided by the Total RNA Extraction Kit (Tiangen, Beijing, China).' (in L523-524).

      (21) All primer sequences must be provided in a table.

      Thank you very much for your advice. We have presented all the primers used in this study in Supplementary Table 1. (in Table S1).

      (22) For RT qPCR it is essential to check the RNA quality to be sure that the obtained results are not artifacts due to varying quality, which may exceed differences. Please state how quality control was done and which threshold was applied for high-quality RNA to be used in RTqPCR (like RIN factor, etc).

      Thank you very much for your question. We performed stringent quality control on the extracted total RNA. First, a micro-spectrophotometer was used to measure RNA concentration and purity, confirming that the A260/A280 ratio was between 1.8 and 2.0 and the A260/A230 ratio was greater than 2.0, indicating good RNA purity without significant protein or organic solvent contamination.Subsequently, verification by agarose gel electrophoresis revealed distinct 28S and 18S rRNA bands, demonstrating good RNA integrity and absence of degradation.

      Author response image 2.

      (B): Minor Comments:

      (1) Please increase the font size of the labels and annotations of the figures; it is hard to read as it is now.

      Thank you very much for your advice. We have increased the size of annotations or numerical labels in the corresponding images for better reading.

      (2) Throughout the manuscript: Please check that all abbreviations are explained at first use.

      Thank you very much for your advice. We have checked the entire text to ensure that abbreviations have their full names when they first appear.

      (3) I do hope that the authors can clarify all concerns and provide an amended manuscript of this interesting story.

      Thank you very much for your advice. Sincerely thank you for your suggestions and questions, which have been very helpful to us.

      Reviewer #2:

      The manuscript entitled "Mitochondrial Protein FgDML1 Regulates DON Toxin Biosynthesis and Cyazofamid Sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" identified the regulatory effect of FgDML1 in DON toxin biosynthesis and sensitivity of Fusarium graminearum to cyazofamid. The manuscript provides a theoretical framework for understanding the regulatory mechanisms of DON toxin biosynthesis in F. graminearum and identifies potential molecular targets for Fusarium head blight control. The paper is innovative, but there are issues in the writing that need to be addressed and corrected.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions with red words. In the response comments, to highlight the specific positions of the revised parts in the manuscript with red line number. The point to point responds to the reviewer’s comments are listed as following.

      Weaknesses:

      (1) The authors speculate that cyazofamid treatment caused upregulation of the assembly factors, leading to a change in the conformation of the Qi protein, thus restoring the enzyme activity of complex III. But no speculation was given in the discussion as to why this would lead to the upregulation of assembly factors, and how the upregulation of assembly factors would change the protein conformation, and is there any literature reporting a similar phenomenon? I would suggest adding this to the discussion.

      Thank you very much for your advice. Based on your suggestion, we have added content related to the assembly factor of complex III in the discussion section and made modifications to the corresponding wording. 'Previous studies have reported that mutations in the Complex III assembly factors TTC19, UQCC2, and UQCC3 impair the assembly and activity of Complex III (Feichtinger et al., 2017; Wanschers et al., 2014). '(in L345-347). 'In conclusion, our findings suggest that the overexpression of assembly factors FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 potentially modifies the conformation of the Qi site, which specifically modulates the sensitivity of F. graminearum to cyazofamid. '(in L352-355).

      (2) Would increased sensitivity of the mutant to cell wall stress be responsible for the excessive curvature of the mycelium?

      Thank you very much for your question. We believe that the sensitivity of ΔFgDML1 to osmotic stress is reduced, which may not be related to hyphal bending, as shown in the Author response image 3. During the conidia stage, ΔFgDML1 cannot germinate in YEPD, while the application of 1M Sorbitol promotes its germination. But it is caused by internal unknown mechanisms, which is also the focus of our future research.

      Author response image 3.

      (3) The vertical coordinates of Figure 7B need to be modified with positive inhibition rates for the mutants.

      Thank you very much for your advice. The display in Figure 7B truly reflects its inhibition rate. In the Δ FgDML1 mutant, when subjected to osmotic stress treatment, the inhibition rate becomes negative, indicating that the colony growth is greater than that of the CK. Therefore, the negative inhibition rate is shown in Figure 7B.

      (1) In Figure 1B, Figure 3C, and Figure 6C, the scale below the picture is not clear. In Figure 5D, the histogram is unclear, and it is recommended to redraw the graph.

      Thank you very much for your advice. The issue with the above images may be due to Word compression. We have changed the settings and enlarged the images as much as possible to better display them.

      (2) The full Latin name of the strain should be used in the title of figures and tables.

      Thank you very much for your advice. Based on your suggestion, we have used the full names of the strains appearing in the title of figures and tables.

      (3) Proteins in line 117 should be abbreviated.

      Thank you very much for your advice. Based on your suggestion, we have abbreviated the corresponding positions. 'The DML1 protein from S. cerevisiae was used as a query for a BLAST search against the Fusarium genome database, resulting in the identification of the putative DML1 gene FgDML1 (FGSG_05390) in F. graminearum. '(in L118-120).

      (4) The sentence in lines 187-189, which is supposed to introduce why the test is sensitive to the three drugs, is currently illogical.

      Thank you very much for your advice. Based on your suggestion, we have made modifications to the corresponding sections. 'Since Complex III is involved in the action of both cyazofamid (targeting the QI site) and pyraclostrobin (targeting the QO site), the sensitivity of ΔFgDML1 to cyazofamid and pyraclostrobin was investigated. ' (in L214-216).

      (5) The expression of FgQCR2, FgQCR7, and FgQCR8 was significantly upregulated in ΔFgDML1 at transcription levels. Do FgQCR2, FgQCR8, and FgQCR9 show upregulated expression at the protein level?

      Thank you very much for your question. Based on your suggestion, we evaluated the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in PH-1 and ΔFgDML1, and we found that the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 were higher than those in PH-1. (in Fig. 6F).

      (6) In Figure 7B, it is recommended to adjust the position of the horizontal axis labels in the histogram.

      Thank you very much for your advice. Based on your suggestion, we have made modifications to the corresponding sections.(in Fig. 7B)

      (7) There are numerous errors in the writing of gene names in the text. Please check the full text and change the writing of gene names and mutant names to italic.

      Thank you very much for your advice. We have checked the entire text to ensure that all genes have been italicized.

      (8) All acronyms should be spelled out in figure and table captions. e.g., F. graminearum.

      Thank you very much for your advice. Based on your suggestion, we have used the full names of the strains appearing in the title of figures and tables.

      (9) In line 492, P should be lowercase and italic.

      Thank you very much for your advice. Based on your suggestion, we have made adjustments to the corresponding content.

      Reviewer #3:

      Summary:

      The manuscript "Mitochondrial 1 protein FgDML1 regulates DON toxin biosynthesis and cyazofamid sensitivity in Fusarium graminearum by affecting mitochondrial homeostasis" describes the construction of a null mutant for the FgDML1 gene in F. graminearum and assays characterising the effects of this mutation on the pathogen's infection process and lifecycle. While FgDML1 remains underexplored with an unclear role in the biology of filamentous fungi, and although the authors performed several experiments, there are fundamental issues with the experimental design and execution, and interpretation of the results.

      Strengths:

      FgDML1 is an interesting target, and there are novel aspects in this manuscript. Studies in other organisms have shown that this protein plays important roles in mitochondrial DNA (mtDNA) inheritance, mitochondrial compartmentalisation, chromosome segregation, mitochondrial distribution, mitochondrial fusion, and overall mitochondrial dynamics. Indeed, in Saccharomyces cerevisiae, the mutation is lethal. The authors have carried out multi-faceted experiments to characterise the mutants.

      Weaknesses:

      However, I have concerns about how the study was conceived. Given the fundamental importance of mitochondrial function in eukaryotic cells and how the absence of this protein impacts these processes, it is unsurprising that deletion of this gene in F. graminearum profoundly affects fungal biology. Therefore, it is misleading to claim a direct link between FgDML1 and DON toxin biosynthesis (and virulence), as the observed effects are likely indirect consequences of compromised mitochondrial function. In fact, it is reasonable to assume that the production of all secondary metabolites is affected to some extent in the mutant strains and that such a strain would not be competitive at all under non-laboratory conditions. The order in which the authors present the results can be misleading, too. The results on vegetative growth rate appeared much later in the manuscript, which should have come first, as the FgDML1 mutant exhibited significant growth defects, and subsequent results should be discussed in that context. Moreover, the methodologies are not described properly, making the manuscript hard to follow and difficult to replicate.

      We appreciate it very much that you spent much time on my paper and give me good suggestions, we tried our best to revise the manuscript. I have revised my manuscript according to your suggestions with red words. In the response comments, to highlight the specific positions of the revised parts in the manuscript with red line number. The point to point responds to the reviewer’s comments are listed as following.

      For weaknesses,we arranged the sequence in this way to emphasize the novel discovery between mitochondria and DON toxin. We found a significant decrease in DON toxin in Δ FgDML1, defects in the formation of toxin producing bodies, and downregulation of FgTRis at both the gene and protein levels. In summary, we believe that the absence of FgDML1 does indeed lead to a decrease in the content of DON toxin, and FgDML1 plays a regulatory role in the synthesis of DON toxin. In addition, our measurements of DON toxin, acetyl CoA, ATP and other indicators are all based on the amount per unit hyphae, excluding differences caused by hyphal biomass or growth. We have further refined the materials and methods to facilitate better reading and understanding.

      (1) Lines 37-39: The disease itself does not produce toxins; it is the fungus that causes the disease that produces toxins. Moreover, the disease symptoms observed are likely caused by the toxins produced by the fungus.

      Thank you very much for your advice. We have made modifications to the wording of the corresponding sections. 'Studies have shown that increased DON levels are positively correlated with the pathogenicity rate of F. graminearum.'(in L36-37).

      (2) Lines 82-87: While it is challenging to summarise the role of ATP in just a few words, this section needs improvement for clarity and accuracy. Additionally, I do not believe that drawing a direct link between mitochondrial defects and toxin production is an appropriate strategy in this case.

      Thank you very much for your advice. Based on your suggestion, we have added corresponding descriptions in the corresponding positions to provide more information on the relationship between ATP and toxins, in order to better prepare for the following text. 'Pathogen-intrinsic ATP homeostasis is recognized as a critical, rate-limiting determinant for toxin biosynthesis. Previous studies indicate that dual-target inhibition of ATP synthase (AtpA) and adenine deaminase (Ade) by a specific small-molecule probe effectively depletes intracellular ATP, consequently suppressing the synthesis of key virulence factors TcdA and TcdB transcriptionally and translationally(Marreddy et al., 2024). The systemic toxicity of Anthrax Edema Toxin (ET) is primarily attributed to its catalytic activity, which depletes the host cell's ATP reservoir, thereby triggering a bioenergetic collapse that culminates in cell lysis and death(Liu et al., 2025). '(in L78-86).

      (3) Lines 125-126: The manuscript does not clearly describe how subcellular localisation was determined. This methodology needs to be properly detailed.

      Thank you very much for your advice. The subcellular localization was validated through co-localization analysis with MitoTracker Red CMXRos, a mitochondrial-specific dye. The observed overlap between the FgDML1-GFP signal and the mitochondrial marker confirmed mitochondrial localization. Based on these results, we determined that FgDML1 is definitively localized to the mitochondria.We have incorporated this description in the appropriate section of the manuscript. 'Furthermore, subcellular localization studies confirmed that FgDML1 localizes to mitochondria, as demonstrated by colocalization with a mitochondria-specific dye MitoTracker Red CMXRos (Fig. 1B). '(in L125-127).

      (4) Regarding the organisation of the Results section, it needs to be revised. While I understand the authors' intention to emphasise the impact on virulence, the results showing how FgDML1 deletion affects vegetative growth, asexual and sexual reproduction, and sensitivity to stressors should be presented before the virulence assays and effects on DON production. Additionally, the authors do not provide any clear evidence that FgDML1 directly interacts with proteins involved in asexual or sexual reproduction, stress responses, or virulence. Therefore, it is misleading to suggest that FgDML1 directly regulates these processes. The observed phenotypes are, rather, a consequence of severely impaired mitochondrial function. Without functional mitochondria, the cell cannot operate properly, leading to widespread physiological defects. In this regard, statements such as those in lines 139-140 and 343-344 are misleading.

      Thank you very much for your advice. We have adjusted the order of the images based on your suggestion, placing the characterization of ΔFgDML1 in nutritional growth, sexual reproduction, and other aspects before DON toxin. And we have made adjustments to the corresponding statements. 'These findings demonstrate that FgDML1 is a positive regulator of virulence in F. graminearum. '(in L140-141).

      (5) Lines 185-186: The authors do not provide sufficient evidence to support the claim that FgQCR2, FgQCR8, and FgQCR9 overexpression is the main cause of reduced cyazofamid sensitivity. Although expression of these genes is altered, reduced sensitivity may result from changes in other proteins or pathways. To strengthen this claim, overexpression of FgQCR2, 8, and 9 in the wild-type background, followed by assessment of cyazofamid resistance, would be necessary. As it stands, there is no support for the claim presented in lines 329-332.

      Thank you very much for your advice. To establish a causal link between the overexpression of FgQCR2, FgQCR7, and FgQCR8 and the observed reduction in cyazofamid sensitivity, we first quantified the protein levels of these assembly factor. Western blot analysis confirmed their elevated expression in the ΔFgDML1 mutant compared to the wild-type PH-1. We further generated individual overexpression strains for FgQCR2, FgQCR7, and FgQCR8 in the wild-type PH-1 background. Fungicide sensitivity assays revealed that all three overexpression mutants displayed significantly reduced sensitivity to cyazofamid compared to the parental strain. These genetic complementation experiments confirm that upregulation of FgQCR2, FgQCR7, and FgQCR8 is sufficient to confer reduced cyazofamid sensitivity.We have incorporated these explanations and provided supporting images in the appropriate section of the manuscript. 'To further clarify whether the upregulated expression of FgQCR2, FgQCR7, and FgQCR8 genes affects their protein expression levels, we measured the protein levels. The results showed that the protein expression levels of FgQCR2, FgQCR7, and FgQCR8 in ΔFgDML1 were higher than those in PH-1(Fig. 6F). Subsequently, we overexpressed FgQCR2, FgQCR7, and FgQCR8 in the wild-type background, and the corresponding overexpression mutants exhibited reduced sensitivity to cyazofamid(Fig. 6E). '(in L205-211)(in Fig. 6E, F)

      (6) Lines 187-190: This segment is confusing and difficult to follow. It requires rewriting for clarity.

      Thank you very much for your advice. Based on your suggestion, we have made corresponding modifications in the corresponding locations. 'Since Complex III is involved in the action of both cyazofamid (targeting the QI site) and pyraclostrobin (targeting the QO site), the sensitivity of ΔFgDML1 to cyazofamid and pyraclostrobin was investigated. ''(in L214-216)

      (7) Lines 345-346: The authors state that in this study, FgDML1 is localised in mitochondria, which implies that in other studies, its localisation was different. Is this accurate? Clarification is needed.

      Thank you very much for your question. In previous studies, the localization of this protein was not clearly defined, and its function was only emphasized to be related to mitochondria. Whether in yeast or in Drosophila melanogaster. (Miklos et al., 1997; Gurvitz et al., 2002)

      Miklos GLG, Yamamoto M-T, Burns RG, Maleszka R. 1997. An essential cell division gene of drosophila, absent from saccharomyces, encodes an unusual protein with  tubulin-like and myosin-like peptide motifs. Proc Natl Acad Sci 94:5189–5194. doi:10.1073/pnas.94.10.5189

      Gurvitz A, Hartig A, Ruis H, Hamilton B, de Couet HG. 2002. Preliminary characterisation of DML1, an essential saccharomyces cerevisiae gene related to misato of drosophila melanogaster. FEMS Yeast Res 2:123–135. doi:10.1016/S1567-1356(02)00083-1

      Material and Methods Section

      (8) In general, the methods require more detailed descriptions, including the brands and catalog numbers of reagents and kits used. Simply stating that procedures were performed according to manufacturers' instructions is insufficient, particularly when the specific brand or kit is not identified.

      Thank you very much for your advice. We have added corresponding content based on your suggestion to more comprehensively display the reagent brand and complete product name. 'Transformants were selected on PDA plates containing either 100 μg/mL Hygromycin B (Yeasen, Shanghai, China) or 0.2 μmol/mL 5-Fluorouracil 2'-deoxyriboside (F2du) (Solarbio, Beijing, China)(Zhao et al., 2022). ' (in L405-407). 'DON toxin was measured using a Wise Science ELISA-based kit (Wise Science, Jiangsu, China) (Li et al., 2019; Zheng et al., 2018) '. (in L469-471)

      (9) Line 364: What do CM and MM stand for? Please define.

      Thank you very much for your advice. Based on your suggestion, we have made modifications in the corresponding locations. 'To evaluate vegetative growth, complete medium (CM), minimal medium (MM), and V8 Juice Agar (V8) media were prepared as described previously(Tang et al., 2020). '(in L385-387)

      Generation of Deletion and Complemented Mutants:

      (10) This section lacks detail. For example, were PCR products used directly for PEG-mediated transformation, or were the fragments cloned into a plasmid?

      Thank you very much for your question. We directly use the fused fragments for protoplast transformation after sequencing confirmation. We have clearly defined the fragment form used for transformation at the corresponding location. 'The resulting fusion fragment was transformed into the wild-type F. graminearum PH-1 strain via polyethylene glycol (PEG)-mediated protoplast transformation. '(in L403-405).

      (11) PCR and Southern blot validation results should be included as supplementary material, along with clear interpretations of these results.

      Thank you very much for your advice. In the supplementary material we submitted, Supplementary Figure 2 already includes the results of PCR and Southern blot validation.(in Fig. S2)

      (12) There is almost no description of how the mutants mentioned in lines 388-390 were generated.

      Thank you very much for your advice. Based on your suggestions, we have added relevant content in the appropriate sections to more comprehensively and clearly reflect the experimental process. 'Specifically, FgDML1, including its native promoter region and open reading frame (ORF) (excluding the stop codon), was amplified.The PCR product was then fused with the XhoI -digested pYF11 vector. After transformation into E. coli and sequence verification, the plasmid was extracted and subsequently introduced into PH-1 protoplasts. For FgDnm1-3×Flag, the 3×Flag tag was added to the C-terminus of FgDnm1 by PCR, fused with the hygromycin resistance gene and the FgDnm1 downstream arm, and then introduced into PH-1 protoplasts. The overexpression mutant was constructed according to a previously described method. Specifically, the ORF of FgDML1 was amplified and the PCR product was ligated into the SacII-digested pSXS overexpression vector. The resulting plasmid was then transformed into PH-1 protoplasts (Shi et al., 2023). For the construction of PH-1::FgTri1+GFP and ΔFgDML1::FgTri1+GFP, the ORF of FgTri1 was amplified and ligated into the XhoI-digested pYF11 vector as described above. The resulting vectors were then transformed into protoplasts of PH-1 or ΔFgDML1, respectively.'(in L413-426).

      Vegetative Growth and Conidiation Assays:

      (13) There is no information about how long the plates were incubated before photos were taken. Judging by the images, it appears that different incubation times may have been used.

      Thank you very much for your advice. Due to the slower growth of ΔFgDML1, we adopted different incubation periods and have supplemented the relevant content in the corresponding section. 'All strains were incubated at 25°C in darkness; however, due to ΔFgDML1 slower growth, the ΔFgDML1 mutant required a 5-day incubation period compared to the 3 days used for PH-1 and ΔFgDML1-C. '(in L490-493).

      (14) There is no description of the MBL medium.

      Thank you very much for your advice. Based on your suggestion, we have supplemented the corresponding content in the corresponding positions. 'Mung bean liquid (MBL) medium was used for conidial production, while carrot agar (CA) medium was utilized to assess sexual reproduction(Wang et al., 2011). '(in L387-389).

      DON Production and Pathogenicity Assays:

      (15) Were DON levels normalised to mycelial biomass? The vegetative growth assays show that FgDML1 null mutants exhibit reduced growth on all tested media. If mutant and wild-type strains were incubated for the same period under the same conditions, it is reasonable to assume that the mutants accumulated significantly less biomass. Therefore, results related to DON production, as well as acetyl-CoA and ATP levels, must be normalised to biomass.

      Thank you very much for your question. We have taken into account the differences in mycelial biomass. Therefore, when measuring DON, acetyl-CoA, and ATP levels, all data were normalized to mycelial mass and calculated as amounts per unit of mycelium, thereby avoiding discrepancies arising from variations in biomass.

      Sensitivity Assays:

      (16) While the authors mention that gradient concentrations were used, the specific concentrations and ranges are not provided. Importantly, have the plates shown in Figure 5 been grown for different periods or lengths? Given the significantly reduced growth rate shown in Figure 6A, the mutants should not have grown to the same size as the WT (PH-1) as shown in Figures 5A and 5B unless the pictures have been taken on different days. This needs to be explained.

      Thank you very much for your question. Due to the slower growth of ΔFgDML1, we adopted different incubation periods and have supplemented the relevant content in the corresponding section. 'All strains were incubated at 25°C in darkness; however, due to ΔFgDML1 slower growth, the ΔFgDML1 mutant required a 5-day incubation period compared to the 3 days used for PH-1 and ΔFgDML1-C. '(in L490-493).

      (17) Additionally, was inhibition measured similarly for both stress agents and fungicides? This should be clarified.

      Thank you very much for your question. We have supplemented the specific concentration gradient of fungicides. 'The concentration gradients for each fungicide in the sensitivity assays were set up according to Supplementary Table S2. '(in L493-494)(in Table. S2).

      Complex III Enzyme Activity:

      (18) A more detailed description of how this assay was performed is needed.

      Thank you very much for your advice. We have provided further detailed descriptions of the corresponding sections. 'Briefly, 0.1 g of mycelia was homogenized with 1 mL of extraction buffer in an ice bath. The homogenate was centrifuged at 600 ×g for 10 min at 4°C. The resulting supernatant was then subjected to a second centrifugation at 11,000 ×g for 10 min at 4°C. The pellet was resuspended in 200 μL of extraction buffer and disrupted by ultrasonication (200 W, 5 s pulses with 10 s intervals, 15 cycles). Complex III enzyme activity was finally measured by adding the working solution as per the manufacturer's protocol. '(in L511-517)

      (19) Were protein concentrations standardised prior to the assay?

      Thank you very much for your question. Protein concentrations for all Western blot samples were quantified using a BCA assay kit to ensure equal loading.

      (20) Line 448: Are ΔFgDML1::Tri1+GFP and ΔFgDML1+GFP the same strain? ΔFgDML1::Tri1+GFP has not been previously described.

      Thank you very much for your question. These two strains are not the same strain, and we have supplemented their construction process in the corresponding section. 'For the construction of PH-1::FgTri1+GFP and ΔFgDML1::FgTri1+GFP, the ORF of FgTri1 was amplified and ligated into the XhoI-digested pYF11 vector as described above. The resulting vectors were then transformed into protoplasts of PH-1 or ΔFgDML1, respectively. '(in L423-426)

      (21) Lines 460 and 468: Please adopt a consistent nomenclature, either RT-qPCR or qRT-PCR.

      Thank you very much for your advice. We have unified it and modified the corresponding content in the corresponding sections. 'Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) was carried out using the QuantStudio 6 Flex real-time PCR system (Thermo, Fisher Scientific, USA) to assess the relative expression of three subunits of Complex III (FgCytb, FgCytc1, FgISP), five assembly factors (FgQCR2, FgQCR6, FgQCR7, FgQCR8, FgQCR9), and DON biosynthesis-related genes (FgTri5 and FgTri6). '(in L526-531)

      (22) Lines 472-473: Why was FgCox1 used as a reference for FgCytb? Clarification is needed.

      Thank you very much for your question. FgCytb (cytochrome b) and FgCOX1 (cytochrome c oxidase subunit I) are both encoded by the mitochondrial genome and serve as core components of the oxidative phosphorylation system (Complex III and Complex IV, respectively). Their transcription is co-regulated by mitochondrial-specific mechanisms in response to cellular energy status. Consequently, under experimental conditions that perturb energy homeostasis, FgCOX1 expression exhibits relative, context-dependent stability with FgCytb, or at least co-varies directionally, making it a superior reference for normalizing target gene expression. In contrast, FgGapdh operates within a distinct genetic and regulatory system. Using FgCOX1 ensures that both reference and target genes reside within the same mitochondrial compartment and functional module, thereby preventing normalization artifacts arising from independent variation across disparate pathways.

      (23) Lines 476-477: This step requires a clearer and more detailed explanation.

      Thank you very much for your advice. We provided detailed descriptions of them in their respective positions. 'For FgDnm1-3×Flag, the 3×Flag tag was added to the C-terminus of FgDnm1 by PCR, fused with the hygromycin resistance gene and the FgDnm1 downstream arm, and then introduced into PH-1 protoplasts. '(in L417-419). 'The FgDnm1-3×Flag fragment was introduced into PH-1 and FgDML1+GFP protoplasts, respectively, to obtain single-tagged and double-tagged strains. '(in L541-543)

      Western blotting:

      (24) Uncropped Western blot images should be provided as supplementary material.

      Thank you very much for your advice. All Western blot images will be submitted to the supplementary material package.

      (25) Lines 485-489: A more thorough description of the antibodies used (including source, catalogue number, and dilution) is necessary.

      Thank you very much for your advice. The antibodies used are clearly stated in terms of brand, catalog number, and dilution. We have added the dilution ratio. 'All antibodies were diluted as follows: primary antibodies at 1:1000 and secondary antibodies at 1:10000. '(in L550-551)

      (26) The Western blot shown in Figure 3D appears problematic, particularly the anti-GAPDH band for FgDML1::FgTri1+GFP. Are both anti-GAPDH bands derived from the same gel?

      Thank you very much for your advice. We are unequivocally certain that these data derive from the same gel. Therefore, we are providing the original image for your inspection.

      Author response image 4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) I have to admit that it took a few hours of intense work to understand this paper and to even figure out where the authors were coming from. The problem setting, nomenclature, and simulation methods presented in this paper do not conform to the notation common in the field, are often contradictory, and are usually hard to understand. Most importantly, the problem that the paper is trying to solve seems to me to be quite specific to the particular memory study in question, and is very different from the normal setting of model-comparative RSA that I (and I think other readers) may be more familiar with.

      We have revised the paper for clarity at all levels: motivation, application, and parameterization. We clarify that there is a large unmet need for using RSA in a trial-wise manner, and that this approach indeed offers benefits to any team interested in decoding trial-wise representational information linked to a behavioral responses, and as such is not a problem specific to a single memory study.

      (2) The definition of "classical RSA" that the authors are using is very narrow. The group around Niko Kriegeskorte has developed RSA over the last 10 years, addressing many of the perceived limitations of the technique. For example, cross-validated distance measures (Walther et al. 2016; Nili et al. 2014; Diedrichsen et al. 2021) effectively deal with an uneven number of trials per condition and unequal amounts of measurement noise across trials. Different RDM comparators (Diedrichsen et al. 2021) and statistical methods for generalization across stimuli (Schütt et al. 2023) have been developed, addressing shortcomings in sensitivity. Finally, both a Bayesian variant of RSA (Pattern component modelling, (Diedrichsen, Yokoi, and Arbuckle 2018) and an encoding model (Naselaris et al. 2011) can effectively deal with continuous variables or features across time points or trials in a framework that is very related to RSA (Diedrichsen and Kriegeskorte 2017). The author may not consider these newer developments to be classical, but they are in common use and certainly provide the solution to the problems raised in this paper in the setting of model-comparative RSA in which there is more than one repetition per stimulus.

      We appreciate the summary of relevant literature and have included a revised Introduction to address this bounty of relevant work. While much is owed to these authors, new developments from a diverse array of researchers outside of a single group can aid in new research questions, and should always have a place in our research landscape. We owe much to the work of Kriegeskorte’s group, and in fact, Schutt et al., 2023 served as a very relevant touchpoint in the Discussion and helped to highlight specific needs not addressed by the assessment of the “representational geometry” of an entire presented stimulus set. Principal amongst these needs is the application of trial-wise representational information that can be related to trial-wise behavioral responses and thus used to address specific questions on brain-behavior relationships. We invite the Reviewer to consider the utility of this shift with the following revisions to the Introduction.

      Page 3. “Recently, methodological advancements have addressed many known limitations in cRSA. For example, cross-validated distance measures (e.g., Euclidean distance) have improved the reliability of representational dissimilarities in the presence of noise and trial imbalance (Walther et al., 2016; Nili et al., 2014; Diedrichsen et al., 2021). Bayesian approaches such as pattern component modeling (Diedrichsen, Yokoi, & Arbuckle, 2018) have extended representational approaches to accommodate continuous stimulus features or temporal variation. Further, model comparison RSA strategies (Diedrichsen et al., 2021) and generalization techniques across stimuli (Schütt et al., 2023) have improved sensitivity and inference. Nevertheless, a common feature shared across most of improvements is that they require stimuli repetition to examine the representational structure. This requirement limits their ability to probe brain-behavior questions at the level of individual events”.

      Page 8. “While several extensions of RSA have addressed key limitations in noise sensitivity, stimulus variance, and modeling (e.g., Diedrichsen et al., 2021; Schütt et al., 2023), our tRSA approach introduces a new methodological step by estimating representational strength at the trial level. This accounts for the multi-level variance structure in the data, affords generalizability beyond the fixed stimulus set, and allows one to test stimulus- or trial-level modulations of neural representations in a straightforward way”.

      Page 44. “Despite such prevalent appreciation for the neurocognitive relevance of stimulus properties, cRSA often does not account for the fact that the same stimulus (e.g., “basketball”) is seen by multiple subjects and produces statistically dependent data, an issue addressed by Schütt et al., 2023, who developed cross validation and bootstrap methods that explicitly model dependence across both subjects and stimulus conditions”.

      (3) The stated problem of the paper is to estimate "representational strength" in different regions or conditions. With this, the authors define the correlation of the brain RDM with a model RDM. This metric conflates a number of factors, namely the variances of the stimulus-specific patterns, the variance of the noise, the true differences between different dissimilarities, and the match between the assumed model and the data-generating model. It took me a long time to figure out that the authors are trying to solve a quite different problem in a quite different setting from the model-comparative approach to RSA that I would consider "classical" (Diedrichsen et al. 2021; Diedrichsen and Kriegeskorte 2017). In this approach, one is trying to test whether local activity patterns are better explained by representation model A or model B, and to estimate the degree to which the representation can be fully explained. In this framework, it is common practice to measure each stimulus at least 2 times, to be able to estimate the variance of noise patterns and the variance of signal patterns directly. Using this setting, I would define 'representational strength" very differently from the authors. Assume (using LaTeX notation) that the activity patterns $y_j,n$ for stimulus j, measurement n, are composed of a true stimulus-related pattern ($u_j$) and a trial-specific noise pattern ($e_j,n$). As a measure of the strength of representation (or pattern), I would use an unbiased estimate of the variance of the true stimulus-specific patterns across voxels and stimuli ($\sigma^2_{u}$). This estimator can be obtained by correlating patterns of the same stimuli across repeated measures, or equivalently, by averaging the cross-validated Euclidean distances (or with spatial prewhitening, Mahalanobis distances) across all stimulus pairs. In contrast, the current paper addresses a specific problem in a quite specific experimental design in which there is only one repetition per stimulus. This means that the authors have no direct way of distinguishing true stimulus patterns from noise processes. The trick that the authors apply here is to assume that the brain data comes from the assumed model RDM (a somewhat sketchy assumption IMO) and that everything that reduces this correlation must be measurement noise. I can now see why tRSA does make some sense for this particular question in this memory study. However, in the more common model-comparative RSA setting, having only one repetition per stimulus in the experiment would be quite a fatal design flaw. Thus, the paper would do better if the authors could spell the specific problem addressed by their method right in the beginning, rather than trying to set up tRSA as a general alternative to "classical RSA".

      At a general level, our approach rests on the premise that there is meaningful information present in a single presentation of a given stimulus. This assumption may have less utility when the research goals are more focused on estimating the fidelity of signal patterns for RSA, as in designs with multiple repetitions. But it is an exaggeration to state that such a trial-wise approach cannot address the difference between “true” stimulus patterns and noise. This trial-wise approach has explicit utility in relating trial-wise brain information to trial-wise behavior, across multiple cognitions (not only memory studies, as applied here). We have added substantial text to the Introduction distinguishing cRSA, which is widely employed, often in cases with a single repetition per stimulus, and model comparative methods that employ multiple repetitions. We clarify that we do not consider tRSA an alternative to the model comparative approach, and discuss that operational definitions of representational strength are constrained by the study design.

      Page 3. “In this paper, we present an advancement termed trial-level RSA, or tRSA, which addresses these limitations in cRSA (not model comparison approaches) and may be utilized in paradigms with or without repeated stimuli”.

      Page 4. “Representational geometry usually refers to the structure of similarities among repeated presentations of the same stimulus in the neural data (as captured in the brain RSM) and is often estimated utilizing a model comparison approach, whereas representational strength is a derived measure that quantifies how strongly this geometry aligns with a hypothesized model RSM. In other words, geometry characterizes the pattern space itself, while representational strength reflects the degree of correspondence between that space and the theoretical model under test”.

      Finally, we clarified that in our simulation methods we assume a true underlying activity pattern and a random error pattern. The model RSM is computed based on the true pattern, whereas the brain RSM comes from the noisy pattern, not the model RSM itself.

      Page 9. “Then, we generated two sets of noise patterns, which were controlled by parameters σ<sub>A</sub> and σ<sub>B</sub> , respectively, one for each condition”.

      (4) The notation in the paper is often conflicting and should be clarified. The actual true and measured activity patterns should receive a unique notation that is distinct from the variances of these patterns across voxels. I assume that $\sigma_ijk$ is the noise variances (not standard deviation)? Normally, variances are denoted with $\sigma^2$. Also, if these are variances, they cannot come from a normal distribution as indicated on page 10. Finally, multi-level models are usually defined at the level of means (i.e., patterns) rather than at the level of variances (as they seem to be done here).

      We have added notations for true and measured activity patterns to differentiate it from our notation for variance. We agree that multilevel models are usually defined at the level of means rather than at the level of variances and we include a Figure (Fig 1D) that describes the model in terms of the means. We clarify that the σ ($\sigma$) used in the manuscript were not variances/standard deviations themselves; rather, they were meant to denote components of the actual (multilevel) variance parameter. Each component was sampled from normal distributions, and they collectively summed up to comprise the final variance parameter for each trial. We have modified our notation for each component to the lowercase letter s to minimize confusion. We have also made our R code publicly available on our lab github, which should provide more clarity on the exact simulation process.

      (5) In the first set of simulations, the authors sampled both model and brain RSM by drawing each cell (similarity) of the matrix from an independent bivariate normal distribution. As the authors note themselves, this way of producing RSMs violates the constraint that correlation matrices need to be positive semi-definite. Likely more seriously, it also ignores the fact that the different elements of the upper triangular part of a correlation matrix are not independent from each other (Diedrichsen et al. 2021). Therefore, it is not clear that this simulation is close enough to reality to provide any valuable insight and should be removed from the paper, along with the extensive discussion about why this simulation setting is plainly wrong (page 21). This would shorten and clarify the paper.

      We have added justification of the mixed-effects model given the potential assumption violations. We caution readers to investigate the robustness of their models, and to employ permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. Finally, we agree that the first simulation setting does not possess several properties of realistic RDMs/RSMs; however, we believe that there is utility in understanding the mathematical properties of correlations – an essential component of RSA – in a straightforward simulation where the ground truth is known, thus moving the simulation to Appendix 1.

      (6) If I understand the second simulation setting correctly, the true pattern for each stimulus was generated as an NxP matrix of i.i.d. standard normal variables. Thus, there is no condition-specific pattern at all, only condition-specific noise/signal variances. It is not clear how the tRSA would be biased if there were a condition-specific pattern (which, in reality, there usually is). Because of the i.i.d. assumption of the true signal, the correlations between all stimulus pairs within conditions are close to zero (and only differ from it by the fact that you are using a finite number of voxels). If you added a condition-specific pattern, the across-condition RSA would lead to much higher "representational strength" estimates than a within-condition RSA, with obvious problems and biases.

      The Reviewer is correct that the voxel values in the true pattern are drawn from i.i.d. standard normal distributions. We take the Reviewer’s suggestion of “condition-specific pattern” to mean that there could be a condition-voxel interaction in two non-mutually exclusive ways. The first is additive, essentially some common underlying multi-voxel pattern like [6, 34, -52, …, 8] for all condition A trials, and different one such pattern for condition B trials, etc. The second is multiplicative, essentially a vector of scaling factors [x1.5, x0.5, x0.8, …, x2.7] for all condition A trials, and a different one such vector for condition B trials, etc. Both possibilities could indeed affect tRSA as much as it would cRSA.

      Importantly, If such a strong condition-specific pattern is expected, one can build a condition-specific model RDM using one-shot coding of conditions (see example figure; src: https://www.newbi4fmri.com/tutorial-9-mvpa-rsa), to either capture this interesting phenomenon or to remove this out as a confounding factor. This practice has been applied in multiple regression cRSA approaches (e.g., Cichy et al., 2013) and can also be applied to tRSA.

      (7) The trial-level brain RDM to model Spearman correlations was analyzed using a mixed effects model. However, given the symmetry of the RDM, the correlations coming from different rows of the matrix are not independent, which is an assumption of the mixed effect model. This does not seem to induce an increase in Type I errors in the conditions studied, but there is no clear justification for this procedure, which needs to be justified.

      We appreciate this important warning, and now caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the supplement.

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models. The multilevel structure of RSA data introduces potential dependencies across subjects, stimuli, and trials, which can violate assumptions of independence if not properly modeled. In the present study, we used a model that included random intercepts for both subjects and stimuli, which accounts for variance at these levels and improves the generalizability of fixed-effect estimates. Still, there is a potential for systematic dependence across trials within a subject. To ensure that the model assumptions were satisfied, we conducted a series of diagnostic checks on an exemplar ROI (right LOC; middle occipital gyrus) in the Object Perception dataset, including visual inspection of residual distributions and autocorrelation (Appendix 3, Figure 13). These diagnostics supported the assumptions of normality, homoscedasticity, and conditional independence of residuals. In addition, we conducted permutation-based inference, similar to prior improvements to cRSA (Niliet al. 2014), using a nested model comparison to test whether the mean similarity in this ROI was significantly greater than zero. The observed likelihood ratio test statistic fell in the extreme tail of the null distribution (Appendix 3, Figure 14), providing strong nonparametric evidence for the reliability of the observed effect. We emphasize that this type of model checking and permutation testing is not merely confirmatory but can help validate key assumptions in RSA modeling, especially when applying mixed-effects models to neural similarity data. Researchers are encouraged to adopt similar procedures to ensure the robustness and interpretability of their findings”.

      Exemplar Permutation Testing

      To test whether the mean representational strength in the ROI right LOC (middle occipital gyrus) was significantly greater than zero, we used a permutation-based likelihood ratio test implemented via the permlmer function. This test compares two nested linear mixed-effects models fit using the lmer function from the lme4 package, both including random intercepts for Participant and Stimulus ID to account for between-subject and between-item variability.

      The null model excluded a fixed intercept term, effectively constraining the mean similarity to zero after accounting for random effects:

      ROI ~ 0 + (1 | Participant) + (1 | Stimulus)

      The full model included the same random effects structure but allowed the intercept to be freely estimated:

      ROI ~ 1 + (1 | Participant) + (1 | Stimulus)

      By comparing the fit of these two models, we directly tested whether the average similarity in this ROI was significantly different from zero. Permutation testing (1,000 permutations) was used to generate a nonparametric p-value, providing inference without relying on normality assumptions. The full model, which estimated a nonzero mean similarity in the right LOC (middle occipital gyrus), showed a significantly better fit to the data than the null model that fixed the mean at zero (χ²(1) = 17.60, p = 2.72 × 10⁻⁵). The permutation-based p-value obtained from permlmer confirmed this effect as statistically significant (p = 0.0099), indicating that the mean similarity in this ROI was reliably greater than zero. These results support the conclusion that the right LOC contains representational structure consistent with the HMAXc2 RSM. A density plot of the permuted likelihood ratio tests is plotted along with the observed likelihood ratio test in Appendix 3 Figure 14.

      (8) For the empirical data, it is not clear to me to what degree the "representational strength" of cRSA and tRSA is actually comparable. In cRSA, the Spearman correlation assesses whether the distances in the data RSM are ranked in the same order as in the model. For tRSA, the comparison is made for every row of the RSM, which introduces a larger degree of flexibility (possibly explaining the higher correlations in the first simulation). Thus, could the gains presented in Figure 7D not simply arise from the fact that you are testing different questions? A clearer theoretical analysis of the difference between the average row-wise Spearman correlation and the matrix-wise Spearman correlation is urgently needed. The behavior will likely vary with the structure of the true model RDM/RSM.

      We agree that the comparability between mean row-wise Spearman correlations and the matrix-wise Spearman correlation is needed. We believe that the simulations are the best approach for this comparison, since they are much more robust than the empirical dataset and have the advantage of knowing the true pattern/noise levels. We expand on our comparison of mean tRSA values and matrix-wise Spearman correlations on page 42.

      Page 42. “Although tRSA and cRSA both aim to quantify representational strength, they differ in how they operationalize this concept. cRSA summarizes the correspondence between RSMs as a single measure, such as the matrix-wise Spearman correlation. In contrast, tRSA computes such correspondence for each trial, enabling estimates at the level of individual observations. This flexibility allows trial-level variability to be modeled directly, but also introduces subtle differences in what is being measured. Nonetheless, our simulations showed that, although numerical differences occasionally emerged—particularly when comparing between-condition tRSA estimates to within-condition cRSA estimates—the magnitude of divergence was small and did not affect the outcome of downstream statistical tests”.

      (9) For the real data, there are a number of additional sources of bias that need to be considered for the analysis. What if there are not only condition-specific differences in noise variance, but also a condition-specific pattern? Given that the stimuli were measured in 3 different imaging runs, you cannot assume that all measurement noise is i.i.d. - stimuli from the same run will likely have a higher correlation with each other.

      We recognize the potential of condition-specific patterns and chose to constrain the analyses to those most comparable with cRSA. However, depending on their hypotheses, researchers may consider testing condition RSMs and utilizing a model comparison approach or employ the z-scored approach, as employed in the simulations above. Regarding the potential run confounds, this is always the case in RSA and why we exclude within-run comparisons. We have also added to the Discussion the suggestion to include run as a covariate in their mixed-effects models. However, we do not employ this covariate here as we preferred the most parsimonious model to compare with cRSA.

      Page 46 - 47. “Further, while analyses here were largely employed to be comparable with cRSA, researchers should consider taking advantage of the flexibility of the mixed-effects models and include co variates of non-interest (run, trial order etc.)”.

      (10) The discussion should be rewritten in light of the fact that the setting considered here is very different from the model-comparative RSA in which one usually has multiple measurements per stimulus per subject. In this setting, existing approaches such as RSA or PCM do indeed allow for the full modelling of differences in the "representational strength" - i.e., pattern variance across subjects, conditions, and stimuli.

      We agree that studies advancing designs with multiple repetitions of a given stimulus image are useful in estimating the reliability of concept representations. We would argue however that model comparison in RSA is not restricted to such data. Many extant studies do not in fact have multiple repetitions per stimulus per subject (Wang et al., 2018 https://doi.org/10.1088/1741-2552/abecc3, Gao et al, 2022 https://doi.org/10.1093/cercor/bhac058, Li et al, 2022 https://doi.org/10.1002/hbm.26195, Staples & Graves, 2020 https://doi.org/10.1162/nol_a_00018) that allow for that type of model-comparative approach. While beneficial in terms of noise estimation, having multiple presentations was not a requirement for implementing cRSA (Kriegeskorte, 2008 https://doi.org/10.3389/neuro.06.004.2008). The aim of this manuscript is to introduce the tRSA approach to the broad community of researchers whose research questions and datasets could vary vastly, including but not limited to the number of repeated presentations and the balance of trial counts across conditions.

      (11) Cross-validated distances provide a powerful tool to control for differences in measurement noise variances and possible covariances in measurement noise across trials, which has many distinct advantages and is conceptually very different from the approach taken here.

      We have added language on the value of cross-validation approaches to RSA in the Discussion:

      Page 47. “Additionally, we note that while our proposed tRSA framework provides a flexible and statistically principled approach for modeling trial-level representational strength, we acknowledge that there are alternative methods for addressing trial-level variability in RSA. In particular, the use of cross-validated distance metrics (e.g., crossnobis distance) has become increasingly popular for controlling differences in measurement noise variance and accounting for possible covariance structures across trials (Walther et al., 2016). These metrics offer several advantages, including unbiased estimation of representational dissimilarities under Gaussian noise assumptions and improved generalization to unseen data. However, cross-validated distances are conceptually distinct from the approach taken here: whereas cross-validation aims to correct for noise-related biases in representational dissimilarity matrices, our trial-level RSA method focuses on estimating and modeling the variability in representation strength across individual trials using mixed-effects modeling. Rather than proposing a replacement for cross-validated RSA, tRSA adds a complementary tool to the methodological toolkit—one that supports hypothesis-driven inference about condition effects and trial-level covariates, while leveraging the full structure of the data”.

      (12) One of the main limitations of tRSA is the assumption that the model RDM is actually the true brain RDM, which may not be the case. Thus, in theory, there could be a different model RDM, in which representational strength measures would be very different. These differences should be explained more fully, hopefully leading to a more accessible paper.

      Indeed, the chosen model RSM may not be the true RSM, but as the noise level increases the correlation between RSMs practically becomes zero. In our simulations we assume this to be true as a straightforward way to manipulate the correspondence between the brain data and the model. However, just like cRSA, tRSA is constrained by the model selections the researchers employ. We encourage researchers to have carefully considered theoretically-motivated models and, if their research questions require, consider multiple and potentially competing models. Furthermore, the trial-wise estimates produced by tRSA encourage testing competing models within the multiple regression framework. We have added this language to the Discussion.

      Page 46. ..”choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives”.

      Pages 45-46. “While a number of studies have addressed the validity of measuring representational geometry using designs with multiple repetitions, a conceptual benefit of the tRSA approach is the reliance on a regression framework that engenders the testing of competing conceptual models of stimulus representation (e.g., taxonomic vs. encyclopedic semantic features, as in Davis et al., 2021)”.

      Reviewer #2 (Public review):

      (1)  While I generally welcome the contribution, I take some issue with the accusatory tone of the manuscript in the Introduction. The text there (using words such as 'ignored variances', 'errouneous inferences', 'one must', 'not well-suited', 'misleading') appears aimed at turning cRSA in a 'straw man' with many limitations that other researchers have not recognized but that the new proposed method supposedly resolves. This can be written in a more nuanced, constructive manner without accusing the numerous users of this popular method of ignorance.

      We apologize for the unintended accusatory tone. We have clarified the many robust approaches to RSA and have made our Introduction and Discussion more nuanced throughout (see also 3, 11 and16).

      (2) The described limitations are also not entirely correct, in my view: for example, statistical inference in cRSA is not always done using classic parametric statistics such as t-tests (cf Figure 1): the rsatoolbox paper by Nili et al. (2014) outlines non-parametric alternatives based on permutation tests, bootstrapping and sign tests, which are commonly used in the field. Nor has RSA ever been conducted at the row/column level (here referred to by the authors as 'trial level'; cf King et al., 2018).

      We agree there are numerous methods that go beyond cRSA addressing these limitations and have added discussion of them into our manuscript as well as an example analysis implementing permutation tests on tRSA data (see response to 7). We thank the reviewer for bringing King et al., 2014 and their temporal generalization method to our attention, we added reference to acknowledge their decoding-based temporal generalization approach.

      Page 8. “It is also important to note that some prior work has examined similarly fine-grained representations in time-resolved neuroimaging data, such as the temporal generalization method introduced by King et al. (see King & Dehaene, 2014). Their approach trains classifiers at each time point and tests them across all others, resulting in a temporal generalization matrix that reflects decoding accuracy over time. While such matrices share some structural similarity with RSMs, they do not involve correlating trial-level pattern vectors with model RSMs nor do their second-level models include trial-wise, subject-wise, and item-wise variability simultaneously”.

      (3) One of the advantages of cRSA is its simplicity. Adding linear mixed effects modeling to RSA introduces a host of additional 'analysis parameters' pertaining to the choice of the model setup (random effects, fixed effects, interactions, what error terms to use) - how should future users of tRSA navigate this?

      We appreciate the opportunity to offer more specific proscriptions for those employing a tRSA technique, and have added them to the Discussion:

      Page 46. “While linear mixed-effects modeling offers a powerful framework for analyzing representational similarity data, it is critical that researchers carefully construct and validate their models and choose their model RSMs carefully. In our simulations, we designed our model RSM to be the “true” RSM for demonstration purposes. However, researchers should consider if their models and model alternatives. However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (4) Here, only a single real fMRI dataset is used with a quite complicated experimental design for the memory part; it's not clear if there is any benefit of using tRSA on a simpler real dataset. What's the benefit of tRSA in classic RSA datasets (e.g., Kriegeskorte et al., 2008), with fixed stimulus conditions and no behavior?

      To clarify, our empirical approach uses two different tasks: an Object Perception task more akin to the classic RSA datasets employing passive viewing, and a Conceptual Retrieval task that more directly addresses the benefits of the trialwise approach. We felt that our Object Perception dataset is a simpler empirical fMRI dataset without explicit task conditions or a dichotomous behavioral outcome, whereas the Retrieval dataset is more involved (though old/new recognition is the most common form of memory retrieval testing) and  dependent on behavioral outcomes. However, we recognize the utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (5) The cells of an RDM/RSM reflect pairwise comparisons between response patterns (typically a brain but can be any system; cf Sucholutsky et al., 2023). Because the response patterns are repeatedly compared, the cells of this matrix are not independent of one another. Does this raise issues with the validity of the linear mixed effects model? Does it assume the observations are linearly independent?

      We recognize the potential danger for not meeting model assumptions. Though our simulation results and model checks suggest this is not a fatal flaw in the model design, we caution readers to investigate the robustness of their models, and consider employing permutation testing that does not make independence assumptions. We have also added checks of the model residuals and an example of permutation testing in the Appendix. See response to R1.

      (6) The manuscript assumes the reader is familiar with technical statistical terms such as Type I/II error, sensitivity, specificity, homoscedasticity assumptions, as well as linear mixed models (fixed effects, random effects, etc). I am concerned that this jargon makes the paper difficult to understand for a broad readership or even researchers currently using cRSA that might be interested in trying tRSA.

      We agree this jargon may cause the paper to be difficult to understand. We have expanded/added definitions to these terms throughout the methods and results sections.

      Page 12. “Given data generated with 𝑠<sub>𝑐𝑜𝑛𝑑,𝐴</sub> = 𝑠<sub>𝑐𝑜𝑛𝑑,B</sub>, the correct inference should be a failure to reject the null hypothesis of ; any significant () result in either direction was considered a false positive (spurious effect, or Type I error). Given data generated with , the inference was considered correct if it rejected the null hypothesis of  and yielded the expected sign of the estimated contrast (b<sub>B-𝐴</sub><0). A significant result with the reverse sign of the estimated contrast (b<sub>B-𝐴</sub><0) was considered a Type I error, and a nonsignificant (𝑝 ≥ 0.05) result was considered a false negative (failure to detect a true effect, or Type II error)”.

      Page 2. “Compared to cRSA, the multi-level framework of tRSA was both more theoretically appropriate and significantly sensitive (better able to detect) to true effects”.

      Page 25.”The performance of cRSA and tRSA were quantified with their specificity (better avoids false positives, 1 - Type I error rate) and sensitivity (better avoids false negatives 1 - Type II error rate)”.

      Page 6. “One of the fundamental assumptions of general linear models (step 4 of cRSA; see Figure 1D) is homoscedasticity or homogeneity of variance — that is, all residuals should have equal variance” .

      Page11. “Specifically, a linear mixed-effects model with a fixed effect  of condition (which estimates the average effect across the entire sample, capturing the overall effect of interest) and random effects of both subjects and stimuli (which model variation in responses due to differences between individual subjects and items, allowing generalization beyond the sample) were fitted to tRSA estimates via the `lme4 1.1-35.3` package in R (Bates et al., 2015), and p-values were estimated using Satterthwaites’s method via the `lmerTest 3.1-3` package (Kuznetsova et al., 2017)”.

      (7) I could not find any statement on data availability or code availability. Given that the manuscript reuses prior data and proposes a new method, making data and code/tutorials openly available would greatly enhance the potential impact and utility for the community.

      We thank the reviewer for raising our oversight here. We have added our code and data availability statements.

      Page 9. “Data is available upon request to the corresponding author and our simulations and example tRSA code is available at https://github.com/electricdinolab”.

      Reviewer #1 (Recommendations for the authors):

      (13) Page 4: The limitations of cRSA seem to be based on the assumption that within each different experimental condition, there are different stimuli, which get combined into the condition. The framework of RSA, however, does not dictate whether you calculate a condition x condition RDM or a larger and more complete stimulus x stimulus RDM. Indeed, in practice we often do the latter? Or are you assuming that each stimulus is only shown once overall? It would be useful at this point to spell out these implicit assumptions.

      We agree that stimulus x stimulus RDMs can be constructed and are often used. However, as we mentioned in the Introduction, researchers are often interested in the difference between two (or more) conditions, such as “remembered” vs. “forgotten” (Davis et al., https://doi.org/10.1093/cercor/bhaa269) or “high cognitive load” vs. “low cognitive load” (Beynel et al., https://doi.org/10.1523/JNEUROSCI.0531-20.2020). In those cases, the most common practice with cRSA is to construct condition-specific RDMs, compute cRSA scores separately for each condition, and then compare the scores at the group level. The number of times each stimulus gets presented does not prevent one from creating a model RDM that has the same rows and columns as the brain RDM, either in the same condition (“high load”) or across different conditions.

      (14) Page 5: The difference between condition-level and stimulus-level is not clear. Indeed, this definition seems to be a function of the exact experimental design and is certainly up for interpretation. For example, if I conduct a study looking at the activity patterns for 4 different hand actions, each repeated multiple times, are these actions considered stimuli or conditions?

      We have added clarifying language about what is considered stimuli vs conditions. Indeed, this will depend on the specific research questions being employed and will affect how researchers construct their models. In this specific example, one would most likely consider each different hand action a condition, treating them as fixed effects rather than random effects, given their very limited number and the lack of need to generalize findings to the broader “hand actions” category.

      Page 5. “Critically, the distinction between condition-level and stimulus level is not always clear as researchers may manipulate stimulus-level features themselves. In these cases, what researchers ultimately consider condition-level and stimulus-level will depend on their specific research questions. For example, researchers intending to study generalized object representation may consider object category a stimulus-level feature, while researchers interested in if/how object representation varies by category may consider the same category variable condition-level”.

      (15) Page 5: The fact that different numbers of trials / different levels of measurement noise / noise-covariance of different conditions biases non-cross-validated distances is well known and repeatedly expressed in the literature. We have shown that cross-validation of distances effectively removes such biases - of course, it does not remove the increased estimation variability of these distances (for a formal analysis of estimation noise on condition patterns and variance of the cross-nobis estimator, see (Diedrichsen et al. 2021)).

      We thank the reviewer for drawing our attention to this literature and have added discussions of these methods.

      (16). Page 5: "Most studies present subjects with a fixed set of stimuli, which are supposedly samples representative of some broader category". This may be the case for a certain type of RSA experiments in the visual domain, but it would be unfair to say that this is a feature of RSA studies in general. In most studies I have been involved in, we use a "stimulus" x "stimulus" RDM.

      We have edited this sentence to avoid the “most” characterization. We also added substantial text to the introduction and discussion distinguishing cRSA, which is nonetheless widely employed, especially in cases with a single repetition per stimulus (Macklin et al., 2023, Liu et al, 2024) and the model comparative method and explicitly stating that we do not consider tRSA an alternative to the model comparative approach.

      (17). Page 5: I agree that "stimuli" should ideally be considered a random effect if "stimuli" can be thought of as sampled from a larger population and one wants to make inferences about that larger population. Sometimes stimuli/conditions are more appropriately considered a fixed effect (for example, when studying the response to stimulation of the 5 fingers of the right hand). Techniques to consider stimuli/conditions as a random effect have been published by the group of Niko Kriegeskorte (Schütt et al. 2023).

      Indeed, in some cases what may be thought of as “stimuli” would be more appropriately entered into the model as a fixed effect; such questions are increasingly relevant given the focus on item-wise stimulus properties (Bainbridge et al., Westfall & Yarkoni). We have added text on this issue to the Discussion and caution researchers to employ models that most directly answer their research questions.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question. An effect is fixed when the levels represent the specific conditions of theoretical interest (e.g., task condition) and the goal is to estimate and interpret those differences directly. In contrast, an effect is random when the levels are sampled from a broader population (e.g., subjects) and the goal is to account for their variability while generalizing beyond the sample tested. Note that the same variable (e.g., stimuli) may be considered fixed or random depending on the research questions”.

      (18) Page 6: It is correct that the "classical" RSA depends on a categorical assignment of different trials to different stimuli/conditions, such that a stimulus x stimulus RDM can be computed. However, both Pattern Component Modelling (PCM) and Encoding models are ideally set up to deal with variables that vary continuously on a trial-by-trial or moment-by-moment basis. tRSA should be compared to these approaches, or - as it should be clarified - that the problem setting is actually quite a different one.

      We agree that PCM and encoding models offer a flexible approach and handle continuous trial-by-trial variables. We have clarified the problem setting in cRSA is distinct on page 6, and we have added the robustness of encoding models and their limitations to the Discussion.

      Page 6. “While other approaches such as Pattern Component Modeling (PCM) (Diedrichsen et al., 2018) and encoding models (Naselaris et al., 2011) are well-suited to analyzing variables that vary continuously on a trial-by-trial or moment-by-moment basis, these frameworks address different inferential goals. Specifically, PCM and encoding models focus on estimating variance components or predicting activation from features, while cRSA is designed to evaluate representational geometry. Thus, cRSA as well as our proposed approach address a problem setting distinct from PCM and encoding models”.

      (19) Page 8: "Then, we generated two noise patterns, which were controlled by parameters 𝜎 𝐴 and 𝜎𝐵, respectively, one for each condition." This makes little sense to me. The noise patterns should be unique to each trial - you should generate n_a + n_b noise patterns, no?

      We clarify that the “noise patterns” here are n_voxel x n_trial in size; in other words, all trial-level noise patterns are generated together and each trial has their own unique noise pattern. We have revised our description as “two sets of noise patterns” for clarity starting on page 9.

      (20) Page 9: First, I assume if this is supposed to be a hierarchical level model, the "noise parameters" here correspond to variances? Or do these \sigma values mean to signify standard deviations? The latter would make little sense. Or is it the noise pattern itself?

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (21) Page 10: your formula states "𝜎<sub>𝑠𝑢𝑏𝑗</sub>~ 𝙽(0, 0.5^2)". This conflicts with your previous mention that \sigmas are noise "levels" are they the noise patterns themselves now? Variances cannot be normally distributed, as they cannot be negative.

      As clarified in 4., the σ values are meant to denote hierarchical components of the composite standard deviation; we have updated our notation to use lower case letter s instead for clarity.

      (22) Page 13: What was the task of the subject in the Memory retrieval task? Old/new judgements relative to encoding of object perception?

      We apologize for the lack of clarity about the Memory Retrieval task and have added that information and clarified that the old/new judgements were relative to a separate encoding phase, the brain data for which has been reported elsewhere.

      Page 14. “Memory Retrieval took place one day after Memory Encoding and involved testing participants’ memory of the objects seen in the Encoding phase. Neural data during the Encoding phase has been reported elsewhere. In the main Memory Retrieval task, participants were presented with 144 labels of real-world objects, of which 114 were labels for previously seen objects and 30 were unrelated novel distractors. Participants performed old/new judgements, as well as their confidence in those judgements on a four-point scale (1 = Definitely New, 2 = Probably New, 3 = Probably Old, 4 = Definitely Old)”.

      (23) Page 13: If "Memory Retrieval consisted of three scanning runs", then some of the stimulus x stimulus correlations for the RSM must have been calculated within a run and some between runs, correct? Given that all within-run estimates share a common baseline, they share some dependence. Was there a systematic difference between the within-run and the between-run correlations?

      We have clarified in this portion of the methods that within run comparisons were excluded from our analyses. We also double-checked that the within-run exclusion was included in the description of the Neural RSMs.

      Page 14. “Retrieval consisted of three scanning runs, each with 38 trials, lasting approximately 9 minutes and 12 seconds (within-run comparisons were later excluded from RSA analyses)”.

      Page 18. “This was done by vectorizing the voxel-level activation values within each region and calculating their correlations using Pearson’s r, excluding all within-run comparisons.”

      (24) Page 20: It is not clear why the mean estimate of "representational strength" (i.e., model-brain RSM correlations) is important at all. This comes back to Major point #2, namely that you are trying to solve a very different problem from model-comparative RSA.

      We have clarified that our approach is not an alternative to model-comparative RSA, and that depending on the task constraints researchers may choose to compare models with tRSA or other approaches requiring stimulus repetition (see 3).

      (25) Page 21: I believe the problems of simulating correlation matrices directly in the way that the authors in their first simulation did should be well known and should be moved to an appendix at best. Better yet, the authors could start with the correct simulation right away.

      We agree the paper is more concise with these simulations being moved to the appendix and more briefly discussed. We have implemented these changes (Appendix 1). However, we are not certain that this problem is unknown, and have several anecdotes of researchers inquiring about this “alternative” approach in talks with colleagues, thus we do still discuss the issues with this method.

      (26) Page 26: Is the "underlying continuous noise variable 𝜎𝑡𝑟𝑖𝑎𝑙 that was measured by 𝑣𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 " the variance of the noise pattern or the noise pattern itself? What does it mean it was "measured" - how?

      𝜎𝑡𝑟𝑖𝑎𝑙 is a vector of standard deviations for different trials, and 𝜎𝑡𝑟𝑖𝑎𝑙 i would be used to generate the noise patterns for trial i. v_measured is a hypothetical measurement of trial-level variability, such as “memorability” or “heartbeat variability”. We have revised our description to clarify our methods.

      Reviewer #2 (Recommendations for the authors):

      (8) It would be helpful to provide more clarity earlier on in the manuscript on what is a 'trial': in my experience, a row or column of the RDM is usually referred to as 'stimulus condition', which is typically estimated on multiple trials (instances or repeats) of that stimulus condition (or exemplars from that stimulus class) being presented to the subject. Here, a 'trial' is both one measurement (i.e., single, individual presentation of a stimulus) and also an entry in the RDM, but is this the most typical scenario for cRSA? There is a section in the Discussion that discusses repetitions, but I would welcome more clarity on this from the get-go.

      We have added discussion of stimulus repetition methods and datasets to the Introduction and clarified our use of the terms.

      Page 8. “Critically, in single-presentation designs, a “trial” refers to one stimulus presentation, and corresponds to a row or column in the RSM. In studies with repeated stimuli, these rows are often called “conditions” and may reflect aggregated patterns across trials. tRSA is compatible with both cases: whether rows represent individual trials or averaged trials that create “conditions”, tRSA estimates are computed at the row level”.

      (9) The quality of the results figures can be improved. For example, axes labels are hard to read in Figure 3A/B, panels 3C/D are hard to read in general. In Figure 7E, it's not possible to identify the 'dark red' brain regions in addition to the light red ones.

      We thank the reviewer for raising these and have edited the figures to be more readable in the manner suggested.

      (10) I would be interested to see a comparison between tRSA and cRSA in other fMRI (or other modality) datasets that have been extensively reported in the literature. These could be the original Kriegeskorte 96 stimulus monkey/fMRI datasets, commonly used open datasets in visual perception (e.g., THINGS, NSD), or the above-mentioned King et al. dataset, which has been analyzed in various papers.

      We recognize the great utility of replication from other research groups and do invite researchers to utilize tRSA on their datasets.

      (11) On P39, the authors suggest 'researchers can confidently replace their existing cRSA analysis with tRSA': Please discuss/comment on how researchers should navigate the choice of modeling parameters in tRSA's linear mixed effects setting.

      We have added discussion of the mixed-effects parameters and the various and encourage researchers to follow best practices for their model selection.

      Page 46. “However, researchers should always consider if their models match the goals of their analysis, including 1) constructing the random effects structure that will converge in their dataset and 2) testing their model fits against alternative structures (Meteyard & Davies, 2020; Park et al., 2020) and 3) considering which effects should be considered random or fixed depending on their research question”.

      (12) The final part of the Results section, demonstrating the tRSA results for the continuous memorability factor in the real fMRI data, could benefit from some substantiation/elaboration. It wasn't clear to me, for example, to what extent the observed significant association between representational strength and item memorability in this dataset is to be 'believed'; the Discussion section (p38). Was there any evidence in the original paper for this association? Or do we just assume this is likely true in the brain, based on prior literature by e.g. Bainbridge et al (who probably did not use tRSA but rather classic methods)?

      Indeed, memorability effects have been replicated in the literature, but not using the tRSA method. We have expanded our discussion to clarify the relationship of our findings and the relevant literature and methods it has employed.

      Page 38. “Critically, memorability is a robust stimulus property that is consistent across participants and paradigms (Bainbridge, 2022). Moreover, object memorability effects have been replicated using a variety of methods aside from tRSA, including univariate analyses and representational analyses of neural activity patterns where trial-level neural activity pattern estimates are correlated directly with object memorability (Slayton et al, 2025).”

      (13) The abstract could benefit from more nuance; I'm not sure if RSA can indeed be said to be 'the principal method', and whether it's about assessing 'quality' of representations (more commonly, the term 'geometry' or 'structure' is used).

      We have edited the abstract to reflect the true nuisance in the current approaches.

      Abstract. Neural representation refers to the brain activity that stands in for one’s cognitive experience, and in cognitive neuroscience, a prominent method of studying neural representations is representational similarity analysis (RSA). While there are several recent advances in RSA, the classic RSA (cRSA) approach examines the structure of representations across numerous items by assessing the correspondence between two representational similarity matrices (RSMs): usually one based on a theoretical model of stimulus similarity and the other based on similarity in measured neural data.

      (14) RSA is also not necessarily about models vs. neural data; it can also be between two neural systems (e.g., monkey vs. human as in Kriegeskorte et al., 2008) or model systems (see Sucholutsky et al., 2023). This statement is also repeated in the Introduction paragraph 1 (later on, it is correctly stated that comparing brain vs. model is most likely the 'most common' approach).

      We have added these examples in our introduction to RSA.

      Page 3.”One of the central approaches for evaluating information represented in the brain is representational similarity analysis (RSA), an analytical approach that queries the representational geometry of the brain in terms of its alignment with the representational geometry of some cognitive model (Kriegeskorte et al., 2008; Kriegeskorte & Kievit, 2013), or, in some cases, compares the representational geometry of two neural systems (e.g., Kriegeskorte et al., 2008) or two model systems (Sucholutsky et al., 2023)”.

      (15) 'theoretically appropriate' is an ambiguous statement, appropriate for what theory?

      We apologize for the ambiguous wording, and have corrected the text:

      Page 11. “Critically, tRSA estimates were submitted to a mixed-effects model which is statistically appropriate for modeling the hierarchical structure of the data, where observations are nested within both subjects and stimuli (Baayen et al., 2008; Chen et al., 2021)”.

      (16) I found the statement that cRSA "cannot model representation at the level of individual trials" confusing, as it made me think, what prohibits one from creating an RDM based on single-trial responses? Later on, I understood that what the authors are trying to say here (I think) is that cRSA cannot weigh the contributions of individual rows/columns to the overall representational strength differently.

      We thank the reviewer for their clarifying language and have added it to this section of the manuscript.

      “Abstract. However, because cRSA cannot weigh the contributions of individual trials (RSM rows/columns), it is fundamentally limited in its ability to assess subject-, stimulus-, and trial-level variances that all influence representation”.

      (17) Why use "RSM" instead of "RDM"? If the pairwise comparison metric is distance-based (e..g, 1-correlation as described by the authors), RDM is more appropriate.

      We apologize for the error, and have clarified the Methods text:

      Page3-4. First, brain activity responses to a series of N trials are compared against each other (typically using Pearson’s r) to form an N×N representational similarity matrix.

      (18) Figure 2: please write 'Correlation estimate' in the y-axis label rather than 'Estimate'.

      We have edited the label in Figure 2.

      (19) Page 6 'leaving uncertain the directionality of any findings' - I do not follow this argument. Obviously one can generate an RDM or RSM from vector v or vector -v. How does that invalidate drawing conclusions where one e.g., partials out the (dis)similarity in e.g., pleasantness ratings out of another RDM/RSM of interest?

      We agree such an approach does not invalidate the partial method; we have clarified what we mean by “directionality”.

      Page 8. ”For instance, even though a univariate random variable , such as pleasantness ratings, can be conveniently converted to an RSM using pairwise distance metrics (Weaverdyck et al., 2020), the very same RSM would also be derived from the opposite random variable , leaving uncertain of the directionality (or if representation is strongest for pleasant or unpleasant items) of any findings with the RSM (see also Bainbridge & Rissman, 2018)”.

      (20) P7 'sampled 19900 pairs of values from a bi-variate normal distribution', but the rows/columns in an RDM are not independent samples - shouldn't this be included in the simulation? I.e., shouldn't you simulate first the n=200 vectors, and then draw samples from those, as in the next analysis?

      This section has been moved to Appendix 1 (see responses to Reviewer 1.13).

      (21) Under data acquisition, please state explicitly that the paper is re-using data from prior experiments, rather than collecting data anew for validating tRSA.

      We have clarified this in the data acquisition section.

      Page 13. “A pre-existing dataset was analyzed to evaluate tRSA. Main study findings have been reported elsewhere (S. Huang, Bogdan, et al., 2024)”.

      (22) Figure 4 could benefit from some more explanation in-text. It wasn't clear to me, for example, how to interpret the asterisks depicted in the right part of the figure.

      We clarified the meaning of the asterisks in the main text in addition to the existent text in the figure caption.

      Page 26. “see Figure 4, off-diagonal cells in blue; asterisks indicate where tRSA was statistically more sensitive then cRSA)”.

      (23) Page 38 "the outcome of tRSA's improved characterization can be seen in multiple empirical outcomes:" it seems there is one mention of 'outcomes' too many here.

      We have revised this sentence.

      Page 41. “tRSA's improved characterization can be seen in multiple empirical outcomes”.

      (24) Page 38 "model fits became the strongest" it's not clear what aspect of the reported results in the paragraph before this is referring to - the Appendix?

      Yes, the model fits are in the Appendix, we have added this in text citation.

      Moreover, model-fits became the strongest when the models also incorporated trial-level variables such as fMRI run and reaction time (Appendix 3, Table 6).

      References

      Diedrichsen, J., Berlot, E., Mur, M., Schütt, H. H., Shahbazi, M., & Kriegeskorte, N. (2021). Comparing representational geometries using whitened unbiased-distance-matrix similarity. Neurons, Behavior, Data and Theory, 5(3). https://arxiv.org/abs/2007.02789

      Diedrichsen, J., & Kriegeskorte, N. (2017). Representational models: A common framework for understanding encoding, pattern-component, and representational-similarity analysis. PLoS Computational Biology, 13(4), e1005508.

      Diedrichsen, J., Yokoi, A., & Arbuckle, S. A. (2018). Pattern component modeling: A flexible approach for understanding the representational structure of brain activity patterns. NeuroImage, 180, 119-133.

      Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI. NeuroImage, 56(2), 400-410.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS Computational Biology, 10(4), e1003553.

      Schütt, H. H., Kipnis, A. D., Diedrichsen, J., & Kriegeskorte, N. (2023). Statistical inference on representational geometries. ELife, 12. https://doi.org/10.7554/eLife.82566

      Walther, A., Nili, H., Ejaz, N., Alink, A., Kriegeskorte, N., & Diedrichsen, J. (2016). Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage, 137, 188-200.

      King, M. L., Groen, I. I., Steel, A., Kravitz, D. J., & Baker, C. I. (2019). Similarity judgments and cortical visual responses reflect different properties of object and scene categories in naturalistic images. NeuroImage, 197, 368-382.

      Kriegeskorte, N., Mur, M., Ruff, D. A., Kiani, R., Bodurka, J., Esteky, H., ... & Bandettini, P. A. (2008). Matching categorical object representations in inferior temporal cortex of man and monkey. Neuron, 60(6), 1126-1141.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.

      Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bobu, A., Kim, B., ... & Griffiths, T. L. (2023). Getting aligned on representational alignment. arXiv preprint arXiv:2310.13018.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Dillard and colleagues integrate cross-species genomic data with a systems approach to identify potential driver genes underlying human GWAS loci and establish the cell type(s) within which these genes act and potentially drive disease. Specifically, they utilize a large single-cell RNA-seq (scRNA-seq) dataset from an osteogenic cell culture model - bone marrow-derived stromal cells cultured under osteogenic conditions (BMSC-OBs) - from a genetically diverse outbred mouse population called the Diversity Outbred (DO) stock to discover network driver genes that likely underlie human bone mineral density (BMD) GWAS loci. The DO mice segregate over 40M single nucleotide variants, many of which affect gene expression levels, therefore making this an ideal population for systems genetic and co-expression analyses. The current study builds on previously published work from the same group that used co-expression analysis to identify co-expressed "modules" of genes that were enriched for BMD GWAS associations. In this study, the authors utilize a much larger scRNA-seq dataset from 80 DO BMSC-OBs, infer co-expression-based and Bayesian networks for each identified mesenchymal cell type, focused on networks with dynamic expression trajectories that are most likely driving differentiation of BMSC-OBs, and then prioritized genes ("differentiation driver genes" or DDGs) in these osteogenic differentiation networks that had known expression or splicing QTLs (eQTL/sQTLs) in any GTEx tissue that colocalized with human BMD GWAS loci. The systems analysis is impressive, the experimental methods are described in detail, and the experiments appear to be carefully done. The computational analysis of the single-cell data is comprehensive and thorough, and the evidence presented in support of the identified DDGs, including Tpx2 and Fgfrl1, is for the most part convincing. Some limitations in the data resources and methods hamper enthusiasm somewhat and are discussed below. Overall, while this study will no doubt be valuable to the BMD community, the cross-species data integration and analytical framework may be more valuable and generally applicable to the study of other diseases, especially for diseases with robust human GWAS data but for which robust human genomic data in relevant cell types is lacking. 

      Specific strengths of the study include the large scRNA-seq dataset on BMSC-OBs from 80 DO mice, the clustering analysis to identify specific cell types and sub-types, the comparison of cell type frequencies across the DO mice, and the CELLECT analysis to prioritize cell clusters that are enriched for BMD heritability (Figure 1). The network analysis pipeline outlined in Figure 2 is also a strength, as is the pseudotime trajectory analysis (results in Figure 3). One weakness involves the focus on genes that were previously identified as having an eQTL or sQTL in any GTEx tissue. The authors rightly point out that the GTEx database does not contain data for bone tissue, but the reason that eQTLs can be shared across many tissues - this assumption is valid for many cis-eQTLs, but it could also exclude many genes as potential DDGs with effects that are specific to bone/osteoblasts. Indeed, the authors show that important BMD driver genes have cell-type-specific eQTLs. Furthermore, the mesenchymal cell type-specific co-expression analysis by iterative WGCNA identified an average of 76 co-expression modules per cell cluster (range 26-153). Based on the limited number of genes that are detected as expressed in a given cell due to sparse per-cell read depth (400-6200 reads/cell) and dropouts, it's hard to believe that as many as 153 co-expression modules could be distinguished within any cell cluster. I would suspect some degree of model overfitting here and would expect that many/most of these identified modules have very few gene members, but the methods list a minimum module size of 20 genes. How do the numbers of modules identified in this study compare to other published scRNA-seq studies that use iterative WGCNA? 

      In the section "Identification of differentiation driver genes (DDGs)", the authors identified 408 significant DDGs and found that 49 (12%) were reported by the International Mouse Knockout [sic] Consortium (IMPC) as having a significant effect on whole-body BMD when knocked out in mice. Is this enrichment significant? E.g., what is the background percentage of IMPC gene knockouts that show an effect on whole-body BMD? Similarly, they found that 21 of the 408 DDGs were genes that have BMD GWAS associations that colocalize with GTEx eQTLs/sQTLs. Given that there are > 1,000 BMD GWAS associations, is this enrichment (21/408) significant? Recommend performing a hypergeometric test to provide statistical context to the reported overlaps here. 

      We thank the reviewer for their constructive feedback and thoughtful questions. In regards to the iterativeWGCNA, a larger number of modules is sometimes an outcome of the analysis, as reported in the iterativeWGCNA preprint (Greenfest-Allen et al., 2017). While we did not make a comparison to other works leveraging this tool for scRNA-seq, it has been used broadly across other published studies, such as PMID: 39640571, 40075303, 33677398, 33653874. While model overfitting, as you mention, may be a cause for more modules, our Bayesian network analysis we perform after iterativeWGCNA highlights smaller aspects of coexpression modules, as opposed to focusing on the entirety of any given module.

      We did not perform enrichment or statistical tests as our goal was to simply highlight attributes or unique features of these genes for additional context.

      Reviewer #2 (Public review): 

      Summary: 

      In this manuscript, Farber and colleagues have performed single-cell RNAseq analysis on bone marrow-derived stem cells from DO Mice. By performing network analysis, they look for driver genes that are associated with bone mineral density GWAS associations. They identify two genes as potential candidates to showcase the utility of this approach. 

      Strengths: 

      The study is very thorough and the approach is innovative and exciting. The manuscript contains some interesting data relating to how cell differentiation is occurring and the effects of genetics on this process. The section looking for genes with eQTLs that differ across the differentiation trajectory (Figure 4) was particularly exciting. 

      Weaknesses: 

      The manuscript is in parts hard to read due to the use of acronyms and there are some questions about data analysis that need to be addressed. 

      We thank the reviewer for their feedback and shared enthusiasm for our work. We tried to minimize the use of technical acronyms as much as we could without compromising readability. Additionally, we addressed questions regarding aspects of data analysis. 

      Reviewer #1 (Recommendations for the authors):

      (1) For increased transparency and to allow reproducibility, it would be necessary for the scripts used in the analysis to be shared along with the publication of the preprint. Also, where feasible, sharing the processed data in addition to the raw data would allow the community greater access to the results and be highly beneficial. 

      Thank you for this suggestion. The raw data will be available via GEO accession codes listed in the data availability statement. We will make available scripts for some analyses on our Github (https://github.com/Farber-Lab/DO80_project) and processed scRNA-seq data in a Seurat object (.rds) on Zenodo (https://zenodo.org/records/15299631)

      (2) Lines 55-76: I think the summary of previous work here is too long. I understand that they would like to cover what has been done previously, but this seems like overkill. 

      Good suggestion. We have streamlined some of the summary of our previous work.

      (3) Did the authors try to map QTL for cell-type proportion differences in their BMSC-OBs? While 80 samples certainly limit mapping power, the data shown in Figs 4C/D suggest that you might identify a large-effect modifier of LMP/OB1 proportions. 

      We did try to map QTL for cell type proportion differences, but no significant associations were identified. 

      (4) Methods question: Does the read alignment method used in your analysis account for SNPs/indels that segregate among the DO/CC founder strains? If not, the authors may wish to include this in their discussion of study limitations and speculate on how unmapped reads could affect expression results. 

      The read alignment method we used does not account for SNPs/indels from the DO founder strains that fall in RNA transcripts captured in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424). 

      (5) Much of the discussion reads as an overview of the methods, while a discussion of the results and their context to the existing BMD literature is relatively lacking in comparison.

      We have added additional explanation of the results and context to the discussion (line 381-382, 396-407). 

      (6) Figure 1E and lines 146-149: Adjusted p values should be reported in the figure and accompanying text instead of switching between unadjusted and adjusted p values. 

      We updated Figure 1e to portray adjusted p-values, listed the adjusted p-values in legend of Figure 1e, and listed them in the main text (line 153-154).

      (7) Why do the authors bring the IMPC KO gene list into the analysis so late? This seems like a highly relevant data resource (moreso than the GTEx eQTLs/sQTLs) that could have been used much earlier to help identify DDGs. 

      Given that our scRNA-seq data is also from mice, we did choose to integrate information from the IMPC to highlight supplemental features of genes in networks (i.e., genes that have an experimentally-tested and significant effect on BMD in mice). However, our primary goal was to inform human GWAS and leverage our previous work in which we identified colocalizations between human BMD GWAS and eQTL/sQTL in a human GTEx tissue, which is why this information was used to guide our network analysis.

      (8) Does Fgfrl1 and/or Tpx2 have a cis-eQTL in your BMSC-OB scRNA-seq dataset? 

      We did not identify cis-eQTL effects for Fgfrl1 and Tpx2.

      (9) Figure 4B-C: These eQTLs may be real, but based on the diplotype patterns in Figure 4C, I suspect they are artifacts of low mapping power that are driven by rare genotype classes with one or two samples having outlier expression results. For example, if you look at the results in Fig 4C for S100a1 expression, the genotype classes with the highest/lowest expression have lower sample numbers. In the case of Pkm eQTL showing a PWK-low effect, the PWK genome has many SNPs that differ from the reference genome in the 3' UTR of this gene, and I wonder if reads overlapping these SNPs are not aligning correctly (see point 4 above) and resulting (falsely) in lower expression values for samples with a PWK haplotype. 

      As mentioned above, our alignment method did not consider DO founder genetic variation that is specifically located in the 3’ end of RNA transcripts in the scRNA-seq data. We have included this as a limitation in our discussion (line 422-424).

      In future studies, we intend to include larger populations of mice to potentially overcome, as you mention, any artifacts that may be attributable to low statistical power, rare genotype classes, or outlier expression.

      Reviewer #2 (Recommendations for the authors):

      Major Points 

      (1) The authors hypothesize "that many genes impacting BMD do so by influencing osteogenic differentiation or possibly bone marrow adipogenic differentiation". However, cell type itself does not correlate with any bone trait. Does this indicate that the hypothesis is not entirely correct, as genes that drive these phenotypes would not be enriched in one particular cell type? The authors have previously identified "high-priority target genes". So, are there any cell types that are enriched for these target genes? If not, this would indicate that all these genes are more ubiquitously expressed and this is probably why they would have a greater effect on the overall bone traits. Furthermore, are the 73 eGenes (so genes with eQTLs in a particular cell type that change around cell type boundaries) or the DDGs (Table 1) enriched for these high-priority target genes? 

      The bone traits measured in the DO mice are complex and impacted by many factors, including the differentiation propensity and abundance of certain cell types, both within and outside of bone. Though we did not identify correlations between cell type abundance and the bone traits we measured, we tailored our investigations to focus on cellular differentiation using the scRNA-seq data. However, future studies would need to be performed to investigate any connections between cellular differentiation, cell type abundance, and bone traits.

      We did not perform enrichment analyses of either the target genes identified from our other work or eGenes identified here, but instead used the target gene list to center our network analysis and the eGenes to showcase the utility of the DO mouse population.

      (2) The readability of the paper could be improved by minimising the use of acronyms and there are several instances of confusing wording throughout the paper. In many cases, this can be solved by re-organising sentences and adding a bit more detail. For example, it was unclear how you arrived at Fgfrl1 or Tpx2.

      One of the goals of our study was to identify genes that have (to our knowledge) little to no known connection to BMD. We chose to highlight Fgfrl1 and Tpx2 because there is minimal literature characterizing these genes in the context of bone, which we speak to in the results (line 296-297). Additionally, we prioritized these genes in our previous work and they were identified in this study by using our network analyses using the scRNA-seq data, which we mention in the results (line 276-279).

      (3) Technical aspects of the assay. In Figure 1d you show that the cell populations vary considerably between different DO mice. It would be useful to give some sense of the technical variance of this assay given that the assay involves culturing the cells in an exogenous environment. This could take the form of tests between mice within the same inbred strain, or even between different legs of the same DO mice to show that results are technically very consistent. It might also be prudent to identify that this is a potential limitation of the approach as in vitro culturing has the potential to substantially change the cell populations that are present. 

      We agree that in vitro culturing, in addition to the preparation of single cells for scRNA-seq, are unavoidable sources of technical variation in this study. However, the total number of cells contributed by each of the 80 DO mice after data processing does not appear to be skewed and the distribution appears normal (see added figures, now included as Supplemental Figure 3). Therefore, technical variation is at least consistent across all samples. Nevertheless, we have mentioned the potential for technical variation artifacts in our study in the discussion (line 414-416).

      (4) Need for permutation testing. "We identified 563 genes regulated by a significant eQTL in specific cell types. In total, 73 genes with eQTLs were also tradeSeq-identified genes in one or more cell type boundaries". These types of statements are fine but they need to be backed up with permutation testing to show that this level of enrichment is greater than one would expect by chance. 

      We did not perform enrichment tests as our only goal was to 1. determine if eQTL could be resolved in the DO mouse population using our scRNA-seq data and 2. predict in what cell type the associated eQTL and associated eGene may have an effect.

      (5) The main novelty of the paper seems to be that you have used single-cell RNA seq (given that you appear to have already detailed the candidates at the end). I don't think this makes the paper less interesting, but I think you need to reframe the paper more about the approach, and not the specific results. How you landed on these candidates is also not clear. So the paper might be improved by more robustly establishing the workflow and providing guidelines for how studies like this should be conducted in the future. 

      We sought to not only devise a rigorous approach to analyze our single cell data, but also showcase the utility of the approach in practice by highlighting targets for future research (i.e., Fgfrl1 and Tpx2).

      Our goal was to identify novel genes and we landed on these candidate genes (Fgfrl1 and Tpx2) because they had substantial data supporting their causality and they have yet to be fully characterized in the context of bone and BMD (line 295-297).

      In regards to establishing the workflow, we have included rationale for specific aspects of our approach throughout the paper. For example, Figure 2 itemizes each step of our network analysis and we explain why each step is utilized throughout various parts results (e.g., lines 168-170, 179-181, 191-193, 202-203, 257-260, 276-277).

      We have added a statement advocating for large-scale scRNA-seq from genetically diverse samples and network analyses for future studies (line 436-438).

      Minor Points 

      (1) In the summary you use the word "trajectory". Trajectories for what? I assume the transition between cell types, but this is not clear. 

      We added text to clarify the use of trajectory in the summary (line 34).

      (2) This sentence: "By 60 identifying networks enriched for genes implicated in GWAS we predicted putatively causal genes 61 for hundreds of BMD associations based on their membership in enriched modules." is also not clear. Do you mean: we predicted putatively causal genes by identifying clusters of co-expressed genes that were enriched for GWAS genes?" It is not clear how you identify the causal gene in the network. Is this just based on the hub gene? 

      The aforementioned sentence has since been removed to streamline the introduction, as suggested by Reviewer 1.

      In regards to causal gene identification, it is not based on whether it is hub gene. We prioritized a DDG (and their associated networks) if it was a causal gene that we identified in our previous work as having eQTL/sQTL in a GTEx tissue that colocalizes with human BMD GWAS.

      (3) Figure 3C. This is good but the labels are quite small. Would be good to make all the font sizes larger. 

      We have enlarged Figure 3C.

      (4) Line 341 in the Discussion should be "pseudotemporal". 

      We have edited “temporal” to “pseduotemporal”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this fMRI study, the authors wished to assess neural mechanisms supporting flexible "temporal construals". For this, human participants learned a story consisting of fifteen events. During fMRI, events were shown to them, and they were instructed to consider the event from "an internal" or from "an external" perspective. The authors found opposite patterns of brain activity in the posterior parietal cortex and the anterior hippocampus for the internal and the external viewpoint. They conclude that allocentric sequences are stored in the hippocampus, whereas egocentric sequences are used in the parietal cortex. The claims align with previous fMRI work addressing this question.

      We appreciate the reviewer's concise summary of our research. We would like to offer two clarifications to prevent any potential misunderstandings.

      First, the activity patterns in the parietal cortex and hippocampus are not entirely opposite across internal and external perspectives. Specifically, the activation level in the posterior parietal cortex shows a positive correlation with sequential distance during external-perspective tasks, but a negative correlation during internal-perspective tasks. In contrast, the activation level in the anterior hippocampus positively correlates with sequential distance, irrespective of the observer's perspective. Therefore, our results suggest that the parietal cortex, with its perspective-dependent activity, supports egocentric representation; the hippocampus, with its consistent activity across perspectives, supports allocentric representation.

      Second, while some of our findings align with previous fMRI studies, to our knowledge, no prior research has explicitly investigated how the neural representation of time may vary depending on the observer's viewpoint. This gap in the literature is the primary motivation for our current study.

      Strengths:

      The research topic is fascinating, and very few labs in the world are asking the question of how time is represented in the human brain. Working hypotheses have been recently formulated, and this work seems to want to tackle some of them.

      We appreciate the reviewer's acknowledgment of the theoretical significance of our study.

      Weaknesses:

      The current writing is fuzzy both conceptually and experimentally. I cannot provide a sufficiently well-informed assessment of the quality of the experimental work because there is a paucity of details provided in the report. Any future revisions will likely improve transparency.

      (1) Improving writing and presentation:

      The abstract and the introduction make use of loaded terms such as "construals", "mental timeline", "panoramic views" in very metaphoric and unexplained ways. The authors do not provide a comprehensive and scholarly overview of these terms, which results in verbiage and keywords/name-dropping without a clear general framework being presented. Some of these terms are not metaphors. They do refer to computational concepts that the authors should didactically explain to their readership. This is all the more important that some statements in the Introduction are misattributed or factually incorrect; some statements lack attributions (uncited published work). Once the theory, the question, and the working hypothesis are clarified, the authors should carefully explain the task.

      We appreciate the reviewer's critics.

      The formulation of the scientific question in the introduction is grounded in the spatial construals of time hypothesis and conceptual metaphor theory (e.g., Traugott, 1978; Lakoff & Johnson, 1980; see recent reviews by Núñez & Cooperrider, 2013; Bender & Beller, 2014). These frameworks were originally developed through analyses of how spatial metaphors are used to describe temporal concepts in natural language. Consequently, it is theoretically motivated and largely unavoidable to introduce the two primary temporal construals—mental time travel and mental time watching— using metaphorical expressions.

      However, we do agree with the reviewer that the introduction in the original manuscript was overly long and that the working hypothesis was not clearly stated. In the revised manuscript, we have streamlined the introduction and substantially revised the following two paragraphs to clarify the formulation of our working hypothesis (Pages 5-6):

      “Recent studies have already begun to investigate the neural representation of the memorized event sequence (e.g., Deuker et al., 2016; Thavabalasingam et al., 2018; Bellmund et al., 2019, 2022; see reviews by Cohn-Sheehy & Ranganath, 2017; Bellmund et al., 2020). Yet, the neural mechanisms that enable the brain to construct distinct construals of an event sequence remain largely unknown. Valuable insights may be drawn from research in the spatial domain, which diPerentiates the neural representation in allocentric and egocentric reference frames. According to an influential neurocomputational model (Byrne et al., 2007; Bicanski & Burgess, 2018; Bicanski & Burgess, 2020), allocentric and egocentric spatial representations are dissociable in the brain—they are respectively implemented in the medial temporal lobe (MTL)—including the hippocampus—and the parietal cortex. Various egocentric representations in the parietal cortex derived from diPerent viewpoints can be transformed and integrated into a unified allocentric representation and stored in the MTL (i.e., bottom-up process). Conversely, the allocentric representation in the MTL can serve as a template for reconstructing diverse egocentric representations across diPerent viewpoints in the parietal cortex (i.e., top-down process).”

      “In line with the spatial construals of time hypothesis, several authors have recently suggested that such mutually engaged egocentric and allocentric reference frames (in the parietal cortex and the medial temporal lobe, respectively) proposed in the spatial domain might also apply to the temporal one (e.g., Gauthier & van Wassenhove, 2016ab; Gauthier et al., 2019, 2020; Bottini & Doeller, 2020). If this hypothesis holds, it could explain how the brain flexibly generates diverse construals of the same event sequence. Specifically, the hippocampus may encode a consistent representation of an event sequence that is independent of whether an individual adopts an internal or external perspective, reflecting an allocentric representation of time. In contrast, parietal cortical representations are expected to vary flexibly with the adopted perspective that is shaped by task demands, reflecting an egocentric representation of time.”

      In the revised manuscript, we also corrected statements in the Introduction that may have been misattributed (see Reviewer 2, comment 4(ii)) and added several relevant and important publications.

      (2) The experimental approach lacks sufficient details to be comprehensible to a general audience. In my opinion, the results are thus currently uninterpretable. I highlight only a couple of specific points (out of many). I recommend revision and clarification.

      (a) No explanation of the narrative is being provided. The authors report a distribution of durations with no clear description of the actual sequence of events. The authors should provide the text that was used, how they controlled for low-level and high-level linguistic confounds.

      We thank the reviewer for the suggestions. The event sequence for the odd-numbered participants is shown in the original Figure 1. In the revised manuscript, we added to Figure 1 the figure supplement 1 to illustrate the actual sequence of events for the participants with both odd and even numbers. We also added the narratives used in the reading phase of the learning procedures for the participants with both odd and even numbers (Figure 1—source data 1).

      To control for low-level linguistic confounds, we included the number of syllables as a covariate in the first-level general linear model in the fMRI analysis. To address high-level linguistic confounds, such as semantic information (which is difficult to quantify), we randomly assigned event labels to the 15 events twice, creating two counterbalanced versions for participants with even and odd numbers (see Comment 2b below).

      (b) The authors state, "we randomly assigned 15 phrases to the events twice". It is impossible to comprehend what this means. Were these considered stimuli? Controls? IT is also not clear which event or stimulus is part of the "learning set" and whether these were indicated to be such to participants.

      We apologize for any confusion in the Results section and the legend of Figure 1. Our motivation was explained in the "Stimuli" section of the Methods. In the revised manuscript, we have clarified this by adding an explanation to the legend of Figure 1 and including the supplementary Figure 1: " To minimize potential confounds between the semantic content of the event phrases and the temporal structure of the events, we randomly assigned the phrases to the events, creating two versions for participants with even and odd ID numbers. Both versions can be seen in Figure1—figure supplement 1 and Figure 1—source data 1."

      (c) The left/right counterbalancing is not being clearly explained. The authors state that there is counterbalancing, but do not sufficiently explain what it means concretely in the experiment. If a weak correlation exists between sequential position and distance, it also means that the position and the distance have not been equated within. How do the authors control for these?

      We thank the reviewer for highlighting this point and apologize for the lack of clarity in the original manuscript. In the current version (Page 40), we have provided further clarification: “We carefully selected two sets of 20 event pairs from the 210 possible combinations, assigning them to the odd and even runs of the fMRI experiment. Using a brute-force search, we identified 20 pairs in which sequential distance showed only weak correlations with positional information for both reference and target events (ranging from 1 to 15), as well as with behavioral responses (Same vs. Different or Future vs. Past, coded as 0 and 1), with all correlation coefficients below 0.2. At the same time, we balanced the proportion of correct responses across conditions: for the external-perspective task, Same/Different = 11/9 and 12/8; for the internal-perspective task, Future/Past = 12/8 and 8/12. Under these constraints, the sequential distances in both sets ranged from 1 to 5. To further mitigate spatial response biases, we pseudorandomized the left/right on-screen positions of the two response options within each task block, while ensuring an equal number of correct responses mapped to the left and right buttons (i.e., 10 per block).”

      The event pairs we selected already represent the best possible choice given all the criteria we aimed to satisfy. It is impossible to completely eliminate all potential correlations. For instance, if the target event occurs near the beginning of the day, it will tend to fall in the past, whereas if it occurs near the end of the day, it is more likely to fall in the future. To further ensure that the significant results were not driven by these weak confounding factors, we constructed another GLM that included three additional parametric modulators: the sequence position of the target event (ranging from 1 to 15) and the behavioral responses (Future vs. Past in the internal-perspective task; Same vs. Different in the external-perspective task, coded as 0 and 1). The significant findings were unaffected.

      (d) The authors used two tasks. In the "external perspective" one, the authors asked participants to report whether events were part of the same or a different part of the day. In the "internal perspective one", the authors asked participants to project themselves to the reference event and to determine whether the target event occurred before or after the projected viewpoint. The first task is a same/different recognition task. The second task is a temporal order task (e.g., Arzy et al. 2009). These two asks are radically different and do not require the same operationalization. The authors should minimally provide a comprehensive comparison of task requirements, their operationalization, and, more importantly, assess the behavioral biases inherent to each of these tasks that may confound brain activity observed with fMRI.

      We understand the reviewer’s concern. We agree that there is a substantial difference between the two tasks. However, the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component. Rather, the neural correlates of temporal distance were first identified as brain regions showing a significant correlation between neural activity and temporal distance using the parametric modulation analysis. We then compared these neural correlates between the two tasks. Therefore, any general differences between the tasks should not be a confound for our main results. Our aim was to examine whether the hippocampal representation of temporal distance remains consistent across different perspectives, and whether the parietal representation of temporal distance varies as a function of the perspective adopted.

      Therefore, the main aim of our task manipulation was to ensure that participants adopted either an external or an internal perspective on the event sequence, depending on the task condition. In the Introduction (Pages 6–7), we clarify this manipulation as follows: “In the externalperspective task, participants localized events with respect to external temporal boundaries, judging whether the target event occurred in the same or a different part of the day as the reference event. In the internal-perspective task, participants were instructed to mentally project themselves into the reference event and localize the target event relative to their own temporal point, judging whether the target event happened in the future or the past of the reference event (see Methods for details of the scanning procedure).”

      We believe this task manipulation was successful. Behaviorally, the two tasks showed opposite correlations between reaction time and temporal distance, resembling the symbolic distance versus mental scanning effect. Neurally, contrasting the internal- and external-perspective tasks revealed activation of the default mode network, which is known to play a central role in self-projection (Buckner et al., 2017).

      (e) The authors systematically report interpreted results, not factual data. For instance, while not showing the results on behavioral outcomes, the authors directly interpret them as symbolic distance effects.

      Thank you for this comment. In the original paper, we reported the relevant statistics before our interpretation: “Sequential Distance was correlated positively with RT in the external-perspective task (z = 3.80, p < 0.001) but negatively in the internal-perspective task (z = -3.71, p < 0.001).” However, they may have been difficult to notice, and we are including a figure for the RT analysis in the revised manuscript.

      Crucially, the authors do not comment on the obvious differences in task difficulty in these two tasks, which demonstrates a substantial lack of control in the experimental design. The same/different task (task 1 called "external perspective") comes with known biases in psychophysics that are not present in the temporal order task (task 2 called " internal perspective"). The authors also did not discuss or try to match the performance level in these two tasks. Accordingly, the authors claim that participants had greater accuracy in the external (same/different) task than in the internal task, although no data are shown and provided to support this report. Further, the behavioral effect is trivialized by the report of a performance accuracy trade off that further illustrates that there is a difference in the task requirements, preventing accurate comparison of the two tasks.

      As noted in Question 2d, we acknowledge the substantial difference between the two tasks. However, the primary goal of this study was not to directly compare these tasks to isolate a specific cognitive component. Instead, we first identified the neural correlates of temporal distance as brain regions showing a significant correlation between neural activity and temporal distance, independent of task demands. We then compared these neural correlates across the two task conditions, which were designed to engage different temporal perspectives. Therefore, any general differences between the tasks should not be a confound for our main findings and interpretation.

      Our aim was to investigate whether the hippocampal representation of temporal distance remains consistent across different perspectives and whether the parietal representation of temporal distance varies as a function of the perspective adopted. We do not see how this doubledissociation pattern could be explained by differences in task difficulty.

      While we do not consider the overall difference in task difficulty between the two tasks to be a confounding factor, we acknowledge the potential confound posed by variations in task difficulty across temporal distances (1 to 5). This concern arises from the similarity between the activity patterns in the posterior parietal cortex and reaction time across temporal distances. To address this, we conducted control analyses to test this hypothesis (see the second and third points from Reviewer 2 for details).

      On page 8, we present the behavioral accuracy data: “Participants showed significantly higher accuracy in the external-perspective task than in the internal-perspective task (external-perspective task: M = 93.5%, SD = 4.7%; internal-perspective task: M = 89.5%, SD = 8.1%; paired t(31) = 3.33, p = 0.002).”

      All fMRI contrasts are also confounded by this experimental shortcoming, seeing as they are all reported at the interaction level across a task. For instance, in Figure 4, the authors report a significant beta difference between internal and external tasks. It is impossible to disentangle whether this effect is simply due to task difference or to an actual processing of the duration that differs across tasks, or to the nature of the representation (the most difficult to tackle, and the one chosen by the authors).

      We thank the reviewer for pointing out this important issue. Like temporal distance, the neural correlates of duration were not derived from a direct contrast between the two tasks. Instead, they were identified by detecting brain regions showing a significant correlation between neural activity and the implied duration of each event using the parametric modulation analysis. Therefore, what is shown in Figure 4 reflects the significant differences in these neural correlations with duration between the two tasks.

      The observed difference in the neural representation of duration between the two tasks was unexpected. In the original manuscript, we provided a post hoc explanation: “Since the externalperspective task in the current study encouraged the participants to compare the event sequence with the external parallel temporal landmarks, duration representation in the hippocampus may be dampened.”

      However, we agree that this difference might also arise from other factors distinguishing the two tasks. In the revised manuscript, we have clarified this possibility as follows: “The difference in duration representation between the two tasks remains open to interpretation. One possible explanation is that the hippocampus is preferentially involved in memory for durations embedded within event sequences (see review by Lee et al., 2020). In the internal-perspective task, participants indeed localized events within the event sequence itself. In contrast, the externalperspective task encouraged participants to compare the event sequence with external temporal landmarks, which may have attenuated the hippocampal representation of duration.”

      Conclusion:

      In conclusion, the current experimental work is confounded and lacks controls. Any behavioral or fMRI contrasts between the two proposed tasks can be parsimoniously accounted for by difficulty or attentional differences, not the claim of representational differences being argued for here.

      We hope that our explanations and clarifications above adequately address the reviewer’s concerns. We would like to reiterate that we did not directly compare the two tasks. Rather, we first identified the neural representations of sequential distance and duration, and then examined how these representations differed across tasks. It is unclear to us how the overall difference in task difficulty or attentional demands could lead to the observed pattern of results.

      By determining where the neural representations were consistent and where they diverged, we were able to differentiate brain regions that encode temporal information allocentrically from those that represent temporal information in a perspective-dependent manner, modulated by task demands.

      Reviewer #2 (Public review):

      Summary:

      Xu et al. used fMRI to examine the neural correlates associated with retrieving temporal information from an external compared to internal perspective ('mental time watching' vs. 'mental time travel'). Participants first learned a fictional religious ritual composed of 15 sequential events of varying durations. They were then scanned while they either (1) judged whether a target event happened in the same part of the day as a reference event (external condition); or (2) imagined themselves carrying out the reference event and judged whether the target event occurred in the past or will occur in the future (internal condition). Behavioural data suggested that the perspective manipulation was successful: RT was positively correlated with sequential distance in the external perspective task, while a negative correlation was observed between RT and sequential distance for the internal perspective task. Neurally, the two tasks activated different regions, with the external task associated with greater activity in the supplementary motor area and supramarginal gyrus, and the internal condition with greater activity in default mode network regions. Of particular interest, only a cluster in the posterior parietal cortex demonstrated a significant interaction between perspective and sequential distance, with increased activity in this region for longer sequential distances in the external task, but increased activity for shorter sequential distances in the internal task. Only a main effect of sequential distance was observed in the hippocampus head, with activity being positively correlated with sequential distance in both tasks. No regions exhibited a significant interaction between perspective and duration, although there was a main effect of duration in the hippocampus body with greater activity for longer durations, which appeared to be driven by the internal perspective condition. On the basis of these findings, the authors suggest that the hippocampus may represent event sequences allocentrically, whereas the posterior parietal cortex may process event sequences egocentrically.

      We sincerely appreciate the reviewers for providing an accurate, comprehensive, and objective summary of our study.

      Strengths:

      The topic of egocentric vs. allocentric processing has been relatively under-investigated with respect to time, having traditionally been studied in the domain of space. As such, the current study is timely and has the potential to be important for our understanding of how time is represented in the brain in the service of memory. The study is well thought out, and the behavioural paradigm is, in my opinion, a creative approach to tackling the authors' research question. A particular strength is the implementation of an imagination phase for the participants while learning the fictional religious ritual. This moves the paradigm beyond semantic/schema learning and is probably the best approach besides asking the participants to arduously enact and learn the different events with their exact timings in person. Importantly, the behavioural data point towards successful manipulation of internal vs. external perspective in participants, which is critical for the interpretation of the fMRI data. The use of syllable length as a sanity check for RT analyses, as well as neuroimaging analyses, is also much appreciated.

      We thank the reviewer for the positive and encouraging comments.

      Weaknesses/Suggestions:

      Although the design and analysis choices are generally solid, there are a few finer details/nuances that merit further clarification or consideration in order to strengthen the readers' confidence in the authors' interpretation of their data.

      (1) Given the known behavioural and neural effects of boundaries in sequence memory, I was wondering whether the number of traversed context boundaries (i.e., between morning-afternoon, and afternoon-evening) was controlled for across sequential length in the internal perspective condition? Or, was it the case that reference-target event pairs with higher sequential numbers were more likely to span across two parts of the day compared to lower sequential numbers? Similarly, did the authors examine any potential differences, whether behaviourally or neurally, for day part same vs. day part different external task trials?

      We thank the reviewer for the thoughtful comments. When we designed the experiment, we minimized the correlation between the sequential distance between the target and reference events and whether the reference and target events occurred within the same or different parts of the day (coded as Same = 0, Different = 1). The point-biserial correlation coefficient between these two variables across all the trials within the same run were controlled below 0.2.

      To investigate the effect of day-part boundaries on behavior, as well as the contribution of other factors, we conducted a new linear mixed-effects model analysis incorporating four additional variables. They are whether the target and the reference events are within the same or different parts of the day (i.e., Same vs. Different), whether the target event is in the future or the past of the reference event (i.e., Future vs. Past), and the interactions of the two factors with Task Type (i.e., internal- vs. external-perspective task).

      The results are largely the same as the original one in the table: There was a significant main effect of Syllable Length, and the interaction effects between Task Type and Sequence Distance and between Task Type and Duration remain significant. What's new is we also found a significant interaction effect between Task Type and Same vs. Different.

      As shown in the Figure 2—figure supplement 1, this Same vs. Different effect was in line with the effect of Sequential Distance, with two events in the same and different parts of the day corresponding to the short and long sequential distances. Given that Sequential Distance had already been considered in the model, the effect of parts of the day should result from the boundary effect across day parts or the chunking effect within day parts, i.e., the sequential distance across different parts of the day was perceived longer while the sequential distance within the same parts of the day was perceived shorter. We have incorporated these findings into the manuscript.

      Neurally, to further verify that the significant effects of sequential distance were not driven by its weak correlation with the Same/Different judgment or other potential confounding factors, we constructed another GLM that incorporated three additional parametric modulators: the sequence position of the target event (ranging from 1 to 15) and the behavioral responses (Future vs. Past in the internal-perspective task; Same vs. Different in the external-perspective task, coded as 0 and 1). The significant findings were unaffected.

      (2) I would appreciate further insight into the authors' decision to model their task trials as stick functions with duration 0 in their GLMs, as opposed to boxcar functions with varying durations, given the potential benefits of the latter (e.g., Grinband et al., 2008). I concur that in certain paradigms, RT is considered a potential confound and is taken into account as a nuisance covariate (as the authors have done here). However, given that RTs appear to be critical to the authors' interpretation of participant behavioural performance, it would imply that variations in RT actually reflect variations in cognitive processes of interest, and hence, it may be worth modelling trials as boxcar functions with varying durations.

      We appreciate the reviewer’s insightful comment on this important issue. Whether to control for RT’s influence on fMRI activation is indeed a long-standing paradox. On the one hand, RT reflects underlying cognitive processes and therefore should not be fully controlled for. On the other hand, RT can independently influence neural activity, as several brain networks vary with RT irrespective of the specific cognitive process involved—a domain-general effect. For example, regions within the multiple-demand network are often positively correlated with RT across different cognitive domains.

      Our strategy in the manuscript is to first present the results without including RT as a control variable and then examine whether the effects are preserved after controlling for RT. In the revised manuscript, we have clarified this approach (Page 13): “Here, changes in activity levels within the PPC were found to align with RT. Whether to control for RT’s influence on fMRI activation represents a well-known paradox. On the one hand, RT reflects underlying cognitive processes and therefore should not be fully controlled for. On the other hand, RT can independently influence neural activity, as several brain networks vary with RT irrespective of the specific cognitive process involved—a domain-general effect. For instance, regions within the multiple-demand network are often positively correlated with RT and task difficulty across diverse cognitive domains (e.g., Fedorenko et al., 2013; Mumford et al., 2024). To evaluate the second possibility, we conducted an additional control analysis by including trial-by-trial RT as a parametric modulator in the first-level model (see Methods). Notably, the same PPC region remained the only area in the entire brain showing a significant interaction between Task Type and Sequential Distance (voxel-level p < 0.001, clusterlevel FWE-corrected p < 0.05). This finding indicates that PPC activity cannot be fully attributed to RT. Furthermore, we do not interpret the effect as reflecting a domain-general RT influence, as regions within the multiple-demand system—typically sensitive to RT and task difficulty—did not exhibit significant activation in our data.”

      The reason we did not use boxcar functions with varying durations in our original manuscript is that we also applied parametric modulation in the same model. In the parametric modulation, all parametric modulators inherit the onsets and durations of the events being modulated. Consequently, the modulators would also take the form of boxcar functions rather than stick functions—the height of each boxcar reflecting the parameter value and its length reflecting the RT. We were uncertain whether this approach would be appropriate, as we have not encountered other studies implementing parametric modulation in this manner.

      For exploratory purposes, we also conducted a first-level analysis using boxcar functions with variable durations. The same PPC region remained the strongest area in the entire brain that shows an interaction effect between Task Type and Sequential Distance. However, the cluster size was slightly reduced (voxel-level p < 0.001, cluster-level FWE-corrected p = 0.0610; see the Author response image 1 below). The cross indicates the MNI coordinates at [38, –69, 35], identical to those shown in the main results (Figure 4A).

      Author response image 1.

      (3) The activity pattern across tasks and sequential distance in the posterior parietal cortex appears to parallel the RT data. Have the authors examined potential relationships between the two (e.g., individual participant slopes for RT across sequential distance vs. activity betas in the posterior parietal cortex)?

      We thank the reviewer for this helpful suggestion. As shown in the Author response image 2, the interaction between Task Type and Sequential Distance was a stronger predictor of PPC activation than of RT. Because PPC activation and RT are measured on different scales, we compared their standardized slopes (standardized β) measuring the change in a dependent variable in terms of standard deviations for a one-standard-deviation increase in an independent variable. The standardized β for the Task Type × Sequential Distance interaction was −0.30 (95% CI [−0.42, −0.19]) for PPC activation and −0.21 (95% CI [−0.30, −0.13]) for RT. The larger standardized effect for PPC activation indicates that the Task Type × Sequential Distance interaction was a stronger predictor of neural activation than of behavioral RT.

      Author response image 2.

      A more relevant question is whether PPC activation can be explained by temporal information (i.e., the sequential distance) independently of RT. To test this, we included both Sequential Distance and RT in the same linear mixed-effects model predicting PPC Activation Level. As shown in the Author response table 1, although RT independently influenced PPC activation (F(1, 288) = 4.687, p = 0.031), the interaction between Task Type and Sequential Distance was a much stronger independent predictor (F(1, 290) = 19.319, p < 0.001).

      Author response table 1.

      PPC Activation Level Predicted by Sequential Distance and RT

      (3) Linear Mixed Model Formula: PPC Activation Level ~ 1 + Task Type * (Sequential Distance + RT) + (1 | Participant)

      (4) There were a few places in the manuscript where the writing/discussion of the wider literature could perhaps be tightened or expanded. For instance:

      (i) On page 16, the authors state 'The negative correlation between the activation level in the right PPC and sequential distance has already been observed in a previous fMRI study (Gauthier & van Wassenhove, 2016b). The authors found a similar region (the reported MNI coordinate of the peak voxel was 42, -70, 40, and the MNI coordinate of the peak voxel in the present study was 39, -70, 35), of which the activation level went up when the target event got closer to the self-positioned event. This finding aligns with the evidence suggesting that the posterior parietal cortex implements egocentric representations.' Without providing a little more detail here about the Gauthier & van Wassenhove study and what participants were required to do (i.e., mentally position themselves at a temporal location and make 'occurred before' vs. 'occurred after' judgements of a target event), it could be a little tricky for readers to follow why this convergence in finding supports a role for the posterior parietal cortex in egocentric representations.

      We appreciate the reviewer’s comments. In the revised manuscript, we have provided a more detailed explanation of Gauthier and van Wassenhove’s study (Page 17): “The negative correlation between the activation level in the right PPC and sequential distance has already been observed in a previous fMRI study by Gauthier & van Wassenhove (2016b). In their study, the participants were instructed to mentally position themselves at a specific time point and judge whether a target event occurred before or after that time point. The authors identified a similar brain region (reported MNI coordinates of the peak voxel: 42, −70, 40), closely matching the activation observed in the present study (MNI coordinates of the peak voxel: 39, −70, 35). In both studies, activation in this region increased as the target event approached the self-positioned time point, which aligns with the evidence suggesting that the posterior parietal cortex implements egocentric representations.”

      (ii) Although the authors discuss the Lee et al. (2020) review and related studies with respect to retrospective memory, it is critical to note that this work has also often used prospective paradigms, pointing towards sequential processing being the critical determinant of hippocampal involvement, rather than the distinction between retrospective vs. prospective processing.

      We sincerely thank the reviewer for highlighting these important points. In response, we have revised the section of the Introduction discussing the neural underpinnings of duration (Pages 3-4). “Neurocognitive evidence suggests that the neural representation of duration engages distinct brain systems. The motor system—particularly the supplementary motor area—has been associated with prospective timing (e.g., Protopapa et al., 2019; Nani et al., 2019; De Kock et al., 2021; Robbe, 2023), whereas the hippocampus is considered to support the representation of duration embedded within an event sequence (e.g., Barnett et al., 2014; Thavabalasingam et al., 2018; see also the comprehensive review by Lee et al., 2020).”

      (iii) The authors make an interesting suggestion with respect to hippocampal longitudinal differences in the representation of event sequences, and may wish to relate this to Montagrin et al. (2024), who make an argument for the representation of distant goals in the anterior hippocampus and immediate goals in the posterior hippocampus.

      We thank the reviewer for bringing this intriguing and relevant study to our attention. In the Discussion of the manuscript, we have incorporated it into our discussion (Page 21): “Evidence from the spatial domain has suggested that the anterior hippocampus (or the ventral rodent hippocampus) implements global and gist-like representations (e.g., larger receptive fields), whereas the posterior hippocampus (or the dorsal rodent hippocampus) implements local and detailed ones (e.g., finer receptive fields) (e.g., Jung et al., 1994; Kjelstrup et al., 2008; Collin et al., 2015; see reviews by Poppenk et al., 2013; Robin & Moscovitch, 2017; see Strange et al., 2014 for a different opinion). Recent evidence further shows that the organizational principle observed along the hippocampal long axis may also extend to the temporal domain (Montagrin et al., 2024). In that study, the anterior hippocampus showed greater activation for remote goals, whereas the posterior hippocampus was more strongly engaged for current goals, which are presumed to be represented in finer detail.”

      Reviewing Editor Comments:

      While both reviewers acknowledged the significance of the topic, they raised several important concerns. We believe that providing conceptual clarification, adding important methodological details, as well as addressing potential confounds will further strengthen this paper.

      We thank the editor for the suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please, provide the actual ethical approval #.

      We have added the ethical approval number in the revised manuscript (P 36): “The ethical committee of the University of Trento approved the experimental protocol (Approval Number 2019-018),”

      (2) Thirty-two participants were tested. Please report how you estimated the sample size was sufficient to test your working hypothesis.

      We thank the editor for pointing out this omission. In the revised manuscript, we have added an explanation for our choice of sample size (p. 36): “The sample size was chosen to align with the upper range of participant numbers reported in previous fMRI studies that successfully detected sequence or distance effects in the hippocampus (N = 15–34; e.g., Morgan et al., 2011; Howard et al., 2014; Deuker et al., 2016; Garvert et al., 2017; Theves et al., 2019; Park et al., 2021; Cristoforetti et al., 2022).”

      (3) All MRI figures: please orient the reader; left/right should be stated.

      In the revised manuscript, we have added labels to all MRI figures to indicate the left and right hemispheres.

      (4) In Figure 3A-B, the clear lateralization of the activation is not discussed in the Results or in the Discussion. Was it predicted?

      We thank the editors for highlighting this important point regarding hemispheric lateralization. The right-lateralization observed in our findings is indeed consistent with previous literature. In the revised manuscript, we have expanded our discussion to emphasize this aspect more clearly.

      For the parietal cortex, we now note (Page 17-18): “The negative correlation between activation in the right posterior parietal cortex (PPC) and sequential distance has previously been reported in an fMRI study by Gauthier and van Wassenhove (2016b). In their paradigm, participants were instructed to mentally position themselves at a specific time point and judge whether a target event occurred before or after that point. The authors identified a similar region (peak voxel MNI coordinates: 42, −70, 40), closely corresponding to the activation observed in the present study (peak voxel MNI coordinates: 39, −70, 35). In both studies, activation in this region increased as the target event approached the self-positioned time point, consistent with evidence suggesting that the posterior parietal cortex supports egocentric representations. Neuropsychological studies have further shown that patients with lesions in the bilateral or right PPC exhibit ‘egocentric disorientation’ (Aguirre & D’Esposito, 1999), characterized by an inability to localize objects relative to themselves (e.g., Case 2: Levine et al., 1985; Patient DW: Stark, 1996; Patients MU: Wilson et al., 1997, 2005).”

      For the hippocampus, we have added (Page 19): “Previous research has shown that hippocampal activation correlates with distance (e.g., Morgan et al., 2011; Howard et al., 2014; Garvert et al., 2017; Theves et al., 2019; Viganò et al., 2023), and that distributed hippocampal activity encodes distance information (e.g., Deuker et al., 2016; Park et al., 2021). Most studies have reported hippocampal ePects either bilaterally or predominantly in the right hemisphere, whereas only one study (Morgan et al., 2011) found the ePect localized to the left hippocampus.”

    1. Author response:

      We thank you and reviewers for their thoughtful, constructive, and fair evaluation of our manuscript. We appreciate the recognition of the value of an end-to-end proteogenomics framework integrating long-read transcriptomics with deep proteomic analysis, and we are grateful for the specific guidance on how to strengthen clarity, generality, and impact for a broad scientific readership. We outline below the key revisions we plan to undertake in response to the public reviews.

      Reviewer #1

      We thank the reviewer for their positive assessment of the relevance of this work to Ewing sarcoma and cancer proteogenomics.

      Scope and generality.

      We agree that analysis of a single cell line limits generalization. In the revised manuscript, we will extend the ProteomeGenerator3 workflow to additional tumor specimens, including Ewing sarcoma tumors, to assess reproducibility and biological relevance beyond a single test cancer cell line.

      Definitions and analytical clarity.

      We will clarify definitions of non-canonical transcripts, alternative splice isoforms, and neogenes, and explicitly distinguish these categories throughout the manuscript. We will add a summary flow diagram that tracks transcripts through classification, ORF prediction, and proteoform detection, clarifying how Figures 4B and 4D relate.

      Proteoform filtering and confidence.

      To improve transparency, we will add a step-wise schematic summarizing how candidate non-canonical proteoforms are filtered to a high-confidence subset, including SwissProt comparison, BLASTp filtering, peptide uniqueness, and competitive database searches.

      Validation.

      We agree that orthogonal validation is important. We will include additional analyses of non-canonical proteofoms detected recurrently in additional tumor specimens to provide an empirical estimate of reliably detectable non-canonical proteoforms.

      Supplementary Figure 5.

      We will revise the presentation and explanation of this figure to avoid misinterpretation, including analyses focused specifically on non-canonical sequence segments and inclusion of tumor samples for direct comparison.

      Reviewer #2

      We thank the reviewer for placing this work in context with our prior ProteomeGenerator publications and for their guidance on framing the manuscript for a broad audience.

      Emphasizing the central conceptual advance.

      We agree that the primary innovation is the use of long-read transcriptomics to generate sample-specific proteogenomic databases. In the revised manuscript, we will directly compare long-read-derived and short-read-derived databases applied to the same samples and proteomic data, explicitly demonstrating where long-read sequencing enables discovery inaccessible to short-read approaches.

      Manuscript reorganization.

      We will substantially revise the manuscript to foreground the biological and conceptual consequences of long-read-enabled proteogenomics, using focused examples. Detailed descriptions of protease selection, fractionation, and acquisition optimization will be moved to supplementary methods, while retaining key conclusions about their impact on discovery.

      Positioning of technical advances.

      We will frame multi-protease and acquisition strategies as general principles required for unbiased proteoform discovery, rather than as static technical prescriptions, emphasizing their relevance across evolving proteomics platforms.

      Overall Significance

      In the revised manuscript, we will more clearly articulate that this work establishes long-read-informed, sample-specific proteogenomics as a discovery-grade framework, revealing cancer-specific proteoforms that are systematically invisible to reference-based and short-read-driven approaches, with broad implications for cancer biology and biomarker discovery.

      We thank the editors and reviewers again for their constructive feedback, which we believe will substantially strengthen the clarity and broad impact of this work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a well-structured and interesting manuscript that investigates how herbivorous insects, specifically whiteflies and planthoppers, utilize salivary effectors to overcome plant immunity by targeting the RLP4 receptor.

      Strengths:

      The authors present a strong case for the independent evolution of these effectors and provide compelling evidence for their functional roles.

      Weaknesses:

      Western blot evidence for effector secretion is weak. The possibility of contamination from insect tissues during the sample preparation should be avoided.

      Below are some specific comments and suggestions to strengthen the manuscript.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      (1) Western blot evidence for effector secretion:

      The western blot evidence in Figure 1, which aims to show that the insect protein is secreted into plants, is not fully convincing. The band of the expected size (~30 kDa) in the infested tissues is very weak. Furthermore, the high and low molecular weight bands that appear in the infested tissues do not match the size of the protein in the insects themselves, and a high molecular weight band also appears in the uninfested control tissues. It is difficult to draw a definitive conclusion that this protein is secreted into the plants based on this evidence. The authors should also address the possibility of contamination from insect tissues during the sample preparation and explain how they have excluded this possibility.

      Thank you for pointing out this. One or two bands between 25-35kDa were specifically identified in B. tabaci-infested plants, but not the non-infested plants, and the smaller high intensity band is the same size as that of BtRDP in salivary glands. This experiment has been repeated for six times. In the current version, we reperformed this experiment, and provided salivary gland sample as a positive control, which showed the same molecular weight with a specific band in infested sample. It is noteworthily that in the experiment of current version, only the smaller high intensity band appear, while the low intensity band did not appear. The detection of a protein within infested plant tissue is a key criterion for validating the secretion of salivary effectors, an approach supported by numerous studies in this field. Furthermore, our previous LC-MS/MS analysis of B. tabaci watery saliva identified six unique peptides matching BtRDP, providing independent evidence for its presence in saliva. Therefore, as we now state in the manuscript “the detection of BtRDP in infested plants (Fig. 1a) and in watery saliva (Fig. S1) collectively indicates that BtRDP is a salivary protein”.

      Regarding the higher molecular weight band that present in both infested and non-infested samples, we agree that it most likely represents a non-specific band, which is a common occurrence in Western blot assays. Such bands are sometimes used to indicate comparable sample loading. To address the possibility of contamination by insect tissues, we wish to clarify that all insects and deposited eggs were carefully removed from the infested leaves prior to sample processing. Moreover, BtRDP is undetectable at the egg stage, and no BtRDP-associated band can be detected even in egg contamination. We have revised the Methods section to explicitly state this procedure:

      “After feeding, the eggs deposited on the infested tobacco leaves were removed. The leaves showing no visible insect contamination were immediately frozen in liquid nitrogen and ground to a fine powder.”

      (2) Inconsistent conclusion (Line 156 and Figure 3c):

      The statement in line 156 is inconsistent with the data presented in Figure 3c. The figure clearly shows that the LRR domain of the protein is the one responsible for the interaction with BtRDP, not the region mentioned in the text. This is a critical misrepresentation of the experimental findings and must be corrected. The conclusion in the text should accurately reflect the data from the figure.

      We apologize for any confusion caused by the original phrasing. In our previous manuscript, the description “NtRLP4 without signal peptides and transmembrane domains” referred specifically to the truncated construct NtRLP4<sub>(23-541)</sub> used in the experiment. To prevent any misunderstanding, we have revised the sentence in the updated version to state explicitly: “Point-to-point Y2H assays reveal that NtRLP4<sub>(23-541)</sub> (a truncated version lacking the signal peptide and transmembrane domains) interacts with BtRDP<sup>-sp</sup>”.

      (3) Role of SOBIR1 in the RLP4/SOBIR1 Complex:

      The authors demonstrate that the salivary effectors destabilize the RLP4 receptor, leading to a decrease in its protein levels and a reduction in the RLP4/SOBIR1 complex. A key question remains regarding the fate of SOBIR1 within this complex. The authors should clarify what happens to the SOBIR1 protein after the destabilization of RLP4. Does SOBIR1 become unbound, targeted for degradation itself, or does it simply lose its function without RLP4? This would provide further insight into the mechanism of action of the effectors.

      Thank you for suggestion. In the current version, we assessed the impact of BtRDP on NtSOBIR1 following NtRLP4 destabilization. The results showed that while the NtRLP4-myc accumulation was markedly reduced, NtSOBIR1-flag levels remained unchanged, suggesting that destabilization of NtRLP4 did not affect NtSOBIR1 accumulation.

      (4) Clarification on specificity and evolutionary claims:

      The paper's most significant claim is that the effectors from both whiteflies and planthoppers "independently evolved" to target RLP4. While the functional data is compelling, this evolutionary claim would be more convincing with stronger evidence. Showing that two different effector proteins target the same host protein is a fascinating finding but without a robust phylogenetic analysis, the claim of independent evolution is not fully supported. It would be valuable to provide a more detailed evolutionary analysis, such as a phylogenetic tree of the effector proteins, showing their relationship to other known insect proteins, to definitively rule out a shared, but highly divergent, common ancestor.

      We appreciate the reviewer’s valuable suggestion to investigate a potential evolutionary link between BtRDP and NlSP104. Our initial analysis already indicated no detectable sequence similarity. To address this point more thoroughly, we attempted a phylogenetic analysis. However, we were unable to generate a meaningful alignment due to a complete lack of conserved amino acid sequences. Therefore, we conducted a comparative genomics analysis by blasting both proteins against the genomic or transcriptomic data of 30 diverse insect species. This analysis revealed that RDP is exclusively present in Aleyrodidae species, and SP104 is exclusively present in Delphacidae species (Table S1). Taken together, the absence of sequence similarity, their distinct protein structure, and their lineage-specific distributions, we conclude that BtRDP and NlSP104 are highly unlikely to be homologous and thus did not originate from a common ancestor.

      (5) Role of SOBIR1 in the interaction:

      The results suggest that the effectors disrupt the RLP4/SOBIR1 complex. It is not entirely clear if the effectors are specifically targeting RLP4, SOBIR1, or both. Further experiments, such as a co-immunoprecipitation assay with just RLP4 and the effector, could clarify if the effector can bind to RLP4 in the absence of SOBIR1. This would help to definitively place RLP4 as the primary target.

      We appreciate the reviewer’s insightful comments regarding whether the effector preferentially targets RLP4, SOBIR1, or both. In our study, we conducted reciprocal co-immunoprecipitation assays using RLP4 and BtRDP as controls. These assays showed that BtRDP interacts with RLP4 but does not interact with SOBIR1, supporting the conclusion that SOBIR1 is unlikely to be a direct target of BtRDP. We fully agree that testing the interaction between RLP4 and BtRDP in the absence of SOBIR1 would further strengthen the conclusion. However, we were unable to obtain N. tabacum SOBIR1 knockout mutants, and therefore could not experimentally assess whether the RLP4–BtRDP interaction persists in planta without SOBIR1. Nevertheless, our yeast two-hybrid assays demonstrate that RLP4 and BtRDP can directly interact, indicating that their association does not strictly depend on SOBIR1. Together, these results support the interpretation that RLP4 is the primary target of BtRDP, while SOBIR1 is not directly engaged by the effector.

      (6) Transcriptome analysis (Lines 130-143):

      The transcriptome analysis section feels disconnected from the rest of the manuscript. The findings, or lack thereof, from this analysis do not seem to be directly linked to the other major conclusions of the paper. This section could be removed to improve the manuscript's overall focus and flow. If the authors believe this data is critical, they should more clearly and explicitly connect the conclusions of the transcriptome analysis to the core findings about the effector-RLP4 interaction.

      Thank you for suggestion. As you and Reviewer #2 pointed, the transcriptomic analysis did not closely link to the major conclusions of the paper, and we got little information from the transcriptomic analysis. Therefore, we remove these analyses to improve the manuscript’s overall focus and flow.

      (7) Signal peptide experiments (Lines 145 and beyond):

      The experiments conducted with the signal peptide (SP) are questionable. The SP is typically cleaved before the protein reaches its final destination. As such, conducting experiments with the SP attached to the protein may have produced biased observations and could lead to unjustified conclusions about the protein's function within the plant cell. We suggest the authors remove the experiments that include the signal peptide.

      Thank you for pointing out this. The SP was retained to direct the target proteins to the extracellular space of plant cells. Theoretically, the SP is cleaved in the mature protein. This methodology is widely used in effector biology. For example, the SP directs Meloidogyne graminicola Mg01965 to the apoplast, where it functions in immune suppression, whereas Mg01965 without the SP fails to exert this function (10.1111/mpp.12759). In our study, the SP of BtRDP was expected to guide the target protein to the extracellular space, facilitating its interaction with RLP4. Moreover, the observed protein sizes of BtRDP with and without the SP in transgenic plants were identical, suggesting successful SP cleavage. Therefore, we have retained the experiments involving the SP in the current version.

      (8) Overly strong conclusion and unclear evidence (Line 176):

      The use of the word "must" on line 176 is very strong and presents a definitive conclusion without sufficient evidence. The authors state that the proteins must interact with SOBIR1, but they do not provide a clear justification for this claim. Is SOBIR1 the only interaction partner for NtRLP4? The authors should provide a specific reason for focusing on SOBIR1 instead of demonstrating an interaction with NtRLP4 first. Additionally, do BtRDP or NlSP694 also interact with SOBIR1 directly? The authors should either tone down their language to reflect the evidence or provide a clearer justification for this strong claim.

      Thank you for pointing this out. In the current version, the word “must” has been toned down to “may” due to insufficient supporting evidence. In this study, SOBIR1 was chosen because it has been widely reported to be required for the function of several RLPs involved in innate immunity. However, it remains unclear whether SOBIR1 is the only interaction partner of NtRLP4. In the current version, we have clarified the rationale for focusing on SOBIR1 prior to the experiments “The receptor-like kinase SOBIR1, which contains a kinase domain, has been widely reported to be required for the function of RLPs involved in innate immunity (Gust & Felix, 2014)” and discussed that “Although NtRLP4 interacts with SOBIR1, this alone does not confirm that it operates strictly through this canonical module. Evidence from other RLPs shows that co-receptor usage can be flexible, and some RLPs function partly or conditionally independent of SOBIR1. Therefore, a more definitive assessment of NtRLP4 signaling will therefore require genetic dissection of its co-receptor dependencies, including but not limited to SOBIR1.”. In addition, the direct interaction between BtRDP and SOBIR1 was experimentally tested, and the results showed that BtRDP failed to interact with SOBIR1.

      Minor Comments

      (9) The statement in the abstract, "However, it remains unclear how these invaders are able to overcome receptor perception and disable the plant signaling pathways," is not entirely accurate. The fields of effector biology and host-pathogen interactions have provided significant insight into how pathogens and pests manipulate both Pattern-Triggered Immunity (PTI) and Effector-Triggered Immunity (ETI). While the specific mechanism described in this paper is novel, the broader claim that the field is unclear on these processes weakens the initial hook of the paper. A more precise framing of the problem would be beneficial, perhaps by stating that the specific mechanisms used by these particular herbivores to target RLP4 were previously unknown.

      Thank you for this insightful comment. We agree that the original statement in the abstract overstated the lack of understanding in the field. In the current version, we have refined the sentence to more accurately reflect the current state of knowledge, emphasizing that while microbial suppression of plant immunity has been extensively studied, the strategies used by herbivorous insects to overcome receptor-mediated defenses remain less understood. The revised sentence now reads as follows: “Although the mechanisms used by microbial pathogens to suppress plant immunity are well studied, how herbivorous insects overcome receptor-mediated defenses remains unclear”.

      (10) The introduction is heavily focused on Pattern Recognition Receptors (PRRs), which, while central to the paper's findings, gives a somewhat narrow view of the plant's defense against herbivores. It would be beneficial to briefly acknowledge the broader context of plant defenses, such as physical barriers, direct chemical toxicity, and indirect defenses, before narrowing the focus to the specific molecular interactions of PRRs that are the core of this study. This would provide a more complete picture of the "arms race" between plants and herbivores.

      Thank you for this valuable suggestion. We agree that the original introduction focused too narrowly on pattern-recognition receptors (PRRs). In the current version, we have expanded the introductory section to provide a broader overview of plant defense mechanisms. Specifically, we now acknowledge the multiple layers of plant defenses, including physical barriers (e.g., cuticle and cell wall), chemical defenses (e.g., toxic secondary metabolites and anti-nutritive compounds), and indirect defenses mediated by herbivore-induced volatiles. This addition provides a more complete context for understanding the molecular interactions discussed in this study. The revised paragraph now reads as follows: “Plants have evolved sophisticated defense systems to survive constant attacks from pathogens and herbivorous insects. These defenses operate at multiple levels, including physical barriers such as the cuticle and cell wall, chemical defenses involving toxic secondary metabolites and anti-nutritive compounds, and indirect defenses that attract natural enemies of herbivores through the emission of herbivore-induced volatiles. Beyond these general strategies, plants also rely on highly specialized molecular immune responses that allow them to detect and respond rapidly to invaders.”

      (11) The figure legends are generally clear, but some could be more detailed. For instance, in Figure 2, it would be helpful to explicitly state what each bar represents in the graph and to include the statistical test used. Please ensure all panels in all figures have clear labels.

      Thank you for this helpful suggestion. We have revised the legend of Fig. 2 and other figures to provide more detailed information for each panel. Specifically, we now explicitly describe what each bar represents in the graphs and specify the statistical test used. In addition, we ensured that all panels are clearly labeled. These changes improve clarity and allow readers to better interpret the data.

      (12) The methods section is comprehensive, but it would be helpful to include more specifics on the statistical analyses used. For example, the type of statistical test (e.g., t-test, ANOVA) and the software used should be mentioned for each experiment.

      Thank you for your suggestion. We have revised the Methods section (Statistical analysis) to provide more detailed information on the statistical analysis used for each experiment.

      (13) The manuscript's overall impact is weakened by the inclusion of unnecessary words and a few grammatical issues. A focused revision to tighten the language would make the major findings stand out more clearly. For example, on page 2, line 18, "in whitefly Bemisia tabaci, BtRDP is an Aleyrod..." seems to have an incomplete sentence. A thorough proofreading for typos and grammatical errors is highly recommended to improve the overall readability.

      Thank you for your suggestion. We have carefully revised the abstract and the manuscript to improve clarity, readability, and grammatical correctness. In addition, we sought the assistance of a professional English editor to thoroughly proofread and polish the manuscript, ensuring that the language meets high academic standards.

      (14) The discussion section is strong, but it could benefit from a more explicit connection between the findings and the broader ecological implications. For instance, how might the independent evolution of these effectors in different insect species impact plant-insect co-evolutionary dynamics?

      We thank the reviewer for the valuable suggestion. In the current version, we have added a paragraph in the Discussion section highlighting the broader ecological and evolutionary implications of our findings. Specifically, we discuss how the independent evolution of RLP4-targeting effectors in different insect lineages may drive plant-insect co-evolution, influence selection pressures on both plants and herbivores, and potentially shape defense diversification across plant communities. This addition helps to link our molecular findings to ecological outcomes and co-evolutionary dynamics.

      (15) The sentence on line 98, which reads " A few salivary proteins have been reported to attach to salivary sheath after secretion" seems to serve an unclear purpose in the introduction. It would be helpful for the authors to clarify its relevance to the surrounding context or to the paper's overall argument. Its inclusion currently disrupts the flow of the introduction and makes it difficult for the reader to understand its intended purpose.

      We thank the reviewer for the comment. We have revised the paragraph to clarify the relevance of salivary sheath localization to the study. Specifically, we now introduce the role of the salivary sheath as a potential scaffold for effector delivery and explicitly link previous reports of sheath-associated salivary proteins to our observation that BtRDP localizes to the salivary sheath after secretion.

      (16) The writing in lines 104-106 is both grammatically inconsistent and overly wordy. The authors switch between present and past tense ("is" and "was"), and the sentences could be made more concise to improve the clarity and flow of the text. Also check entire paper.

      We thank the reviewer for pointing this out. We have revised the sentence to improve grammatical consistency and clarity, and also checked the manuscript for similar issues. The sentence is now split into two concise statements. In addition, we have thoroughly checked the entire manuscript for similar tense inconsistencies and overly wordy sentences, and have made revisions throughout to ensure consistent past tense usage and improved readability.

      (16) The sentences on lines 111-113 are quite wordy. The core conclusion, which is that the protein affects the insect's feeding probe, could be expressed more simply and directly to improve clarity and flow. I suggest rephrasing this section to be more concise and to highlight the primary finding without the added language.

      We thank the reviewer for the helpful suggestion. We have revised the sentences to make them more concise and to emphasize the main finding that BtRDP influences the whitefly’s feeding behavior as follow: “Compared with the dsGFP control, dsBtRDP-treated B. tabaci showed a marked reduction in phloem ingestion and a longer pathway duration, indicating that BtRDP is required for efficient feeding (Fig. 2c).”

      (17) On line 118, the authors mention "subcellular location." It is not clear where the protein is localized. The authors should explicitly state the specific subcellular compartment of the protein, as this is crucial for understanding its function and interaction with other proteins.

      We thank the reviewer for this valuable comment. To clarify the subcellular localization of BtRDP, we have revised the manuscript accordingly. The transgenic line overexpressing the full-length BtRDP including the signal peptide (oeBtRDP) is expected to localize in the apoplast (extracellular space), whereas the line expressing BtRDP without the signal peptide (oeBtRDP<sup>-sp</sup>) is likely retained in the cytoplasm.

      (18) Lines 121-128, the description of the fecundity and choice assays in this section is overly wordy. The authors should present the main conclusion of these experiments more directly and concisely. The key finding is that the protein affects feeding behavior; this central point is somewhat lost in the detailed, and sometimes repetitive, phrasing.

      We thank the reviewer for this suggestion. In the revised manuscript, we have simplified the description of the fecundity and two-choice assays to highlight the main conclusion as follow: “Fecundity and two-choice assays showed that BtRDP, whether localized in the apoplast (oeBtRDP) or cytoplasm (oeBtRDP<sup>-sp</sup>), enhanced whitefly settling and oviposition compared with EV controls (Fig. 2d-i; Fig. S10), indicating that BtRDP promotes whitefly feeding behavior regardless of its subcellular location.”

      (19) Line 148, the manuscript mentions experiments involving transformation, but the transformation efficiency is not provided. Please include the transformation efficiency for all transformation experiments, as this is crucial for the reproducibility of the results.

      We thank the reviewer for raising this point. We would like to clarify that no transformation experiments were performed in this section. The experiments described involved Y2H screening using BtRDP<sup>-sp</sup> as a bait to identify interacting proteins from a N. benthamiana cDNA library. Therefore, there is no transformation efficiency to report.

      (20) Line 159, the manuscript refers to a sequence similarity around line 159 but does not provide the specific data. It is important to show the actual sequence similarity, perhaps in a supplementary figure or table, to support the claims being made.

      We thank the reviewer for this suggestion. To support our statement regarding sequence similarity, we have added the corresponding alignment figure in the Fig. S11.

      (21) Line 159, the manuscript refers to "three randomly selected salivary proteins." It is unclear from where these proteins were selected. The authors should clarify the source of this selection (e.g., a specific database or a previous study) to ensure the methodology is transparent and the results are reproducible.

      We thank the reviewer for raising this point. These proteins were selected based on previously reports (10.1093/molbev/msad221; 10.1111/1744-7917.12856). In the current version, we provide the accession of these proteins in the MS.

      (22) Line 160, the description "NtcCf9 without signal peptide and transmembrane domains" is difficult to understand. It would be clearer and more consistent to use a term like "truncated NtcCf9" and then specify which domains were removed, as this is a standard practice in molecular biology for describing protein constructs.

      We thank the reviewer for this suggestion. We have revised the manuscript to describe the construct as “truncated NtCf9” and specified that the signal peptide and transmembrane domains were removed

      (23) The phrase "incubated with anti-flag beads" on line 172 is a detail of a routine method. Such details are more appropriate for the Methods section rather than the main text, which should focus on the results and their implications. Please remove such descriptions from the main text to improve readability and flow.

      We thank the reviewer for this suggestion. We have removed the methodological detail from the main text to improve readability. We also check this throughout the MS.

      I am excited about the potential of this work and look forward to seeing the current version.

      We sincerely thank the reviewer for the positive feedback and encouragement. We appreciate your time and thoughtful comments.

      Reviewer #2 (Public review):

      Summary:

      The authors tested an interesting hypothesis that white flies and planthoppers independently evolved salivary proteins to dampen plant immunity by targeting a receptor-like protein.

      Strengths:

      The authors used a wide range of methods to dissect the function of the white fly protein BtRDP and identify its host target NtRLP4.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      Weaknesses:

      (1) Serious concerns about protein work.

      I did not find the indicated protein bands for anti-BtRDP in Figures 1a and 1b in the original blot pictures shown in Figure S30. In Figure 1a, I can't get the point of showing an unspecific protein band with a size of ~190 kD as a loading control for a protein of ~ 30 kD.

      The data discrepancy led me to check other Western blot pictures. Similarly, Figures 2d, 3b, 3d, and S15b (anti-Myc) do not correspond to the original blots shown. In addition, the anti-Myc blot in Figure 4i, all blot pictures in Figures 5b, 5h, and S19a appeared to be compressed vertically. These data raised concerns about the quality of the manuscript.

      Blots shown in Figure 3d, 4f, 4g, and 4h appeared to be done at a different exposure rate compared to the complete blot shown in Figure S30. The undesirable connection between Western blot pictures shown in the figures and the original data might be due to the reduced quality of compressed figures during submission. Nevertheless, clarification will be necessary to support the strength of the data provided.

      We sincerely thank the reviewer for carefully examining our Western blot data and for pointing out these inconsistencies. The discrepancy between the figures in the main text and the original blots (Figure S30) resulted from an oversight during manuscript revision. This manuscript had undergone multiple rounds of revision after submission to another journal. During this process, the main figures and supplementary figures were updated separately, and we mistakenly failed to replace the original blot files with the corresponding current versions.

      For the different exposure rate, the blots shown in the main text were adjusted for overall contrast and brightness to enhance band visibility and presentation clarity, whereas the original images in Figure S30 were raw, unprocessed scans directly from the imaging system. For example, in the Author response image 1 below, to visualize the loading of the input sample, the output figure was adjusted for overall contrast and brightness. This was acceptable for image processing (https://www.nature.com/nature-portfolio/editorial-policies/image-integrity)

      Author response image 1.

      The same figure with brightness and contrast changes across the entire image.

      For the vertical compression, in the previous version, some images were vertically compressed for layout purposes to make the composite figures appear more visually balanced. However, after consulting relevant publication guidelines, we realized that such one-dimensional compression is not encouraged by certain journals as it may alter the original aspect ratio of the image. Therefore, in the manuscript, we have avoided any non-proportional scaling and retained the original aspect ratio of all images.

      We have now carefully rechecked all Western blot data, replaced the outdated raw blot images with the correct corresponding ones, avoid vertical compression, and ensured that the processed figures in the main text match their original data. The revised supplementary figures now accurately reflect the raw experimental results.

      (2) Misinterpretation of data.

      I am afraid the authors misunderstood pattern-triggered immunity through receptor-like proteins. It is true that several LRR-type RLPs constitutively associate with SOBIR1, and further recruit BAK1 or other SERKs upon ligand binding. One should not take it for granted that every RLP works this way. To test the hypothesis that NtRLP4 confers resistance to B.tabaci infestation, the author compared transcriptional profiles between an EV plant line and an RLP4 overexpression line. If I understood the methods and figure legends correctly, this was done without B. tabaci treatment. This experimental design is seriously flawed. To provide convincing genetic evidence, independent mutant lines (optionally independent overexpression lines) in combination with different treatments will be necessary. Otherwise, one can only conclude that overexpressing the RLP4 protein generated a nervous plant. In addition, ROS burst, but not H2O2 accumulation, is a common immune response in pattern-triggered immunity.

      We agree with the reviewer that not every RLP functions through the same mechanism as the canonical SOBIR1–BAK1 pathway. In the current version, we further examined the interaction between the whitefly salivary protein and SOBIR1, and found that they do not interact. However, our interaction assays clearly demonstrated that NtRLP4 does interact with SOBIR1. Whether NtRLP4 functions through, or exclusively through, SOBIR1 remains uncertain, and we have emphasized this limitation in the Discussion section as follow: “Although NtRLP4 interacts with SOBIR1, this alone does not confirm that it operates strictly through this canonical module. Evidence from other RLPs shows that co-receptor usage can be flexible, and some RLPs function partly or conditionally independent of SOBIR1 [39]. Therefore, a more definitive assessment of NtRLP4 signaling will therefore require genetic dissection of its co-receptor dependencies, including but not limited to SOBIR1.”

      Regarding the transcriptome analysis, our original aim was to explore why B. tabacishowed such a pronounced preference among tobacco plants. As this preference was assessed using uninfested plants, we also performed transcriptome sequencing using plants without B. tabaci treatment. The enrichment analysis demonstrated that the majority of up-regulated DEGs were associated with plant–pathogen interaction, environmental adaptation, MAPK signaling, and signal transduction pathways, while down-regulated DEGs were enriched in glutathione, carbohydrate, and amino acid metabolism. Notably, many DEGs were annotated as RLK/RLPs or WRKY transcription factors, most of which were upregulated, suggesting an enhanced defense state in the NtRLP4-overexpressing plants. The altered expression of JA- and SA-related genes (e.g., upregulation of FAD7 and downregulation of PAL and NPR1) further supported this enhanced defense and hormonal crosstalk. We agree that combining overexpression or knockout lines with insect infestation treatments would provide more direct genetic evidence for NtRLP4-mediated resistance, and we have acknowledged this as an important future direction. Nevertheless, our current data are consistent with the conclusion that NtRLP4 overexpression confers increased resistance to B. tabaci infestation.

      Finally, DAB staining for H<sub>2</sub>O<sub>2</sub> accumulation is also a well-established indicator of PTI responses, and many studies have shown that overexpression of salivary elicitors can trigger such accumulation.

      (3) Lack of logic coherence.

      The written language needs substantial improvement. This impeded the readability of the work. More importantly, the logic throughout the manuscript appeared scattered. The choice of testing protein domains for protein-protein interactions, using plants overexpressing an insect protein to study its subcellular localization, switching back and forth between using proteins with signal peptides and without signal peptides, among others, lacks a clear explanation.

      We appreciate the reviewer’s careful reading and valuable comments regarding the logical coherence of our manuscript.

      (1) To improve the English quality, the entire manuscript has been professionally edited by a certified language-editing service.

      (2) Regarding the rationale for testing protein domains in the protein–protein interaction assays: NtRLP4 is a membrane-anchored receptor-like protein composed of extracellular, transmembrane, and short intracellular domains. We aimed to determine which region of NtRLP4 is responsible for interacting with the salivary protein, as this would help infer the likely site of interaction in planta. In addition, not all RLPs contain a malectin-like domain, and we sought to verify whether the BtRDP–NtRLP4 interaction depends on this domain. To enhance the logical flow, we introduced a brief statement explaining the experimental purpose before presenting the interaction assays in the current version as follow: “These findings raised the question of which domain of NtRLP4 is responsible for binding BtRDP, as identifying the interacting domain could help infer where the salivary protein contacts the receptor in planta. We therefore dissected the NtRLP4 domains accordingly.”

      (3) With respect to using plants overexpressing an insect protein to examine subcellular localization: since both the brown planthopper and the whitefly are non-model species for which stable genetic transformation is technically unfeasible, many previous studies have used Agrobacterium-mediated transient expression or transgenic plant systems to investigate the subcellular localization of insect salivary proteins within host cells. Following these precedents, our study also employed plant systems to determine the localization of the insect protein and to assess how different localizations affect plant defense responses.

      (4) As for switching between constructs with or without signal peptides: the subcellular localization of effectors can influence their biological activity and interactions. Previous studies have used the presence or absence of signal peptides, or replacement with a PR1 signal peptide, to direct protein targeting (for example, Frontiers in Plant Science, 2022, 13:813181). Because salivary sheaths are generally considered to localize in the apoplastic space, we generated two transgenic N. tabacum lines overexpressing BtRDP: one carrying the full-length coding sequence including the signal peptide (oeBtRDP), expected to be secreted into the apoplast, and another lacking the signal peptide (oeBtRDP-sp), likely retained in the cytoplasm. In the current version, we clarified this rationale and added references to similar studies to improve the manuscript’s logic and readability. Details are as follow: “To investigate the role of BtRDP in different subcellular location of host plants, we constructed two transgenic N. tabacum lines overexpressing BtRDP: one carrying the full-length coding sequence including the signal peptide (oeBtRDP), which is expected to be secreted into the apoplast (extracellular space), and the other lacking the signal peptide (oeBtRDP<sup>-sp</sup>), which is likely retained in the cytoplasm.”

      Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al. investigate how herbivorous insects overcome plant receptor-mediated immunity by targeting plant receptor-like proteins. The authors identify two independently evolved salivary effectors, BtRDP in whiteflies and NlSP694 in brown planthoppers, that promote the degradation of plant RLP4 through the ubiquitin-dependent proteasome pathway. NtRLP4 from tobacco and OsRLP4 from rice are shown to confer resistance against herbivores by activating defense signaling, while BtRDP and NlSP694 suppress these defenses by destabilizing RLP4 proteins.

      Strengths:

      This work highlights a convergent evolutionary strategy in distinct insect lineages and advances our understanding of insect-plant coevolution at the molecular level.

      Thank you very much for your comments. We have carefully revised the MS following your valuable suggestions and comments.

      Weaknesses:

      (1) I found the naming of BtRDP and NlSP694 somewhat confusing. The authors defined BtRDP as "B. tabaci RLP-degrading protein," whereas NlSP694 appears to have been named after the last three digits of its GenBank accession number (MF278694, presumably). Is there a standard convention for naming newly identified proteins, for example, based on functional motifs or sequence characteristics? As it stands, the inconsistency makes it difficult for readers to clearly distinguish these proteins from those reported in other studies.

      Thank you for your comment. These are species-specific salivary proteins that have not been reported or annotated in previous studies. Because no homologous genes could be identified in other species, there are no existing names or annotations for these proteins. For such lineage-specific salivary proteins, it is common in recent studies to name them according to their experimentally identified functions. For example, a recently reported salivary protein was named SR45-interacting salivary protein (SISP) based on its function (10.1111/nph.70668). Following this convention, we adopted a similar functional naming strategy in this study. We acknowledge that there may not yet be a standardized rule for naming such proteins, and we would be glad to follow a more authoritative naming guideline if possible.

      (2) Figure 2 and other figures. Transgenic experiments require at least two independent lines, because results from a single line may be confounded by position effects or unintended genomic alterations, and multiple lines provide stronger evidence for reproducibility and reliability.

      We appreciate the reviewer’s suggestion. In our study, two independent transgenic lines were used to ensure the reproducibility and reliability of the results. One representative line was presented in the main figures, while data from the second independent line were included in the supplementary figures. To make this clearer, we have emphasized in the manuscript that bioassays were conducted using two independent transgenic lines.

      (3) Figure 3e. Quantitative analysis of NtRLP4 was required. Additionally, since only one band was observed in oeRLP, were any tags included in the construct?

      Thank you for your comment. In the current version, quantitative analysis of NtRLP4 expression has been performed and is now presented in Figure 3. For the oeRLP plants, no tag was fused to NtRLP4; thus, anti-RLP serum was used to detect the target bands. In contrast, oeBtRDP and oeBtRDP-sp were fused with C-terminal FLAG tags, and their detection was carried out using anti-FLAG serum. This information has been clarified in the revised Methods section as follows: “The oeBtRDP and oeBtRDP<sup>-sp</sup> were fused with C-terminal FLAG tags, while no tag was fused to oeNtRLP4.”

      (4) Figure 4a. The RNAi effect appears to be well rescued in Line 1 but poorly in Line 2. Could the authors clarify the reason for this difference?

      Thank you for pointing this out. We also noticed that the RNAi effect appeared to be better rescued in Line 2 than in Line 1. Based on our measurements, the silencing efficiency of NtRLP4 in RNAi-RLP4 Line 1 was markedly weaker than in Line 2, which likely explains the difference in rescue efficiency. In the current version, we have clarified this point as follows: “Both RNAi-RLP lines showed reduced NtRLP4 levels compared with EV plants, with RNAi-RLP#2 exhibiting a stronger silencing effect (Fig. S19a).” “The differential rescue effect between the two RNAi lines likely resulted from their different NtRLP4 silencing efficiencies, with the lower NtRLP4 level in RNAi-RLP#2 leading to a more complete rescue phenotype.”

      (5) ROS accumulation is shown for only a single leaf. A quantitative analysis of ROS accumulation across multiple samples would be necessary to support the conclusion. The same applies to Figure 16f.

      Thank you for pointing this out. The H<sub>2</sub>O<sub>2</sub> accumulation experiments have been repeated for 5 times in Figure 4 and Figure S16f. In the current version, we addressed that “the experiment is repeated five times with similar results” in the figure legends.

      (6) Figure 4f: NtRLP4 abundance was significantly reduced in oeBtRDP plants but not in oeBtRDP-SP. Although coexpression analysis suggests that BtRDP promotes NtRLP4 degradation in an ubiquitin-dependent manner, the reduced NtRLP4 levels may not result from a direct interaction between BtRDP and NtRLP4. It is possible that BtRDP influences other factors that indirectly affect NtRLP4 abundance. The authors should discuss this possibility.

      Thank you for your valuable suggestion. We agree that the reduced NtRLP4 abundance may not necessarily result from a direct interaction between BtRDP and NtRLP4. In the manuscript, we have further discussed this possibility as follows: “Notably, BtRDP and NlSP104 shared no sequence or structural similarity and lack resemblance to known eukaryotic ubiquitin-ligase domains. Their interaction with RLP4s occurs in the extracellular space (Fig. 3d; Fig. 5c), whereas the ubiquitin-proteasome system primarily functions in the cytosol and nucleus [46]. Furthermore, NtRLP4 reduction is observed only in oeBtRDP transgenic plants, not in oeBtRDP-sp plants (Fig. 4f), suggesting that BtRDP exerts its influence on NtRLP4 in the extracellular space. These observations collectively argue against the possibility that BtRDP or NlSP694 possesses intrinsic E3 ligase activity capable of directly ubiquitinating RLP4s within plant cells. Importantly, the reduced NtRLP4 levels may not result from a direct physical interaction between BtRDP and NtRLP4. Instead, BtRDP may indirectly affect RLP4 post-translational modification, thereby accelerating its degradation, which warrants further investigation”

      (7) The statement in lines 335-336 that 'Overexpression of NtRLP4 or NtSOBIR1 enhances insect feeding, while silencing of either gene exerts the opposite effect' is not supported by the results shown in Figures S16-S19. The authors should revise this description to accurately reflect the data.

      Thank you for pointing this out. We agree that our original statement was not precise, as we measured the insect settling preference and oviposition on transgenic plants, but did not directly assess the feeding behavior of B. tabaci. Therefore, we have revised the description in the manuscript to more accurately reflect our data as follows: “Overexpression of NtRLP4 or NtSOBIR1 in N. tabacum is attractive to B. tabaci and promotes insect reproduction, whereas silencing of either gene exerts the opposite effect.”

      (8) BtRDP is reported to attach to the salivary sheath. Does the planthopper NlSP694 exhibit a similar secretion localization (e.g., attachment to the salivary sheath)? The authors should supplement this information or discuss the potential implications of any differences in secretion localization between BtRDP and NlSP694 for their respective modes of action.

      Thank you for your insightful suggestion. We agree that determining the secretion localization of NlSP694 would provide valuable information for understanding its potential mode of action. Immunohistochemical (IHC) staining is indeed a critical approach for such analysis. However, in this study, we were unable to express NlSP694 in Escherichia coli, and the antibody generated using a synthesized peptide did not show sufficient specificity or sensitivity for IHC detection. Consequently, we were unable to determine whether NlSP694 is attached to the salivary sheath. Therefore, whether BtRDP and NlSP694 acted in different mode require further investigation.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1e. The BtRDP-labeled fluorescent signal is difficult to discern. An enlarged view of the target region would be helpful for clarity.

      Thank you for your suggestion. In the current version, an enlarged view of the target region was provided below the figure.

      (2) The finding that BtRDP accumulates in the salivary sheath secreted by Bemisia tabaci is important for understanding the subcellular localization of this protein during actual insect feeding. I suggest moving Figure S5 to the main text.

      Thank you for your suggestion. Figure S5 has been moved to Fig. 1f in the current version.

      (3) Please carefully cross-check the figure numbering to ensure that all in-text citations correspond to the correct figures and panels. i.e., lines 136,188,192, and 194.

      Thank you for pointing this out. We corrected them in the current version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript titled "The distinct role of human PIT in attention control" by Huang et al. investigates the role of the human posterior inferotemporal cortex (hPIT) in spatial attention. Using fMRI experiments and resting-state connectivity analyses, the authors present compelling evidence that hPIT is not merely an object-processing area, but also functions as an attentional priority map, integrating both top-down and bottom-up attentional processes. This challenges the traditional view that attentional control is localized primarily in frontoparietal networks.

      The manuscript is strong and of high potential interest to the cognitive neuroscience community. Below, I raise questions and suggestions to help with the reliability, methodology, and interpretation of the findings.

      Thank you for a nice summary of the key points of our study. Below you will find our reply to your questions.

      (1) The authors argue that hPIT satisfies the criteria for a priority map, but a clearer justification would strengthen this claim. For example, how does hPIT meet all four widely recognized criteria, such as spatial selectivity, attentional modulation, feature invariance, and input integration, when compared to classical regions such as LIP or FEF? A more systematic summary of how hPIT meets these benchmarks would be helpful. Additionally, to what extent are the observed attentional modulations in hPIT independent of general task difficulty or behavioral performance?

      Great suggestions! For the first suggestion, we have included a clearer justification in the discussion part of manuscript (line 405-406). For the second one, all participants received task practice prior to scanning, and task accuracy exceeded 90%, suggesting the tasks were not overly demanding. Although ceiling effects limit the interpretability of behavioral-performance correlations, we argue that higher task demands would likely require greater attentional effort, leading to stronger modulation in hPIT, which aligns with our findings.

      (2) The authors report that hPIT modulation is invariant to stimulus category, but there appear to be subtle category-related effects in the data. Were the face, scene, and scrambled images matched not only in terms of luminance and spatial frequency, but also in terms of factors such as semantic familiarity and emotional salience? This may influence attentional engagement and bias interpretation.

      The response of hPIT is not sensitive to stimulus category, but attentional modulation in hPIT is slightly stronger to faces than scenes and scrambled images. Although faces used in the task had neutral expressions and the scene pictures were also neutral, we acknowledge that we indeed cannot exclusively eliminate the possibility that potential semantic familiarity or emotional salience may contribute to the subtle category-related effects in the results of experiment 3. This limitation has been noted in the discussion part of manuscript (line 440-442).

      (3) The result that attentional load modulates hPIT is important and adds depth to the main conclusions. However, some clarifications would help with the interpretation. For example, were there observable individual differences in the strength of attentional modulation? How consistent were these effects across participants?

      Yes, individual differences exist. In the manuscript, we have included individual subject data points in the figure 6B. No data exceeded three standard deviations from the group mean, suggesting that the attentional modulation effects were generally consistent across participants.

      (4) The resting-state data reveal strong connections between hPIT and both dorsal and ventral attention networks. However, the analysis is correlational. Are there any complementary insights from task-based functional connectivity or latency analyses that support a directional flow of information involving hPIT? In addition, do the authors interpret hPIT primarily as a convergence hub receiving input from both DAN and VAN, or as a potential control node capable of influencing activity in these networks? Also, were there any notable differences between hemispheres in either the connectivity patterns or attentional modulation?

      Though it’s hard to generate directional flow of information from fMRI due to the low temporal resolution. We agree that besides resting-state connection, task-based functional connectivity analyses would have the potential to provide additional information about whether hPIT serves as a convergence node or a control hub. We have conducted task-based functional connectivity analyses, specifically PPI, using data from experiment 2, which revealed task-modulated right hPIT connectivity with FFA, LOp, and TPJ, suggesting hPIT may allocate attentional resources to object-processing regions following priority map generation (line 378-383). Given the limited number of significant PPI results and the inherent constraints of fMRI in capturing fast or transient attention-related interactions, the present data do not allow us to determine the role of hPIT. Future studies combining effective connectivity or causal perturbation methods (e.g., DCM, TMS-fMRI) would be ideal to test whether hPIT acts as a control node influencing activity within DAN and VAN.

      We also observed modest hemispheric asymmetries in connectivity—for instance, both left and right hPIT showed stronger connectivity with right-hemisphere attention nodes. This has been described in the results part of manuscript (line 373-377).

      (5) A few additional questions arise regarding the anatomical characteristics of hPIT: How consistent were its location and size across participants? Were there any cases where hPIT could not be reliably defined? Given the proximity of hPIT to FFA and LOp, how was overlap avoided in ROI definition? Were the functional boundaries confirmed using independent contrasts?

      We can see a relatively consistent size and location of hPIT across subjects in Supplementary Figure 1, where the voxel size and location for individual subjects reported. The consistency also demonstrated by figure 4C.

      We avoided overlap with the FFA and LOp by manually delineating the hPIT which is defined by conjunction maps across three tasks and by avoiding overlapping voxels. The FFA was defined using an independent contrast (Exp3 contrast [face-scene]) and the Lop location was defined by anatomical parcellation (Glasser et al., 2016).

      Reviewer #2 (Public review):

      Summary

      This study investigates the role of the human posterior inferotemporal cortex (hPIT) in attentional control, proposing that hPIT serves as an attentional priority map that integrates both top-down (endogenous) and bottom-up (exogenous) attentional processes. The authors conducted three types of fMRI experiments and collected resting-state data from 15 participants. In Experiment 1, using three different spatial attention tasks, they identified the hPIT region and demonstrated that this area is modulated by attention across tasks. In Experiment 2, by manipulating the presence or absence of visual stimuli, they showed that hPIT exhibits strong attentional modulation in both conditions, suggesting its involvement in both bottom-up and top-down attention. Experiment 3 examined the sensitivity of hPIT to stimulus features and attentional load, revealing that hPIT is insensitive to stimulus category but responsive to task load - further supporting its role as an attentional priority map. Finally, resting-state functional connectivity analyses showed that hPIT is connected to both dorsal and ventral attention networks, suggesting its potential role as a bridge between the two systems. These findings extend prior work on monkey PITd and provide new insights into the integration of endogenous and exogenous attention.

      Strengths

      (1) The study is innovative in its use of specially designed spatial attention tasks to localize and validate hPIT, and in exploring the region's role in integrating both endogenous and exogenous attention, as prior works focus primarily on its involvement in endogenous attention.

      (2) The authors provided very comprehensive experiment designs with clear figures and detailed descriptions.

      (3) A broad range of analyses was conducted to support the hypothesis that hPIT functions as an attentional priority map -- including experiments of attentional modulation under both top-down and bottom-up conditions, sensitivity to stimulus features and task load, and resting-state functional connectivity. These analyses showed consistent results.

      (4) Multiple appropriate statistical analyses - including t-tests, ANOVAs, and post-hoc tests - were conducted, and the results are clearly reported.

      Thank you for a nice summary of the key points and strengths of our study.

      Weaknesses

      (1) The sample size is relatively small (n = 15), and inter-subject variability is big in Figures 5 and 6, as seen in the spread of individual data points and error bars. The analysis of attention-modulated voxel map intersections appears to be influenced by multiple outliers.

      We agree that the sample size (n = 15) is not ideal, and we acknowledge that some data points in Figures 5 and 6 appear to be potential outliers. However, according to conventional outlier detection criteria, all data points fell within three standard deviations of the group mean and were therefore retained for analysis.

      Moreover, the attention-modulated voxel intersection map shown in Figure 4C is insensitive to outliers, because the intersection plotted is based on the number of subjects

      (2) The authors acknowledge important limitations, including the lack of exploration of feature-based attention and the temporal constraints inherent to fMRI.

      Yes, we have mentioned these limitations in the discussion.

      (3) Prior research has established that regions such as the prefrontal cortex (PFC) and posterior parietal cortex (PPC) are involved in both endogenous and exogenous attention and have been proposed as attentional priority maps. It remains unclear what is uniquely contributed by hPIT, how it functionally interacts with these classical attentional hubs, and whether its role is complementary or redundant. The study would benefit from more direct comparisons with these regions.

      In this study, we define the ROI base on intersection across three different types of spatial attention tasks, which is a stricter criterion. And the results didn’t reveal spatial attentional modulation across tasks besides PITd. This could be due to the lack of lateralized responses in PFC/PPC. To evaluate whether a region qualifies as a priority map, we applied four widely accepted criteria (as mentioned in introduction). While dorsal and ventral attention network (DAN and VAN) regions can be considered supportive components of the priority map system, our findings suggest that among the regions tested, only hPIT fully meets all criteria. In Experiment 2, we included regions such as VFC (as part of PFC) and IPS (as part of PPC), and our findings suggest these areas are more involved in top-down attention. In the revision, we have performed additional analysis on PPC (IPS) and PFC (FEF, VFC), shown in Figure S2.

      (4) The functional connectivity analysis is only performed on resting-state data, and this approach does not capture context-dependent interactions. Task-based data analysis can provide stronger evidence.

      We acknowledge that resting-state FC is limited in assessing task-specific communication. To further investigate the role of hPIT, we have conducted PPI analysis, which revealed task-modulated right hPIT connectivity in attention allocation (line 378-383).

      (5) The study does not report whether attentional modulation in hPIT is consistent across the two hemispheres. A comparison of hemispheric effects could provide important insight into lateralization and inter-individual variability, especially given the bilateral localization of hPIT.

      We thank the reviewer for this suggestion. hPIT was localized bilaterally using the same intersection-based method in Experiment 1. We have now performed additional analysis and found hemispheric differences in hPIT attentional modulation (Experiment 2). Besides, we also found in Experiment 3, the difference of load modulation (averaged across stimulus categories) in left and right hPIT was not significant. These results have been reported in the results part of manuscript (line 347-351).

    1. Author response:

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about the interpretation of group differences across neutral and negative conditions limit the interpretability of the results.

      We are grateful for this improved assessment. Below, we provide detailed responses that we believe address the noted concerns about interpreting group differences across conditions. If these clarifications resolve the interpretability concerns, we would be grateful if the editors would consider updating the eLife assessment accordingly.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the Drift and Diffusion Model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options in individuals with bulimia nervosa (BN) and healthy participants

      (2)The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has potential to improve the understanding of pathological food choices.

      Weaknesses:

      I thank the author for reviewing their manuscript.

      However, I still have major concerns.

      The authors say that they removed any causal claims in their revised version of the manuscript. The sentence before the last one of the abstract still says "bias for high-fat foods predicted more frequent subjective binge episodes over three months". This is a causal claim that I already highlighted in my previous review, specifically for that sentence (see my second sentence of my major point 2 of my previous review).

      We appreciate the Reviewer's continued attention to causal language. We acknowledge that our use of the term 'predicted', though intended to refer to statistical prediction in a regression model, could be misinterpreted as implying causation. We have therefore revised this sentence to read: 'bias for high-fat foods was associated with more frequent subjective binge episodes over three months’.

      I also noticed that a comment that I added was not sent to the authors. In this comment I was highlighting that in Figure 2 of Galibri et al., I was uncertain about a difference between neutral and negative inductions of the average negative rating after the induction in the BN group (i.e. comparing the negative rating after negative induction in BN to the negative rating after neutral induction in BN). Figure 2 of Galibri et al. looks to me that:

      (1) The BN participants were more negative before the induction when they came to the neutral session than when they came to the negative session.

      (2) The BN participants looked almost negatively similar (taking into account the error bars reported) after the induction in both sessions

      These observations are of high importance because they may support the fact that BN patients were likely in a similar negative state to run the food decision task in both conditions (negative and neutral). Therefore, the lack of difference in food choices in BN patients is unsurprising and nothing could be concluded from the DDM analyses. Moreover, the strong negative ratings of BN patients in the neutral condition as compared to healthy participants together with almost similar negative ratings after the two inductions contradict the authors' last sentence of their abstract.

      I appreciate that the authors reproduced an analysis of their initial paper regarding the negative ratings (i.e. Table S1). It partly answers my aforementioned point but does not address the fact that BN may have been in a similar negative state in both conditions (neutral and negative) when running the food decision task: if BN patients were similarly negative after both induction (neutral and negative), nothing can be concluded from their differences in their results obtained from the DDM. As the authors put it, "not all loss-ofcontrol eating occurs in the context of negative state", I add that far from all negative states lead to a loss-of-control eating in BN patients. This grounds all my aforementioned remarks and my remarks of my first review.

      A solution for that is to run a paired t-test in BN patients only comparing the score after the induction in the two conditions (neutral and negative) reported in Figure 2 of their initial article.

      We appreciate the reviewer’s concern. We understand how the visual representation in Figure 2, which displays between-subject error bars, might suggest similar post-induction affect levels. However, the within-subject paired comparison (which appropriately accounts for individual differences in baseline affect) reveals a significant difference, which we detail below.

      While BN participants did report higher baseline negative affect than the HC group prior to the mood inductions, this does not negate the effectiveness of the manipulation. The critical comparison is the within-subject change from pre- to post-induction (detailed below) which shows that negative affect was significantly higher after the negative induction than the neutral induction.

      As we reported in the Supplementary Information (Table S1), our initial analyses of self-reported affect ratings used a linear mixed-effects model with group (HC = 0, BN = 1), condition (Neutral = 0, Negative = 1), and time (pre-induction = 0, post-induction = 1) as fixed effects, including all interactions, and random intercepts for participants. This approach accounts for individual differences in baseline affect.

      However, to address the reviewer's concerns, we conducted two simple effects analyses using estimated marginal means. As the reviewer suggested, we directly compared post-induction affect between conditions within the BN group (described in the second analysis below). In the first analysis, we examined the diagnosis × time interaction within each condition separately. In the Negative condition, individuals with BN demonstrated a substantial increase in negative affect from pre- to post-induction (mean difference = 20.36, t = 4.84, p < 0.0001, Cohen’s d = 0.97). In the second analysis, we examined the condition × time interaction within each group separately. Among the BN group, we found that reported affect was significantly higher following the negative mood induction than after the neutral affect induction (mean difference = -17.40, t = -4.13, p = 0.0003, Cohen’s d = 0.83). This difference in post-induction negative affect between conditions within the BN group represents a meaningful and statistically robust difference in affective states. These within-group effects confirm that the negative mood induction was (1) effective in the BN group and (2) produced significantly greater negative affect than the neutral mood induction.

      These findings confirm that participants completed the food decision task under meaningfully different affective states, supporting the interpretability of the subsequent DDM analyses. We now report these analyses in the Supplementary Information.

      I appreciate the analysis that the authors added with the restrictive subscale of the EDE-Q.

      That this analysis does not show any association with the parameters of interest does not show that there is a difference in the link between self reported restrictions and self reported binges. Only such a difference would allow us to claim that the results the authors report may be related to binges.

      We thank the reviewer for raising this important point about specificity. To address this concern, we examined the correlation between self-reported binge frequency (both subjective binge episodes and objective binge episodes over the past three months) and EDE-Q Restraint subscale in our BN sample.

      The correlation between these measures were modest and non-significant (subjective binge frequency: Spearman’s p = 0.21, p = 0.306; objective binge frequency: Spearman’s p = 0.05, p = 0.806), indicating that both binge frequency measures and dietary restraint were relatively independent dimensions of eating pathology in our sample. This dissociation supports the specificity of our findings: the fact that our DDM parameters were associated with binge frequency but not with dietary restraint suggests that the affect-induced changes in decisionmaking we observed are specifically related to binge-eating behavior rather than reflecting a correlate of dietary restraint. We now report this analysis in the Supplementary Information.

      I appreciate the wording of the answer of the authors to my third point: "the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms". This sentence is crystal clear and sums very well the limits of the associations the authors report with binge eating frequency. However, I do not see this sentence in the manuscript. I think the manuscript would benefit substantially from adding it.

      We thank the reviewer for the suggestion. We have added the following sentences that convey this information to the end of the third paragraph of the discussion:

      “These results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic. However, our correlational design does not allow us to determine whether this reactivity causes the symptoms.”

      Statistical analyses:

      If I understood well the mixed models performed, analyses of supplementary tables S1 and S27 to S32 are considering all measures as independent which means that the considered score of each condition (neutral vs negative) and each time (before vs after induction) which have been rated by the same participants are independent. Such type of analyses does not take into account the potential correlation between the 4 scores of a given participant. As a consequence, results may lead to false positives that a linear mixed model does not address. The appropriate analysis would be to run adapted statistical tests pairing the data without running any mixed model.

      We appreciate the reviewer's attention to the statistical approach. However, we respectfully note that mixed-effects models do account for within-subject correlations, contrary to the reviewer’s interpretation.

      The linear mixed-effects model we employed explicitly accounts for the correlation among repeated measures from the same participant through the random intercept term. This random effect structure models the non-independence of observations within participants, allowing for correlated errors within individuals while assuming independence between individuals. This is a standard and appropriate approach for analyzing repeated-measures data (Bates et al., 2015).

      The mixed-effects model is, in fact, more appropriate than separate paired t-tests for our design because it:

      (1) Simultaneously models all fixed effects (group, condition, time) and their interactions in a single unified framework;

      (2) Properly partitions variance into within-subject and between-subject components;

      (3) Provides greater statistical power and more precise estimates by using all available data simultaneously; and

      (4) Allows for direct testing of three-way interactions that cannot be assessed through pairwise comparisons alone.

      Paired tests (e.g., t-tests), as the reviewer suggests, would require multiple separate analyses and would not allow us to test our primary hypotheses about group × condition × time interactions. The mixed-effects approach provides a more comprehensive and statistically rigorous analysis of our repeated-measures design. To clarify this even further in the manuscript, we have added the following in our methods when describing our model, “participant-level random intercepts were included to account for within-subject correlations across repeated measurements.”

      Notes:

      It is not because specific methods like correlating self reported measures over long periods with almost instantaneous behaviors (like tasks) have been used extensively in studies that these methods are adapted to answer a given scientific question. Measures aggregated over long periods miss the variations in instantaneous behaviors over these periods.

      We acknowledge the reviewer’s concern about the temporal mismatch between our session-level task measures and the 3-month aggregated symptom reports. This is a valid limitation of crosssectional designs, and we agree that examining how task performance fluctuates in relation to real-time symptom variation would provide richer insights into the potential dynamics of these relationships.

      We agree that we cannot capture how daily changes in task performance relate to momentary symptom occurrence. In response to previous rounds of helpful reviews, we added this limitation to the Discussion section, noting that future research employing ecological momentary assessment (EMA) or daily diary methods could examine whether the decision-making processes we identified also fluctuate in relation to real-time symptom occurrence.

      We note that our finding that affect-induced changes in decision-making parameters were associated with subjective binge frequency suggests that this laboratory-measured reactivity may reflect a stable individual difference that manifests across contexts and time periods. While our current study provides initial evidence that individual differences in affect-related decisionmaking are associated with symptom severity, we acknowledge that longitudinal designs with repeated assessments would strengthen causal and temporal inferences.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well-understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decisionmaking processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant, and the methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      Sample size was relatively small, and participants were all women with BN, which limits generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. These limitations are adequately noted in the discussion.

      We are grateful to Reviewer #2 for their careful and supportive review of our manuscript. We appreciate their recognition that computational modeling can reveal nuanced alterations in decision-making processes that may not be apparent in overt behavioral choices. Their balanced assessment of both the strengths and limitations of our work has been helpful in contextualizing our findings appropriately. We have carefully considered their comments regarding sample size and the potential limitations of our mood induction procedure, both of which we discuss in detail in the manuscript's limitations section.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach-the diffusion decision model-to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding-that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness-offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We are grateful to Reviewer #3 for their thoughtful evaluation of our work. We appreciate their recognition that the diffusion decision model provides a novel analytical lens for understanding how negative affect influences the dynamics of food-related decision-making in bulimia nervosa. Their balanced assessment of both the methodological strengths of our design (counterbalancing, rigorous statistical corrections) and its limitations (sample size, mood induction efficacy) has been valuable in ensuring we appropriately contextualize our findings and their implications. Specifically, we have taken their comments regarding sample size and the relative efficacy of different mood induction methods seriously, and we address these important methodological considerations in our discussion of the study's limitations.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The authors have addressed my previous comments, and I do not have any additional suggestions for improvement.

      We thank the reviewer for their time, effort, and insightful feedback.

      Reviewer #3 (Recommendations for the authors):

      The authors have adequately addressed my feedback. I have no further comments.

      We thank the reviewer for their time, effort, and insightful feedback.

    1. Author response:

      eLife Assessment

      Hoverflies are known for their sexually dimorphic visual systems and exquisite flight behaviors. This valuable study reports how two types of visual descending neurons differ between males and females in their motion- and speed-dependent responses, yet surprisingly, the behavior they control lacks any sexual dimorphism. The results convincingly support these findings, which will be of interest for studies of visuomotor transformations and network-level brain organization.

      This statement perfectly recapitulates our findings.

      Public Reviews:

      Reviewer #1 (Public review):  

      Summary: 

      Hoverflies are known for a striking sexual dimorphism in eye morphology and early visual system physiology. Surprisingly, the male and female flight behaviors show only subtle differences. Nicholas et al. investigate the sensori-motor transformation of sexually dimorphic visual information to flight steering commands via descending neurons. The authors combined intra- and extracellular recordings, neuroanatomy, and behavioral analysis. They convincingly demonstrate that descending neurons show sexual dimorphisms - in particular at high optic flow velocities - while wing steering responses seem relatively monomorphic. The study highlights a very interesting discrepancy between neuronal and behavioral response properties.

      Thank you for this summary. Most of the statement perfectly recapitulates the main findings of our paper. However, we want to emphasize that some hoverfly flight behaviors are strongly sexually dimorphic, especially those related to courtship and mating. Indeed, only male hoverflies pursue targets at high speed, chase away territorial intruders, and pursue females for mating. However, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not sexually dimorphic. We will amend the Introduction to make the difference between flight behaviors clear.

      More specifically, the authors focused on two types of descending neurons that receive inputs from well-characterized wide-field sensitive tangential cells: OFS DN1, which receives inputs from so-called HS cells, and OFS DN2, which receives input from a set of VS cells. Their likely counterparts in Drosophila connect to the neck, wing, and haltere neuropils. The authors characterized the visual response properties of these two neuronal classes in both male and female hoverflies and identified several interesting differences. They then presented the same set of stimuli, tracked wing beat amplitude, and analyzed the sum and the difference of right and left wing beat amplitude as a readout of lift or thrust, and yaw turning, respectively. Behavioral responses showed little to no sexual dimorphism, despite the observed neuronal differences.

      Thank you for this very nice summary of our work. We want to clarify that LPTC input to DN1 and DN2 has not been shown directly in hoverflies using e.g. dye coupling, or dual recordings. Instead, the presumed HS and VS input is inferred from morphological and physiological DN evidence, and comparisons to similar data in Drosophila and blowflies. We will amend the Introduction to clarify this. The rest of the paragraph perfectly recapitulates the main findings of our paper.

      Strengths:

      I find the question very interesting and the results both convincing and intriguing. A fundamental goal in neuroscience is to link neuronal responses and behavior. The current study highlights that the transformations - even at the level of descending neurons to motoneurons - are complex and less straightforward than one might expect.

      Thank you.

      Weaknesses:

      The authors investigated two types of descending neurons, but it was not clear to me how many other descending neurons are thought to be involved in wing steering responses to wide-field motion. I would suggest providing a more in-depth overview of what is known about hoverflies and Drosophila, since the conclusions drawn from the study would be different if these two types were the only descending neurons involved, as opposed to representing a subset of the neurons conveying visual information to the wing neuropil.

      This is a great point. There are around 1000 fly DNs, of which many could respond to widefield motion, without being specifically tuned to widefield motion. For example, many looming sensitive neurons also respond to widefield motion, and could therefore be involved in the WBA movements that we measured here. In addition, there are many multimodal neurons that could be involved in optomotor responses in free flight, but these may not have been stimulated when we only provided visual input. Furthermore, many visual neurons are modulated by proprioceptive feedback, which is lacking in immobilized physiology preps. Finally, in blowflies, up to 5 optic flow sensitive DNs have been identified morphologically, and in Drosophila 3 have been identified morphologically and physiologically. In summary, it is more than likely that other neurons project visual widefield motion information to the wing neuropil. We will amend our Introduction and Discussion to make this important point clear to the readers.

      Both neuronal classes have counterparts in Drosophila that also innervate neck motor regions. The authors filled the hoverfly DNs in intracellular recordings to characterize their arborization in the ventral nerve cord. In my opinion, these anatomical data could be further exploited and discussed a bit more: is the innervation in hoverflies also consistent with connecting to the neck and haltere motor regions? Are there any obvious differences and similarities to the Drosophila neurons mentioned by the authors? If the arborization also supports a role in neck movements, the authors could discuss whether they would expect any sexual dimorphism in head movements.

      These are all great points. We did not see any clear arborizations to the frontal nerve, where we would expect to find the neck motor neurons (NMNs). In addition, while we did see fine arborizations throughout the length of the thoracic ganglion, we saw no strong outputs projecting directly to the haltere nerve (HN). In the revised version of the MS we will modify figure 4 (morphological characterization) to clarify.

      There are important differences between the morphology of DN1 and DN2 in hoverflies and DNHS1 and DNOVS2 in Drosophila, in terms of their projections in the thoracic ganglion. For example, In Drosophila DNOVS2, there are several fine branches along the length of the neuron in the thoracic ganglia. Similarly, we found fine branches in Eristalis tenax DN2, however, in addition, we found a wide branch projecting to the area of the thoracic ganglion where the prothoracic and pterothoracic nerves likely get their inputs (Figure 4), suggesting that the neuron could contribute to controlling the wings and/or the forelegs (which is why we quantified the WBA). In Drosophila DNHS1, there is a similar fat branch to the prothoracic and pterothoracic nerves, which we also found in Eristalis tenax OFS DN1 (Figure 4). Indeed, while Drosophila DNHS1 and DNOVS2 have quite strikingly different morphology, DN1 and DN2 in Eristalis looked quite similar. We will modify the Results section to make this clear.

      In addition, to investigate this further, in the revised version of the MS we will include analysis of the movement of different body parts (including the head) to investigate the presence of any potential sexual dimorphism. Unfortunately, however, this will not include the halteres, as they cannot be seen well in the videos.

      Reviewer #2 (Public review):

      Summary:

      Many fly species exhibit male-specific visual behaviors during courtship, while little is known about the circuit underlying the dimorphic visuomotor transformations. Nicholas et al focus on two types of visual descending neurons (DNs) in hoverflies, a species in which only males exhibit high-speed pursuit of conspecifics. They combined electrophysiology and behavior analysis to identify these DNs and characterize their response to a variety of visual stimuli in both male and female flies. The results show that the neurons in both sexes have similar receptive fields but exhibit speed-dependent dimorphic responses to different optic flow stimuli.

      This statement perfectly recapitulates the main findings of our paper. However, as mentioned above, while hoverfly flight behaviors related to courtship and mating are strongly sexually dimorphic, other flight behaviours, such as those related to optomotor responses and flights between flowers when feeding, are not. We will amend the Introduction to make the difference between flight behaviors clear.

      Strengths:

      Hoverflies, though not a common model system, show very interesting dimorphic behaviors and provide a unique and valuable entry point to explore the brain organization behind sexual dimorphism. The findings here are not only interesting on their own right but will also likely inspire those working in other systems, particularly Drosophila.

      Thank you.

      The authors employed rigorous morphology, electrophysiology, and behavior methods to deliver a comprehensive characterization of the neurons in question. The precision of the measurements allowed for identifying a subtle and nuanced neuronal dimorphism and set a standard for future work in this area.

      Thank you.

      Weaknesses:

      Cell-typing using receptive field preferred directions (RFPDs): if I understood correctly, this classification method mostly relies on the LPDs near the center of the receptive field (median within the contour in Fig.1). I have two concerns here. First, this method is great if we are certain there are only two types of visual DNs as described in the manuscript. But how certain is this? Given the importance of vision in flight control, I would expect many DNs that transmit optic flow information to the motor center. I'd also like to point out that there are other lobula plate tangential cells (LPTCs) than HS and VS cells, which are much less studied and could potentially contribute to dimorphic behaviors.

      This is very true, and an important point. As mentioned above, in blowflies, up to 5 optic flow sensitive DNs have been identified morphologically, however, if these correspond to 5 different physiological types remain unclear. In both blowflies and Drosophila 3 have been identified morphologically and physiologically (DNHS1, DNOVS1, DNOVS2). Importantly, in both blowflies and fruitflies DNOVS1 gives graded responses, and no action potentials, meaning that we would not be able to record from it using extracellular electrophysiology.

      We previously used clustering techniques to show that in Eristalis, we can reliably distinguish two types of optic flow sensitive DNs from extracellular electrophysiological data, based on a range of receptive field parameters, and we think that these correspond to DNHS1 and DNOVS2 in Drosophila (Nicholas et al, J Comp Physiol A, 2020, cited in paper). As mentioned above in response to Reviewer 1, this does not mean that there are no other neurons that could respond to widefield optic flow, and which might be involved in the WBA we recorded in the paper. However, the point of this paper was not to conclusively show that there are only two optic flow sensitive descending neurons. The point was to say that there are two quite distinct optic flow sensitive neurons that have similar receptive fields in males and females, while the responses to widefield motion show differences between males and females.

      We will modify the Introduction and Discussion to make these important points clear to the Reader, including the discussion of the 45-60 LPTCs that exist in the lobula plate, and what their role might be.

      Second, this method feels somewhat impoverished given the richness of the data. The authors have nicely mapped out the directional tuning for almost the entire visual field. Instead of reducing this measurement to 2 values (center and direction), I was wondering if there is a better method to fully utilize the data at hand to get a better characterization of these DNs. As the authors are aware, local features alone can be ambiguous in characterizing optic flows. What's more, taking into account more global features can be useful for discovering potentially new cell types.

      This is a great point, and we did an extensive analysis of other receptive field properties in this study (shown in supp fig 1). In addition, and as mentioned above, we have published a clustering analysis across receptive field properties of these neurons (Nicholas et al, J Comp Physiol A, 2020, cited in paper). The point that we attempted to make in this paper was that by using two strikingly simple metrics, we can reliably distinguish which of the two neuron types we are recording from (if we accept that there are two main types that we are likely to record from) simply based on location and overall directional preference. This makes automated analysis very easy and straightforward. Indeed, we now use this routinely to ID what neuron we are recording from, rather than making a human-based assumption.

      However, we agree that further in depth analysis is warranted. Therefore, to address this, we will provide additional receptive field analysis and clustering in the revised version of the MS. In addition, we want to highlight that all data is uploaded to DataDryad for anyone interested in doing additional in-depth analyses.

      Line 131, it wasn't clear to me why full-screen stimuli were used for comparison here, instead of the full receptive field maps. Male flies exhibit sexual dimorphic behaviors only during courtship, which would suggest that small-sized visual stimuli (mimicking an intruder or female conspecific) would be better suited to elicit dimorphic neuronal responses. A similar comment applies to the later results as well. Based on the receptive field mapping in Figure 1, I'm under the impression that these 2 DN types are more suited to detect wide-field optic flows, those induced by self-motion as mentioned in the manuscript. The results are still very interesting, but it's good to make this point clear early on to help set appropriate expectations. Conversely, this would also suggest that there are other visual DN types that are responsible for the courtship-related sexually dimorphic behaviors.

      Thank you for mentioning these important points. Our reasoning for using full-screen stimuli for the analysis on line 131 was that since we used the small sinusoidal gratings for mapping the receptive fields, and to subsequently classify the neurons, it would be unfair to use the same data to investigate potential sexual dimorphism. I.e., we selected neurons that fulfilled certain criteria, and then we cannot rightfully use the same criteria to determine differences. This was not explicitly mentioned in the paper, so we will modify the text to make this clear to the Reader.

      However, in Supp Figure 1d/e we show that there are no striking receptive field differences between males and females in terms of receptive field center nor directional preference. In Supp Figure 1f we show that there is no difference between male and female receptive field height and width. We will modify the text to draw the Reader’s attention to this figure, and also mention the additional analysis done in response to the comment above.

      As a side note, I personally expected at least DNHS1 to have a smaller receptive field in males, as the hoverfly HSN is strikingly sexually dimorphic (Nordström et al, Curr Biol 2008), and also very sensitive to small objects. However, while optic flow sensitive DNs do respond to small objects (see e.g. the J Comp Physiol paper mentioned above) we did not detect any obvious sexual dimorphism in receptive field properties. Indeed, we think that a different subset of DNs control target pursuit behavior (target selective DNs (TSDNs)). This will be addressed in the modified version of the paper.

    1. Author response:

      [Note: The final version has been published in Brain, Behavior, and Immunity: https://doi.org/10.1016/j.bbi.2026.106473]

      eLife Assessment

      Rhis useful study raises interesting questions but provides inadequate evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The findings are intriguing but they are correlative and hypothesis-generating with the strong possibility of residual confounding.

      We thank the editors and reviewers for characterizing our work as useful and for the opportunity to publish a Reviewed Preprint with a corresponding response. However, the statements in the Assessment characterizing the evidence as ‘inadequate’ and asserting a ‘strong possibility of residual confounding’ are factually incorrect as applied to our data and incompatible with the empirical findings presented in the manuscript. We have notified the editors of this factual inaccuracy. As the Assessment will be published as originally written, we provide clarification here to ensure an accurate scientific record for readers of the Reviewed Preprint.

      Our study shows that the association between atovaquone–proguanil (A/P) exposure and reduced dementia risk, first identified in a rigorously matched national cohort in Israel, is robustly reproduced across three independently constructed age-stratified cohorts in the U.S. TriNetX network (with exposure at ages 50–59, 60–69, and 70–79). In each cohort, individuals exposed to A/P were compared with rigorously matched individuals who received another medication at the same age and were then followed over a decade for incident dementia. Cases and controls were matched on all major established dementia risk factors: age, sex, race/ethnicity, diabetes, hypertension, obesity, and smoking status.

      Across all three strata, each containing more than 10,000 exposed individuals with an equal number of matched controls, we observed substantial and consistent reductions in cumulative dementia incidence (HR 0.34–0.51), extremely low P-values (10<sup>–16</sup> to 10<sup>–40</sup>), and continuously widening divergence of Kaplan–Meier curves over the follow-up period. To more rigorously exclude the possibility of unmeasured baseline differences in health status, we additionally performed, for the purpose of this response, comparative analyses of key indicators of frailty and clinical utilization, including emergency and inpatient encounters, as well as the prevalence of mild cognitive impairment prior to medication exposure (values provided below in response to Reviewer #2, Weakness 1). These analyses provide clear evidence showing no pattern suggestive of exposed individuals being medically or cognitively healthier at baseline.

      Taken together, these findings constitute a rigorously matched and independently replicated association across two national health systems, using TriNetX, the most widely cited real-world evidence platform in published cohort studies. Replication across three age strata, each with >10,000 exposed individuals, followed for a decade, and matched on all major known risk factors for dementia, meets the accepted epidemiologic definition of strong and reproducible evidence.

      Although we disagree with elements of the editorial Assessment that appear inconsistent with the empirical findings, we will proceed with publication of the current manuscript as a Reviewed Preprint in order to ensure timely dissemination of findings with meaningful implications for public health and dementia prevention. In this initial public version, the point-by-point responses below provide concise explanations addressing the critiques underlying the Assessment. A revised manuscript, incorporating expanded baseline comparisons across each TriNetX age stratum, additional stringent exclusions, and an expanded discussion that will address the remarks presented in this review, will be submitted shortly.

      Reviewer #1 (Public review):

      Summary:

      This useful study provides incomplete evidence of an association between atovaquone-proguanil use (as well as toxoplasmosis seropositivity) and reduced Alzheimer's dementia risk. The study reinforces findings that VZ vaccine lowers AD risk and suggests that this vaccine may be an effect modifier of A-P's protective effect. Strengths of the study include two extremely large cohorts, including a massive validation cohort in the US. Statistical analyses are sound, and the effect sizes are significant and meaningful. The CI curves are certainly impressive.

      Weaknesses include the inability to control for potentially important confounding variables. In my view, the findings are intriguing but remain correlative / hypothesis generating rather than causative. Significant mechanistic work needs to be done to link interventions which limit the impact of Toxoplasmosis and VZV reactivation on AD.

      We thank the reviewer for describing our study as useful and for highlighting several of its strengths, including the very large cohorts, sound statistical analyses, meaningful effect sizes, and the impressive CI curves. We also appreciate the reviewer’s recognition that our findings reinforce prior evidence linking VZV vaccination to reduced AD risk.

      Regarding the statement that the evidence remains incomplete due to “inability to control for potentially important confounding variables,” we refer to our introductory explanation above. As noted there, our analyses meet the accepted criteria for reproducible epidemiological evidence, and the assumption of uncontrolled confounding is contradicted by rigorous matching and by additional baseline evaluations. We fully agree that mechanistic work is warranted, and our epidemiologic findings strongly motivate such efforts.

      We address the reviewer’s specific comments in detail below.

      (1) Most of the individuals in the study received A-P for malaria prophylaxis as it is not first line for Toxo treatment. Many (probably most) of these individuals were likely to be Toxo negative (~15% seropositive in the US), thereby eliminating a potential benefit of the drug in most people in the cohort. Finally, A-P is not a first line treatment for Toxo because of lower efficacy.

      We agree that individuals in our cohort received Atovaquone-Proguanil (A-P) for malaria prophylaxis rather than for treatment of toxoplasmosis. However, this does not contradict our interpretation. Because latent CNS colonization by T. gondii is not currently considered clinically actionable, asymptomatic carriers are not offered treatment, and therefore would only receive an anti-Toxoplasma regimen unintentionally, through a medication prescribed for another indication such as malaria prophylaxis. Importantly, atovaquone is an established therapy for toxoplasmosis, including CNS disease, with documented efficacy and CNS penetration in current treatment guidelines. It is therefore reasonable to assume that, during the multi-week course typically administered for malaria prophylaxis, A-P would exert significant anti-Toxoplasma activity in individuals with latent CNS infection, potentially reducing or eliminating parasite burden even though the medication was not prescribed for that purpose.

      The reviewer notes that only ~15% of individuals in the U.S. are Toxoplasma-seropositive, based on surveys performed primarily in young adults of reproductive age (serologic testing is most commonly obtained in women during prenatal care). However, seropositivity increases cumulatively over the lifespan, and few reliable estimates exist for the age groups in which Alzheimer’s disease and dementia occur. Even if we accept the lower estimate of ~15% latent colonization in older adults, this proportion is still smaller than the lifetime cumulative incidence of dementia in the general population.

      Therefore, if latent toxoplasmosis contributes causally to dementia risk, and A-P is capable of eliminating latent Toxoplasma in the subset of individuals who harbor it, then a multi-week course of treatment—such as the one routinely taken for malaria prophylaxis—would be expected to produce a substantial reduction in dementia incidence at the population level, of the same order of magnitude reported here. A protective effect concentrated in a minority of exposed individuals is fully compatible with, and can mechanistically explain, the large overall reduction in risk that we observe.

      Finally, the reviewer notes that A-P is not a first-line treatment for toxoplasmosis due to assumed lower efficacy. This point does not undermine our results. Even a second-line agent, when administered over several weeks—as is routinely done for malaria prophylaxis—is expected to exert substantial anti-Toxoplasma activity. The long duration of exposure in large populations receiving A-P for travel provides a unique natural experiment that does not exist for other anti-Toxoplasma medications, which, when prescribed for their non-Toxoplasma indications, are not taken more than a few days. Thus, the widespread use of A-P for malaria prophylaxis allows a unique opportunity to evaluate long-term outcomes following inadvertent anti-Toxoplasma treatment.

      Moreover, “first line” recommendations in clinical guidelines refer to treatment of acute toxoplasmosis in immunosuppressed individuals, where tachyzoites are actively replicating. These guidelines do not consider efficacy against latent CNS colonization, which is dominated by bradyzoites, a biologically distinct form, in immunocompetent individuals. Therefore, the guideline hierarchy is not informative regarding which medication is more effective at clearing latent brain infection, the stage we consider most relevant to dementia risk.

      (2) A-P exposure may be a marker of subtle demographic features not captured in the dataset such as wealth allowing for global travel and/or genetic predisposition to AD. This raises my suspicion of correlative rather than casual relationships between A-P exposure and AD reduction. The size of the cohort does not eliminate this issue, but rather narrows confidence intervals around potentially misleading odds ratios which have not been adjusted for the multitude of other variables driving incident AD.

      We agree that prior to matching, A-P exposure may be associated with demographic features such as health or to travel internationally. However, this does not apply after matching. In all age-stratified analyses, exposed and control individuals were rigorously matched on all major risk factors known to influence dementia risk, including age, sex, race/ethnicity, smoking status, hypertension, diabetes, and obesity. Owing to the extremely large pool of individuals in TriNetX (~120M), our matching was performed stringently, producing exposed and unexposed cohorts that are near-identical with respect to the established determinants of dementia risk.

      The reviewer correctly identifies that large cohorts alone do not eliminate confounding; however, confounding must still be biologically and epidemiologically plausible. Any hypothetical confounder capable of producing a 50–70% reduction in dementia incidence over a decade would need to: (1) produce a very large protective effect against dementia; (2) be strongly associated with A-P exposure; and (3) remain entirely uncorrelated with age, sex, race/ethnicity, smoking, diabetes, hypertension and obesity, which have been rigorously matched. No such factor has been proposed. The suggestion that an unspecified ‘subtle demographic feature’ could produce effects of this magnitude remains hypothetical, and no such factor has been described in the dementia risk literature.

      If a specific evidence-supported confounder is proposed that meets these criteria, we would be pleased to test it empirically in our cohorts. In the absence of such a proposal, the interpretation that the association is merely “correlative rather than causal” remains speculative and does not negate the strength of a replicated, rigorously matched, long-term association across large cohorts in two national health systems.

      (3) The relationship between herpes virus reactivation and Toxo reactivation seems speculative.

      We respectfully disagree with the characterization of the herpesvirus–Toxoplasma interaction as speculative. The mechanism we describe is biologically valid, based on established virology and parasitology literature showing that latent T. gondii infection can reactivate from its bradyzoite state under inflammatory or immune-modifying conditions, including viral triggers. A published clinical report has documented CNS co-reactivation of T. gondii and a herpesvirus, explicitly noting that HHV-6 reactivation can promote Toxoplasma reactivation in neural tissue (Chaupis et al., Int J Infect Dis, 2016).

      Moreover, this mechanism is the only currently evidence-supported explanation that simultaneously and parsimoniously accounts for all of the epidemiologic observations in our study:

      (1) Substantially higher cumulative incidence of dementia in individuals with positive Toxoplasma serology, indicating that latent infection is a risk factor for subsequent cognitive decline;

      (2) Strong protective association following A-P exposure, a medication with established activity against Toxoplasma gondii, including in the CNS;

      (3) Independent protection conferred by VZV vaccination, observed consistently for two vaccines with distinct formulations (one live attenuated, one recombinant protein), whose only shared property is suppression of VZV reactivation;

      (4) Greater protective effect of A-P among individuals who were not vaccinated against VZV, consistent with a model in which dementia risk requires both herpesvirus reactivation and persistent latent Toxoplasma infection—such that reducing either factor alone (via VZV vaccination or anti-Toxoplasma suppression) substantially lowers risk.

      Taken together, these observations are difficult to reconcile under any alternative hypothesis.  

      To date, we are unaware of any other biologically coherent mechanism that can explain all four findings simultaneously. We would welcome any alternative explanation capable of accounting for these converging epidemiologic signals, as such a proposal could meaningfully advance the scientific discussion. In the absence of a competing explanation, the interaction between latent toxoplasmosis and herpesvirus reactivation remains the most parsimonious hypothesis supported by current knowledge.

      Finally, while observational studies are inherently limited in their ability to provide causal inference, the mechanism we propose is biologically grounded and experimentally testable. Our results provide a strong rationale for mechanistic studies and clinical trials, and warrant publication precisely because they generate a verifiable hypothesis that can now be evaluated directly.

      (4) A direct effect on A-P on AD lesions independent on infection is not considered as a hypothesis. Given the limitations above and effects on metabolic pathways, it probably should be. The Toxo hypothesis would be more convincing if the authors could demonstrate an enhanced effect of the drug in Toxo positive individuals without no effect in Toxo negative individuals.

      A direct effect of A-P on AD established lesions is indeed possible, and this hypothesis would be of significant therapeutic interest. However, we did not consider it within the scope of our epidemiologic analyses because all cohorts explicitly excluded individuals with existing dementia. Under these conditions, proposing a disease-modifying effect on established Alzheimer’s lesions based on our data would itself be speculative. Evaluating such a mechanism would be better answered by mechanistic or interventional studies rather than inference from populations without baseline disease.

      We also agree that demonstrating a stronger protective effect among Toxoplasma-positive individuals would be informative. Unfortunately, this “natural experiment” cannot be performed using the available data: Toxoplasma serology is rarely ordered in older adults, and A-P exposure is itself uncommon, resulting in a cohort overlap far too small to yield valid statistical inference (n≈25 in TriNetX).

      Thus, while both proposed hypotheses are scientifically attractive and merit further study, neither can be resolved using currently available real-world clinical data. Our findings provide the rationale to investigate both hypotheses experimentally, and we hope our report will motivate such studies.

      Reviewer #2 (Public review):

      Summary:

      This manuscript examines the association between atovaquone/proguanil use, zoster vaccination, toxoplasmosis serostatus and Alzheimer's Disease, using 2 databases of claims data. The manuscript is well written and concise. The major concerns about the manuscript center around the indications of atovaquone/proguanil use, which would not typically be active against toxoplasmosis at doses given, and the lack of control for potential confounders in the analysis.

      Strengths:

      (1) Use of 2 databases of claims data.

      (2) Unbiased review of medications associated with AD, which identified zoster vaccination associated with decreased risk of AD, replicating findings from other studies.

      We thank the reviewer for the thoughtful assessment and for noting key strengths of our work, including (1) the use of two large national databases, and (2) the unbiased discovery approach that replicated the widely reported association between zoster vaccination and reduced Alzheimer’s disease (AD) risk. We agree that these features highlight the validity and reproducibility of the analytic framework.

      Below we respond to the reviewer’s perceived weaknesses.

      Weaknesses:

      (1) Given that atovaquone/proguanil is likely to be given to a healthy population who is able to travel, concern that there are unmeasured confounders driving the association.

      We agree that, prior to matching, A-P exposure may correlate with demographic or health-related differences (e.g., ability to travel). However, this potential bias was explicitly controlled for in the study design. Across all three age-stratified TriNetX cohorts, exposed and unexposed individuals were rigorously matched on all major established dementia risk factors: age, sex, race/ethnicity, smoking status, obesity, diabetes mellitus, and hypertension. Comparative analyses confirm that these risk factors are equivalently distributed at baseline.

      As noted in our response to Reviewer #1, for any hypothetical unmeasured confounder to explain the results, it would need to satisfy three conditions simultaneously:

      (1) Be capable of producing a 50–70% reduction in dementia incidence sustained over a decade and across three distinct age strata (ages 50–79);

      (2) Be strongly associated with likelihood of receiving A-P;

      (3) Remain entirely uncorrelated with age, sex, race/ethnicity, smoking, diabetes, hypertension, or obesity, all of which were rigorously matched and balanced at baseline.

      No such factor has been proposed in the literature or by the reviewer. Thus, the concern remains hypothetical and unsupported by any measurable demographic or biological mechanism.

      Importantly, empirical evidence contradicts the notion of a “healthy traveler” bias:

      Emergency and inpatient encounter rates prior to exposure were comparable between A-P users and controls. Across the three age-stratified cohorts, emergency visits were similar or slightly higher among A-P users (EMER: 19.6% vs 16.4%, 19.9% vs 14.2%, 22.0% vs 14.8%), and inpatient encounters were effectively equivalent (IMP: 14.8% vs 15.2%, 17.7% vs 17.6%, 22.1% vs 22.2%). These patterns directly contradict the suggestion that A-P users were a healthier or less medically burdened population at baseline.

      Prevalence of mild cognitive impairment was not lower among A-P users and was, in fact, slightly higher in the oldest cohort. Across the three age groups, baseline diagnoses of mild cognitive impairment (MCI) were comparable or slightly higher among exposed individuals (0.1% vs 0.1%, 0.3% vs 0.2%, 1.1% vs 0.6%). These data contradict the suggestion that A-P users had superior baseline cognition.

      The strongest protective association occurred in the youngest stratum (age 50–59; HR 0.34). At this age, when nearly all individuals are sufficiently healthy to travel internationally, A-P uptake is the least likely to confound health status. A frailty-based “healthy traveler” hypothesis would instead predict the opposite pattern, with older adults showing the greatest apparent benefit, since health limitations are more likely to restrict travel in later life. In contrast, the protective association weakens with increasing age, empirically contradicting any explanation based on differential travel capacity.

      In conclusion, the empirical evidence directly contradicts the existence of a ‘healthy traveler’ effect.

      (2) The dose of atovaquone in atovaquone/proguanil is unlikely to be adequate suppression of toxo (much less for treatment/elimination of toxo), raising questions about the mechanism.

      A few important points should address the reviewer’s concern:

      In our cohorts, A-P was prescribed for malaria prophylaxis, as correctly noted. In this setting, it is taken for the entire duration of travel, plus several days before and after, typically resulting in many weeks of continuous exposure. This creates an unintentional but scientifically valuable natural experiment, in which a CNS-penetrating anti-Toxoplasma agent is administered for long durations.

      Atovaquone is an established treatment for CNS toxoplasmosis, has strong CNS penetration, and is included in current clinical guidelines for acute toxoplasmosis in immunocompromised patients, although at higher doses. Because latent, asymptomatic CNS colonization is not treated in clinical practice, there are currently no data establishing the dose required to eliminate bradyzoite-stage Toxoplasma in immunocompetent individuals.

      Our observations concern atovaquone–proguanil (A-P), a fixed-dose combination of atovaquone with proguanil, a DHFR inhibitor targeting a key metabolic pathway shared by malaria parasites and T. gondii. The combination has well-established synergistic effects in malaria prophylaxis and the same mechanism would be expected to enhance anti-Toxoplasma activity. This fixed-dose regimen has never been formally evaluated for toxoplasmosis treatment at prolonged durations or against latent bradyzoite infection.

      Our hypothesis does not require or imply complete eradication of Toxoplasma. A clinically meaningful reduction in latent cyst burden among the subset of colonized individuals may be sufficient to alter long-term disease trajectories. Thus, a population-level decrease in dementia incidence does not require universal clearance of infection, but only partial suppression or reduction of parasite load in susceptible individuals, which is entirely compatible with the known pharmacology and duration of A-P exposure.

      (3) Unmeasured bias in the small number of people who had toxoplasma serology in the TriNetX cohort.

      The relatively small number of older adults with Toxoplasma serology stems from current clinical practice: serologic testing is mostly performed in women during reproductive years due to risks in pregnancy, whereas in older adults a positive result has no clinical consequence and therefore testing is rarely ordered.

      Importantly, the seropositive and seronegative groups were drawn from the same underlying population of individuals who underwent serology testing, and the only difference between groups is the test result itself. Because the decision to order a test is made prior to and independent of the result, there is no plausible rationale by which the serology outcome (positive or negative) would introduce a bias favoring either group beyond the result of the test itself.

      Furthermore, the two groups were here also rigorously matched on all major dementia risk factors, including age, sex, race/ethnicity, smoking, diabetes, hypertension, and BMI, and these characteristics are similarly distributed between groups. A small sample size does not imply bias; it simply reduces statistical power. Despite this limitation, the observed association (HR = 2.43, p = 0.001) remains strongly significant.

      Finally, this result is consistent with multiple published studies reporting higher rates of Toxoplasma seropositivity among individuals with Alzheimer’s disease, dementia, and even mild cognitive impairment, such that our finding reinforces a broader and independently observed epidemiologic pattern. Importantly, in our cohort the serology testing clearly preceded dementia diagnosis, which supports the plausibility of a causal rather than merely correlative relationship between latent toxoplasmosis and cognitive decline.

      To conclude our provisional response, we thank the editor and reviewers for raising points that will be further addressed and expanded upon in the discussion of the forthcoming revision. We welcome transparent scientific dialogue and acknowledge that, as with all observational research, residual confounding cannot be eliminated with absolute certainty. However, we disagree with the overall Assessment and emphasize that our findings—reproduced independently across two national health systems and three age-stratified cohorts, each rigorously matched on all major determinants of dementia risk, meet, and in many respects exceed, current standards for high-quality observational evidence.

      Assigning the results to “residual confounding” requires more than speculation: it requires identification of a confounding factor that is (1) anchored in established dementia risk literature, (2) empirically plausible, and (3) quantitatively capable of generating a sustained ~50 percent reduction in dementia incidence over a decade. No such factor has been identified to date. We note that the assertion of “residual confounding” has not been supported by a specific, quantitatively plausible mechanism. A hypothetical bias that is both extremely large in effect and uncorrelated with all major risk factors is not statistically or biologically credible.

      The explanation we propose, reduction in dementia risk through elimination of latent Toxoplasma gondii, is biologically grounded, directly supported by independent epidemiologic literature, and uniquely capable of accounting for all convergent observations in our data. No alternative hypothesis has been put forward that can plausibly explain these findings.

      A revised version of the manuscript will be submitted shortly, incorporating expanded baseline analyses, with the strictest possible exclusion criteria (including congenital, vascular, chromosomal, and neurodegenerative disorders such as Parkinson’s disease), and complete tabulated comparisons. These data will further reinforce that the observed protective associations are not attributable to any measurable confounding. We also plan to enhance the discussion in order to address the points raised by the reviewers.

      In light of the expanded analyses, any reservations expressed in the initial Assessment can now be re-evaluated on the basis of the empirical evidence. The findings reported in our study meet, and in several respects exceed, current epidemiologic standards for high-quality observational research, clearly warrant publication, and provide a robust scientific foundation for future mechanistic and interventional studies to determine whether elimination of latent toxoplasmosis can prevent or treat dementia.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 ( Public review):

      The strength of the current study lies in their establishing the molecular mechanism through which PRMT1 could alter craniofacial development through regulation of the transcriptome, but the data presented to support the claim that a PRMT1-SFPQ axis directly regulates intron retention of the relevant gene networks should be robust and with multiple forms of clear validation. For example, elevated intron retention findings are based on the intron retention index, and according to the manuscript, are assessed considering the relative expression of exons and introns from a given transcript. However, delineating between intron retention and other forms of alternative splicing (i.e., cryptic splice site recognition) requires a more comprehensive consideration of the intron splicing defects that could be represented in data. A certain threshold of intron read coverage (i.e., the percent of an intron that is covered by mapped reads) is needed to ascertain if those that are proximal to exons could represent alternative introns ends rather than full intron retention events. In other words, intron retention is a type of alternative splicing that can be difficult to analyze in isolation given the confounding influence of cryptic splicing and cryptic exon inclusion. If other forms of alternative splicing were assessed and not detected, more confident retention calls can be made.

      This manuscript is a mechanistic exploration that follows previous work we published on the role of Prmt1 in craniofacial development, in which genetic deletion of Prmt1 in CNCCs leads to cleft palate and mandibular hypoplasia (PMID: 29986157).

      As the reviewer pointed out, a certain threshold of intron read coverage is needed to assess intron retention events. We employed IRTools to assess the collective changes of intron retention between cell-states associated with certain biological function or pathway. IRTools incorporated considerations for intron read coverage by checking the evenness of read distribution in an intron. Specifically, every constitutive intronic regions (CIR) is divided into 10 equally sized bins and the proportion of reads that map to each bin is calculated. CIRs are then ranked according to their imbalance in bin-wise reads distribution, represented by the proportion of reads in its most populated bin. Those among top 1% are considered to contain potentially false IR events and excluded. We further addressed this question by developing another measure of intron retention, intron retention coefficient (IRC), which assesses IR events using the junction reads (Supplemental Figure-S8). Junction reads that straddle two exons are called exon-exon junction reads (spliced reads), and those that straddle an exon and a neighboring intron are called exon-intron junction reads (retained reads). The IRC of an intron is defined as the fraction of junction reads that are exon-intron junction reads: IRC = exon-intron read-count / (exon-exon read-count + exon-intron read-count), where exon-intron read-count = (5’ exon-intron read-count + 3’ exon-intron read-count) / 2. The IRC of a gene is defined as the exon-intron fraction of all junction reads overlapping or over the constitutive introns of this gene. In the calculation of the IRC, only exon-intron junction reads that cover the junction point and overlap both of each side for at least 8 bps were counted, and only exon-exon junction reads that jump over the relevant junction points and overlap each of the respective exons for at least 8 bps were counted. In this process, evenness of the proportion of exon-intron junction reads that are 5’ or 3’ exon-intron junction reads are taken into account. As shown in the Supplemental Figure S7A and S7B, IRC analysis generated consistent results with those obtained from using IRI (Figure 3A and 3I).

      In addition, as the reviewer pointed out, intron retention can be difficult to analyze in isolation. We followed the reviewer’s suggestion that “If other forms of alternative splicing were assessed and not detected, more confident retention calls can be made“ and analyzed other forms of alternative splicing for all ECM and GAG genes with significant IRI increase (genes highlighted in Figure-3A and 3I) using rMATS (Supplemental Figure-S9). Among these genes, only 5 genes (Cthcr1, Mmp23, Adamts10, Ccdc80 and Col25a1) showed statistically significant changes in skipped exon, 1 gene (Bmp7) showed significant changes in mutually exclusive exons, and none showed significant changes in alternative 5’ or 3’ splicing. SE and MXE changes detected were marginal (Supplemental figure S8), while the majority of matrix genes with significant intron retention didn’t exhibit other forms of alternative splicing, further supporting the confidence of intron retention calls.

      While data presented to support the PRMT1-SFPQ activation axis is quite compelling, that this is directly responsible for the elevated intron retention remains enigmatic. First, in characterizing their PRMT1 knockout model, it is unclear whether the elevated intron retention events directly correspond to downregulated genes.

      In the revised manuscript, we demonstrate IR-triggered NMD as a mechanism for transcript decay and downregulation of matrix genes. When IR-triggered NMD was blocked by chemical inhibitor NMDI14, the intron-retaining transcripts showed significant accumulation (new Figure-4). NMD is the RNA surveillance system to degrade aberrant RNAs. Intron retention-triggered NMD in cancer has both promotive and suppressive roles and NMD inhibitors has been tested for cancer therapy including immunotherapy. During embryonic development, the functional significance of NMD machinery is suggested by human genetic findings and mouse genetic models. NMD is driven by a protein complex composed of SMG and UPF proteins. Smg6, Upf1, Upf2 and Upf3a knockout mouse die at early embryonic stages (E5.5-E9.5), and Smg1 gene trap mutant mice die at E12.5 (PMID: 29272451). SMG9 mutation in human patients causes malformation in the face, hand, heart and brain (PMID: 27018474).

      We show that in CNCCs NMD functions both as a physiological mechanism and invoked by molecular insult. Blocking NMD in CNCCs caused significant accumulation of intron-retaining Adamts2, Alpl, Eln, Matn2, Loxl1 and Bgn transcripts, suggesting a basal role for NMD to degrade intron-retaining transcripts (Figure-4Ba-4Bf). We further demonstrated the accumulation of Adamts2 and Fbln5 using semi-quantitative PCR with the detection of a longer product from Adamts2 intron 19 and Fbln5 intron 7 (Figure-4Ca-4Ch). In CNCCs and ST2 cells, NMD is further invoked by Prmt1 and Sfpq deficiency. In Prmt1 deficient CNCCs, NMD blockage led to higher accumulation of intron-retaining Adamts2 and Alpl transcripts, suggesting that Prmt1 deficiency triggers NMD to reduce intron-containing transcripts (Figure-4Aa, 4Ab). In Sfpq-depleted ST2 cells, blocking NMD caused accumulation of intron-retaining transcripts Col4a2, St6galnac3 and Ptk7 (Figure-9B, 9C).

      Moreover, intron splicing is a well-documented node for gene regulation during embryogenesis and in other proliferation models, and craniofacial defects are known to be associated with 'spliceosomopathies'. However, reproduction of this phenotype does not suggest that the targets of interest are inherently splicing factors, and a more robust assessment is needed to determine the exact nature of alternative splicing in this system. Because there are several known splicing factors downstream of PRMT1 and presented in the supplemental data, the specific attribution of retention to SFPQ would be additionally served by separating its splicing footprint from that of other factors that are primed to cause alternative splicing.

      We have previously shown that a group of splicing factors depends on Prmt1 for arginine methylation, including SFPQ (PMID: 31451547). We tested additional splicing factors that are highly expressed in CNCCs and depends on PRMT1 for arginine methylation: SRSF1, EWSR1, TAF15, TRA2B and G3BP1 (Figure-5, 6 and 10). Among these factors, EWSR1 and TRA2B are both methylated in CNCCs and depend on PRMT1 for methylation (Fig. 5 and Supplemental Figure-S3B, S3C). We weren’t able to assess TAF15 methylation because of lack of efficient antibody for the PLA assay. We also demonstrated that their protein expression or subcellular localization was not altered by Prmt1 deletion in CNCCs, unlike SFPQ (Supplemental Figure-S4). To define their splicing footprint, we performed siRNA-mediated knockdown in ST2 cells, followed by RNA-seq and IRI analysis to define differentially regulated genes and introns, which revealed distinct biological pathways regulated by SFPQ, EWSR1, TRA2B and TAF15, but minimal roles of EWSR1, TRA2B and TAF15 on intron retention when compared to SFPQ (Fig. 10F-10S, Supplemental Figure S7A-S7F, Supplemental Tables S4-S6). ECM genes are significantly downregulated by all four splicing factors (Fig. 10F-10I), but EWSR1, TRA2B and TAF15 function through IR-independent mechanisms, such as exon skipping, as exemplified by Postn (Fig. 10J-10S).

      Clarifying the relationship between SFPQ and splicing regulation is important given that the observed splicing defects are incongruous with published data presented by Takeuchi et al., (2018) regarding SFPQ control of neuronal apoptosis in mice. In this system, SFPQ was more specifically attributed to the regulation of transcription elongation over long introns and its knockout did not result in significant splicing changes. Thus, to establish the specificity for the SFPQ in regulating these retention events, authors would need to show that the same phenotype is not achieved by mis-regulation of other splicing factors. That the authors chose SFPQ based on its binding profile is understandable but potentially confounding given its mechanism of action in transcription of long introns (Takeuchi 2018). Because mechanisms and rates of transcription can influence splicing and exon definition interactions, the role of SFPQ as a transcription elongation factor versus a splicing factor is inadequately disentangled by authors.

      To test whether SFPQ acts as a transcription elongation factor, we performed Pol II Cut&Tag in ST2 cells and demonstrated that depletion of SFPQ only caused marginal changes in either the promoter region or gene body of ECM genes, suggesting that the role of SFPQ as a transcriptional activator or elongation factor is minimal (Fig. 7G, 7H). This finding is distinct from SFPQ function in neurons (PMID: 29719248), suggesting that the activation or recruitment of SFPQ in transcriptional regulation may involve tissue-specific factors in neurons.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Lima et al examines the role of Prmt1 and SFPQ in craniofacial development. Specifically, the authors test the idea that Prmt1 directly methylates specific proteins that results in intron retention in matrix proteins. The protein SFPQ is methylated by Prmt1 and functions downstream to mediate Prmt1 activity. The genes with retained introns activate the NMD pathway to reduce the RNA levels. This paper describes an interesting mechanism for the regulation of RNA levels during development.

      Strengths:

      The phenotypes support what the authors claim that Prmt1 is involved in craniofacial development and splicing. The use of state-of-the-art sequencing to determine the specific genes that have intron retention and changes in gene expression is a strength.

      Weaknesses:

      Some of the data seems to contradict the conclusions. And it is unclear how direct the relationships are between Prmt1 and SFPQ.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      First, the claims regarding the effect of PRMT1 loss on splicing are unclear by the section title. In other words, does loss PRMT1 change the incidence of baseline alternative splicing events, or does it introduce new retention events that are responsible for underwriting the craniofacial phenotype? Consistent with this idea, the narrative could benefit from more cellular and/or histological validations of the transcriptomic defects discovered in the RNAseq, which could help contextualize the bioinformatics data with the developmental defects. Moreover, the conclusions drawn about intron retention could be clarified in terms of how applicable the mechanism is likely to be outside of this tissue-specific set of responsive introns.

      Loss of Prmt1 did not cause a global shift in intron retention, as shown in Supplemental Figure S2. Instead, Prmt1 deletion caused increase of intron retention specifically in genes enriched in cartilage development, glycosaminoglycan biology, dendrite and axon, and decreased intron retention in mitochondria and metabolism genes (Table. S1). We also tested matrix protein expression by histology to confirm that transcriptomic defects revealed at the RNA level resulted in lower protein production. The new data are in Figure 3E-3H.

      Additionally, invoking NMD to align splicing and differential gene expression data understandable but lacking sufficient controls to be conclusive, such as positive control genes to confirm inhibition of NMD.

      To validate the blockage of NMD, glutathione peroxidase 1 (Gpx1) intron 1, a well-documented substrate for NMD, is tested as positive control (Fig 4Ac, 4Ad, 9B).

      Additionally, it should be clarified whether NMD is a basal mechanism for the regulation of these introns or whether it is an induced mechanism that is invoked by the molecular insult.

      In CNCCs, NMD functions both as a physiological mechanism and invoked by molecular insult. Please refer to responses to Reviewer 1’s public review for detailed explanations.

      Further, authors present data downstream of two siRNAs for the same gene target, but it remains unclear how siRNAs for the same gene target produce different effects. It may be helpful for authors to clarify how many of the transcriptomic defects are shared versus unique between the siRNAs.

      To address this question, we used bioinformatic analysis of the whole genome data to the similarity in changes caused by the two SFPQ-targeting siRNAs. As shown in the new Fig. 7Ba & 7Bb, transcriptomic and intron changes are consistent between the two siRNAs, suggesting that genes targeted by the two siRNA predominantly overlap. This overlap is illustrated by scatter plot analysis of RNAseq DEG and IRI data from each siRNA against SFPQ.

      Finally, we stress the importance of presenting the full conceptual basis for SFPQ's potential role in splicing and gene expression. It is significant to note that SFPQ has been previously studied as a splicing factor and was instead determined to function in support of the transcription elongation rather than in splicing. Thus, if authors are confident that the SFPQ manifests directly in splicing changes they encumber the burden of proof to show that its role in transcription, nor another splicing factor, are driving splicing changes.

      We demonstrated that depletion of SFPQ only caused marginal changes in either the promoter region or gene body of ECM genes, suggesting that the role of SFPQ as a transcriptional activator or elongation factor is minimal (Fig. 7G, 7H). Please refer to responses to Reviewer 1’s public review for detailed explanations.

      Reviewer #2 (Recommendations for the authors):

      (1) It is not clear why the authors focused on intron retention targets vs the other possibilities. Skipped Exon is much higher in terms of the number of changes, please clarify. For the intron retention how is this quantified? The traces are nice, but it is hard to tell which part is retained at this magnification. Also, because the focus is on extracellular matrix (ECM) and NMD it would be nice to show some of those targets here. In the tbx1 trace, some are up and some are down. What does that mean for the gene expression?

      We have investigated SE initially and found that genes with significant changes in Prmt1 CKO CNCCs fall into diverse functional pathways. Among them, a few genes are critical for skeletal formation, including Postn and Fn, and the function of their exon skipping has been documented. For example, the two exons that are skipped in Postn, Exon17 and 21, have been shown to regulate craniofacial skeleton shape and mandibular condyle hypertrophic zone thickness using transgenic mouse models (PMID: 36859617). As illustrated by Figure 10, the skipped exon of Postn is regulated by multiple splicing factors that may perform overlapping functions in vivo.

      Intron retention of each gene is quantified by the ratio of the overall read density of its constitutive intronic regions (CIRs) to the overall read density of its constitutive exonic regions (CERs) and defined as the intron retention index (IRI). In the first section of Response to Reviewer 1’s comments, we explained additional bioinformatic analysis that was performed to address reviewers’ questions, support the confidence of intron event calls and rule out the possibility of other alternative splicing mechanisms, such as by SE, MXE, A5SS or A3SS (Supplemental Figure S5, S6, Table S7).

      (2) RNA-Sequencing of Prmt1 mutants nicely shows gene expression changes, including in ECM and GAG genes. While validation of the sequencing results is not necessarily required, it would be very interesting to show the expression in situ. In addition, the heat map shows both downregulated but also upregulated transcripts. This is expected since this protein regulates many genes. However, the volcano plot shows a significant number of genes upregulated. It would be interesting to show what the upregulated genes are. And what is the proposed mechanism for Prmt1 regulation of upregulated genes?

      Validation for the transcriptomic changes is shown in Fig. 3E-3H using immunostaining.

      As for upregulated genes in Prmt1 mutant, top pathways include cytokine-mediated signaling pathway, signal transduction by p53 signaling pathway and cell morphogenesis (Figure 2E), which are consistent with our previous reports that Prmt1 deletion induces cytokine production in oral epithelium and leads to p53 accumulation in embryonic epicardium (PMID: 32521264, 29420098). Besides these pathways, Prmt1 deletion also caused upregulation of genes involved in adult behavior, postsynaptic organization and apoptotic process, which is consistent with findings from other labs on PRMT1 function in neuronal and cancer cells (PMID: 34619150, 33127433).

      (3) Specific transcripts were shown to have elevated intron retention involved in the ECM and GAG pathway. However in Figure 3D it seems to show the opposite with intronic expression decreased and exonic increases and intronic decrease. This is very important to the final conclusion of the paper. In addition, is there a direct relationship between increased intron and downregulation of this specific gene expression? It seems a bit correlational as it could also be an indirect mechanism. One way to test this is to do in vitro translation with and without the specific intron to test if it results in lower expression.

      We apologize for the mis-labeling in previous version of Figure 3D, which is now corrected. We also tried to test the direct relationship between intron and downregulation of matrix genes such as Adamts2 using in vitro experiments, however, the introns of matrix genes with high retention tends to be long, many 10 to 50kb in length, making it challenging to generate mini-gene constructs for molecular analysis. We used a different approach and demonstrated that inhibition of NMD with a chemical inhibitor NMDI14 caused dramatic accumulation of the Adamts2, Alpl, Eln, Matn2, Loxl1 and Bgn transcripts, suggesting that retained introns triggered NMD to regulate gene expression and this mechanism acts as a physiological level in CNCCs (Fig. 4). We also blocked NMD in control and Prmt1 null CNCCs, where NMD blockage led to higher accumulation of Adamts2 and Alpl transcripts, suggesting that upon Prmt1 deficiency, NMD is further utilized to degrade intron-containing transcripts (Fig. 4). Similarly, in Sfpq-depleted ST2 cells, blocking NMD caused accumulation of intron-retaining transcripts Col4a2, St6galnac3 and Ptk7 (Fig. 9A, 9B).

      (4) While Figure 4 nicely shows the methylation of SFPQ is reduced in Prmt1 CKO cells, it is unclear which reside this methylation occurs. Also the overall expression of SFPQ is also down so it is possible that the methylation is indirect ie Prmt1 regulates some other methyltransferase that regulates SFPQ. Or that because the overall level of SFPQ is down, there is no protein to methylate. How do the authors differentiate between these possibilities?

      Previously, arginine methylation of SFPQ has been characterized using in vitro reaction and cell lines with biochemical assays by Snijders., et al in 2015 (PMID: 25605962). Among all PRMTs that catalyze asymmetric arginine dimethylation (ADMA), SFPQ is methylated by only PRMT1 and PRMT3, with PRMT1 showing higher efficiency while PRMT3 showing a lower efficiency. However, PRMT3 is mainly cytosolic. Its expression in CNCCs is about 100-fold lower than PRMT1 (Fig. 1). Based on these knowledges, PRMT1 is the primary arginine methyltransferase for SFPQ, a nuclear protein in CNCCs. We and others have shown in a previous publication that SFPQ methylation on arginine 7 and 9 depends on PRMT1 (PMID: 31451547).

      To investigate SFPQ protein degradation in CNCCs, we used MG132 to block proteasomal degradation and observed a partial rescue of SFPQ protein degradation in Prmt1 mutant embryos, suggesting that SFPQ is degraded through proteasomal-mediated mechanism. To address the relationship between SFPQ methylation and protein expression, we assessed arginine methylation of SFPQ that accumulated after MG132 treatment. The accumulated SFPQ was not methylated, confirming the absence of methylation even when SFPQ protein expression is restored.

      Snijders., et al, also shown that citrullination induced by PADI4 regulate SFPQ stability (Snijders 2015). We considered this possibility and assessed the expression levels of PADIs. In E13.5 and E15.5 CNCCs, PADI1-4 mRNA expression levels are very low (TPM<5), suggesting that PADIs may not regulate SFPQ stability in CNCCs. A detailed mechanism as to how PRMT1-mediated SFPQ methylation controls stability awaits further investigation.

      (5) For the Sfpq deleted experiment, it seems that the two knockdowns are not similar in the gene targets and GO terms different except Wnt signaling. This makes this data difficult to interpret. The genes identified as intron retention are different than the ones identified in Prmt1 deletion and not reduced as much. How does this fit in with the Prmt1 story? If working through Sfpq, it assumes that the targets will be similar and more the 8% would be in common.

      To address the first concern, we used bioinformatic analysis of the whole genome data to the similarity in changes caused by the two SFPQ-targeting siRNAs. As shown in the new Fig. 7Ba & 7Bb, transcriptomic and intron changes are consistent between the two siRNAs, suggesting that genes targeted by the two siRNA predominantly overlap. This overlap is illustrated by scatter plot analysis of RNAseq DEG and IRI data from each siRNA against SFPQ.

      We have previously identified a group of splicing factors that depends on PRMT1 for arginine methylation, including SFPQ (PMID: 31451547). In the new data in Figures 5, 6 and 10, we tested an additional five PRMT1-dependent splicing factors that are highly expressed in CNCCs: SRSF1, EWSR1, TAF15, TRA2B and G3BP1 (Fig. 5, 6 and 10). Among these factors, SRSF1 and G3BP1 are predominantly expressed in the cytosol of NCCs at E13.5. As splicing activity in the nucleus is needed for pre-mRNA splicing, we excluded these two and focused on the other three proteins. EWSR1 and TRA2B are both methylated in CNCCs and depend on PRMT1 for methylation (Fig. 5). We weren’t able to assess TAF15 methylation because of lack of efficient antibody for the PLA assay. We also demonstrated that their protein expression or subcellular localization was not altered by Prmt1 deletion in CNCCs, unlike SFPQ (Fig. S2). To define their splicing footprint, we performed siRNA-mediated knockdown in ST2 cells, followed by RNA-seq and IRI analysis to define differentially regulated genes and introns, which revealed distinct biological pathways regulated by SFPQ, EWSR1, TRA2B and TAF15, but minimal roles of EWSR1, TRA2B and TAF15 on intron retention when compared to SFPQ (Fig. 10F-10I, Supplemental Figure S7A-S7F). ECM genes are significantly downregulated by all four splicing factors (Fig. 10J-10M), but EWSR1, TRA2B and TAF15 regulate transcription or exon skipping instead of IR, as exemplified by Alpl and Postn (Fig. 10N-10T).

      (6) The addition of an NMD mechanism is interesting but not surprising that when inhibiting the pathway broadly, there is an increase in gene expression in the mesoderm cell line. How specific is this to craniofacial development?

      NMD is driven by a protein complex composed of SMG and UPF proteins. We show in the revised manuscript that NMD is both a physiological mechanism in CNCCs and triggered by genetic disturbance (Fig. 4). These data are in line with human patient reports where SMG9 mutation in human causes malformation in the face, hand, heart and brain (PMID: 27018474). Mouse genetic studies also demonstrated roles of NMD components during embryonic development.Smg6, Upf1, Upf2 and Upf3a knockout mouse die at early embryonic stages (E5.5-E9.5), and Smg1 gene trap mutant mice die at E12.5 (Han 2018). Additionally, intron retention-triggered NMD in cancer has both promotive and suppressive roles and NMD inhibitors has been tested for cancer therapy and recently cancer immunotherapy. Our findings highlight matrix genes as one of the key targets for NMD during craniofacial development.

      Minor:

      (1) The supplemental figures are difficult to understand. In the first upload there are many figures and tables, some excel files that are separate uploads and some not. Please upload as separate files so it is clear. And also put them in order that they are in the manuscript.

      (2) For the heat map in figure 2B, it would be good to show all the genes or none at all. It seems a bit like cherry-picking to highly only a few. And they are not labeled where they are located in the graph. Are these the top lines if so please label.

      (3) Gene names in Figure 3A are difficult to read. I would also not consider BMP7 an ECM gene.

      (4) A summary diagram of the interactions proposed will help to make this more understandable.

      The supplemental figures are reorganized and uploaded as separate word and excel documents. For Heat map in Fig. 2B, we have removed the gene names. For Fig. 3A, only the most significantly changed gene are labeled in red dots with names. We didn’t label all the genes because of the large number of genes. For the new Figure 3B, we have replaced BMP7. A schematic summary is also added to Supplemental Fig. S9 to illustrate the PRMT1-SFPQ pathway.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      The authors determine the phylogenetic relation of the roughly two dozen wtf elements of 21 S. pombe isolates and show that none of them in the original S. pombe are essential for robust mitotic growth. It would be interesting to test their meiotic function by simply crossing each deletion mutant with the parent and analyzing spores for non-Mendelian inheritance. If this has been reported already, that information should be added to the manuscript. If not, I suggest the authors do these simple experiments and add this information.

      Thanks for the great summary! All the wtf genes have been tested for meiotic drive phenotypes previously by Bravo Nunez et al. (2020; http://doi.org/10.1371/journal.pgen.1008350). The reference was cited in our original manuscript, and we added the details in the revised manuscript.  

      Strengths:

      The most interesting data (Figure 4) show that one recombinant (wtfC4) between wtf18 and wtf23 produces in mitotic growth a poison counteracted by its own antidote but not by the parental antidotes. Again, it would be interesting to test this recombinant in a more natural setting - meiosis between it and each of the parents.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Weaknesses:

      In the opinion of this reviewer, some minor rewriting is needed.

      We did the rewriting as this reviewer suggested.

      Reviewer #2 (Public review):

      Summary:

      This important study provides a mechanism that can explain the rapid diversification of poison-antidote pairs (wtf genes) in fission yeast: recombination between existing genes.

      Thanks!

      Strengths:

      The authors analyzed the diversity of wtf in S. pombe strains, and found pervasive copy number variations. They further detected signals of recurrent recombination in wtf genes. To address whether recombination can generate novel wtf genes, the authors performed artificial recombination between existing wft genes, and showed that indeed a new wtf can be generated: the poison cannot be detoxified by the antidotes encoded by parental wtf genes but can be detoxified by own antidote.

      Thanks for the great summary!

      Weaknesses:

      The study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript, Wang and colleagues explore factors contributing to the diversification of wtf meiotic drivers. wtf genes are autonomous, single-gene poison-antidote meiotic drivers that encode both a spore-killing poison (short isoform) and an antidote to the poison (long isoform) through alternative transcriptional initiation. There are dozens of wtf drivers present in the genomes of various yeast species, yet the evolutionary forces driving their diversification remain largely unknown. This manuscript is written in a straightforward and effective manner, and the analyses and experiments are easy to follow and interpret. While I find the research question interesting and the experiments persuasive, they do not provide any deeper mechanistic understanding of this gene family.

      Thanks! Please see the following for our point-to-point response.

      Strengths:

      (1) The authors present a comprehensive compendium and analysis of the evolutionary relationships among wtf genes across 21 strains of S. pombe.

      (2) The authors found that a synthetic chimeric wtf gene, combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves like a meiotic driver that could only be rescued by the chimeric antidote but neither of the parental antidotes. This is a very interesting observation that could account for their inception and diversification.

      Thanks for the great summary!

      Weaknesses:

      (1) Deletion strains

      The authors separately deleted all 25 Wtf genes in the S. pombe ference strain. Next, the authors performed a spot assay to evaluate the effect of wtf gene knockout on the yeast growth. They report no difference to the WT and conclude that the wtf genes might be largely neutral to the fitness of their carriers in the asexual life cycle at least in normal growth conditions.

      The authors could have conducted additional quantitative growth assays in yeast, such as growth curves or competition assays, which would have allowed them to detect subtle fitness effects that cannot be quantified with a spot assay. Furthermore, the authors do not rule out simpler explanations, such as genetic redundancy. This could have been addressed by crossing mutants of closely related paralogs or editing multiple wtf genes in the same genetic background.

      Another concern is the lack of detailed information about the 25 knockout strains used in the study. There is no information provided on how these strains were generated or, more importantly, validated. Many of these wtf genes have close paralogs and are flanked by repetitive regions, which could complicate the generation of such deletion strains. As currently presented, these results would be difficult to replicate in other labs due to insufficient methodological details

      We generated growth curves for all the 25 wtf deletion strains. We provided the details for wtf gene knockout. However, for 25 wtf genes, there are too many combinations for editing two genes, and it is technically challenging to knock out multiple wtf together. Nevertheless, our results suggest single wtf genes have little effect on the host fitness under normal condition.

      (2) Lack of controls

      The authors found that a synthetic chimeric wtf gene, constructed by combining exons 1-5 of wtf23 and exon 6 of wtf18, behaves as a meiotic driver that can be rescued only by its corresponding chimeric antidote, but not by either of the parental antidotes (Figure 4F). In contrast, three other chimeric wtf genes did not display this property (Figure 4C-E). No additional experiments were conducted to explain these differences, and basic control experiments, such as verifying the expression of the chimeric constructs, were not performed to rule out trivial explanations. This should be at the very least discussed. Also, it would have been better to test additional chimeras.

      We verified the expression of the chimeric genes. The last exon of wtf18 is too small (128bp) to do more meaningful chimeras.

      (3) Statistical analyses

      In line 130 the authors state that: "Given complex phylogenetic mixing observed among wtf genes (Figure 1E), we tested whether recombination occurred. We detected signals of recombination in the 25 wtf genes of the S. pombe reference genome (p = 0) and in the wtf genes of the 21 S. pombe strains (p = 0) using pairwise homoplasy index (HPI) test." Reporting a p-value of 0 is not appropriate. Exact P-values should be reported. 

      Due to software limitations, the PHI test reports p-values of 0.0 for extremely significant results. We have therefore reported them as <0.0001 in the revised manuscript.

      Recommendations for the authors:

      Reviewing Editor Comments:

      Regarding the synthetic chimeric wtf gene constructed by combining exons of wtf23 and wtf18, the authors did not explicitly test whether it acts as a meiotic driver in the natural context of a cross. Instead, they examined this possibility only through transgenic overexpression experiments. Given that this is arguably the most important claim of the paper, it is critical that the authors perform, report, and discuss such an experiment in a natural context, regardless of the outcome. It is not necessary to test other recombinants or other wtf loci.

      Thanks for this insightful comment! As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory strain 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wtf18. However, we encountered a challenge: since strain 972h- has only one mating type and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that only carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins or due to the genetic background. Similarly, the drive activity of wtf13 has been shown to be specifically suppressed in certain backgrounds.

      Reviewer #1 (Recommendations for the authors):

      The paper is very well written, but some minor points should be corrected or checked.

      (1) Line 95: Why "Putative"? Is it not clear what a wtf pseudogene is?

      “Putative” was removed.

      (2) Line 105: Does "known functional" mean they are active (i.e., have been tested and shown to be active)? If so, a reference should be added.

      We used “known meiotic divers”, and added reference here.

      (3) Line 135: "no recombination signal was tested". Do the authors mean no signal was inferred? 

      We changed “tested” to “detected”.

      (4) Line 147: References for "known functional meiotic drivers (wtf23) and artificially generated meiotic driver (wtf18)" should be given. A statement of how wtf18 was "artificially generated" is essential so the reader knows how that element differs from the wtfC4 generated here.

      Reference for wtf23. As for wtf18, we have specified in the follow text, namely “we artificially introduced an in-frame ATG codon right before the start of exon 2, generating wtf18poison/-0M.”

      (5) Lines 154 and 424 say an ATG codon was introduced "right before the start of exon 2," but Figure 4B shows it before exon 1.

      We thank the reviewer. The introduced ATG is the second start codon in the long transcript and the first in the short transcript. The right panel of Figure 4B shows the short transcript, so the text and figure are consistent.

      (6) Line 159: The wtf18 mutant with this additional ATG codon should be tested in meiosis, to see if "putative" is correct.

      Thanks. As wtfC4, we came with technical challenges to show the driver phenotype in a natural setting, and thus removed this statement.

      (7) Line 181: change "driver" to "drive".

      Driver is correct.

      (8) Line 184: insert to read "wtf genes tested". Also, what is the basis for proposing that "the last exon might be crucial for antidote function"?

      “Tested” added, and removed the statement.

      (9) Line 198: change to read "detects only large differences".

      Done as suggested.

      (10) Line 204: change "removed" to "removal".

      Done as suggested.

      (11) Lines 242 and 243: Are "Splittree4" and "SplitsTree4" different, or is this a misprint?

      Corrected!

      (12) Lines 274-5 and 412 -3 would read better as "strains were diluted in five 10-fold steps” and “...μL of each dilution spotted on” “…to assay for…"

      Done as suggested.

      (13) Line 284 says "No new data were generated." This is clearly wrong. Perhaps the authors mean there are no supplementary data files.

      Corrected!

      (14) Line 406: Change "is" to "are".

      Corrected!

      (15) Line 413: Surely, they were spotted onto YE agar medium, not liquid medium.

      Corrected!

      (16) Figure 3C: Define "Rho" and the scale used.

      The definition of Rho has been added to the Methods section in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The evidence is largely solid, but the study can benefit from demonstrating that the novel poison-antidote constructed by the authors can serve as a meiotic driver.

      As suggested, we have tried to test this recombinant in a more natural setting. We created a recombinant strain (wtfC4) based on the laboratory 972h-. Specifically, we replaced the last exon of the original wtf23 gene with the last exon of wt18f. However, we encountered a challenge: since 972h- is a mating-type strain and cannot undergo meiosis on its own, we had to mate the recombinant strain with a BN0 h⁺ strain that carries the wtf23<sup>antidote</sup>. Unfortunately, despite of tens of attempts over nearly a year, we did not observe meiotic driver phenotype as expected. This might be due to issues with the proper splicing and expression of the potential poison and antidote proteins.

      Reviewer #3 (Recommendations for the authors):

      I strongly recommend the authors provide all the details concerning the generation of the knock-out strains, including specific primers used (for both the deletion and validation), the result of these validations, and the specific genotype (and ID) of the strains generated.

      These details are now included in the Materials and Methods section and in Supplementary.

      Please also provide exact P-values (see point 3).

      Due to software limitations, the PHI test reports p-values of 0.0 for extremely significant results. We have therefore reported them as <0.0001 in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #2 (Public review):

      In this valuable manuscript, Lin et al attempt to examine the role of long non coding RNAs (lncRNAs) in human evolution, through a set of population genetics and functional genomics analyses that leverage existing datasets and tools. Although the methods are incomplete and at times inadequate, the results nonetheless point towards a possible contribution of long non coding RNAs to shaping humans, and suggest clear directions for future, more rigorous study.

      Comments on revisions:

      I thank the authors for their revision and changes in response to previous rounds of comments. As before, I appreciate the changes made in response to my comments, and I think everyone is approaching this in the spirit of arriving at the best possible manuscript, but we still have some deep disagreements on the nature of the relevant statistical approach and defining adequate controls. I highlight a couple of places that I think are particularly relevant, but note that given the authors disagree with my interpretation, they should feel free to not respond!

      (1) On the subject of the 0.034 threshold, I had previously stated: "I do not agree with the rationale for this claim, and do not agree that it supports the cutoff of 0.034 used below."

      In their reply to me, the authors state:

      "What we need is a gene number, which (a) indicates genes that effectively differentiate humans from chimpanzees, (b) can be used to set a DBS sequence distance cutoff. Since this study is the first to systematically examine DBSs in humans and chimpanzees, we must estimate this gene number based on studies that identify differentially expressed genes in humans and chimpanzees. We choose Song et al. 2021 (Song et al. Genetic studies of human-chimpanzee divergence using stem cell fusions. PNAS 2021), which identified 5984 differentially expressed genes, including 4377 genes whose differential expression is due to trans-acting differences between humans and chimpanzees. To the best of our knowledge, this is the only published data on trans-acting differences between humans and chimpanzees, and most HS lncRNAs and their DBSs/targets have trans-acting relationships (see Supplementary Table 2). Based on these numbers, we chose a DBS sequence distance cutoff of 0.034, which corresponds to 4248 genes (the top 20%), slightly fewer than 4377."

      I have some notes here. First, Agoglia et al, Nature, 2021, also examined the nature of cis vs trans regulatory differences between human and chimps using a very similar set up to Song et al; their Supplementary Table 4 enables the discovery of genes with cis vs trans effects although admittedly this is less straightforward than the Song et al data. Second, I can't actually tell how the 4377 number is arrived at. From Song et al, "Of 4,671 genes with regulatory changes between human-only and chimpanzee-only iPSC lines, 44.4% (2,073 genes) were regulated primarily in cis, 31.4% (1,465 genes) were regulated primarily in trans, and the remaining 1,133 genes were regulated both in cis and in trans (Fig. 2C). This final category was further broken down into a cis+trans category (cis- and transregulatory changes acting in the same direction) and a cis-trans category (cis- and trans-regulatory changes acting in opposite directions)." Even when combining trans-only and cis&trans genes that gives 2,598 genes with evidence for some trans regulation. I cannot find 4,377 in the main text of the Song et al paper.

      Elsewhere in their response, the authors respond to my comment that 0.034 is an arbitrary threshold by repeating the analyses using a cutoff of 0.035. I appreciate the sentiment here, but I would not expect this to make any great difference, given how similar those numbers are! A better approach, and what I had in mind when I mentioned this, would be to test multiple thresholds, ranging from, eg,0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size.

      (1) We sincerely thank the reviewer for this critical point. Our initial purpose, based on DBS distances from the human genome to chimpanzee genome and archaic genomes, was that genes with large DBS distances may have contributed more to human evolution. However, our ORA (overrepresentation analysis) explored only genes with large DBS distances (the legend of old Figure 2 was “1256 target genes whose DBSs have the largest distances from modern humans to chimpanzees and Altai Neanderthals are enriched in different Biological Processes GO terms”), with the use of the cutoff (threshold) of 0.034 for defining large distance. The cutoff is not totally unreasonable (as our new results and the following sensitivity analysis indicate), but this approach was indirect and flawed.

      (2) We have now performed ORA using two methods. The first uses only DBS distances. Instead of using a cutoff, we now sort genes by DBS distance (human-chimpanzee distances and human-Altai Neanderthal distance, respectively, see Supplementary Table 5) and use the top 25% and bottom 25% of genes to perform ORA. This directly examines whether DBS distances along indicate that genes with large DBS distances contribute more to human evolution than genes with small DBS distances. The second also explores the ASE genes (allele-specific expression, genes undergoing human/chimpanzee-specific regulation in the tetraploid human–chimpanzee hybrid iPS) reported by Agoglia et al. 2021. We select the top 50% and bottom 50% of genes with large and small DBS distances, intersect them with ASE genes from Agoglia et al. 2021 (their Supplementary Table 4), and apply ORA to the intersections. Both the results are that: (a) more GO terms are obtained from genes with large DBS distances, (b) more human evolution-related GO terms are obtained from genes with large DBS distances (Supplementary Table 5,6,7; Figure 2; Supplementary Fig. 15). These results directly suggest that genes with large DBS distances contribute more to human evolution than genes with small DBS distances, which is a key theme of the study.

      (3) Regarding Song et al 2021, the statement of “we differentiated…allotetraploid (H1C1a, H1C1b, H2C2a, H2C2b) lines into ectoderm, mesoderm, and endoderm” made us assume that their differentiated hybrid cell lines cover more tissue types than those of Agoglia et al. 2021. Now, upon re-examining Supplementary Table 5 of Song et al. and Supplementary Table 4 of Agoglia et al. 2021, we find that the latter more clearly indicates significant ASE genes (p-adj<0.01 and |LFC>0.5| in GRCh38 and PanTro5).

      (4) We have also performed two additional analyses in response to the suggestion of “test multiple thresholds, ranging from, eg, 0.05 to 0.01 <DBS dist =0.01 -> 0.034 -> 0.05> at some well-defined step size”. First, we performed a multi-threshold sensitivity analysis using a spectrum of cutoffs (0.03, 0.034, 0.04, 0.05), and tracked the number of genes identified and the enrichment significance of key GO terms (e.g., "neuron projection development," "behavior") across these thresholds. The result confirms that while the absolute number of genes varies with the cutoffs, the core biological conclusion (specifically, the significant enrichment of target genes in neurodevelopmental and cognitive functions) remains stable and significant. For instance, "behavior" maintains strong statistical significance (FDR<0.01) in both the human-chimpanzee and human-Altai Neanderthal comparisons across all tested cutoffs, and "Neuron projection development" also remains significant across three (0.03, 0.034, 0.04) of the four cutoffs in the Altai comparison. This pattern suggests that our core findings regarding neurodevelopmental functions are robust across a range of cutoffs. Nevertheless, we did not extend the analysis to smaller cutoffs (e.g., 0.01 or 0.02) because such values would identify an excessively large number of genes (>10000) for ORA, which would render the GOterm enrichment analysis less meaningful due to a loss of specificity.

      Second, we have performed an additional validation to directly evaluate whether the 0.034 cutoff itself represents a stringent and biologically meaningful value. We sought to empirically determine how often a DBS sequence distance of 0.034 or greater might occur by chance in promoter regions, thereby testing its significance as a marker of potential evolutionary divergence. We randomly sampled 10,000 windows from annotated promoter regions across the hg38 genome, each with a size matching the average length of DBSs (147 bp). We then calculated the per-base sequence distances for these random windows between modern humans and chimpanzees, as well as between modern humans and the three archaic humans (Altai, Denisovan, Vindija). The analysis reveals that a distance of ≥0.034 is a rare event in random promoter sequences: for Human-Chimp, Human-Altai, HumanDenisovan, and Human-Vindija, 5.49% (549/10000), 0.31% (31/10000), 4.47% (447/10000), and0.03% (3/10000) of random windows reach this distance. This empirical evidence suggests that 0.034 is a sufficiently strong cutoff for defining large DBS distance, it would occur very unlikely in a random genomic background (P<0.1 for Chimpanzee and P<0.05 for the archaic humans), and DBSs exceeding this cutoff are significantly enriched for sequences that have undergone substantial evolutionary change instead of being random neutral variations.  

      (5) We present new Figure 2, Supplementary Table 5,6,7, and Supplementary Fig. 15. We have substantially revised section 2.3, related sections in Results, Supplementary Note 3, and Supplementary Table 8. We have removed related descriptions and explanations in the main text and Supplementary Notes. The results of the above two analyses are presented here as two Author response images.

      Author response table 1.

      Sensitivity analysis of GO-term enrichment across different DBS sequence distance cutoffs. The table shows the numbers of target genes identified and the false discovery rates (FDR) for the enrichment of three selected GO terms at four different distance cutoffs. Note that, unlike in the old Figure 2, the results for chimpanzees and Altai Neanderthals are not directly comparable here, as the numbers of target genes used for the enrichment analysis differ between them at each cutoff.

      Author response image 1.

      Distribution of per-base sequence distances for DBS size-matched random genomic windows in Ensembl-annotated promoter regions, calculated between modern humans and (A) chimpanzee, (B) Altai Neanderthal, (C) Denisovan, and (D) Vindija Neanderthal genomes.

      (2) The authors have introduced a new TFBS section, as a control for their lncRNAs - this is welcome, though again I would ask for caution when interpreting results. For instance, in their reply to me the authors state: "The number of HS TFs and HS lncRNAs (5 vs 66) <HS TF vs all HS lncRNAs> alone lends strong evidence suggesting that HS lncRNAs have contributed more significantly to human evolution than HS TFs (note that 5 is the union of three intersections between <many2zero + one2zero> and the three <human TF list>)."

      But this assumes the denominator is the same! There are 35899 lncRNAs according to the current GENCOVE build; 66/35899 = 0.0018, so, 0.18% of lncRNAs are HS. The authors compare this to 5 TFs. There are 19433 protein coding genes in the current GENCOVE build, which naively (5/19433) gives a big depletion (0.026%) relative to the lnc number. However, this assumes all protein coding genes are TFs, which is not the case. A quick search suggests that ~2000 protein coding genes are TFs (see, eg, https://pubmed.ncbi.nlm.nih.gov/34755879/); which gives an enrichment (although I doubt it is a statistically significant one!) of HS TFs over HS lncRNAs (5/2000 = 0.0025). Hence my emphasis on needing to be sure the controls are robust and valid throughout!

      We thank the reviewer for this comment. While 5 vs 66 reveals a difference, a direct comparison is too simplified. The real take-home message of the new TFBS section is not the numbers but the distributions of HS TFs’ targets and HS lncRNAs’ targets across GTEx organs and tissues (Figure 3 and Supplementary Figures 24, 25) - correlated HS lncRNA-target transcript pairs are highly enriched in brain regions, but correlated HS TF-target transcript pairs are distributed broadly across GTEx tissues and organs. We have now removed the simple comparison of “5 vs 66” and more carefully explained our comparison in section 2.6.

      (3) In my original review I said: line 187: "Notably, 97.81% of the 105141 strong DBSs have counterparts in chimpanzees, suggesting that these DBSs are similar to HARs in evolution and have undergone human-specific evolution." I do not see any support for the inference here. Identifying HARs and acceleration relies on a far more thorough methodology than what's being presented here. Even generously, pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee.

      In their reply to me, the authors state:

      Here, we actually made an analogy but not an inference; therefore, we used such words as "suggesting" and "similar" instead of using more confirmatory words. We have revised the latter half sentence, saying "raising the possibility that these sequences have evolved considerably during human evolution".

      Is the aim here to draw attention to the ~2.2% of DBS that do not have a counterpart? In that case, it would be better to rewrite the sentence to emphasise those, not the ones that are shared between the two species? I do appreciate the revised wording, though.

      (1) Our original phrasing may be misleading, and we agree entirely that “pairwise comparison between two taxa only cannot polarise the direction of differences; inferring human-specific change requires outgroups beyond chimpanzee”. As explained in that reply, we know and think that DBSs and HARs are two different classes of sequences, and indeed, identifying HARs and acceleration relies on a far more thorough methodology. Yet, three factors prompted us to compare them. First, both suggest the importance of sequences outside genes. Second, both are quite “old” sequences and have undergone considerable evolution recently (although the references are different). Third, both have contributed greatly to human brain evolution.  

      (2) Here, our stress is 97.81% but not 2.2%, and we have made this analogy more clearly and cautiously. Relevant revisions have been made in the Results, Discussion, and Methods sections.   

      (3) We also have further determined whether the 2.2% DBSs are human-specific gains by analyzing them using the UCSC Multiz Alignments of 100 Vertebrates. The result confirms that all 2248 DBSs are present in the human genome but are absent from the chimpanzee genome and all other aligned vertebrate genomes. We add this result into the manuscript.

      (4) Finally, Line 408: "Ensembl-annotated transcripts (release 79)" Release 79 is dated to March 2015, which is quite a few releases and genome builds ago. Is this a typo? Both the human and the chimpanzee genome have been significantly improved since then!

      (1) We thank the reviewer for this comment, which prompts us to provide further explanation and additional data. First, we began predicting HS lncRNAs’ DBSs when Ensembl release 79 was available, but did not re-predict DBSs when new Ensembl releases were published because (a) these new Ensembl releases are based also on hg38, (b) we did not find any fault in the LongTarget program during our use, nor received any one from users, (c) predicting lncRNAs’ DBSs using the LongTarget program is highly time-consuming.  

      (2) Second, to assess the influence of newer Ensembl releases, we compared the promoters annotated in release 79 and in release 115. We found that the vast majority (87.3%) of promoters newly annotated in release 115 belong to non-coding genes. Thus, using release 115 may predict more DBSs in non-coding genes, but downstream analyses based on protein-coding genes would be essentially the same (meaning that all figures and tables would be the same).

      (3) Third, a key element of this study is GTEx data analysis, and these data were also published years ago.  

      (4) Finally, some lncRNA genes have new gene symbols in new Ensembl releases. To allow researchers to use our data conveniently, we have added a new column titled "Gene symbol (Ensembl release115)" to Supplementary Tables 2A and 2B.  

      Summary:

      Major changes based on Reviewer’s comments:

      (1) The following revisions are made to address the comment on “the 0.034 threshold”: (a) Section 2.3, section 2.4, Supplementary Note 3, and related contents in Discussion and Methods are revised, (b) new Figure 2, Supplementary Figure 15, new Supplementary Table 5,6,7, (c) Table 2 and Supplementary Table 8 are revised.

      (2) To address the comment on “new TFBS section”, section 2.6 and section 4.13 are revised.  

      (3) To address the comment on “97.81% and 2.2% of DBSs”, section 2.3 is revised.

      (4) The following revisions are made to address the comment on “release 79”: (a) the old Supplementary Table 2, 3 are merged to Supplementary Table 2AB, and the new column "Gene symbol (Ensembl release115)" is added to Supplementary Table 2AB, (b) accordingly, Supplementary Table 4,5 are renamed to Supplementary Table 3,4.

      Additional revisions:

      (1) Section 2.5 “Young weak DBSs may have greatly promoted recent human evolution” is moved into Supplementary Note 3 (which now has the subtitle “Target genes with specific DBS features are enriched in specific functions”), because this section is short and lacking sufficient cross-validation.

      (2) Considerable minor revisions of sentences have been made.

      (3) Since there are many supplementary figures, the main text now cites only Supplementary Notes, as the reader can easily access supplementary figures in Supplementary Notes.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The present study evaluates the role of visual experience in shaping functional correlations between human extrastriate visual cortex and frontal regions. The authors used fMRI to assess "resting-state" temporal correlations in three groups: sighted adults, congenitally blind adults, and neonates. Previous research has already demonstrated differences in functional correlations between visual and frontal regions in sighted compared to early blind individuals. The novel contribution of the current study lies in the inclusion of an infant dataset, which allows for an assessment of the developmental origins of these differences.

      The main results of the study reveal that correlations between prefrontal and visual regions are more prominent in the blind and infant groups, with the blind group exhibiting greater lateralization. Conversely, correlations between visual and somato-motor cortices are more prominent in sighted adults. Based on these data, the authors conclude that visual experience plays an instructive role in shaping these cortical networks. This study provides valuable insights into the impact of visual experience on the development of functional connectivity in the brain.

      Strengths:

      The dissociations in functional correlations observed among the sighted adult, congenitally blind, and neonate groups provide strong support for the main conclusion regarding postnatal experience-driven shaping of visual-frontal connectivity.

      The inclusion of neonates offers a unique and valuable developmental anchor for interpreting divergence between blind and sighted adults. This is a major advance over prior studies limited to adult comparisons.

      Convergence with prior findings in the blind and sighted adult groups reinforces the reliability and external validity of the present results.

      The split-half reliability analysis in the infant data increases confidence in the robustness of the reported group differences.

      Weaknesses:

      The manuscript risks overstating a mechanistic distinction between sighted and blind development by framing visual experience as "instructive" and blindness as "reorganizing." Similarly, the binary framing of visual experience and blindness as independent may oversimplify shared plasticity mechanisms.

      The interpretation of changes in temporal correlations as altered neural communication does not adequately consider how shifts in shared variance across networks may influence these measures without reflecting true biological reorganization.

      The discussion does not substantively engage with the longstanding debate over whether sensory experience plays an instructive or permissive role in cortical development.

      The relationship between resting-state and task-based findings in blindness remains unclear.

      Reviewer #2 (Public review):

      Summary:

      Tian et al. explore the developmental origins of cortical reorganization in blindness. Previous work has found that a set of regions in the occipital cortex show different functional responses and patterns of functional correlations in blind vs. sighted adults. Here, Tian et al. explore how this organization arises over development. Is the "starting state" more like the blind pattern, or more like the adult pattern? Their analyses reveal that the answer depends on the particular networks investigated. Some functional connections in infants look more like blind than sighted adults; other functional connections look more like sighted than blind adults; and others fall somewhere in the middle, or show an altogether different pattern in infants compared with both sighted and blind adults.

      Strengths:

      The paper addresses very important questions about the starting state in the developing visual cortex, and how cortical networks are shaped by experience. Another clear strength lies in the unequivocal nature of many results. Many results have very large effect sizes, critical interactions between regions and groups are tested and found, and infant analyses are replicated in split halves of the data.

      Weaknesses:

      While potential roles of experience (e.g., visual, cross-modal) are discussed in detail, little consideration is given to the role of experience-independent maturation. The infants scanned are extremely young, only 2 weeks old. It is possible then that the sighted adult pattern may still emerge later in infancy or childhood, regardless of infant visual experience. If so, the blind adult pattern may depend on blindness-related experience only (which may or may not reflect "visual" experience per se). In short, it is not clear that birth, or the first couple weeks of life, are a clear cut "starting point" for development, after which all change can be attributed to experience.

      Reviewer #3 (Public review):

      Summary

      This study aimed to investigate whether the differences observed in the organization of visual brain networks between blind and sighted adults result from a reorganization of an early functional architecture due to blindness, or whether the early architecture is immature at birth and requires visual experience to develop functional connections. This question was investigated through the comparison of 3 groups of subjects with resting-state functional MRI (rs-fMRI). Based on convincing analyses, the study suggests that: 1) secondary visual cortices showed higher connectivity to prefrontal cortical regions (PFC) than to non-visual sensory areas (S1/M1 and A1) in infants like in blind adults, in contrast to sighted adults; 2) the V1 connectivity pattern of infants lies between that of sighted adults (showing stronger functional connectivity with non-visual sensory areas than with PFC) and that of blind adults (showing stronger functional connectivity with PFC than with non-visual sensory areas); 3) the laterality of the connectivity patterns of infants resembled those of sighted adults more than those of blind adults, but infants showed a less differentiated fronto-occipital connectivity pattern than adults.

      Strengths

      - The question investigated in this article is important for understanding the mechanisms of plasticity during typical and impaired development, and the approach considered, which compares different groups of subjects including, neonates/infants and blind adults, is highly original.

      - Overall, the presented analyses are solid and well detailed, and the results and discussion are convincing.

      Weaknesses

      - While it is informative to compare the "initial" state (close to birth) and the "final" states in blind and sighted adults to study the impact of post-natal and visual experience, this study does not analyze the chronology of this development and when the specialization of functional connections is completed. This would require investigating the evolution of functional connectivity of the visual system as a function of visual experience and thus as a function of age, at least during toddlerhood given the early and intense maturation of the visual system after birth. This could be achieved by analyzing different developmental periods using open databases such as the Baby Connectome Project.

      - The rationale for grouping full-term neonates and preterm infants (scanned at term-equivalent age) is not understandable when seeking to perform comparisons with adults. Even if the study results do not show differences between full-terms and preterms in terms of functional connectivity differences between regions and of connectivity patterns, preterms group had different neurodevelopment and post-natal (including visual) experiences (even a few weeks might have an impact). And actually they show reduced connectivity strength systematically for all regions compared with full-terms (Sup Fig 7). Considering a more homogeneous group of neonates would have strengthen the study design.

      - The rationale for presenting results on the connectivity of secondary visual cortices before the one of primary cortices (V1) could be clarified.

      - The authors acknowledge the methodological difficulties for defining regions of interest (ROIs) in infants in a similar way as adults. Since the brain development is not homogeneous and synchronous across brain regions (in particular with the frontal and parietal lobes showing a delayed growth), this poses major problems for registration. This raises the question of whether the study findings could be biased by differences in ROI positioning across groups.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors are appropriately cautious in many parts of the discussion and include several helpful control analyses. Nonetheless, additional clarification of key assumptions and potential confounds would strengthen the paper.

      (1) The current framing labels vision as "instructive" and blindness as "reorganizing," but it is unclear why these two experiential factors are characterized differently. Both involve activity-dependent changes to functional architecture from a shared immature scaffold. Labeling them differently risks conflating divergent outcomes with distinct underlying mechanisms. Just because visual and blind adults show different patterns of functional connectivity does not mean they reflect separate processes. While the discussion briefly acknowledges the possibility of shared plasticity mechanisms, much of the framing across the manuscript, including in the abstract and introduction, implies a dichotomy. A clearer articulation of the criteria used to assign these labels, or reconsideration of whether such a distinction is warranted, would improve conceptual clarity. The current framing appears analogous to saying that "heat causes expansion" and "cold causes contraction" as if these were separate mechanisms, when they are actually two directions of change along a single factor: temperature. A more parsimonious framework, such as activity-dependent reweighting of pre-existing connectivity, may better capture the nature of plasticity at play in both sighted and blind development.

      Following the reviewer’s suggestion, we have revised the manuscript to clarify that both vision and blindness can be understood as manifestations of a common framework of experience-driven plasticity. We removed all mention of reorganization and clarify and modified the wording throughout.

      Specifically:

      Abstract: “Are infant visual cortices functionally like those of sighted adults, with blindness leading to functional change? We find that, on the contrary that secondary visual cortices of infants are functionally more like those of blind adults: stronger coupling with PFC than with nonvisual sensory-motor networks, suggesting that visual experience modifies elements of the sighted-adult long-range functional connectivity profile. Infant primary visual cortices are in-between blind and sighted adults i.e., more balanced PFC and sensory-motor connectivity than either adult group. The lateralization of occipital-to-frontal connectivity in infants resembles the sighted adults, consistent with the idea that blindness leads to functional change. These results suggest that both vision and blindness modify functional connectivity through experience-driven (i.e., activity-dependent) plasticity.” (Page 1, Line 13)

      Introduction: We replaced “blindness leads to functional reorganization” with “blindness modifies this functional connectivity” (Page 2, Line 52), and the following sentence has also been modified to: “lifetime visual experience shapes connectivity toward the sighted-adult pattern” (Page 2, Line 54) For the lateralization patterns, we now describe them as “blindness-related modification” rather than “reorganization”, to keep the interpretation descriptive rather than mechanistic. (Page 4, Line 114),

      (2) In interpreting the functional correlation differences, the discussion should more explicitly consider how statistical interdependence between areas could influence the observed results. For example, an increase in shared variance between visual and motor areas, such as might result from visually guided action, could result in a reduction in the apparent strength of visual-prefrontal temporal correlation (at the resolution of fMRI) without any true biological change in communication between visual-prefrontal cortex. This possibility is not ruled out by reporting groupwise patterns of relative connectivity. A more cautious systems-level framing could help clarify the distinction between neural plasticity and statistical redistribution of variance.

      We thank the reviewer for raising this important point. We agree that resting-state fMRI provides a measure of statistical synchrony in BOLD signals rather than direct causal interactions between regions. This a fundamental limitation of resting state fMRI, which we now note in the Discussion section. Such changes in correlation are consistent with a variety of underlying biological mechanisms. Online task is one factor that influences cross-region correlations. In the current study, both blind and sighted groups were measured while blindfolded and were not performing visually guided actions during the resting state fMRI scans. It is possible that past visual-guided action experience changes the resting state correlations of sighted participants. Indeed, this is one interesting hypothesis.

      In the revised Discussion, we now explicitly note this limitation and clarify that differences in FC do not by themselves establish whether or how underlying neurophysiological mechanisms are changed. We also emphasize that future work will need to investigate whether FC changes are accompanied by alterations in structural connectivity and to probe causal interactions and mechanistic underpinnings as follows:

      “Resting-state functional connectivity captures synchrony in BOLD signal fluctuations rather than causal interactions and differences in functional connectivity cannot on their own reveal how underlying neurophysiological mechanisms are modified.” (page 13,line 342)

      “Future studies will be needed to determine whether these functional changes are accompanied by alterations in structural connectivity, and to probe causal interactions and mechanistic underpinnings.” (page 13,line 350)

      (3) The mechanistic interpretation of group differences in visual-motor coupling would benefit from stronger network-level justification. Direct connections between these areas are sparse in primates. If effects reflect indirect polysynaptic interactions or shared thalamic input, as the authors suggest, one might expect corresponding group differences in intermediate regions (e.g., parietal cortex, thalamus) that mediate these interactions. Is there any evidence for this in the data?

      We thank the reviewer for raising this point. We agree and as noted above, resting state fMRI cannot distinguish between direct causal interactions between two regions and ones that a mediating region is involved. This is a fundamental limitation of resting state fMRI. The current study further focused on testing a specific hypothesis motivated by previously observed group differences between blind and sighted adults and our analyses focused on ROI-to-ROI connectivity between occipital, frontal, and sensory-motor cortices, and did not include these additional regions. In prior work, we and others, have looked at effects in parietal cortices (Abboud & Cohen, 2019; Bedny et al., 2009; Deen et al., 2015; Kanjlia et al., 2016, 2021; Sen et al., 2022). In blindness, parietal networks show increased correlations with some visual areas, rather than decreased. Regarding the thalamus, there is less clear evidence and there is some ongoing work trying to address this question. A couple of studies suggest that there is indeed increased connectivity between some parts of the thalamus and visual cortex in blindness. Although the anatomical information is limited, some of the work suggests that this increase is with higher-cognitive nuclei of the thalamus (Bedny et al., 2011; Liu et al., 2007).

      We agree that this is an important direction for future work. To acknowledge this point, we have revised the manuscript to highlight the potential role of cortical and subcortical hub regions in mediating connectivity changes. The text has been modified as follows:

      “Connectivity changes between two areas could be mediated by ‘third-party’ hub regions. For example, posterior parietal cortex serves as a cortical hub for multisensory integration and visuo-motor coordination and could mediate occipital-to-sensory-motor communication (Rolls et al., 2023; Sereno & Huang, 2014). Subcortical structures such as the thalamus could also play a mediating role (Vega-Zuniga et al., 2025).” (page 13,line 345)

      (4) The discussion would benefit from deeper engagement with prior work on experience-dependent plasticity, particularly the longstanding distinction between instructive and permissive roles of experience. While the authors briefly define these concepts and reference their historical use, a more explicit consideration of how their findings relate to this broader literature would help clarify whether such distinctions are necessary or appropriate.

      We thank the reviewer for this thoughtful suggestion to engage more explicitly with the longstanding literature on instructive versus permissive roles of experience. However, most of this literature comes from animal models, where experimental manipulations of the anatomical structure, of experience itself (e.g., controlled rearing studies) and sometimes of neural activity patterns allow clear tests of these mechanisms. Such manipulations are not feasible in humans. The terminology in the animal literature does not directly map onto the methods and data available in the present study or in other work with humans. For this reason, the current data does not allow us to fully engage with the debates in the animal literature and doing risks overinterpreting our findings.

      Nevertheless, we agree that once the instructive/permissive framework has been introduced, it is important to clarify how our results relate to it, rather than only providing definitions. We have therefore added the following text to the discussion:

      “In humans, such manipulations are not feasible, leaving us to study only the consequences of the presence or absence of vision. Under an instructive account, visual and multisensory experience could strengthen coupling between visual and other non-visual sensory-motor cortices through coordinated activity, thereby establishing the sighted-adult connectivity pattern. In the absence of visual input, by contrast, the lack of such coordinated activity may prevent these couplings from being established. Alternatively, vision may act permissively, indirectly enabling maturational processes that shift connectivity toward the sighted-adult configuration.” (page 14,line 362)

      (5) The revised discussion acknowledges the divergence between resting-state and task-based findings, but does not fully frame the theoretical implications of this discrepancy. Although this study cannot resolve the issue with its own data, a more integrative discussion could help clarify whether these measures reflect distinct functional states, developmental trajectories, or mechanisms of plasticity. Without such framing, readers are left without clear guidance on how to reconcile the present results with prior work on cross-modal recruitment in blindness.

      We thank the reviewer for this thoughtful comment. We agree that know how resting-state evidence relates to task-based evidence is a fundamentally important issue. We now discuss this more in the Introduction as well as in the Discussion.

      There is a sizable literature of both task-based and resting state studies. Some of prior studies have measured resting state and task-based data within the same participants and found relationships (Kanjlia et al., 2016, 2021; Lane et al., 2015). We now clarify this in the introduction. These studies find that within visual cortices of blind people, the task-based profile of a cortical area is related to its resting state connectivity pattern (Abboud & Cohen, 2019; Deen et al., 2015; Kanjlia et al., 2016, 2021). This suggests that these two measures are related. However, the timecourse of this relationship, the developmental trajectory and mechanism of plasticity is not known. We note this now in the introduction on page 2. Primarily this is because there is very little relevant developmental evidence. For example, in the current study we find that the resting state profile of secondary visual networks in infants is similar to that of blind adults. However, we do not know whether the visual cortices of infants show task-based cross modal responses. To our knowledge nobody has tested this question. We agree with the reviewer that raising this question in the paper is better than not commenting on the relationship at all.

      To address the reviewer’s comment, we have expanded the discussion to situate our results within a developmental framework, highlighting how early intrinsic connectivity may scaffold alternative trajectories shaped by either visual experience or blindness. The revised text now reads as follows:

      “Conversely, for people who remain blind throughout life, visual-PFC connectivity could enable recruitment of visual cortices for higher-order non-visual functions, such as language and executive control (Bedny et al., 2011; Kanjlia et al., 2021). Our results suggest that blind adults may build on connectivity patterns already present in infancy: like blind adults, sighted infants show stronger occipital–PFC than occipital–sensory–motor coupling. Repeated engagement of occipital networks during higher cognitive tasks in early development could intern enhance connectivity and specialization of visual networks for non-visual higher-order functions.

      Some prior studies have measured resting-state and task-based functional profiles in the same participants. These studies find that within visual cortices of blind people, the task-based profile of a cortical area is related to its resting state connectivity pattern (citations.) This suggests that these two measures are related. However, the timecourse of this relationship, the developmental trajectory and mechanism of plasticity is not known. Primarily this is because there is very little relevant developmental evidence. For example, in the current study we find that the resting state profile of secondary visual networks in infants is similar to that of blind adults. However, we do not know whether the visual cortices of infants show enhanced task-based cross modal responses, relative to sighted adults and how this compares to responses observed in blind adults. Future work with infants and children would be able to address this question.

      In the current study, the clearest evidence for functional change driven by blindness was observed for laterality. Connectivity lateralization in sighted infants resembles that of sighted adults, in both V1 and secondary visual cortices. Relative to both sighted infants and sighted adults, blind adults show more lateralized connectivity patterns between occipital and prefrontal cortices. Previous studies suggest that in people born blind occipital and non-occipital language responses are co-lateralized (Lane et al., 2017; Tian et al., 2023). We speculate that habitual activation of visual cortices by higher-cognitive tasks, such as language, which are themselves highly lateralized, contributes to this biased connectivity pattern of occipital cortex in blindness. Taken together, these results suggest a developmental framework in which intrinsic connectivity present in infancy provides a scaffold that is subsequently shaped and reinforced by experience-dependent recruitment, through either visual experience or the lifelong absence of vision in blindness. Longitudinal work across successive developmental stages will be crucial to test how the alternative trajectories shaped by visual experience versus blindness unfold over development.” (page 14-15)

      (6) The split-half reliability analysis is a valuable control. Additional details would clarify what these noise ceilings reflect. Were the rsFC patterns for each ROI calculated only for the ROIs included in the current study or was a broader assessment across the whole brain performed? It also would be helpful to report whether reliability differed for individual ROIs within and between groups. Even if global reliability is matched, selective differences could influence group comparisons. Several infants in the dhcp dataset were scanned twice. Were any second scans included in the current analyses? Comparing first versus second scans directly could strengthen the claim that several weeks of visual experience are insufficient to shift connectivity toward a sighted adult profile.

      Thanks to the reviewer’s comments on the reliability of the current study.

      In the present study, the noise ceiling was computed from the reliability of the ROI-wise FC profiles used across all analyses. Reliability was estimated using a split-half procedure: each rs-fMRI time series was divided into two equal halves, FC among all ROIs included in the study was computed separately for each half, and the noise ceiling for each ROI was defined as the Pearson correlation between its two FC profiles. Then we averaged these ROI-wise noise ceilings to evaluate group-level reliability, which exceeded 0.70 in all three groups and found no significant difference across groups. This provides an estimate of the upper bound on explainable variance for the exact FC features subjected to statistical testing (Lage-Castellanos et al., 2019). A brief description has been added to the manuscript (page 19, line 518).

      Regarding the reviewer’s question about the scope of rsFC features used in the noise-ceiling analysis: we computed noise ceilings only for the ROIs included in the present study, because all analyses in this work were conducted at the ROI–ROI level and did not involve voxelwise whole-brain FC. Thus, the noise-ceiling estimates correspond directly to the full set of FC features on which all statistical comparisons were based.

      As suggested by the reviewer, we examined noise ceilings for each ROI separately. All ROIs showed high absolute reliability (noise ceiling > 0.80) across the three groups, indicating that the ROI-wise FC estimates are generally robust across participants. Although many ROIs exhibited statistically significant group differences in noise ceiling (one-way ANOVA, p < 0.05), the effect sizes were small to moderate (partial η<sup>2</sup> < 0.14). These differences indicate that reliability may vary modestly across groups at the ROI level, and we cannot fully determine whether such variability contributes to the observed different FC patterns across groups. We have included this point in the revised manuscript (page 19, line 525), along with the full statistical results for the ROI-wise noise ceilings in the Supplementary Table S2.

      Last, we fully agree that longitudinal comparisons across multiple time points can provide important insights into how early visual experience shapes connectivity. At the same time, in the present dataset, the first scan occurred at a preterm age and the second at term-equivalent age. The differences between the first and second scans would reflect not only additional weeks of visual input, but also differences in prematurity status and overall neurodevelopmental maturity, which would make the interpretation of such comparisons difficult in the context of our current aims. We have clarified in the revised manuscript that only term-equivalent (second) scans were included. We see careful longitudinal work as an important avenue for addressing this question more directly.

      (7) The signal dropout assessment in the infant dataset is a valuable quality control step. Applying the same metric to the adult datasets would help harmonize preprocessing across groups and increase confidence in group-level comparisons.

      Thank you for this valuable suggestion. Following your comment, we applied the same signal dropout assessment to the adult datasets. One participant in the sighted adult group and two participants in the blind adult group showed signal dropout in one ROI each. The corresponding results are now included in the Supplementary Materials (Figure S13). The findings remain unchanged after this additional control analysis. We also add the relevant content in the Method part as follows:

      “The same signal dropout assessment was also applied to the blind and sighted adults to ensure consistent quality control across groups. One participant in the sighted adult group and two participants in the blind adult group exhibited signal dropout in one ROI each. Excluding these participants did not alter the group-level results (see Figure S13).” (page 16, line 449)

      Minor:

      (8) The authors added accurate anatomical descriptions to the methods but a less precise characterization remains in the introduction: "Anatomically, these regions correspond roughly to the location of areas such as motion area V5/MT+, the lateral occipital complex (LO), V3a and V4v in sighted people."

      We thank the reviewer for this helpful comment. We have revised the Introduction to provide a fuller anatomical description, consistent with the Methods. The text now reads:

      “Anatomically, these regions in sighted people approximately correspond to the locations of motion-sensitive V5/MT+ and the lateral occipital complex (LO), as well as ventral portions of occipito-temporal cortex including V4v and dorsal portions including V3a. The occipital ROI also extends ventrally into the middle portion of the ventral temporal lobe and dorsally into the intraparietal sulcus and superior parietal lobule.” (page 3, line 88)

      (9)Typo: "lager effect" should be "larger effect."

      Secondary visual cortices showed a significant within > between difference in both groups, with a lager effect in the blind group (post-hoc tests, Bonferroni-corrected paired: t-test: sighted adults within hemisphere > between hemisphere: t (49) = 7.441, p = 0.012; blind adults within hemisphere > between hemisphere: t (29) = 10.735, p < 0.001; V1: F(1, 78) =87.211, p < 0.001).

      We thank the reviewer for catching this typo. We have corrected “lager effect” to “larger effect” in the revised manuscript. (page 9, line 214)

      Reviewer #2 (Recommendations for the authors):

      All of my other concerns were adequately addressed.

      We thank the reviewer for their positive evaluation, and we are glad that our revisions have addressed their concerns.

      Reviewer #3 (Recommendations for the authors):

      In my view, qualifying infants as "sighted" is confusing and unnecessary: why not simplifying and homogenizing the wording along the manuscript and figures?

      We thank the reviewer for this suggestion. We agree and have revised the manuscript to use consistent wording, avoiding the qualification of infants as “sighted.”

      l188, I don't understand the sentence "By contrast, in sighted adults, this cross-hemisphere difference is weak or absent."

      We thank the reviewer for noting that this sentence was unclear. We have revised the text to provide a more precise explanation. The text now reads:

      “By contrast, in sighted adults this lateralized pattern is weaker: visual areas in each hemisphere show only a modest preference for ipsilateral prefrontal cortices, and connectivity with the contralateral PFC remains comparatively strong.” (page 8, line 207)

      l193: "Secondary visual cortices showed a significant within > between difference in both groups, with a lager effect in the blind group": providing effect sizes for the 2 groups would strengthen this result (+ note the typo laRger).<br /> - Figure S7, S11: Please add titles of y-axes.

      Thank you for this helpful suggestion. We have corrected the typo and added the effect sizes for both groups in the revised text. The revised sentence now reads as follows:

      “Secondary visual cortices showed a significant within > between difference in both groups, with a larger effect in the blind group (post-hoc tests, Bonferroni-corrected paired: t-test: sighted adults within hemisphere > between hemisphere: t (49) = 7.441, p = 0.012, cohen’d = 0.817; blind adults within hemisphere > between hemisphere: t (29) = 10.735, p < 0.001, cohen’d = 1.96).” (page 9, line 214)

      Titles of the y-axes have also been added to Figures S7 and S11.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Lesser et al provide a comprehensive description of Drosophila wing proprioceptive sensory neurons at the electron microscopy resolution. This “tour-de-force” provides a strong foundation for future structural and functional research aimed at understanding wing motor control in Drosophila with implications for understanding wing control across other insects.

      Strengths:

      (1) The authors leverage previous research that described many of the fly wing proprioceptors, and combine this knowledge with EM connectome data such that they now provide a near-complete morphological description of all wing proprioceptors.

      (2) The authors cleverly leverage genetic tools and EM connectome data to tie the location of proprioceptors on the wings with axonal projections in the connectome. This enables them to both align with previous literature as well as make some novel claims.

      (3) In addition to providing a full description of wing proprioceptors, the authors also identified a novel population of sensors on the wing tegula that make direct connections with the B1 wing motor neurons, implicating the role of the tegula in wing movements that was previously underappreciated.

      (4) Despite being the most comprehensive description so far, it is reassuring that the authors clearly state the missing elements in the discussion.

      Weaknesses:

      (1) The authors do their main analysis on data from the FANC connectome but provide corresponding IDs for sensory neurons in the MANC connectome. I wonder how the connectivity matrix compares across FANC and MANC if the authors perform a similar analysis to the one they have done in Figure 2. This could be a valuable addition and potentially also pick up any sexual dimorphism.

      We agree that systematic comparisons will provide valuable insights as more connectome datasets become available. However, the primary goal of this study was to link central axon morphology with peripheral structures in the wing. We deliberately omitted more detailed and quantitative analyses of the downstream VNC circuitry, apart from providing a global view of the connectivity matrix and using it to cluster the sensory axon types. A more detailed and systematic comparison of wing sensorimotor circuit connectivity across different connectome datasets (FANC, MANC, BANC, IMAC) is the subject of ongoing work in our lab, which we feel is beyond the scope of this study. Here, we chose to match the wing proprioceptors to axons in MANC to demonstrate their stereotypy across individuals and to make them more accessible to other researchers. We found no obvious sexual dimorphism at the level of wing sensory neurons. We now note this in the Discussion.

      (2) The authors speculate about the presence of gap junctions based on the density of mitochondria. I’m not convinced about this, given that mitochondrial densities could reflect other things that correlate with energy demands in sub-compartments.

      We have moved speculation about mitochondria and gap junctions to the Discussion.

      (3) I’m intrigued by how the tegula CO is negative for iav. I wonder if authors tried other CO labeling genes like nompc. And what does this mean for the nature of this CO. Some more discussion on this anomaly would be helpful.

      Based on this suggestion, we have added an image showing that tegula CO neurons are labeled by nompC-Gal4.

      (4) The authors conclude there are no proprioceptive neurons in sclerite pterale C based on Chat-Gal4 expression analysis. It would be much more rigorous if authors also tried a pan-neuronal driver like nsyb/elav or other neurotransmitter drivers (Vglut, GAD, etc) to really rule this out. (I hope I didn’t miss this somewhere.)

      To address this, we imaged OK371-GFP, which labels glutamatergic neurons, in the wing and wing hinge. We saw expression in the wing, as others have reported (Neukomm et. al., 2014), but we saw no expression at the wing hinge. Apart from a handful of glutamatergic gustatory neurons in the leg, we are not aware of any other sensory neurons in the fly that are not labeled by Chat-Gal4.

      Overall, I consider this an exceptional analysis that will be extremely valuable to the community.

      We sincerely appreciate the reviewer’s positive feedback.

      Reviewer #2 (Public review):

      Summary:

      Lesser et al. present an atlas of Drosophila wing sensory neurons. They proofread the axons of all sensory neurons in the wing nerve of an existing electron microscopy dataset, the female adult fly nerve cord (FANC) connectome. These reconstructed sensory axons were linked with light microscopy images of full-scale morphology to identify their origin in the periphery of the wing and encoded sensory modalities. The authors described the morphology and postsynaptic targets of proprioceptive neurons as well as previously unknown sensory neurons.

      Strengths:

      The authors present a valuable catalogue of wing sensory neurons, including previously undescribed sensory axons in the Drosophila wing. By providing both connectivity information with linked genetic drive lines, this research facilitates future work on the wing motor-sensory network and applications relating to Drosophila flight. The findings were linked to previous research as well as their putative role in the proprioceptive and nerve cord circuitry, providing testable hypotheses for future studies.

      Weaknesses:

      (1) With future use as an atlas, it should be noted that the evidence is based on sensory neurons on only one side of the nerve cord. Fruit flies have stereotyped left/right hemispheres in the brain and left/right hemisegments in the nerve cord. The comparison of left and right neurons of the nervous system can give a sense of how robust the morphological and connectivity findings are. Here, the authors have not compared the left and right side sensory axons from the wing nerve, leaving potential for developmental variability across samples and left/right hemisegments.

      The right ADMN nerve in the FANC dataset is partially severed, making left/right comparisons unreliable (see Azevedo 2024, Extended Data Figure 4). We have updated the text to explain this within the Methods section of the paper.

      (2) Not all links between the EM reconstructions and driver lines are convincing. To strengthen these, for all EM-LM matches in Figures 3-7, rotated views of the driver line (matching the rotated EM views) should be shown to provide a clearer comparison of the data. In particular, Figure 3G and Figure 7B are not very convincing based on the images shown. MCFO imaging of the driver lines in Figure 3G and 7B would make this position stronger if a clone that matches the EM reconstruction could be identified.

      Many of the z-stack images in the paper are from the Janelia FlyLight collection, and unfortunately their imaging parameters were not optimized for orthogonal views. Rotated views are blurry and not especially helpful for comparison to EM reconstruction. We now point out in the text that interested readers can access the z-stacks from FlyLight to see the dorsal-ventral projections.

      Regarding Figure 3G and 7B, we have added markers to the image with corresponding descriptions in the legend to guide the reader through the image of the busy driver line. Although these lines label many cells in the VNC as a whole, they sparsely label cells in the ADMN, making them nonetheless useful for identifying peripheral sensory neurons.

      (3) Figure 7B looks like the driver line might have stochastic expression in the sensory neuron, which further reduces confidence in the result shown in Figure 7C. Is this expression pattern in the wing consistently seen? Many split-GAL4s have stochastic expressions. The evidence would be strengthened if the authors presented multiple examples (~4-5) of each driver line’s expression pattern in the supplement.

      Figure 7B shows sparse labeling of the driver line using the MCFO technique, as specified in the legend. Its unilateral expression is therefore not due to stochastic expression of the Gal4 line. We have added the “MFCO” label to the image to clarify.

      (4) Certain claims in this work lack quantitative evidence. On line 128, for instance, “Overall, our comprehensive reconstruction revealed many morphological subgroups with overlapping postsynaptic partners, suggesting a high degree of integration within wing sensorimotor circuits.” If a claim of subgroups having shared postsynaptic partners is being made, there should have been quantitative evidence. For example, cosine similar amongst members of each group compared to the cosine similarity of shuffled/randomised sets of axons from different groups. The heat map of cosine similarity in Figure 2B alone is not sufficient.

      We agree that illustrating the extent of shared postsynaptic partners across subgroups strengthens this point. We added a visualization showing pairwise similarity scores for within- and between-cluster neuron pairs (Figure 2B inset). We also performed a permutation test to determine that within-cluster similarity is significantly higher than between clusters, and we report the test in the results as well as the figure legend. This analysis provides a more quantitative summary of the qualitative trends in connectivity that are summarized in Figure 2B.

      (5) Similarly, claims about putative electrical connections to b1 motor neurons are very speculative. The authors state that “their terminals contain very densely packed mitochondria compared to other cells”, without providing a quantitative comparison to other sensory axons. There is also no quantitative comparison to the one example of another putative electrical connection from the literature. Further, it should be noted that this connection from Trimarchi and Murphey, 1997, is also stated as putative on line 167, which further weakens this evidence. Quantification would strongly strengthen this position. Identification of an example of high mitochondrial density at a confirmed electrical connection would be even better. In the related discussion section “A potential metabolic specialization for flight circuitry”, it should be more clearly noted that the dense mitochondria could be unrelated to a putative electrical connection. If the authors have an alternative hypothesis about the mitochondria density, this should be stated as well.

      We agree with the reviewer that the link between mitochondrial density and metabolic specialization is purely speculative in this context. Based on reviewer feedback, we have moved all mention of the relationship between mitochondrial density and gap junction coupling to the Discussion. We acknowledge that this may seem like a somewhat random and not quantitatively supported observation. However, we found the coincidence striking and worthy of mention, though it is only tangentially relevant to the rest of the paper. From conversations with colleagues, we have also heard that this relationship is consistent with as yet unpublished work in other model organisms (e.g., zebrafish, mouse).

      The electrical coupling to b1 motor neurons is well-established (Fayyazuddin and Dickinson, 1999), and we have updated the text to state this more clearly. However, we agree that whether the specific neurons we have identified based on their anatomy are the same ones functionally identified through whole-nerve recordings remains unknown.

      (6) It would be appropriate to cite previous work using a similar strategy to match sensory axons to their cell bodies/dendrites at the periphery using driver lines and connectomics (see Figure 5 for example in the following paper: https://doi.org/10.7554/eLife.40247 ).

      At this point, there are now dozens of papers that match the axons of sensory neurons to their cell bodies/dendrites in the periphery by comparing light microscopy and connectomics. When we dug in, we found examples in C. elegans, Ciona intestinalis, zebrafish, and mouse, all published prior to the study cited above. For basically every animal for which scientists have acquired EM volumes of neural tissue, they have used other anatomical labeling methods to determine cell types inside and outside the imaged volume. In summary, we found it difficult to establish a single primary citation for this approach. In lieu of this, we have added a citation to an earlier review by a pioneer in EM connectomics that discusses the general approach of matching cells across different labeling/imaging modalities (Meinertzhagen et al., 2009).

      The methods section is very sparse. For the sake of replicability, all sections should be expanded upon.

      We have expanded the methods section, and also a STAR methods table.

      Reviewer #3 (Public review):

      Summary:

      The authors aim to identify the peripheral end-organ origin in the fly’s wing of all sensory neurons in the anterior dorsomedial nerve. They reconstruct the neurons and their downstream partners in an electron microscopy volume of a female ventral nerve cord, analyse the resulting connectome, and identify their origin with a review of the literature and imaging of genetic driver lines. While some of the neurons were already known through previous work, the authors expand on the identification and create a near-complete map of the wing mechanosensory neurons at synapse resolution.

      Strengths:

      The authors elegantly combine electron microscopy, neuron morphology, connectomics, and light microscopy methods to bridge the gap between fly wing sensory neuron anatomy and ventral nerve cord morphology. Further, they use EM ultrastructural observations to make predictions on the signaling modality of some of the sensory neurons and thus their function in flight.

      The work is as comprehensive as state-of-the-art methods allow to create a near-complete mapof the wing mechanosensory neurons. This work will be of importance to the field of fly connectomics and modelling of fly behavior, as well as a useful resource to the Drosophila research community.

      Through this comprehensive mapping of neurons to the connectome, the authors create a lot of hypotheses on neuronal function, partially already confirmed with the literature and partially to be tested in the future. The authors achieved their aim of mapping the periphery of the fly’s wing to axonal projections in the ventral nerve cord, beautifully laying out their results to support their mapping.

      The authors identify the neurons in a previously published connectome of a male fly ventral nerve cord to enable cross-individual analysis of connections. Further, together with their companion paper, Dhawan et al. 2025, describing the haltere sensory neurons in the same EM dataset, they cover the entire mechanosensory space involved in Drosophila flight.

      Weaknesses:

      The connectomic data are only available upon request; the inclusion of a connectivity table of the reconstructed neurons would aid analysis reproducibility and cross-dataset comparisons.

      We have added a connectivity table as well as analysis scripts in the github repository for the paper (https://github.com/EllenLesser/Lesser_eLife_2025).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The methods section should be expanded in every aspect. Most pressing sections are:

      (1) Data and Code availability: All code should be included as a Zenodo database, the suggestion to ask authors for code upon request is inappropriate.

      We have added all code to a public github repository, which is now linked in the Methods section.

      (2) Samples: Standard cornmeal and molasses medium should have a reference, as many institutes use different recipes.

      The recipe used by the University of Washington fly kitchen is based on the Bloomington standard Cornmeal, Molasses and Yeast Medium recipe, which can be found at https://bdsc.indiana.edu/information/recipes/molassesfood.html. The UW recipe is slightly modified for different antifungal ingredients and includes tegosept, propionic acid, and phosophoric acid.

      (3) Table 3: Driver lines labelling wing sensory neurons: The genetic driver lines should have associated Bloomington stock centre numbers. Additionally, relevant information for effector lines used should be included in the methods.

      We now include the Bloomington stock numbers and more information on effector lines in the STAR methods table.

      Minor corrections:

      (1) Lines 119-120: “Notably, many of the axons do not form crisp cluster boundaries, suggesting that multimodal sensory information is integrated at early stages of sensory processing.” We do not follow the logic of this statement and suspect it is a bit too speculative.

      We removed this sentence from the manuscript.

      (2) Figure 1: The ADMN is missing in the schematics and would be helpful to depict for non-experts. Is this what is highlighted in Figure 1D?

      Yes, and we now label 1D as the ADMN wing nerve.

      (3) Figure 1B: Which driver lines are being depicted here? Looking at Table 3 does not clarify. It should be specified at least in the figure legend.

      As stated in the legend, we include a table of all of the driver lines we screened and which sensory structures they label.

      (4) Figure 1C: There are some minor placement issues with the text in the schematic. There is an arrow very close to the “CO” on the top right, which makes the “O” look like the symbol for male. “ax ii” is a bit too close to the wing hinge

      We updated the figure to address this issue.

      (5) Figure 1D: The outlined grey masks are not clear. The use of colour would be very useful for the reader to help understand what the authors are referring to here

      We now use color for the masks.

      (6) Figure 2A: It is unclear if the descending neuron and non-motor efferent neuron are not shown because they are under the described threshold, or to simplify the plot. They should be included in the plot if over the threshold.

      We have updated the legend to specify that the exclusion of the descending and non-motor efferent neurons are to visually simplify the plot. We include % of sensory output to each of these neurons in the legend, and they are included in the connectivity matrix data in the public  GitHub repository associated with the paper, included in the Methods.

      (7) Figure 2B: What clustering is used specifically? The method says it’s from Scikit-learn, but there are many types of clustering available in this package.

      We now include the specific clustering type used in the Methods section, which is agglomerative clustering.

      (8) Figure 3A: What does the green box behind the plot represent?

      The green box represents the tegula CO axons, which we now specify in the legend.

      (9) Figure 3C: the “C” is clipped at the top.

      We updated the figure to address this issue.

      (10) Figure 4A: the main text says a “group of four axons” (line 203) while the figure says 5 axons.

      We updated the text to address this issue.

      (11) Line 360: “We found that the campaniform sensilla on the tegula provide the most direct feedback onto wing steering motor neurons”. We struggled to find where this was directly shown, because several sensory axon types directly synapse onto motor neurons.

      We now specify in the text that this finding is shown in Figure 3.

      Reviewer #3 (Recommendations for the authors):

      I would like to congratulate the authors on their beautiful, easy-to-read, and easy-to-comprehend manuscript, with clear figures and nice visualizations. This work provides a valuable resource that will contribute to the interpretability of connectomic data and further to connectome-based modeling of fly behavior.

      We sincerely appreciate the reviewer’s positive feedback.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This article deals with the chemotactic behavior of E coli bacteria in thin channels (a situation close to 2D). It combines experiments and simulations.

      The authors show experimentally that, in 2D, bacteria swim up a chemotactic gradient much more effectively when they are in the presence of lateral walls. Systematic experiments identify an optimum for chemotaxis for a channel width of ~8µm, close to the average radius of the circle trajectories of the unconfined bacteria in 2D. It is known that these circles are chiral and impose that the bacteria swim preferentially along the right-side wall when there is no chemotactic gradient. In the presence of a chemotactic gradient, this larger proportion of bacteria swimming on the right wall yields chemotaxis. This effect is backed by numerical simulations and a geometrical analysis.

      If the conclusions drawn from the experiments presented in this article seem clear and interesting, I find that the key elements of the mechanism of this wall-directed chemotaxis are not sufficiently emphasized. Moreover, the paper would be clearer with more details on the hypotheses and the essential ingredients of the analyses.

      We thank the reviewer for these constructive suggestions. We agree that emphasizing the underlying mechanism is crucial for the clarity of our findings. In the revised manuscript, we have now explicitly highlighted the critical roles of chiral circular motion and the alignment effect following side-wall collisions in both the Abstract (lines 25-27) and the Discussion (lines 391-393). Furthermore, we have added a new analysis of bacterial trajectories post-collision (Fig. S2), which demonstrates that cells predominantly align with and swim along the sidewalls. We have also clarified the assumptions in our numerical simulations, specifically how the radius of circular trajectories and the alignment effect are incorporated into the equations of motion. Please refer to our detailed responses in the "Recommendations for the authors" section for further specifics.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors investigated the chemotaxis of E. coli swimming close to the bottom surface in gradients of attractant in channels of increasingly smaller width but fixed height = 30 µm and length ~160 µm. In relatively large channels, they find that on average the cells drift in response to the gradient, despite cells close to the surface away from the walls being known to not be chemotactic because they swim in circles.

      They find that this average drift is due to the cell localization close to the side walls, where they slide along the wall. Whereas the bacteria away from the walls have no chemotaxis (as shown before), the ones on the left side wall go down-gradient on average, but the ones on the right-side wall go up-gradient faster, hence the average drift. They then study the effect of reducing channel width. They find that chemotaxis is higher in channels with a width of about 8 µm, which approximately corresponds to the radius of the circular swimming R. This higher chemotactic drift is concomitant to an increased density of cells on the RSW. They do simulations and modeling to suggest that the disruption of circular swimming upon collision with the wall increases the density of cells on the RSW, with a maximal effect at w = ~ 2/3 R, which is a good match for their experiments.

      Strengths:

      The overall result that confinement at the edge stabilises bacterial motion and allows chemotaxis is very interesting although not entirely unexpected. It is also important for understanding bacterial motility and chemotaxis under ecologically relevant conditions, where bacteria frequently swim under confinement (although its relevance for controlling infections could be questioned). The experimental part of the study is nicely supported by the model.

      Weaknesses:

      Several points of this study, in particular the interpretation of the width effect, need better clarification:

      (1) Context:

      There are a number of highly relevant previous publications that should have been acknowledged and discussed in relation to the current work:

      https://pubs.rsc.org/en/content/articlehtml/2023/sm/d3sm00286a

      https://link.springer.com/article/10.1140/epje/s10189-024-00450-7

      https://doi.org/10.1016/j.bpj.2022.04.008

      https://doi.org/10.1073/pnas.1816315116

      https://www.pnas.org/doi/full/10.1073/pnas.0907542106

      https://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1038/s41467-020-15711-0

      http://doi.org/10.1039/c5sm00939a

      We appreciate the reviewer bringing these important publications to our attention. We have now cited and discussed these works in the Introduction (lines 55-62 and 76-85) to better contextualize our study regarding bacterial motility and chemotaxis in confined geometries.

      (2) Experimental setup:

      a) The channels are built with asymmetric entrances (Figure 1), which could trigger a ratchet effect (because bacteria swim in circle) that could bias the rate at which cells enter into the channel, and which side they follow preferentially, especially for the narrow channel. Since the channel is short (160 µm), that would reflect on the statistics of cell distribution. Controls with straight entrances or with a reversed symmetry of the channel need to be performed to ensure that the reported results are not affected by this asymmetry.

      We appreciate the reviewer's insight regarding the potential ratchet effect caused by asymmetric entrances. To rule this out, we fabricated a control device with straight entrances and repeated the measurements. As shown in Figure S3, the chemotactic drift velocity follows the same trend as observed in the original setup, confirming an optimal width of ~9 mm. These results demonstrate that the entrance geometry does not bias the reported statistics. We have updated the manuscript text at lines 233-235.

      b) The authors say the motile bacteria accumulate mostly at the bottom surface. This is strange, for a small height of 30 µm, the bacteria should be more-or-less evenly spread between the top and bottom surface. How can this be explained?

      We apologize for not explaining this clearly in the text. As shown by Wei et al., Phys. Rev. Lett. 135, 188401 (2025), significant surface accumulation occurs in channels with heights exceeding 20 µm. In our specific experimental setup, we did not use Percoll to counteract gravity. Therefore, the bacteria accumulated mostly at the bottom surface under the combined influence of gravity and hydrodynamic attraction. This bottom-surface localization is supported by our observation that the bacterial trajectories were predominantly clockwise (characteristic of the bottom surface) rather than counter-clockwise (characteristic of the top surface). We have added this explanation to Line 141.

      c) At the edge, some of the bacteria could escape up in the third dimension (http://doi.org/10.1039/c5sm00939a). What is the magnitude of this phenomenon in the current setup? Does it have an effect?

      We thank the reviewer for raising this important point regarding 3D escape. We have quantified this phenomenon and found the escape rate from the edge into the third dimension to be 0.127 s<sup>-1</sup>. This corresponds to a mean residence time that allows a cell moving at 20 mm/s to travel approximately 157.5 mm along the edge. Since this distance is comparable to the full length of our lanes (~160 mm), most cells traverse the entire edge without escaping. Furthermore, our analysis is based on the average drift of the surface trajectories per unit of time; this metric is independent of the absolute number of cells present. Therefore, the escape phenomenon does not significantly impact our conclusions. We have added a statement clarifying this at line 154.

      d) What is the cell density in the device? Should we expect cell-cell interactions to play a role here? If not, I would suggest to de-emphasize the connection to chemotaxis in the swarming paper in the introduction and discussion, which doesn't feel very relevant here, and rather focus on the other papers mentioned in point 1.

      The cell density in our experiments was approximately 1.3×10<sup>-3</sup> μm<sup>-2</sup>. Given this low density, we do not expect cell-cell interactions to play a role in the observed behaviors.

      Regarding the connection to swarming chemotaxis: We agree that our low-density setup differs from a high-density swarm; however, we believe the comparison remains relevant for two reasons. First, it provides a necessary contrast to studies showing surface inhibition of chemotaxis. Second, while we eliminate cell-cell interactions, we isolate the geometric aspect of swarming. In a swarm, cells move within narrow lanes created by their neighbors. Our device mimics this specific physical confinement by replacing neighboring cells with PDMS sidewalls. This allows us to decouple the effects of physical confinement from cell-cell interactions. We have added the text (Line 370) to clarify this rationale and have incorporated the additional references in introduction as suggested in point 1.

      e) We are not entirely convinced by the interpretation of the results in narrow channels. What is the causal relationship between the increased density on the RSW and the higher chemotactic drift? The authors seem to attribute higher drift to this increased RSW density, which emerges due to the geometric reasons. But if there is no initial bias, the same geometric argument would induce the same increased density of down-gradient swimmers on the LSW, and so, no imbalance between RSW and LSW density. Could it be the opposite that the increased RSW density results from chemotaxis (and maybe reinforces it), not the other way around? Confinement could then deplete one wall due to the proximity of the other, and/or modify the swimming pattern - 8 µm is very close to the size of the body + flagellum. To clarify this point, we suggest measuring the bacterial distributions in the absence of a gradient for all channel widths as a control.

      We thank the reviewer for this insightful comment regarding the causal relationship between cell density and chemotactic drift. We apologize if the initial explanation was unclear.

      Regarding the no-gradient control: Without an attractant gradient (and no initial bias), there is no breaking of symmetry and the labels of "LSW" and "RSW" are arbitrary. Therefore, there will be no asymmetry in the bacterial distributions on both sides (within experimental fluctuations) in the absence of a gradient for any channel width.

      Regarding the causality and density imbalance: We agree that the increased RSW density is a result of chemotaxis, which is then reinforced by the lane geometry especially at narrow lane width. The mechanism relies on the coupling of chemotactic bias with surface circularity. The angle ranges that lead to RSW-UG accumulation (Fig. 6A-C) coincide with the up-gradient direction. Because these cells experience suppressed tumbling (longer runs), they can maintain the steady circular trajectories required to reach and align with the RSW. Conversely, while pure geometric analysis suggests a similar potential for LSW-DG accumulation, these trajectories coincide with the down-gradient direction. These cells experience enhanced tumbling, which distorts the circular trajectories. This prevents them from effectively reaching the LSW and also increases the probability of them leaving the wall. Therefore, the causality is indeed a positive feedback loop: the attractant gradient creates an initial bias that allows the RSW-UG fraction to form stable trajectories; the optimal lane width (matching the swimming radius) then maximizes this capture efficiency, further enriching the RSW fraction and enhancing the overall drift.

      We have added clarifications regarding these points in the revised manuscript (the last paragraph of “Results”).

      (3) Simulations:

      The simulations treat the wall interaction very crudely. We would suggest treating it as a mechanical object that exerts elastic or "hard sphere" forces and torques on the bacteria for more realistic modeling.

      We appreciate the reviewer's suggestion to incorporate more detailed mechanical interactions, such as elastic or hard-sphere forces, for the wall collisions. While we agree that a full hydrodynamic or mechanical model would offer higher fidelity, our experimental observations suggest that a simplified kinematic approach is sufficient for the specific phenomena studied here.

      As shown in the new Fig. S2, our analysis of cell trajectories in the 44-µm-wide channels reveals that cells colliding with the sidewalls tend to align with the surface almost instantaneously. The timescale required for this alignment is negligible compared to the typical wall residence time (see also Ref. 6). Consequently, to maintain computational efficiency without sacrificing the essential physics of the accumulation effect, we employed a coarse-grained phenomenological model where a bacterium immediately aligns parallel to the wall upon contact, similar to approaches used previously (Ref. 43). We have added relevant text to the manuscript on lines 168-171.

      Notably, the simulations have a constant (chemotaxis independent) rate of wall escape by tumbling. We would expect that reduced tumbling due to up-gradient motility induces a longer dwell time at the wall.

      We apologize for the confusion. The chemotaxis effect is indeed fully integrated into our simulation. Specifically, the simulated cells sense the chemical gradient and adjust their motor CW bias (B) accordingly. This adjustment directly modulates the tumble rate (k), calculated as k \= B/0.31 s<sup>-1</sup>. Consequently, the wall escape rate is not constant but varies with the chemotactic response. We also imposed a maximum detention time limit which, when combined with the variable tumble rate, results in an average wall residence time of approximately 2 s, consistent with our experimental observations (Fig. S6B). We have clarified these details in the final section of 'Materials and Methods'.

      Reviewer #3 (Public review):

      This paper addresses through experiment and simulation the combined effects of bacterial circular swimming near no-slip surfaces and chemotaxis in simple linear gradients. The authors have constructed a microfluidic device in which a gradient of L-aspartate is established to which bacteria respond while swimming while confined in channels of different widths. There is a clear effect that the chemotactic drift velocity reaches a maximum in channel widths of about 8 microns, similar in size to the circular orbits that would prevail in the absence of side walls. Numerical studies of simplified models confirm this connection.

      The experimental aspects of this study are well executed. The design of the microfluidic system is clever in that it allows a kind of "multiplexing" in which all the different channel widths are available to a given sample of bacteria.

      While the data analysis is reasonably convincing, I think that the authors could make much better use of what must be voluminous data on the trajectories of cells by formulating the mathematical problem in terms of a suitable Fokker-Planck equation for the probability distribution of swimming directions. In particular, I would like to see much more analysis of how incipient circular trajectories are interrupted by collisions with the walls and how this relates to enhanced chemotaxis. In essence, there needs to be a much clearer control analysis of trajectories without sidewalls to understand the mechanism in their presence.

      We thank the reviewer for this insightful suggestion. We agree that understanding how circular trajectories are interrupted by wall collisions is central to explaining the enhanced chemotaxis. While we did not explicitly formulate a Fokker-Planck equation, we have addressed the reviewer's core point by employing two complementary mathematical approaches that model the probability distribution of swimming directions and wall interactions:

      (1) Stochastic simulations (Langevin approach): As detailed in the "Simulation of E. coli chemotaxis within lane confinements" subsection of “Results” and Figure 5, we modeled cells as self-propelled particles performing random walks. This model explicitly accounts for the "interruption" of circular trajectories by incorporating a constant angular velocity (circular swimming) and an alignment effect upon collision with sidewalls. These simulations successfully reproduced the experimental trends, confirming that the interplay between circular radius and lane width determines the optimal drift velocity.

      (2) Geometric probability analysis: To provide the "intuitive understanding", we included a specific Geometrical Analysis section (the last subsection of “Results”) and Figure 6. This analysis mathematically formulates the problem by calculating the exact proportion of swimming angles that allow a cell to transition from a circular trajectory in the bulk to an up-gradient trajectory along the Right Sidewall (RSW). By integrating over the possible swimming directions, we derived the probability of wall interception as a function of lane width (w) and swimming radius (r). This analysis reveals that the interruption of circular paths is most favorable for chemotaxis when w » (0.7-0.8)´r.

      (3) Control analysis: regarding the "control analysis of trajectories without sidewalls," we utilized the cells in the Middle Area (MA) of the wide lanes as an internal control. As shown in Fig. 2B and 4A, these cells exhibit typical surface-associated circular swimming (Fig. 3B) but generate zero net drift. This serves as the baseline "no sidewall" condition, demonstrating that the chemotactic enhancement is strictly driven by the rectification of circular swimming into wall-aligned motion at the boundaries.

      The authors argue that these findings may have relevance to a number of physiological and ecological contexts. Yet, each of these would be characterized by significant heterogeneity in pore sizes and geometries, and thus it is very unclear whether or how the findings in this work would carry over to those situations.

      We thank the reviewer for this important observation regarding environmental heterogeneity. We agree that we should be cautious about directly extrapolating to complex ecological contexts without qualification. We have revised the last sentence of the abstract to adopt a more measured tone: "Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections."

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Key elements of the mechanism of wall-directed chemotaxis are not sufficiently emphasized:

      For instance, the chirality of the trajectories is an essential part of the analysis but is mentioned only briefly in the introduction. In the geometrical analysis, I understand that one of the critical parameters is the angle at which bacteria "collide" with the walls. But, again, this remains largely implicit in the discussion. This comes to the point that these ideas are not even mentioned in the abstract which doesn't provide any hint of a mechanism. An analysis of the actual trajectories of the cells after they hit the walls, as a function of their initial angle would be helpful in comparison with the simulations and the geometrical analysis.

      We appreciate the reviewer's insightful comment regarding the need to better emphasize the mechanism of wall-directed chemotaxis. We agree that the chirality of trajectories and the geometry of wall collisions are central to our analysis and were previously under-emphasized.

      To address this, we have made the following revisions:

      (1) We have revised the Abstract (lines 25-27) and the Discussion (lines 391-393) to explicitly highlight the crucial role of chiral circular motion and the alignment effect following sidewall collisions.

      (2) We further analyzed bacterial trajectories at different collision angles. Typical examples are shown in Supplementary Fig. S2. We observed that cells tend to align with and swim along the sidewalls regardless of their initial collision angles. This finding is now described in the main text at lines 168-171.

      The motion of the bacteria is modelled as run-and-tumble at several places in the manuscript, and in particular in the simulations. Yet, the trajectories of the bacteria seem to be smooth in this almost 2D geometry, except of course when they directly interact with the walls (I hardly see tumbles in the MA region in Figure 1B). Can the authors elaborate on the assumptions made in the numerical simulations? In particular, how is the radius of the trajectories included in these equations of motion (line 514)?

      We apologize for the lack of clarity regarding the bacterial motion model. It has been established that while bacteria do tumble near solid surfaces, they exhibit a smaller reorientation angle compared to bulk fluids; in fact, the most probable reorientation angle on a surface is zero (Ref. 41). Consequently, tumbles are often difficult to distinguish from runs with the naked eye. Additionally, the trajectories in Figure 1B are plotted on a 44 mm ´ 150 mm canvas with unequal coordinate scales, which may further obscure the visual distinctness of tumbling events.

      Regarding the equations of motion: We modeled the bacteria as self-propelled particles governed by the internal chemotaxis pathway, alternating between run and tumble states. As noted in the equations on lines 286 & 578, we incorporated the circular motion by introducing a constant angular velocity, −ν<sub>0</sub>/r, during the run state. Here, ν<sub>0</sub> represents the swimming speed, r denotes the radius of circular swimming, and the negative sign indicates clockwise chirality. Furthermore, to model the hydrodynamic interaction with the boundaries, we assumed that when a cell collides with a sidewall, its velocity vector instantly aligns parallel to that wall.

      The comparison of Figure 5B (simulations) with Figure 4B (experiments) does not strike me as so "similar". Why are the points at small widths so noisy (Figure 5AB)? Figure 5C is cut at these widths, it should be plotted over the entire scale.

      We acknowledge that the agreement between simulation and experiment is less robust in the narrowest channels. The discrepancy and "noise" at small widths in Figure 5 arise from the limitations of the self-propelled particle model in highly confined geometries. Specifically, our simulation treats bacteria as point particles and does not explicitly calculate the physical exclusion (steric effects) caused by the finite size of the flagella and cell body.

      In the experimental setup, steric constraints within narrow channels (comparable to the cell size) restrict the cells' ability to turn freely, effectively stabilizing their motion. However, because our model allows particles to reorient more freely than actual cells would in such confined spaces, it produces fluctuations and an overestimation of the drift velocity at small widths. If these confinement effects were fully incorporated, the cell density mismatch between the left and right sidewalls would be reduced, leading to lower drift velocities that match the experimental data more closely.

      Regarding Figure 5C: Since the "active particle" assumption loses physical validity in channels narrower than the scale of the bacterium, the simulation results in this regime are not representative of biological reality. Plotting these non-physical points would distort the analysis. Therefore, we have maintained the truncation of Figure 5C at 4 mm to ensure the data presented is physically meaningful. We have added a clear discussion of these model limitations to the manuscript at lines 310-314.

      These important precisions should be added to the text or in a supplementary section. A validated mechanism describing in detail the impact of the walls on the cell trajectories would greatly improve the conclusions.

      We thank the reviewer for the suggestions. As noted in the responses above, we have incorporated the details concerning the simulation assumptions and the model limitations at narrow widths into the revised manuscript. We have performed further analysis of the collision trajectories between bacteria and the sidewalls. As illustrated in the new Fig. S2, the data confirms that cells tend to align with and swim along the sidewalls following a collision, regardless of the initial impact angle.

      Reviewer #2 (Recommendations for the authors):

      Minor points

      (1) Related to swimming in 3D: The authors should specify the depth of field of the objective in their setup.

      We thank the reviewer for pointing this out. We have calculated the depth of field (DOF) of our objective to be approximately 3.7 µm. This estimate is based on the standard formula:

      where l = 610 nm (emission wavelength), n = 1.0 (refractive index), NA = 0.45 (numeric aperture), M = 20 (magnification), and e = 6.5 µm (camera resolution). We have added this specification to the "Microscopy and Data Acquisition" section of “Materials and Methods”.

      (2) Related to the interpretation of the width effect: We think plotting the cell enrichment, ie the probabilities P in Figure 4B normalized to the expected value if cells were homogeneously distributed ((3µm)/w for the side walls, (w - 6µm)/w for the middle) would help understand the strength of the wall 'siphoning' effect.

      We thank the reviewer for the suggestion. We have calculated the cell enrichment by normalizing the observed probabilities against the expected values for a homogeneous distribution, as suggested. The resulting relationship between cell enrichment and lane width is presented in Figure S4.

      Related to simulations:

      (1) Showing vd for the 3 regions in Figure S5 would be helpful also to understand the underlying mechanism.

      We thank the reviewer for the suggestion. The V<sub>d</sub> values for the three regions are shown in Fig. S5.

      (2) Figure 5B vs 4B: There is a mismatch in the right vs left side density at w=6µm in the simulations that is not here in the experiments. What could explain this difference?

      We appreciate the reviewer pointing this out. The mismatch in the simulations is due to the simplified treatment of cells as self-propelled particles, which overlooks the physical volume of the cell body and flagella. In narrow channels (w\=6 mm), these physical constraints would restrict the cells' ability to change direction freely - a factor not fully captured in the simulation. Accounting for these steric effects would trap cells more effectively against the walls, reducing the density asymmetry between the LSW and RSW and lowering the drift velocity. This would bring the simulation results closer to the experimental observations. We have added a discussion of these limitations and effects to the revised manuscript (lines 310-314).

      (3) The simulations essentially assume that the density of motile cells is homogeneous and equal at both x=0 and x=L open ends of the channel. Is it the case in the experiments, even with the gradient, and the walls creating some cell transport?

      We thank the reviewer for pointing this out. The simulation assumption is consistent with our experimental observations. Our data were recorded within 160-μm-long lanes located in the center of the wider (400 μm) cell channel. In this central region, the cells maintain a continuous flux. Furthermore, experiments were performed within 8 min of flow, limiting the time for significant cell density gradients to establish. As illustrated in Author response image 11, the inhomogeneity in the measured cell density distribution is insignificant across the length of the observation window, indicating that the walls and gradient do not create significant heterogeneity at the boundaries of the region of interest.

      Author response image 1.

      The cell density distribution along the gradient field from the data of 44-μm-wide lane.

      (4) Line 506: There is something strange with the definition of the bias. B cannot be the tumbling bias if k=B/0.31 s<sup>-1</sup> and the tumble-to-run rate is 5/s, because then the tumbling bias is B/0.31 / (B/0.31 + 5). Please clarify.

      We apologize for the confusion caused by the notation. In our model, B represents the CW bias of the individual flagellar motor, not the macroscopic tumbling bias of the cell. We assume the run-to-tumble rate is equivalent to the motor CCW-to-CW switching rate (k). Previous studies have shown that this rate increases linearly with the motor CW bias according to k=B/t, where t is a characteristic time (Ref. 50).

      Based on experimental data for wildtype cells, the average run time in the near-surface region is ~2.0 s (corresponding to a run-to-tumble rate of ~0.5 s<sup>-1</sup>) (Ref. 11), and the steady-state wildtype CW bias is ~0.15. Using these values, we determined t ~ 0.31 s. Consequently, the switching rate is defined as k=B/0.31 s<sup>-1</sup>. Since the tumble duration is constant (0.2 s) (Ref. 51), the tumble-to-run rate is fixed at 5 s<sup>-1</sup>. We have clarified these definitions and parameter values in lines 569-573.

      Other minor comments:

      (1) Line 20 and lines 34-35: We think that the connection to infection is questionable here and should be toned down.

      Thank you for the suggestion. We have revised Line 20 to read: “Understanding bacterial behavior in confined environments is helpful to elucidating microbial ecology and developing strategies to manage bacterial infections.” Additionally, we modified lines 34-35 to state: “Our results may offer insights into bacterial navigation in complex biological environments such as host tissues and biofilms, providing a preliminary step toward exploring microbial ecology in confined habitats and potential strategies for controlling bacterial infections.”

      (2) Line 49: Consider highlighting the change in the sense of rotation at the air-liquid interface.

      Thank you for the suggestion. We have now highlighted the difference in chirality between trajectories at the air-liquid interface and those at the liquid-solid interface. The text has been updated to read: “For example, E. coli swim clockwise when observed from above a solid surface, whereas Caulobacter crescentus move in tight, counter-clockwise circles when viewed from the liquid side.”

      (3) Lines 58-59: The sentence should be better formulated, explaining what is CheY-P and that its concentration changes because of a change in phosphorylation (P).

      Thank you for the suggestion. We have reformulated this section to explicitly define CheY-P and explain how its concentration is regulated through phosphorylation. The revised text reads: “The transmembrane chemoreceptors detect attractants or repellents and transmit signals into the cell by modulating the autophosphorylation of the histidine kinase CheA. Attractant binding suppresses CheA autophosphorylation, while repellent binding promotes it. This modulation alters the concentration of the phosphorylated response regulator protein, CheY-P.”

      (4) Lines 63-64: CheR CheB do a bit more than "facilitating" adaptation, they mediate it. The notation CheB(p) may be confusing, since "-P" was used above for CheY.

      Thank you for pointing this out. We have corrected the notation and strengthened the description of the enzymes' roles. The revised text is: “The adaptation enzymes CheR and CheB methylate and demethylate the receptors, respectively, mediating sensory adaptation.”

      (5) Line 130: there must be a typo in the formula.

      We have replaced the ambiguous lag time variable in Fig. 1C with _n_Δt to ensure mathematical consistency.

      (6) Additionally, \Delta t is both the time between the frame here and the lag time in Figure 1.

      Thank you for highlighting this ambiguity. We have updated the notation to distinguish these two values. The lag time in Figure 1 is now explicitly denoted as _n_Δt, while Δt remains the time interval between individual frames.

      (7) Line 162: "Consistent with previous reports," a reference to said reports is missing.

      Thank you for pointing this out. We have now added the reference (Ref. 41) to support this statement.

      (8) Figure 1B: Are these tracks in the presence of a gradient? Same as used in panel C? This needs to be explained.

      Response: Thank you for this question. We confirm that the tracks shown in Figure 1B were indeed recorded in the presence of a gradient and represent a subset of the data used in Figure 1C. We have clarified this in the figure legend as follows: "Thirty bacterial trajectories selected from the data of the 44-mm-wide lane in gradient assays. These represent a subset of the trajectories analyzed in panel C."

      (9) Simulations: the equation for x(t) should also be given for completeness.

      Thank you for the suggestion. For completeness, we have added the position updating equations for the run state to the Materials and Methods section (lines 579-580). The equations are defined as:

      (10) Figure S2: For the swimming directions that are more unstable due to the surface friction torque, RSW-DG, and LSW-UG, one would have expected that the Up-gradient motion is more persistent than the down gradient one. It seems to be the opposite. Is it significant, and what could be the reason for this?

      We apologize for the lack of clarity in our original explanation. While we would generally expect up-gradient motion to be more persistent than down-gradient motion in bulk fluid, our measurements near the surface show a different trend due to the specific contributions of run and tumble states to the escape rate. Cells swimming up-gradient (UG) in the LSW experience higher probability of running. Consequently, they are subjected to the destabilizing surface friction torque for a greater proportion of time compared to cells swimming down-gradient (DG) in the RSW. This can be explained mathematically. The escape rates for RSW-DG and LSW-UG can be expressed as:

      Where B<sup>+</sup> and B<sup>−</sup> represent the tumble bias (probability of tumbling) when swimming up-gradient and down-gradient, respectively, and k<sub>T</sub> and k<sub>R</sub> denote the escape rates during a tumble and a run, respectively. Due to the chemotactic response, 0≤ B<sup>+</sup>< B<sup>−</sup> ≤1. Crucially, our system is characterized by k<sub>R</sub>>k<sub>T</sub> (the escape rate is higher during a run than a tumble). Therefore, the lower tumble bias during up-gradient swimming (B<sup>+</sup>< B<sup>−</sup>) increases the weight of the run-state escape term((1−B<sup>+</sup>)k<sub>R</sub>), leading to a higher overall escape rate for LSW-UG compared to RSW-DG. We have added an intuitive understanding of k<sub>R</sub>>k<sub>T</sub> in the Supplemental text.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This is a careful and comprehensive study demonstrating that effector-dependent conformational switching of the MT lattice from compacted to expanded deploys the alpha tubulin C-terminal tails so as to enhance their ability to bind interactors.

      Strengths:

      The authors use 3 different sensors for the exposure of the alpha CTTs. They show that all 3 sensors report exposure of the alpha CTTs when the lattice is expanded by GMPCPP, or KIF1C, or a hydrolysis-deficient tubulin. They demonstrate that expansion-dependent exposure of the alpha CTTs works in tissue culture cells as well as in vitro.

      Weaknesses:

      There is no information on the status of the beta tubulin CTTs. The study is done with mixed isotype microtubules, both in cells and in vitro. It remains unclear whether all the alpha tubulins in a mixed isotype microtubule lattice behave equivalently, or whether the effect is tubulin isotype-dependent. It remains unclear whether local binding of effectors can locally expand the lattice and locally expose the alpha CTTs.

      Appraisal:

      The authors have gone to considerable lengths to test their hypothesis that microtubule expansion favours deployment of the alpha tubulin C-terminal tail, allowing its interactors, including detyrosinase enzymes, to bind. There is a real prospect that this will change thinking in the field. One very interesting possibility, touched on by the authors, is that the requirement for MAP7 to engage kinesin with the MT might include a direct effect of MAP7 on lattice expansion.

      Impact:

      The possibility that the interactions of MAPS and motors with a particular MT or region feed forward to determine its future interaction patterns is made much more real. Genuinely exciting.

      We thank the reviewer for their positive response to our work. We agree that it will be important to determine if the bCTT is subject to regulation similar to the aCTT. However, this will first require the development of sensors that report on the accessibility of the bCTT, which is a significant undertaking for future work.

      We also agree that it will be important to examine whether all tubulin isotypes behave equivalently in terms of exposure of the aCTT in response to conformational switching of the microtubule lattice.

      We thank the reviewer for the comment about local expansion of the microtubule lattice. We believe that Figure 3 does show that local binding of effectors can locally expand the lattice and locally expose the alpha-CTTs. We have added text to clarify this.

      Reviewer #2 (Public review):

      The unstructured α- and β-tubulin C-terminal tails (CTTs), which differ between tubulin isoforms, extend from the surface of the microtubule, are post-translationally modified, and help regulate the function of MAPs and motors. Their dynamics and extent of interactions with the microtubule lattice are not well understood. Hotta et al. explore this using a set of three distinct probes that bind to the CTTs of tyrosinated (native) α-tubulin. Under normal cellular conditions, these probes associate with microtubules only to a limited extent, but this binding can be enhanced by various manipulations thought to alter the tubulin lattice conformation (expanded or compact). These include small-molecule treatment (Taxol), changes in nucleotide state, and the binding of microtubule-associated proteins and motors. Overall, the authors conclude that microtubule lattice "expanders" promote probe binding, suggesting that the CTT is generally more accessible under these conditions. Consistent with this, detyrosination is enhanced. Mechanistically, molecular dynamics simulations indicate that the CTT may interact with the microtubule lattice at several sites, and that these interactions are affected by the tubulin nucleotide state.

      Strengths:

      Key strengths of the work include the use of three distinct probes that yield broadly consistent findings, and a wide variety of experimental manipulations (drugs, motors, MAPs) that collectively support the authors' conclusions, alongside a careful quantitative approach.

      Weaknesses:

      The challenges of studying the dynamics of a short, intrinsically disordered protein region within the complex environment of the cellular microtubule lattice, amid numerous other binders and regulators, should not be understated. While it is very plausible that the probes report on CTT accessibility as proposed, the possibility of confounding factors (e.g., effects on MAP or motor binding) cannot be ruled out. Sensitivity to the expression level clearly introduces additional complications. Likewise, for each individual "expander" or "compactor" manipulation, one must consider indirect consequences (e.g., masking of binding sites) in addition to direct effects on the lattice; however, this risk is mitigated by the collective observations all pointing in the same direction.

      The discussion does a good job of placing the findings in context and acknowledging relevant caveats and limitations. Overall, this study introduces an interesting and provocative concept, well supported by experimental data, and provides a strong foundation for future work. This will be a valuable contribution to the field.

      We thank the reviewer for their positive response to our work. We are encouraged that the reviewer feels that the Discussion section does a good job of putting the findings, challenges, and possibility of confounding factors and indirect effects in context. 

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors investigate how the structural state of the microtubule lattice influences the accessibility of the α-tubulin C-terminal tail (CTT). By developing and applying new biosensors, they reveal that the tyrosinated CTT is largely inaccessible under normal conditions but becomes more accessible upon changes to the tubulin conformational state induced by taxol treatment, MAP expression, or GTP-hydrolysis-deficient tubulin. The combination of live imaging, biochemical assays, and simulations suggests that the lattice conformation regulates the exposure of the CTT, providing a potential mechanism for modulating interactions with microtubule-associated proteins. The work addresses a highly topical question in the microtubule field and proposes a new conceptual link between lattice spacing and tail accessibility for tubulin post-translational modification.

      Strengths:

      (1) The study targets a highly relevant and emerging topic-the structural plasticity of the microtubule lattice and its regulatory implications.

      (2) The biosensor design represents a methodological advance, enabling direct visualization of CTT accessibility in living cells.

      (3) Integration of imaging, biochemical assays, and simulations provides a multi-scale perspective on lattice regulation.

      (4) The conceptual framework proposed lattice conformation as a determinant of post-translational modification accessibility is novel and potentially impactful for understanding microtubule regulation.

      Weaknesses:

      There are a number of weaknesses in the paper, many of which can be addressed textually. Some of the supporting evidence is preliminary and would benefit from additional experimental validation and clearer presentation before the conclusions can be considered fully supported. In particular, the authors should directly test in vitro whether Taxol addition can induce lattice exchange (see comments below).

      We thank the reviewer for their positive response to our work. We have altered the text and provided additional experimental validation as requested (see below).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The resolution of the figures is insufficient.

      (2) The provision of scale bars is inconsistent and insufficient.

      (3) Figure 1E, the scale bar looks like an MT.

      (4) Figure 2C, what does the grey bar indicate?

      (5) Figure 2E, missing scale bar.

      (6) Figure 3 C, D, significance brackets misaligned.

      (7) Figure 3E, consider using the same alpha-beta tubulin / MT graphic as in Figure 1B.

      (8) Figure 5E, show cell boundaries for consistency?

      (9) Figure 6D, stray box above the y-axis.

      (11) Figure S3A, scale bar wrong unit again.

      (12) S3B "fixed" and mount missing scale bar in the inset.

      (13) S4 scale bars without scale, inconsistency in scale bars throughout all the figures.

      We apologize for issues with the figures. We have corrected all of the issues indicated by the reviewer.

      (10) Figure 6F, surprising that 300 mM KCL washes out rigor binding kinesin

      We thank the reviewer for this important point. To address the reviewer’s concern, we have added a new supplementary figure (new Figure 6 – Figure Supplement 1) which shows that the washing step removes strongly-bound (apo) KIF5C(1-560)-Halo<sup>554</sup> protein from the microtubules. In addition, we have made a correction to the Materials and Methods section noting that ATP was added in addition to the KCl in the wash buffer. We apologize for omitting this detail in the original submission. We also added text noting that the wash out step was based on Shima et al., 2018 where the observation chamber was washed with either 1 mM ATP and 300 mM K-Pipes or with 10 mM ATP and 500 mM K-Pipes buffer. In our case, the chamber was washed with 3 mM ATP and 300 mM KCl. It is likely that the addition of ATP facilitates the detachment of strongly-bound KIF5C.

      (14) Supplementary movie, please identify alpha and beta tubules for clarity. Please identify residues lighting up in interaction sites 1,2 & 3.

      Thank you for the suggestions. We have made the requested changes to the movie.

      Reviewer #2 (Recommendations for the authors):

      There appear to have been some minor issues (perhaps with .pdf conversion) that leave some text and images pixelated in the .pdf provided, alongside some slightly jarring text and image positioning (e.g., Figure 5E panels). The authors should carefully look at the figures to ensure that they are presented in the clearest way possible.

      We apologize for these issues with the figures. We have reviewed the figures carefully to ensure that they are presented in the clearest way possible.

      The authors might consider providing a more definitive structural description of compact vs expanded lattice, highlighting what specific parameters are generally thought to change and by what magnitude. Do these differ between taxol-mediated expansion or the effects of MAPs?

      Thank you for the suggestion. We have added additional information to the Introduction section.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 should include a schematic overview of all constructs used in the study. A clear illustration showing the probe design, including the origin and function of each component (e.g., tags, domains), would improve clarity.

      Thank you for the suggestion. We have added new illustrations to Figure 1 showing the origin and design (including domains and tags) of each probe.

      (2) Add Western blot data for the 4×CAP-Gly construct to Figure 1C for completeness.

      We thank the reviewer for this suggestion. We carried out a far-western blot using the purified 4xCAPGly-mEGFP protein to probe GST-Y, GST-DY, and GST-DC2 proteins (new Figure 1 – Figure Supplement 1C). We note that some bleed-through signal can be seen in the lanes containing GST-ΔY and GST-ΔC2 protein due to the imaging requirements and exposure needed to visualize the 4xCAPGly-mEGFP protein. Nevertheless, the blot shows that the purified CAPGly sensor specifically recognizes the native (tyrosinated) CTT sequence of TUBA1A.

      (3) Essential background information on the CAP-Gly domain, SXIP motif, and EB proteins is missing from the Introduction. These concepts appear abruptly in the Results and should be properly introduced.

      Thank you for the suggestion. We have added additional information to the Introduction section about the CAP-Gly domain. However, we feel that introducing the SXIP motif and EB proteins at this point would detract from the flow of the Introduction and we have elected to retain this information in the Results section when we detail development of the 4xCAPGly probe.

      (4) In Figure 2E, it remains possible that the CAP-Gly domain displacement simply follows the displacement of EB proteins. An experiment comparing EB protein localization upon Taxol treatment would clarify this relationship.

      We thank the reviewer for raising this important point. To address the reviewer’s concern, we utilized HeLa cells stably expressing EB3-GFP. We performed live-cell imaging before and after Taxol addition (new Figure 2 – Figure Supplement 1C). EB3-EGFP was lost from the microtubule plus ends within minutes and did not localize to the now-expanded lattice.

      (5) Statements such as "significantly increased" (e.g., line 195) should be replaced with quantitative information (e.g., "1.5-fold increase").

      We have made the suggested changes to the text.

      (6) Phrases like "became accessible" should be revised to "became more accessible," as the observed changes are relative, not absolute. The current wording implies a binary shift, whereas the data show a modest (~1.5-fold) increase.

      We have made the suggested changes to the text.

      (7) Similarly, at line 209, the terms "minimally accessible" versus "accessible" should be rephrased to reflect the small relative change observed; saturation of accessibility is not demonstrated.

      We have made the suggested changes to the text.

      (8) Statements that MAP7 "expands the lattice" (line 222) should be made cautiously; to my knowledge, that has not been clearly established in the literature.

      We thank the reviewer for this important comment. We have added text indicating that MAP7’s ability to induce or presence an expanded lattice has not been clearly established.

      (9) In Figures 3 and 4, the overexpression of MAP7 results in a strikingly peripheral microtubule network. Why is there this unusual morphology?

      The reviewer raises an interesting question. We are not sure why the overexpression of MAP7 results in a strikingly peripheral microtubule network but we suspect this is unique to the HeLa cells we are using. We have observed a more uniform MAP7 localization in other cell types [e.g. COS-7 cells (Tymanskyj et al. 2018), consistent with the literature [e.g. BEAS-2B cells (Shen and Ori-McKenney 2024), HeLa cells (Hooikaas et al. 2019)].

      (10) In Supplementary Figure 5C, the Western blot of detyrosination levels is inconsistent with the text. Untreated cells appear to have higher detyrosination than both wild-type and E254A-overexpressing cells. Do you have any explanation?

      We thank the reviewer for this important comment. We do not have an explanation at this point but plan to revisit this experiment. Unfortunately, the authors who carried out this work recently moved to a new institution and it will be several months before they are able to get the cell lines going and repeat the experiment. We thus elected to remove what was Supp Fig 5C until we can revisit the results. We believe that the important results are in what is now Figure 5 - Figure Supplement 1A,B which shows that the expression levels of the WT and E254E proteins are similar to each other.

      (11) The image analysis method in Figures 5B and 5D requires clarification. It appears that "density" was calculated from skeletonized probe length over total area, potentially using a strict intensity threshold. It looks like low-intensity binding has been excluded; otherwise, the density would be the same from the images. If so, this should be stated explicitly. A more appropriate analysis might skeletonize and integrate total fluorescence intensity relative to the overall microtubule network.

      We have added additional information to the Materials and Methods section to clarify the image analysis. We appreciate the reviewer’s valuable feedback and the suggestion to use the integrated total fluorescence intensity, which is a theoretically sound approach. While we agree that integrated intensity is a valid metric for specific applications, its appropriate use depends on two main preconditions:

      (1) Consistent microscopy image acquisition conditions.

      (2) Consistent probe expression levels across all cells and experiments.

      We successfully maintained consistent image acquisition conditions (e.g., exposure time) throughout the experiment. However, despite generating a stably-expressing sensor cell lines to minimize variation, there remains an inherent, biological variability in probe expression levels between individual cells. Integrated intensity is highly susceptible to this cell-to-cell variability. Relying on it would lead to a systematic error where differences in the total amount of expressed probe would be mistaken for differences in Y-aCTT accessibility.

      The density metric (skeletonized probe length / total cell area) was deliberately chosen as it serves as a geometric measure rather than an intensity-based normalization. The density metric quantifies the proportion of the microtubule network that is occupied by Y-aCTT-labeled structures, independent of fluorescence intensity. Thus, the density metric provides a more robust and interpretable measure of Y-aCTT accessibility under the variable expression conditions inherent to our experimental system. Therefore, we believe that this geometric approach represents the most appropriate analysis for our image dataset.

      (12) In Figure 5D, the fold-change data are difficult to interpret due to the compressed scale. Replotting is recommended. The text should also discuss the relative fold changes between E254A and Taxol conditions, Figure 2H.

      We appreciate the reviewer's insightful comment. We agree that the presence of significant outliers led to a compressed Y-axis scale in Figure 5D, obscuring the clear difference between the WT-tubulin and E254A-tubulin groups. As suggested, we have replotted Figure 5D using a broken Y-axis to effectively expand the relevant lower range of the data while still accurately representing all data points, including the outliers. We believe that the revised graph significantly enhances the clarity and interpretability of these results. For Figure 2, we have added the relative fold changes to the text as requested.

      (13) Figure 6. The authors should directly test in vitro whether Taxol addition can induce lattice exchange, for example, by adding Taxol to GDP-microtubules and monitoring probe binding. Including such an assay would provide critical mechanistic evidence and substantially strengthen the conclusions. I was waiting for this experiment since Figure 2.

      We thank the reviewer for this suggestion. As suggested, we generated GDP-MTs from HeLa tubulin and added it to two flow chambers. We then flowed in the YL1/2<sup>Fab</sup>-EGFP probe into the chambers in the presence of DMSO (vehicle control) or Taxol. Static images were taken and the fluorescence intensity of the probe on microtubules in each chamber was quantified. There was a slight but not statistically significant difference in probe binding between control and Taxol-treated GDP-MTs (Author response image 1). While disappointing, these results underscore our conclusion (Discussion section) that microtubule assembly in vitro may not produce a lattice state resembling that in cells, either due to differences in protofilament number and/or buffer conditions and/or the lack of MAPs during polymerization.

      Author response image 1.

      References

      Hooikaas, P. J., Martin, M., Muhlethaler, T., Kuijntjes, G. J., Peeters, C. A. E., Katrukha, E. A., Ferrari, L., Stucchi, R., Verhagen, D. G. F., van Riel, W. E., Grigoriev, I., Altelaar, A. F. M., Hoogenraad, C. C., Rudiger, S. G. D., Steinmetz, M. O., Kapitein, L. C. and Akhmanova, A. (2019). MAP7 family proteins regulate kinesin-1 recruitment and activation. J Cell Biol, 218, 1298-1318.

      Shen, Y. and Ori-McKenney, K. M. (2024). Microtubule-associated protein MAP7 promotes tubulin posttranslational modifications and cargo transport to enable osmotic adaptation. Dev Cell, 59, 1553-1570.

      Tymanskyj, S. R., Yang, B. H., Verhey, K. J. and Ma, L. (2018). MAP7 regulates axon morphogenesis by recruiting kinesin-1 to microtubules and modulating organelle transport. Elife, 7.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses primarily simulation tools to probe the pathway of cholesterol transport with the smoothened (SMO) protein. The pathway to the protein and within SMO is clearly discovered, and interactions deemed important are tested experimentally to validate the model predictions.

      Strengths:

      The authors have clearly demonstrated how cholesterol might go from the membrane through SMO for the inner and outer leaflets of a symmetrical membrane model. The free energy profiles, structural conformations, and cholesterol-residue interactions are clearly described.

      We thank the reviewer for their kind words.

      (1) Membrane Model: The authors decided to use a rather simple symmetric membrane with just cholesterol, POPC, and PSM at the same concentration for the inner and outer leaflets. This is not representative of asymmetry known to exist in plasma membranes (SM only in the outer leaflet and more cholesterol in this leaflet). This may also be important to the free energy pathway into SMO. Moreover, PE and anionic lipids are present in the inner leaflet and are ignored. While I am not requesting new simulations, I would suggest that the authors should clearly state that their model does not consider lipid concentration leaflet asymmetry, which might play an important role.

      We thank the reviewer for their comment. Membrane asymmetry is inherent in endogenous systems; we acknowledge that as a limitation of our current model. We have addressed the comment by adding this limitation to our discussion in the manuscript.

      Added lines: (End of paragraph 6, Results subsection 2):

      “One possibility that might alter the thermodynamic barriers is native membrane asymmetry, particularly the anionic lipid-rich inner leaflet. This presents as a limitation of our current model.”

      (2) Statistical comparison of barriers: The barriers for pathways 1 and 2 are compared in the text, suggesting that pathway 2 has a slightly higher barrier than pathway 1. However, are these statistically different? If so, the authors should state the p-value. If not, then the text in the manuscript should not state that one pathway is preferred over the other.

      We thank the reviewer for their comment. We have added statistical t-tests for the barriers.

      Changes made: (Paragraph 6, Results subsection 2)

      “However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol v/s 6.5 ± 0.8 kcal/mol, p = 0.0013)”

      (3) Barrier of cholesterol (reasoning): The authors on page 7 argue that there is an enthalpy barrier between the membrane and SMO due to the change in environment. However, cholesterol lies in the membrane with its hydroxyl interacting with the hydrophilic part of the membrane and the other parts in the hydrophobic part. How is the SMO surface any different? It has both characteristics and is likely balanced similarly to uptake cholesterol. Unless this can be better quantified, I would suggest that this logic be removed.

      We thank the reviewer for this suggestion. We have removed the line to avoid confusion.

      Reviewer #2 (Public review):

      Summary:

      In this work, the authors applied a range of computational methods to probe the translocation of cholesterol through the Smoothened receptor. They test whether cholesterol is more likely to enter the receptor straight from the outer leaflet of the membrane or via a binding pathway in the inner leaflet first. Their data reveal that both pathways are plausible but that the free energy barriers of pathway 1 are lower, suggesting this route is preferable. They also probe the pathway of cholesterol transport from the transmembrane region to the cysteine-rich domain (CRD).

      Strengths:

      (1) A wide range of computational techniques is used, including potential of mean force calculations, adaptive sampling, dimensionality reduction using tICA, and MSM modelling. These are all applied rigorously, and the data are very convincing. The computational work is an exemplar of a well-carried out study.

      (2) The computational predictions are experimentally supported using mutagenesis, with an excellent agreement between their PMF and mRNA fold change data.

      (3) The data are described clearly and coherently, with excellent use of figures. They combine their findings into a mechanism for cholesterol transport, which on the whole seems sound.

      (4) The methods are described well, and many of their analysis methods have been made available via GitHub, which is an additional strength.

      Weaknesses:

      (1) Some of the data could be presented a little more clearly. In particular, Figure 7 needs additional annotation to be interpretable. Can the position of the cholesterol be shown on the graph so that we can see the diameter change more clearly?

      We thank the reviewer for this suggestion. We have added the cholesterol positions as requested.

      Changes made: (Caption, Figure 7)

      “The tunnel profile during cholesterol translocation in SMO. (a) Free energy plot of the zcoordinate v/s the tunnel diameter when cholesterol is present in the core TMD. The tunnel shows a spike in the radius in the TMD domain, indicating the presence of a cholesterol-accommodating cavity. (b) Representative figure for the tunnel when a cholesterol molecule is in the TMD. (c) Same as (a), when cholesterol is at the TMD-CRD interface. (e) same as (b), when cholesterol is at the TMD-CRD interface. (e) same as (a), when cholesterol is at the CRD binding site. (f) same as (b), when cholesterol is at the CRD binding site. Tunnel diameters shown as spheres. Cholesterol positions marked on plots using dotted lines. All snapshots presented are frames taken from MD simulations.”

      (2) In Figure 3C, it doesn’t look like the Met is constricting the tunnel at all. What residue is constricting the tunnel here? Can we see the Ala and Met panels from the same angle to compare the landscapes? Or does the mutation significantly change the tunnel? Why not A283 to a bulkier residue? Finally, the legend says that the figure shows that cholesterol can still pass this residue, but it doesn’t really show this. Perhaps if the HOLE graph was plotted, we could see the narrowest point of the tunnel and compare it to the size of cholesterol.

      We thank the reviewer for this suggestion. A283 was mutated to methionine as it presents with a longer heavy tail containing sulfur. We have plotted the tunnel radii for both WT and A283M mutants and added them as a supplemental figure. As shown in the figure, the presence of methionine doesn’t completely block the tunnel, but occludes it, thereby increasing the barrier for cholesterol transport slightly.

      Changes made: (End of Results subsection 1)

      “When we calculated the PMF for cholesterol entry, A<sup>2.60f</sup>M mutant showed restricted tunnel but it did not fully block the tunnel (Figure 3—figure Supplement 3).”

      (3) The PMF axis in 3b and d confused me for a bit. Looking at the Supplementary data, it’s clear that, e.g., the F455I change increases the energy barrier for chol entering the receptor. But in 3d this is shown as a -ve change, i.e., favourable. This seems the wrong way around for me. Either switch the sign or make this clearer in the legend, please.

      We thank the reviewer for this suggestion. We measured ∆PMF as PMF<sub>WT</sub> PMF<sub>mutant</sub>, hence the negative values. We have added additional text to the legend to clarify this.

      Changes made: (Caption, Figure 3)

      “(b) ∆Gli1 mRNA fold change (high SHH vs untreated) and ∆ PMF (difference of peak PMF , calculated as PMF<sub>WT</sub> - PMF<sub>mutant</sub>) plotted for the mutants in Pathway 1. (c) Example mutant A<sup>2_._60f</sup>M shows that cholesterol can enter SMO through Pathway 1 even on a bulky mutation. (d) Same as (b) but for Pathway 2 (e) Example mutant L<sup>5.62f</sup>A shows that cholesterol can enter SMO through Pathway 2 due to lesser steric hindrance. All snapshots presented are frames taken from MD simulations.”

      Changes made: (Caption, Figure 6)

      “(b) ∆Gli1 mRNA fold change (high SHH vs untreated) and ∆ PMF (difference of peak PMF, calculated as PMF<sub>WT</sub> - PMF<sub>mutant</sub>) plotted for mutants along the TMD-CRD pathway. (c, d) Example mutants Y<sup>LD</sup>A and F<sup>5.65f</sup>A show that cholesterol is unable to translocate through this pathway because of the loss of crucial hydrophobic contacts provided by Y207 and F484 and along the solvent-exposed pathway.”

      (4) The impact of G280V is put down to a decrease in flexibility, but it could also be a steric hindrance. This should be discussed.

      We thank the reviewer for this suggestion. We have added it as a possible mechanism of the decrease in activity of SMO.

      Changes made: (Paragraph 5, Results subsection 1)

      “We mutated G280<sup>2.57f</sup>  to valine - G<sup>2.57f</sup>V to test whether reducing the flexibility of TM2 prevents cholesterol entry into the TMD. Consequently, the activity of mSMO showed a decrease. However, this decrease could also be attributed to steric hindrance added by the presence of a bulky propyl group in valine.”

      (5) Are the reported energy barriers of the two pathways (5.8plus minus0.7 and 6.5plus minus0.8 kcal/mol) significantly and/or substantially different enough to favour one over the other? This could be discussed in the manuscript.

      We thank the reviewer for this suggestion. We have added statistical t-tests for the barriers.

      Changes made: (Paragraph 6, Results subsection 2)

      “However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol v/s 6.5 ± 0.8 kcal/mol, p = 0.001)”

      (6) Are the energy barriers consistent with a passive diffusion-driven process? It feels like, without a source of free energy input (e.g., ion or ATP), these barriers would be difficult to overcome. This could be discussed.

      We thank the reviewer for this suggestion. We have added a discussion to further clarify this point.

      Discussion: (Paragraph 6, Results subsection 2)

      “These values are comparable to ATP-Binding Cassette (ABC) transporters of membrane lipids, which use ATP hydrolysis (-7.54 ± 0.3 kcal/mol) (Meurer et al., 2017) to drive lipid transport from the membrane to an extracellular acceptor. Some of these transporters share the same mechanism as SMO, where the lipid from the inner leaflet is flipped and transported to the extracellular acceptor protein (Tarling et al., 2013). Additionally, for secondary active transporters that do not use ATP for the transport of substrates, a thermodynamic barrier of 5-6 kcal/mol has been reported in literature. (Chan et al., 2022; Selvam et al., 2019; McComas et al., 2023; Thangapandian et al., 2025).”

      (7) Regarding the kinetics from MSM, it is stated that the values seen here are similar to MFS transporters, but this then references another MSM study. A comparison to experimental values would support this section a lot.

      We thank the reviewer for this suggestion. We have added a discussion discussing millisecond-scale timescales measured for MFS transporters.

      Changes made: (Paragraph 2, Results subsection 5)

      “These timescales are comparable to the substrate transport timescales of Major Facilitator Superfamily (MFS) transporters (Chan et al., 2022). Furthermore, several experimental studies have also resolved the millisecond-scale kinetics of MFS transporters (Blodgett and Carruthers, 2005; Körner et al., 2024; Bazzone et al., 2022; Smirnova et al., 2014; Zhu et al., 2019), further corroborating the results from our study.”

      Reviewer #2 (Recommendations for the authors):

      (1) The heatmaps in Figures 2a and 4a are great. On these, an arrow denotes what looks like a minimum energy path. Is it possible to see this plotted, as this might show the height of the energy barriers more clearly?

      We thank the reviewer for this suggestion. We have computed the minimum energy paths for both pathways and presented them in a supplementary figure.

      Added lines: (Paragraph 4, Results subsection 1):

      For further clarity, we have plotted the minimum energy path taken by cholesterol as it translocates along this pathway (Figure 2—figure Supplement 3)a,b)

      Added lines: (Paragraph 4, Results subsection 2):

      For further clarity, we have plotted the minimum energy path taken by cholesterol as it translocates along this pathway (Figure 2—figure Supplement 3)c,d)

      (2) The tiCA data in S15 is first referred to on line 137, but the technique isn’t introduced until line 222. This makes understanding the data a little confusing. Reordering this might improve readability.

      We thank the reviewer for this suggestion. We have reordered the text to make it clearer.

      Changes made: (Paragraph 2, Results subsection 1) This provides evidence for multiple stable poses along the pathway as observed in the multiple stable poses of cholesterol in Cryo-EM structures of SMO bound to sterols (Deshpande et al., 2019; Qi et al., 2019b, 2020). A reliable estimate of the barriers comes from using the time-lagged Independent Components (tICs), which project the entire dataset along the slowest kinetic degrees of freedom. Overall, the highest barrier along Pathway 1 is 5.8 ± 0.7 kcal/mol, and it is associated with the entry of cholesterol into the TMD (Figure 2—Figure Supplement 2).

      Changes made: (Paragraph 3, Results subsection 2)

      “On plotting the first two components of tICs, (Figure 2—Figure Supplement 2), we observe that the energetic barrier between η and θ is ∼6.5 ± 0.8 kcal/mol.”

      (3) Missing bracket on line 577.

      We thank the reviewer for this suggestion. The typo has been fixed.

      (4) Line 577: Fig. S2nd?

      We thank the reviewer for this suggestion. This typo has been fixed.

      Reviewer #3 (Public review):

      Summary:

      This manuscript presents a study combining molecular dynamics simulations and Hedgehog (Hh) pathway assays to investigate cholesterol translocation pathways to Smoothened (SMO), a G protein-coupled receptor central to Hedgehog signal transduction. The authors identify and characterize two putative cholesterol access routes to the transmembrane domain (TMD) of SMO and propose a model whereby cholesterol traverses through the TMD to the cysteine-rich domain (CRD), which is presented as the primary site of SMO activation. The MD simulations and biochemical experiments are carefully executed and provide useful data.

      Weaknesses:

      However, the manuscript is significantly weakened by a narrow and selective interpretation of the literature, overstatement of certain conclusions, and a lack of appropriate engagement with alternative models that are well-supported by published data-including data from prior work by several of the coauthors of this manuscript. In its current form, the manuscript gives a biased impression of the field and overemphasizes the role of the CRD in cholesterol-mediated SMO activation. Below, I provide specific points where revisions are needed to ensure a more accurate and comprehensive treatment of the biology.

      (1) Overstatement of the CRD as the Orthosteric Site of SMO Activation

      The manuscript repeatedly implies or states that the CRD is the orthosteric site of SMO activation, without adequate acknowledgment of alternative models. To give just a few examples (of many in this manuscript):

      (a) “PTCH is proposed to modulate the Hh signal by decreasing the ability of membrane cholesterol to access SMO’s extracellular cysteine-rich domain (CRD)” (p. 3).

      (b) “In recent years, there has been a vigorous debate on the orthosteric site of SMO” (p. 3).

      (c) “cholesterol must travel through the SMO TMD to reach the orthosteric site in the CRD” (p. 4).

      (d) “we observe cholesterol moving along TM6 to the TMD-CRD interface (common pathway, Fig. 1d) to access the orthosteric binding site in the CRD” (p. 6).

      While the second quote in this list at least acknowledges a debate, the surrounding text suggests that this debate has been entirely resolved in favor of the CRD model. This is misleading and not reflective of the views of other investigators in the field (see, for example, a recent comprehensive review from Zhang and Beachy, Nature Reviews Molecular and Cell Biology 2023, which makes the point that both the CRD and 7TM sites are critical for cholesterol activation of SMO as well as PTCH-mediated regulation of SMO-cholesterol interactions).

      In contrast, a large body of literature supports a dual-site model in which both the CRD and the TMD are bona fide cholesterol-binding sites essential for SMO activation. Examples include:

      (a) Byrne et al., Nature 2016: point mutation of the CRD cholesterol binding site impairs-but does not abolish-SMO activation by cholesterol (SMO D99A, Y134F, and combination mutants - Fig 3 of the 2016 study).

      (b) Myers et al., Dev Cell 2013 and PNAS 2017: CRD deletion mutants retain responsiveness to PTCH regulation and cholesterol mimetics (similar Hh responsiveness of a CRD deletion mutant is also observed in Fig. 4 Byrne et al, Nature 2016).

      (c) Deshpande et al., Nature 2019: mutation of residues in the TMD cholesterol binding site blocks SMO activation entirely, strongly implicating the TMD as a required site, in contrast to the partial effects of mutating or deleting the CRD site.

      Qi et al., Nature 2019, and Deshpande et al., Nature 2019, both reported cholesterol binding at the TMD site based on high-resolution structural data. Oddly, Deshpande et al., Nature 2019, is not cited in the discussion of TMD binding on p. 3, despite being one of the first papers to describe cholesterol in the TMD site and its necessity for activation (the authors only cite it regarding activation of SMO by synthetic small molecules).

      Kinnebrew et al., Sci Adv 2022 report that CRD deletion abolished PTCH regulation, which is seemingly at odds with several studies above (e.g., Byrne et al, Nature 2016; Myers et al, Dev Cell 2013); but this difference may reflect the use of an N-terminal GFP fusion to SMO in the Kinnebrew et al 2022, which could alter SMO activation properties by sterically hindering activation at the TMD site by cholesterol (but not synthetic SMO agonists like SAG); in contrast, the earlier work by Byrne et al is not subject to this caveat because it used an untagged, unmodified form of SMO.

      Although overexpression of PTCH1 and SMO (wild-type or mutant) has been noted as a caveat in studies of CRD-independent SMO activation by cholesterol, this reviewer points out that several of the studies listed above include experiments with endogenous PTCH1 and low-level SMO expression, demonstrating that SMO can clearly undergo activation by cholesterol (as well as regulation by PTCH1) in a manner that does not require the CRD.

      Recommendation: The authors should revise the manuscript to provide a more balanced overview of the field and explicitly acknowledge that the CRD is not the sole activation site. Instead, a dual-site model is more consistent with available structural, mutational, and functional data. In addition, the authors should reframe their interpretation of their MD studies to reflect this broader and more accurate view of how cholesterol binds and activates SMO.

      We thank the reviewer for this comprehensive overview of the existing literature. We agree that cholesterol binding to both the TMD and CRD sites is required for full activation of SMO. As described below in responses to comments, we have made changes to the manuscript to make this point clear. For instance, in the revised manuscript, we refrain from calling the CRD cholesterol binding site the “orthosteric site”. Instead, we highlight that the goal of the manuscript is not to resolve the debate over whether the TMD or CRD site is more important for PTCH1 regulation by SMO but rather to use molecular dynamics to understand the fascinating question of how cholesterol in the membrane can reach the CRD, located at a significant distance above the outer leaflet of the membrane. We believe that this is an important goal since there is an abundance of evidence that supports the view that PTCH1 inhibits SMO by reducing cholesterol access to the CRD. This evidence is now summarized succinctly in the introduction:

      Changes made: (Paragraph 4, Introduction)

      “While cholesterol binding to both the TMD and CRD sites is required for full SMO activation, our work focuses on how cholesterol gains access to the CRD site, perched above the outer leaflet of the membrane (Luchetti et al., 2016; Kinnebrew et al., 2022). Multiple lines of evidence suggest that PTCH1-regulated cholesterol binding to the CRD plays an instructive role in SMO regulation both in cells and animals. Mutations in residues predicted to make hydrogen bonds with the hydroxyl group of cholesterol bound to the CRD reduced both the potency and efficacy of SHH in cellular signaling assays (Kinnebrew et al., 2022; Byrne et al., 2016) and, more importantly, eliminated HH signaling in mouse embryos (Xiao et al., 2017). Experiments using both covalent and photocrosslinkable sterol probes in live cells directly show that PTCH1 activity reduces sterol access to the CRD (Kinnebrew et al., 2022; Xiao et al., 2017). Notably, our simulations evaluate a path of cholesterol translocation that includes both the TMD and CRD sites: cholesterol first enters the 7-transmembrane domain bundle from the membrane; it then engages the TMD site before continuing along a conduit to the CRD site. Thus, we analyze translocation energetics and residue-level contacts along a path that includes both the TMD and the CRD.”

      However, Reviewer 3 makes several comments below that are biased, inaccurate, or selective. We feel it is important to address these so readers can approach the literature from a balanced perspective. Indeed, the eLife review forum provides an ideal venue to present contrasting views on a scientific model. We encourage the editors to publish both Reviewer 3’s comments and our response in full so readers can read the original papers and reach their own conclusions. It is important to note these issues are not relevant to the quality of the computational and experimental data presented in this paper.

      We have now removed the term “orthosteric” to describe the CRD site throughout the paper and clearly state in the introduction that “both the CRD and TMD sites are required for SMO activation” but that our focus is on how cholesterol moves from the membrane to the CRD site. There is no doubt that cholesterol binding to the CRD plays a key role in SMO activation– our focus on this path is justified and does not devalue the importance of the TMD site. Our prior models (see Figure 7 of Kinnebrew 2022 explicitly include contributions of both sites).

      Now we respond to some of the concerns outlined, individually:

      (1) Byrne et al., Nature 2016: point mutation of the CRD cholesterol binding site impairs-but does not abolish-SMO activation by cholesterol (SMO D99A, Y134F, and combination mutants - Fig 3 of the 2016 study)

      The fact that a point mutation dramatically diminishes (but does not abolish signaling) does not mean that the CRD cholesterol binding site is not important for SMO regulation. Indeed, the reviewer fails to mention that Song et. al. (Molecular Cell, 2017) found that a SMO protein carrying a subtle mutation at D99 (D95/99N, a residue that makes a hydrogen bond with the cholesterol hydroxyl) completely abolishes SMO signaling in mouse embryos. Thus, the CRD site is critical for SMO activation in an intact animal, justifying our focus on evaluating the path of cholesterol translocation to the CRD site.

      (2) Myers et al., Dev Cell 2013 and PNAS 2017: CRD deletion mutants retain responsiveness to PTCH regulation and cholesterol mimetics (similar Hh responsiveness of a CRD deletion mutant is also observed in Fig 4 Byrne et al, Nature 2016).

      The Reviewer fails to note that CRD-deleted versions of SMO have markedly (>10-fold) higher basal (i.e. ligand-independent) activity compared to full-length SMO. The response to SHH is minimal (∼2-fold), compared to >50-100-fold with full-length SMO. Thus, CRD-deleted SMO is likely in a non-native conformation. Local changes in cholesterol accessibility caused by PTCH1 inactivation or cholesterol loading can cause small fluctuations in delta-CRD activity, but this cannot be used to infer meaningful insights about how native, full-length SMO (with >10-fold lower basal activity) is regulated. We encourage the reviewer to read our previous paper (Kinnebrew et. al. 2022), which presents a unified view of how the TMD and CRD sites together regulate SMO activation.

      A more physiological experiment, reported in Kinnebrew et. al. 2022, tested mutations in residues that make hydrogen bonds with cholesterol at the CRD and TMD sites in the context of full-length SMO. These mutants were stably expressed at moderate levels in Smo<sup>−/−</sup> cells. Mutations at the CRD site reduced the fold-increase in signaling output in response to SHH, as would be expected for a PTCH1-regulated site. In contrast, analogous mutations in the TMD site reduced the magnitude of both basal and maximal signaling, without affecting the fold-change in response to SHH. In signaling assays, the key parameter in evaluating the impact of a mutation is whether it impacts the change in output in response to a signal (in this case PTCH1 inactivation by SHH). A mutation in SMO that affects PTCH1 regulation is expected to decrease the fold-change in signaling in response to SHH, a criterion that is fulfilled by mutations in the CRD site. Accordingly, mutations in the CRD site abolish SMO signaling in mouse embryos (Xiao et al., 2017).

      (3) Deshpande et al., Nature 2019: mutation of residues in the TMD cholesterol binding site blocks SMO activation entirely, strongly implicating the TMD as a required site, in contrast to the partial effects of mutating or deleting the CRD site.

      Introduction of bulky mutations at the TMD site (V333F) that abolish SMO activity were first reported by Byrne et. al. 2016 and were used to markedly increase the stability of SMO for protein expression. These mutations indeed stabilize the inactive state of SMO, increasing protein abundance and completely preventing its localization at primary cilia. SMO variants carrying such bulky mutations cannot be used to infer the importance of the TMD site since they do not distinguish between the following possibilities: (1) SMO is inactive because the sterol cannot bind, or (2) SMO is inactive because it is locked in an inactive conformation, or (3) SMO is inactive because it cannot localize to primary cilia (where it must be localized to activate downstream signaling).

      As described in Response 3.3, a better evaluation of the importance of the TMD site is the use of mutations in residues that make hydrogen bonds with the hydroxyl group of TMD cholesterol. These mutations do not markedly increase protein stability or prevent ciliary localization (Kinnebrew 2022, Fig.S2). While a TMD site mutation decreases the magnitude of maximal (and basal) SMO signaling, it does not impact the fold-increase in signal output in response to Hh ligands (the key parameter that should be used to evaluate PTCH1 activity).

      (4) Qi et al., Nature 2019, and Deshpande et al., Nature 2019, both reported cholesterol binding at the TMD site based on high-resolution structural data. Oddly, Deshpande et al., Nature 2019 not cited in the discussion of TMD binding on p. 3, despite being one of the first papers to describe cholesterol in the TMD site and its necessity for activation (the authors only cite it regarding activation of SMO by synthetic small molecules)

      The reference has now been added at this location in the manuscript.

      (5) Kinnebrew et al., Sci Adv 2022 report that CRD deletion abolished PTCH regulation, which is seemingly at odds with several studies above (e.g., Byrne et al, Nature 2016; Myers et al, Dev Cell 2013); but this difference may reflect the use of an N-terminal GFP fusion to SMO in the Kinnebrew et al 2022, which could alter SMO activation properties by sterically hindering activation at the TMD site by cholesterol (but not synthetic SMO agonists like SAG); in contrast, the earlier work by Byrne et al is not subject to this caveat because it used an untagged, unmodified form of SMO.

      The reviewer fails to note that CRD deleted versions of SMO have markedly (>10-fold) higher basal activity than full-length SMO. The response to SHH is minimal (∼2fold), compared to >50-fold with full-length SMO. Thus, CRD-deleted SMO is likely in a non-native conformation. Local changes in cholesterol accessibility caused by PTCH1 inactivation or cholesterol loading can cause small fluctuations in delta-CRD activity, but this cannot be used to infer meaningful insights about how native, full-length SMO (with >10-fold lower basal activity) is regulated. Please see Response 3.3 for further details.

      Reviewer 3 presents an incomplete picture of the extensive experiments reported in Kinnebrew et. al. to establish the functionality of YFP-tagged delta-CRD SMO. Most importantly, a TMDselective sterol analog (KK174) can fully activate YFP-tagged delta-CRD, showing conclusively that the YFP fusion does not block sterol access to the TMD site. The fact that this protein is nearly unresponsive to SHH highlights the critical role of the CRD-bound cholesterol in SMO regulation by PTCH1. Indeed, the YFP-tagged, CRD-deleted SMO was made purposefully to test the requirement of the CRD in a construct that had normal basal activity. Again, this data justifies the value of investigating the path of cholesterol movement from the membrane via the TMD site to the CRD.

      (6) Although overexpression of PTCH1 and SMO (wild-type or mutant) has been noted as a caveat in studies of CRD-independent SMO activation by cholesterol, this reviewer points out that several of the studies listed above include experiments with endogenous PTCH1 and low-level SMO expression, demonstrating that SMO can clearly undergo activation by cholesterol (as well as regulation by PTCH1) in a manner that does not require the CRD.

      This comment is inaccurate. The data presented in Deshpande et. al. (and prior work in Myers et. al.) used transient transfection to overexpress SMO in Smo<sup>−/−</sup> cells. At the individual cell level transient transfection produces expression levels that are markedly higher (10-1000-fold) than stable expression (in addition to being more variable). Most scientists would agree that stable expression (as used in Kinnebrew 2022) at a moderate expression level is a better system to compare mutant phenotypes, assess basal and activated signaling, and provide an accurate measure of the fold-change in signal output in response to SHH. Notably, introduction of a mutation in the CRD cholesterol binding site at the endogenous mouse Smo locus (an even better experiment than stable expression) leads to complete loss of SMO activity (PMID 28344083). This result again justifies our investigation of the pathway of cholesterol movement from the membrane to the CRD site.

      We have changed the initial discussion and reflect a more general outlook.

      Changes made: (Paragraph 1, Introduction)

      “PTCH modulates the availability of accessible cholesterol at the primary cilium and thereby regulates SMO, with models invoking effects on both the CRD and 7TM pockets.”

      Changes made: (Results subsection 3, paragraph 1)

      “According to the dual-site model, to reach the binding site in the CRD (ζ), cholesterol translocate along the TMD-CRD interface from the TM binding site (α∗) is required.”

      Added lines: (Paragraph 5, Results subsection 3):

      “The computational investigation showed here covers the dual-site model, where cholesterol reaches the CRD site via binding to the TM binding site first. In comparison to the CRD site, the TM site is more stable by ∼ 2 kcal/mol (Figure 2—Figure Supplement 3b, d).”

      Added lines: (Paragraph 2, Conclusions):

      “Here we have explored the role the CRD-site plays in SMO activation. In addition, through simulating the CRD site-dependent SMO activation hypothesis, we have also simulated the TMD site-dependent activation. We show that the overall stability of cholesterol is higher than the CRD site by ∼ 2 kcal/mol.”

      (2) Bias in Presentation of Translocation Pathways

      The manuscript presents the model of cholesterol translocation through SMO to the CRD as the predominant (if not sole) mechanism of activation. Statements such as: "Cholesterol traverses SMO to ultimately reach the CRD binding site" (p. 6) suggest an exclusivity that is not supported by prior literature in the field. Indeed, the authors’ own MD data presented here demonstrate more stable cholesterol binding at the TMD than at the CRD (p 17), and binding of cholesterol to the TMD site is essential for SMO activation. As such, it is appropriate to acknowledge that cholesterol may activate SMO by translocating through the TM5/6 tunnel, then binding to the TMD site, as this is a likely route of SMO activation in addition to the CRD translocation route they highlight in their discussion.

      The authors describe two possible translocation pathways (Pathway 1: TM2/3 entry to TMD; Pathway 2: TM5/6 entry and direct CRD transfer), but do not sufficiently acknowledge that their own empirical data support Pathway 2 as more relevant. Indeed, because their experimental data suggest Pathway 2 is more strongly linked to SMO activation, this pathway should be weighted more heavily in the authors’ discussion. In addition, Pathway 2 is linked to cholesterol binding to both the TMD and CRD sites (the former because the TMD binding site is at the terminus of the hydrophobic tunnel, the latter via the translocation pathway described in the present manuscript), so it is appropriate that Pathway 2 figures more prominently than Pathway 1 in the authors’ discussion.

      The authors also claim that "there is no experimental structure with cholesterol in the inner leaflet region of SMO TMD" (p 16). However, a structural study of apo-SMO from the Manglik and Cheng labs (Zhang et al., Nat Comm, 2022) identified a cholesterol molecule docked at the TM5/6 interface and also proposed a "squeezing" mechanism by which cholesterol could enter the TM5/6 pocket from the membrane. The authors do not consider this SMO conformation in their models, nor do they discuss the possibility that conformational dynamics at the TM5/6 interface could facilitate cholesterol flipping and translocation into the hydrophobic conduit, despite both possibilities having precedent in the 2022 empirical cryoEM structural analysis.

      Recommendation: The authors should avoid oversimplifying the SMO cholesterol activation process, either by tempering these claims or broadening their discussion to better reflect the complexity and multiplicity of cholesterol access and activation routes for SMO. They should also consider the 2022 apo-SMO cryoEM structure in their analysis of the TM5/6 translocation pathway.

      We thank the reviewer for this comprehensive overview of the existing literature and parts we have missed to include in the discussion. We agree with the reviewer, since our data shows that both pathways are probable. Through our manuscript, we have avoided using a competitive approach (that one pathway dominates over the other). Instead, we have evaluated both pathways independently and presented a comparative rather than competitive overview of both pathways from our observations. While we agree that experimental evidence suggests the inner leaflet pathway is possible, we cannot discount the observations made in previous studies that support the outer leaflet pathway, particularly Hedger et al. (2019), Bansal et al. (2023), and Kinnebrew et al. (2021). Therefore, considering the reviewer’s comments have made the following changes:

      (1) Added lines: (Paragraph 3, Conclusions):

      “We show that the barriers associated with the pathway starting from the outer leaflet are lower by ∼0.7 kcal, (p=0.0013). We also provide evidence that cholesterol can enter SMO via both leaflets, considering that multiple computational and experimental studies have found cholesterol entry sites and activation modulation via the outer leaflet, between TM2TM3. This is countered by evidence from multiple experimental and computational studies corroborating entry via the inner leaflet, between TM5-TM6, including this study. Overall, we posit that cholesterol translocation from either pathway is feasible.”

      (2)nChanges made: (Paragraph 6, Results subsection 2)

      “Based on our experimental and computational data, we conclude that cholesterol translocation can happen via either pathway. This is supported on the basis of the following observations: mutations along pathway 2 affect SMO activity more significantly, and the presence of a direct conduit that connects the inner leaflet to the TMD binding site. In addition, a resolved structure of SMO in the presence of cholesterol shows a cholesterol situated at the entry point from the membrane into the protein between TM5 and TM6, in the inner leaflet. However, we also observe that pathway 1 shows a lower thermodynamic barrier (5.8 ± 0.7 kcal/mol vs. 6.5 ± 0.8 kcal/mol, p \= 0.0013). Additionally, PTCH1 controls cholesterol accessibility in the outer leaflet. This shows that there is a possibility for transport from both leaflets. One possibility that might alter the thermodynamic barriers is native membrane asymmetry, particularly the anionic lipid-rich inner leaflet. This presents as a limitation of our current model.”

      (3)nChanges made: (Paragraph 1, Results subsection 2)

      “In a structure resolved in 2022, cholesterol was observed at the interface between the protein and the membrane, in the inner leaflet, between TMs 5 and 6. However, cholesterol in the inner leaflet has a downward orientation, with the polar hydroxyl group pointing intracellularly (η). A striking observation is that this cholesterol binding site pose was never used as a starting point for simulations and was discovered independent of the pose described in Zhang et al. (2022) (Figure 4—Figure Supplement 1).”

      (3) Alternative Possibility: Direct Membrane Access to CRD

      The possibility that the CRD extracts cholesterol directly from the membrane outer leaflet is not considered. While the crystal structures place the CRD in a stable pose above the membrane, multiple cryo-EM studies suggest that the CRD is dynamic and adopts a variety of conformations, raising the possibility that the stability of the CRD in the crystal structures is a result of crystal packing and that the CRD may be far more dynamic under more physiological conditions.

      Recommendation: The authors should explicitly acknowledge and evaluate this potential mechanism and, if feasible, assess its plausibility through MD simulations.

      We thank the reviewer for the suggestion. We have addressed this comment by calculating the distance from the lipid headgroups for each lipid in the membrane to the cholesterol binding site. We show that in our study, we do not observe any bending of the CRD over the membrane, precluding any cholesterol from being extracted from the membrane directly.

      Added lines: (Paragraph 3, Conclusions):

      “An alternative possibility states that the flexibility associated with the CRD would allow it to directly access the membrane, and consequently, cholesterol. In the extensive simulations reported in this study, the binding site of cholesterol in the CRD remains at least 20 Å away from the nearest lipid head group in the membrane, suggesting that such direct extraction and the bending of the CRD do not occur within the timescales sampled (Appendix 2 – Figure 6).

      The mechanistic details of this process are still unexplored and form the basis of future work.”

      (4) Inconsistent Framing of Study Scope and Limitations

      The discussion contains some contradictory and misleading language. For example, the authors state that "In this study we only focused on the cholesterol movement from the membrane to the CRD binding site," and then several sentences later state that "We outline the entire translocation mechanism from a kinetic and thermodynamic perspective." These statements are at odds. The former appropriately (albeit briefly) notes the limited scope of the modeling, while the latter overstates the generality of the findings.

      In addition, the authors’ narrow focus on the CRD site constitutes a major caveat to the entire work. It should be acknowledged much earlier in the manuscript, preferably in the introduction, rather than mentioned as an aside in the penultimate paragraph of the conclusion.

      Recommendation: The authors should clarify the scope of the study and expand the discussion of its limitations. They should explicitly acknowledge that the study models one of several cholesterol access routes and that the findings do not rule out alternative pathways.

      We thank the reviewer for the suggestion. We have addressed this comment by explicitly mentioning the scope of the study.

      Changes made: (Paragraph 3, Conclusions)

      “We outline the entire translocation mechanism from a kinetic and thermodynamic perspective for one of the leading hypotheses for the activation mechanism of SMO.”

      (5) Summary:

      This study has the potential to make a useful contribution to our understanding of cholesterol translocation and SMO activation. However, in its current form, the manuscript presents an overly narrow and, at times, misleading view of the literature and biological models; as such, it is not nearly as impactful as it could be. I strongly encourage the authors to revise the manuscript to include:

      (1) A more balanced discussion of the CRD vs. TMD binding sites.

      (2) Acknowledgment of alternative cholesterol access pathways.

      (3) More comprehensive citation of prior structural and functional studies.

      (4) Clarification of assumptions and scope.

      Of note, the above suggestions require little to no additional MD simulations or experimental studies, but would significantly enhance the rigor and impact of the work.

      We thank the reviewer for the suggestions. We have taken into account the literature and diverse viewpoints. We have changed the initial discussion and reflected a more general outlook. In the revised version of the manuscript, we have refrained from referring to the CRD site as the orthosteric site. Instead, we refer to it as the CRD sterol-binding site. To better represent the dual-site model, we add further discussion in the Introduction. Through our manuscript, we have avoided using a competitive approach (that one pathway dominates over the other). Instead, we have evaluated both pathways independently and presented a comparative rather than competitive overview of both pathways from our observations. We explicitly mention the scope of the study.

    1. Author response:

      We thank the reviewers for their careful reading and constructive feedback. We were glad to see that they recognized both the technical scope of the study and its contribution as the first to apply activation maximization with such fine spatial sampling. Their appreciation for the critical in vivo validation of model-derived stimuli is very encouraging.

      The reviewers raised several important points that we plan to address in the revised manuscript. These center on:

      Model Architecture and Potential Circularity:

      Both reviewers raised the concern that using a CNN-based model could introduce circularity when comparing V4 functional groups to artificial vision systems, and questioned whether similar results would emerge with alternative architectures. We believe that the in vivo verification provides a critical control for this concern: the MEIs synthesized by our model were empirically validated to elicit significantly higher responses than matched natural image controls, demonstrating that the model captures genuine biological tuning properties rather than architectural artifacts. This means that even if these features emerged from the particular architectural choice, the biological neurons seem to prefer the same features. We will clarify this point in the respective section in the revised manuscript.

      Recording locations and spike sorting contamination:

      Reviewer #2 raised concerns about potential correlation artefacts along the silicon probe. Unfortunately, assessing functional correlations across sessions proved challenging because neurons recorded at different penetration sites had non-overlapping receptive fields, precluding direct comparison of responses to identical stimuli across recording sites. We will make this limitation explicit in the manuscript. Furthermore, we maintain conservative standards for spike sorting to minimize the risk of multi-unit activity (MUA) "smearing" across unit definitions. Our primary analyses are restricted to well-isolated single units that meet all isolation metrics. Due to our low-impedance ground placed on the bone, shared-reference contamination as a source of tuning similarity is also mitigated.

      Quantitative Comparisons to Prior Literature:

      Reviewer #2 also noted that our comparisons between MEIs and known V4 tuning properties (e.g., shape, curvature, texture selectivity) were presented qualitatively, and suggested that explicit image analyses or metrics would strengthen these links to prior literature. We will revise the text to more carefully frame these comparisons as qualitative observations consistent with prior findings.

      Alternative Similarity Metrics:

      We will expand our justification for the Böhm et al. contrastive embedding approach in the Methods section. However, we believe that a systematic comparison of multiple clustering and similarity methods is beyond the scope of the current study.

      In the revised manuscript, we will address these points primarily through clarifications and expanded discussion. Specifically, we will: (1) strengthen our discussion of model architecture choice emphasizing that in vivo verification serves as a critical control against architectural artifacts; (2) clarify the stringent matching criteria underlying our closed-loop sample size and its consistency with the larger population analyses; (3) explicitly describe the recording geometry, including the use of multiple grid holes, and explain why direct functional comparisons across penetrations were precluded by non-overlapping receptive fields; (4) better characterize the spatial relationship between receptive fields and MEI masks; (5) reframe comparisons to prior V4 literature as qualitative observations rather than quantitative validations; and (6) expand our justification for the contrastive embedding approach. We believe these revisions will improve the clarity and rigor of the manuscript while appropriately scoping the claims to what the current data support.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The author presents a new method for microRNA target prediction based on (1) a publicly available pretrained Sentence-BERT language model that the author fine-tunes using MeSH information and (2) downstream classification analysis for microRNA target prediction. In particular, the author's approach, named "miRTarDS", attempts to solve the microRNA target prediction problem by utilizing disease information (i.e., semantic similarity scores) from their language model. The author then compares the prediction performance with other sequence- and disease-based methods and attempts to show that miRTarDS is superior or at least comparable to existing methods. The author's general approach to this microRNA target prediction problem seems promising, but fails to demonstrate concrete computational evidence that miRTarDS outperforms other existing methods. The author's claim that disease information-based language models are sufficient is unfounded. The manuscript requires substantial rewriting and reorganization for readers with a strong background in biomedical research.

      We appreciate the reviewer’s careful examination of modeling, benchmarking, and interpretation, and we are particularly encouraged that they found the proposed method promising. We will make corresponding revisions to the manuscript based on the reviewer’s comments.

      A major issue related to the author's claim of computational advance of miRTarDS: The author does not introduce existing biomedical-specific language models, and does not compare them against miRTarDS's fine-tuned model. The performance of miRTarDS is largely dependent on the semantic embedding of disease terms. The author shows in Figure 5 that MeSH-based fine-tuning leads to a substantial improvement in MeSH-based correlation compared to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1" without sacrificing a large amount of BIOSSES-based correlation. However, the author does not compare the performance of MeSH- and BIOSSES-based correlation with existing language models such as ChatGPT, BioBERT, PubMedBERT, and more. Also, the substantial improvement in MeSH-based correlation is a mere indication that the MeSH-based fine-tuning strategy was reasonable and not that it's superior to the publicly available pretrained SBERT model "multi-qa-MiniLM-L6-cos-v1".

      We thank the reviewer for the constructive suggestions regarding the benchmarking of language models. We acknowledge that the performance of miRTarDS largely depends on the semantic embeddings of disease terms. So, in the revisions, I will: 1) conduct a literature review to introduce existing biomedical-specific language models, and 2) perform a horizontal comparison between our fine-tuned model and these existing models, to more comprehensively evaluate the model’s capabilities.

      Another major issue is in the author's claim that disease-information from miRTarDS's language model is "sufficient" for accurate microRNA target prediction. Available microRNA targets with experimental evidence are largely biased for those with disease implications that have been reported in the biomedical literature. It's possible that their language model is biased by existing literature that has also been used to build microRNA target databases. Therefore, it is important that the author provides strong evidence that excludes the possibility of data leakage circularity. Similar concerns are prevalent across the manuscript, and so I highly recommend that the author reassess the evaluation frameworks and account for inflated performance, biased conclusions, and self-confirming results.

      We thank the reviewer for the comment. We recognize that existing experimentally validated microRNA targets may be biased toward those reported in biomedical literature as disease‑related. To mitigate this bias, we attempted to extract predicted microRNA targets that share a very similar number of miRNA- and gene‑ disease entries as the experimentally validated microRNA targets using the K‑Nearest Neighbors (KNN) method. Then applied Positive‑Unlabeled (PU) Learning to classify the two groups. PU‑Learning is designed to address scenarios where only a subset of the training data is explicitly labeled as positive, while the remaining data are unlabeled—with the unlabeled set containing both potential positives and true negatives—which is highly suitable for the application context of this manuscript [1]. Preliminary results show that after applying the new data extraction and classification approach, model performance drops to around F1=0.73 (the MISIM method also shows a decline, with F1 around 0.58; detailed code is available on GitHub). The specific reasons for this require further investigation.

      Last but not least, the manuscript requires a deeper and careful description and computational encoding of microRNA biology. I'd advise the author to include an expert in microRNA biology to improve the quality of this manuscript. For example, the author uses the pre-miRNA notation and replaces the mature miRNA notation to maintain computational encoding consistency across databases. However, the mature microRNA notation "the '-3p' or '-5p' is critical as the 3p and 5p mature microRNAs have different seed sequences and thus different mRNA targets. The 3p mature microRNA would most likely not target an mRNA targeted by the 5p mature microRNA.

      We thank the reviewer for the critique and suggestion. We fully agree with the reviewer that the distinction between the 3p and 5p mature strands is critical for determining mRNA targeting, as they possess distinct seed sequences. In our study, we relied on the miRNA–disease associations provided by the HMDD database, which annotates interactions at the pre-miRNA level: “… the enriched functions of each mature miRNA are aggregated to the corresponding miRNA precursor.” [2] Furthermore, existing literature suggests that the pre-miRNA level can be appropriate and informative for disease association analyses: “Compared with the mature miRNA method, the pre-miRNA method is more useful for studying disease association.” [3] We also find that, in some cases, both strands cooperate to regulate the same or complementary pathways [4]. We acknowledge the reviewer’s point as an important consideration for future revision. We plan to consult or collaborate with biologists to enhance the quality of the manuscript in biology.

      Reviewer #2 (Public review):

      This study introduces a novel knowledge-driven approach, miRTarDS, which enables microRNA-Target Interaction (MTI) prediction by leveraging the disease association degree between a miRNA and its target gene. The core hypothesis is that this single feature is sufficient to distinguish experimentally validated functional MTIs from computationally predicted MTIs in a binary classification setting. To quantify the disease association, the authors fine-tuned a Sentence-BERT (SBERT) model to generate embeddings of disease descriptions and compute their semantic similarity. Using only this disease association feature, miRTarDS achieved an F1 score of 0.88 on the test set.

      We thank the reviewers for their positive feedback, especially for their recognition of the novelty of this manuscript.

      Strengths:

      The primary strength is the innovative use of the disease association degree as an independent feature for MTI classification. In addition, this study successfully adapts and fine-tunes the Sentence-BERT (SBERT) model to quantify the semantic similarity between biomedical texts (disease descriptions). This approach establishes a critical pathway for integrating powerful language models and the vast growth in clinical/disease data into biochemical discovery, like MTI prediction.

      We would like to thank the reviewer again for their positive feedback. We appreciate their recognition of the novelty of our work, as well as their acknowledgment that the proposed method paves the way for integrating language models with clinical/disease data into biochemical discovery.

      Weaknesses:

      The main weakness lies in its definition of the ground-truth dataset, which serves as a foundation for methodological evaluation. The study defines the Negative Set as computationally predicted MTIs that lack experimental evidence. However, the absence of experimental validation does not equate to non-functionality. Similarly, the miRAW sets are classified by whether the target and miRNA could form a stable duplex structure according to RNA structure prediction. This definition is biologically irrelevant, as duplex stability does not fully encapsulate the complex in vivo binding of miRNAs within the AGO protein complex.

      We thank the reviewers for their constructive feedback. We have realized that treating predicted MTI as a negative class may pose some issues. Therefore, we have decided to adopt Positive Unlabeled (PU) Learning in subsequent updates. This classification method can be applied to datasets such as ours, which contain only positive classes and lack negative ones [1]. We used the miRAW dataset to enable a horizontal comparison of our method with traditional sequence-based prediction approaches. We acknowledge that miRAW may overlook some biological insights, and we plan to optimize the construction of test datasets in the future. Some preliminary explorations have already been conducted, and the relevant code is available on GitHub.

      Furthermore, we will make the following revisions: 1) We will clearly specify the version of miRBase and incorporate more miRNA-related databases. 2) Conduct a further literature review on miRNA biological mechanisms to enhance the quality of the manuscript in biology. 3) Perform a more comprehensive evaluation of the model’s performance. 4) Attempt to identify some representative MTIs that have been overlooked by existing prediction tools but can be predicted by our proposed method.

      References

      (1) Li, F., Dong, S., Leier, A., Han, M., Guo, X., Xu, J., ... & Song, J. (2022). Positive-unlabeled learning in bioinformatics and computational biology: a brief review. Briefings in Bioinformatics, 23(1), bbab461.

      (2) Huang, Z., Shi, J., Gao, Y., Cui, C., Zhang, S., Li, J., ... & Cui, Q. (2019). HMDD v3. 0: a database for experimentally supported human microRNA–disease associations. Nucleic acids research, 47(D1), D1013-D1017.

      (3) Wang, H., & Ho, C. (2023). The human pre-miRNA distance distribution for exploring disease association. International Journal of Molecular Sciences, 24(2), 1009.

      (4) Mitra, R., Adams, C. M., Jiang, W., Greenawalt, E., & Eischen, C. M. (2020). Pan-cancer analysis reveals cooperativity of both strands of microRNA that regulate tumorigenesis and patient survival. Nature Communications, 11(1), 968.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      In this study, Ursu, Centeno, and Leblois record from the cerebellum of zebra finches and analyze neurons for auditory and song-related activity. The paper covers a lot of ground, ranging from lesions of the deep nuclei to song and white noise playback inside and outside of singing, and some level of survey of response types across cerebellar lobules, to provide foundational information on cerebellar relationships with song. There are a number of interesting observations in the study, to me most notably, the lack of responsivity of song-related activity in lobule IV to distorted auditory feedback. This observation is interesting in light of the perennial idea that the cerebellum may participate in rapid error corrections in other somatic control domains. If such a role were relevant for song, it stands to reason that some alteration of activity could be found there. Of course, on the other hand, zebra finches do not show rapid corrections during DAF, so perhaps the null result does not resolve much. Nevertheless, these data are important steps forward in establishing the involvement or lack of involvement in a broader set of brain structures beyond the song control system typically studied. While the study presents some interesting and important inroads, in my opinion, there was a general lack of 'polish' to the study that led to ambiguity in the report and confusing displays. This detracted from rigorous reporting of the findings.

      We thank reviewer #1 for his comments. We will clarify the possible misleading or ambiguous claims and interpretations in the present manuscript and polish the presentation of the results. We will also modify the discussion to better place or results within the current knowledge on cerebellum and songbirds, and in particular address the link between our findings and the low sensitivity to auditory feedback in zebra finches.

      Reviewer #2 (Public review):

      In this paper, the authors investigate the role of the cerebellum in song production in the zebra finch. First, they replicate prior studies to show that lesions of the lateral deep cerebellar nuclei (latDCN, primarily lobules IV-VII and IX) result in shorter duration syllables and song motifs than sham controls. The authors then record neural activity from the cerebellum during both passive auditory exposure in anesthetized birds and in freely singing animals. The authors claim that across multiple lobules, the cerebellum receives "non-selective" auditory inputs locked to syllable boundaries (based on acute recordings) and that cerebellar neurons display song-locked responses that are unaffected by auditory feedback perturbations (in chronic recordings). Moreover, the authors emphasized the distinct properties of lobule IV, which they argue is tightly locked to the onset and offset of syllables, and conclude that the cerebellum might contribute to the duration of song elements.

      This paper presents novel and useful descriptions of song-related neural activity in the cerebellum. However, there are multiple serious issues. First, there are major issues with the design and presentation of the analysis of the electrophysiological data; based on these, it is unclear whether the authors are justified in some of their conclusions about neural tuning or are entitled to any of their claims about the specific tuning or function of neurons in particular lobules. Second, because the authors' conceptual framework seems to ignore possible non-auditory inputs to the cerebellum, their results on (minimal) effects of auditory manipulation during singing are over-interpreted with respect to providing evidence of a forward model. Third, the paper's central assertion - that the songbird cerebellum may contribute to the duration of vocal events during song - was firmly established by a prior lesion study (Radic et al., 2024). Although the authors do cite this prior study with respect to longer-term postlesion changes after cerebellar lesions, this paper also showed a large change in syllable duration immediately after cerebellar lesion (Figure 5 in Radic et al). The electrophysiological results in the present paper could provide valuable insights into the neural mechanisms underlying this already-described role of the songbird cerebellum; however, given the other concerns above, it is not clear that the authors have done so.

      We thank reviewer #2 for these comments. We will improve the presentation of the results, in particular our cell-type classification of the electrophysiology recordings based on latest literature and  the statistics of the tuning differences between lobules. We will also modify the discussion regarding singing related internal models and consider non-auditory feedback. Finally, we will clarify the position of our work within the existing songbird literature and clarify what are the specific contributions of this work. We fully agree that prior studies have already shown the behavioural effects of lesions, as already clearly mentioned in introduction and discussion, and rather aimed at reproducing partially these results before diving into neural mechanisms. We will clarify this point in our revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Thank you so much for your comprehensive and insightful assessment of our manuscript. We appreciate your recognition of the novelty of our experimental design and the utility of our computational framework for interpreting visual remapping across the lifespan and in clinical populations. We are very grateful for your suggestions regarding the narrative flow, which have helped us to improve the manuscript's focus and coherence. Our responses to your specific concerns are detailed below.

      (1) Relevance of the figure-copy results (pp. 13-15). Is it necessary to include the figure-copy task results within the main text? The manuscript already presents a clear and coherent narrative without this section. The figure-copy task represents a substantial shift from the LOCUS paradigm to an entirely different task that does not measure the same construct. Moreover, the ROCF findings are not fully consistent with the LOCUS results, which introduces confusion and weakens the manuscript's coherence. While I understand the authors' intention to assess the ecological validity of their model, this section does not effectively strengthen the manuscript and may be better removed or placed in the Supplementary Materials.

      We thank the reviewer  for their perspective regarding the narrative flow and the transition between the LOCUS paradigm and the ROCF results. However, we remain keen to retain these findings in the main text, as they provide critical ecological and clinical validation for the computational mechanisms identified in our study.

      We think these results strengthen the manuscript for the following main reasons:

      (1) The ROCF we used is a standard neuropsychological tool for identifying constructional apraxia. Our results bridge the gap between basic cognitive neuroscience and clinical application by demonstrating that specific remapping parameters—rather than general memory precision—predict real-world deficits in patients.

      (2) The finding that our winning model explains approximately 62% of the variance in ROCF copy scores across all diagnostic groups further indicates that these parameters from the LOCUS task represent core computational phenotypes that underpin complex, real-life visuospatial construction (copying drawings).

      (3) Previous research has often observed only a weak or indirect link between drawing ability and traditional working memory measures, such as digit span (Senese et al., 2020). This was previously attributed to “deictic” strategies—like frequent eye and hand movements—that minimise the need to hold large amounts of information in memory (Ballard et al., 1995; Cohen, 2005; Draschkow et al., 2021). While our study was not exclusively designed to catalogue all cognitive contributions to drawing, the findings provide significant and novel evidence indicating that transsaccadic integration is a critical driver of constructional (copying drawing) ability. By demonstrating this link, the results provide evidence to stimulate a new direction for future research, shifting the focus from general memory capacity toward the precision of spatial updating across eye movements.

      In summary, by including the ROCF results in the main text, we provide evidence for a functional role for spatial remapping that extends beyond perceptual stability into the domain of complex visuomotor control. We have expanded on these points throughout the revised manuscript:

      In the Introduction: p.2:

      “The clinical relevance of these spatial mechanisms is underscored by significant disruptions to visuospatial processing and constructional apraxia—a deficit in copying and drawing figures—observed in neurodegenerative conditions such as Alzheimer's disease (AD) and Parkinson's disease (PD).[20,21] This raises a crucial question: do clinical impairments in complex visuomotor tasks stem from specific failures in transsaccadic remapping? If so, the computational parameters that define normal spatial updating should also provide a mechanistic account of these clinical deficits, differentiating them from general age-related decline.”

      p.3: "Finally, by linking these mechanistic parameters to a standard clinical measure of constructional ability (the Rey-Osterrieth Complex Figure task), we demonstrate that transsaccadic updating represents a core computational phenotype underpinning real-world visuospatial construction in both health and neurodegeneration.

      In the Results:

      “To assess whether the mechanistic parameters derived from the LOCUS task represent core phenotypes of real-world visuospatial abilities, we also instructed all participants to complete the Rey-Osterrieth Complex Figure copy task (ROCF; Figure 7A) on an Android tablet using a digital pen (see examples in Figure 7B; all Copy data are available in the open dataset: https://osf.io/95ecp/). The ROCF is a gold-standard neuropsychological tool for identifying constructional apraxia.[29] Historically, drawing performance has shown only weak or indirect correlations with traditional working memory measures.[30] This disconnect has been attributed to active visual-sampling strategies—frequent eye movements that treat the environment as an external memory buffer, minimising the necessity of holding large volumes of information in internal working memory.[3–5]

      We hypothesised that drawing accuracy is primarily constrained by the precision of spatial updating across frequent saccades rather than raw memory capacity. To evaluate the ecological validity of the identified saccade-updating mechanism, we modelled individual ROCF copy scores across all four groups using the estimated (maximum a posteriori) parameters from the winning “Dual (Saccade) + Interference” model (Model 7; Figure 8) as regressors in a Bayesian linear model. Prior to inclusion, each regressor was normalised by dividing by the square root of its variance.

      This model successfully explained 61.99% of the variance in ROCF copy scores, indicating that these computational parameters are strong predictors of real-word constructional ability (Figure 8A). … This highlights the critical role of accurate remapping based on saccadic information; even if the core saccadic update mechanism is preserved across groups (as shown in previous analyses), the precision of this updating process is crucial for complex visuospatial tasks. Moreover, worse ROCF copy performance is associated particularly with higher initial angular encoding error. This indicates that imprecision in the initial registration of angular spatial information contributes to difficulties in accurately reproducing complex visual stimuli.”

      In the Discussion:

      “Importantly, our computational framework establishes a direct mechanistic link between trassaccadic updating and real-world constructional ability. Specifically, higher saccade and angular encoding errors contribute to poorer ROCF copy scores. By mapping these mechanistic estimates onto clinical scores, we found that the parameters derived from our winning model explain approximately 62% of the variance in constructional performance across groups. These findings suggest that the computational parameters identified in the LOCUS task represent core phenotypes of visuospatial ability, providing a mechanistic bridge between basic cognitive theory and clinical presentation.

      This relationship provides novel insights into the cognitive processes underlying drawing, specifically highlighting the role of transsaccadic working memoty.ry. Previous research has primarily focused on the roles of fine motor control and eye-hand coordination in this skill.[4,50–55] This is partly because of consistent failure to find a strong relation between traditional memory measures and copying abili [4,31] For instance, common measures of working memory, such as digit span and Corsi block tasks, do not directly predict ROCF copying performance.[31,56] Furthermore, in patients with constructional apraxia, these memory performance measures often remain relatively preserved despite significant drawing impairments.[56–58] In the literature, this lack of association has often been attributed to “deictic” visual-sampling strategies, characterised by frequent eye movements that treat the environment as an external memory buffer, thereby minimising the need to maintain a detailed internal representation.[4,59] In a real-world copying task, the ROCF requires a high volume of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified here. Recent eye-tracking evidence confirms that patients with AD exhibit significantly more saccades and longer fixations during figure copying compared to controls, potentially as a compensatory response to trassaccadic working memory constraints.[56] This high-frequency sampling—averaging between 150 and 260 saccades for AD patients compared to approximately 100 for healthy controls—renders the task highly dependent on the precision of dynamic remapping signals.[56] To ensure this relationship was not driven by a general "g-factor" or non-spatial memory impairment, we further investigated the role of broader cognitive performance using the ACE-III Memory subscale. We found that the relationship between transsaccadic working memory and ROCF performance remains highly significant, even after controlling for age, education, and ACE-III Memory subscore. This suggests that transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.

      In other words, even when visual information is readily available in the world, the act of copying depends critically on working memory across saccades. This reveals a fundamental computational trade-off: while active sampling strategies (characterised with frequent eye-hand movements) effectively reduce the load on capacity-limited working memory, they simultaneously increase the demand for precise spatial updating across eye movements. By treating the external world as an "outside" memory buffer, the brain minimises the volume of information it must hold internally, but it becomes entirely dependent on the reliability with which that information is remapped after each eye movement. This perspective aligns with, rather contradicts, the traditional view of active sampling, which posits that individuals adapt their gaze and memory strategies based on specific task demands.[3,60] Furthermore, this perspective provides a mechanistic framework for understanding constructional apraxia; in these clinical populations, the impairment may not lie in a reduced memory "span," but rather in the cumulative noise introduced by the constant spatial remapping required during the copying process.[58,61]

      Beyond constructional ability, these findings suggest that the primary evolutionary utility of high-resolution spatial remapping lies in the service of action rather than perception. While spatial remapping is often invoked to explain perceptual stability,[11–13,15] the necessity of high-resolution transsaccadic memory for basic visual perception is debated.[13,62–64] A prevailing view suggests that detailed internal models are unnecessary for perception, given the continuous availability of visual information in the external world.[13,44] Our findings support an alternative perspective, aligning with the proposal that high-resolution transsaccadic memory primarily serves action rather than perception.[13] This is consistent with the need for precise localisation in eye-hand coordination tasks such as pointing or grasping.[65] Even when unaware of intrasaccadic target displacements, individuals rapidly adjust their reaching movements, suggesting direct access of the motor system to remapping signals.66 Further support comes from evidence that pointing to remembered locations is biased by changes in eye position,[67] and that remapping neurons reside within the dorsal “action” visual pathway, rather than the ventral “perception” visual pathway.[13,68,69] By demonstrating a strong link between transsaccadic working memory and drawing (a complex fine motor skill), our findings suggest that precise visual working memory across eye movements plays an important role in complex fine motor control.”

      (2) Model fitting across age groups (p. 9).

      It is unclear whether it is appropriate to fit healthy young and healthy elderly participants' data to the same model simultaneously. If the goal of the model fitting is to account for behavioral performance across all conditions, combining these groups may be problematic, as the groups differ significantly in overall performance despite showing similar remapping costs. This suggests that model performance might differ meaningfully between age groups. For example, in Figure 4A, participants 22-42 (presumably the elderly group) show the best fit for the Dual (Saccade) model, implying that the Interference component may contribute less to explaining elderly performance.

      Furthermore, although the most complex model emerges as the best-fitting model, the manuscript should explain how model complexity is penalized or balanced in the model comparison procedure. Additionally, are Fixation Decay and Saccade Update necessarily alternative mechanisms? Could both contribute simultaneously to spatial memory representation? A model that includes both mechanisms-e.g., Dual (Fixation) + Dual (Saccade) + Interference-could be tested to determine whether it outperforms Model 7 to rule out the sole contribution of complexity.

      We thank you for the opportunity to expand upon and clarify our modelling approach. Our decision to use a common generative model for both young and older adults was grounded in the empirical finding that there was no significant interaction between age group and saccade condition for either location or colour memory. While older adults demonstrated lower baseline precision, the specific "saccade cost" remained remarkably consistent across cohorts. This was the justification we proceeded on to use of a common model to assess quantitative differences in parameter estimates while maintaining a consistent mechanistic framework for comparison.

      Moreover, our winning model nests simpler models as special cases, providing the flexibility to naturally accommodate groups where certain components—such as interference—might play a reduced role. This ultimately confirms that the mechanisms for age-related memory deficits in this task reflect more general decline rather than a qualitative failure of the saccadic remapping process.

      This approach is further supported by the properties of the Bayesian model selection (BMS) procedure we used, which inherently penalises the inclusion of unnecessary parameters. Unlike maximum likelihood methods, BMS compares marginal likelihoods, representing the evidence for a model integrated over its entire parameter space. This follows the principle of Bayesian Occam’s Razor, where a model is only favoured if the improvement in fit justifies the additional parameter space; redundant parameters instead "dilute" the probability mass and lower the model evidence.

      Consequently, we contend that a hybrid model combining fixation and saccade mechanisms is unnecessary, as we have already adjudicated between alternative mechanisms of equal complexity. Specifically, Model 6 (Dual Fixation + Interference) and Model 7 (Dual Saccade + Interference) possess an identical number of parameters. The fact that Model 7 emerged as the clear winner—providing substantial evidence against Model 6 with a Bayes Factor of 6.11—demonstrates that our model selection is driven by the specific mechanistic account of the data rather than a simple preference for complexity.

      We have revised the Results and Discussion sections of the manuscript to state these points more explicitly for readers and have included references to established literature regarding the robustness of marginal likelihoods in guarding against overfitting.

      In the Results,

      “By fitting these models to the trial-by-trial response data from all healthy participants (N=42), we adjudicated between competing mechanisms to determine which best explained participant performance (Figure 4). We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[25–27] The analysis yielded a strong result: the “Dual (Saccade) + Interference” model (Model 7 in Table 1) emerged as the winning model, providing substantial evidence against the next best alternative with a Bayes Factor of 6.11.”

      In the Discussion:

      “Our framework employs Variational Laplace, a method used to recover computational phenotypes in clinical populations like those with substance use disorders,[34,35] and the models we fit using this procedure feature time-dependent parameterisation of variance—conceptually similar to the widely-used Hierarchical Gaussian Filter.[36–39] Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[25–27,40] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      Minor point: On p. 9, line 336, Figure 4A does not appear to include the red dashed vertical line that is mentioned as separating the age groups.

      Thank you for pointing out this inconsistency. We apologise for the oversight; upon further review, we concluded that the red dashed vertical line was unnecessary for the clear presentation of the data. We have therefore removed the line from Figure 4A and deleted the corresponding sentence in the figure caption.

      (3) Clarification of conceptual terminology.

      Some conceptual distinctions are unclear. For example, the relationship between "retinal memory" and "transsaccadic memory," as well as between "allocentric map" and "retinotopic representation," is not fully explained. Are these constructs related or distinct? Additionally, the manuscript uses terms such as "allocentric map," "retinotopic representation," and "reference frame" interchangeably, which creates ambiguity. It would be helpful for the authors to clarify the relationships among these terms and apply them consistently.

      Thank you for pointing this out. We have revised the manuscript to ensure that these terms are applied with greater precision and consistency. Our revisions standardise the terminology based on the following distinctions:

      Reference frames: We distinguish between the eye-centred reference frame (coordinate systems that shift with gaze) and the world-centred reference frame (coordinate systems anchored to the environment).

      Retinotopic representation vs. allocentric map: We clarify that retinotopic representations are encoded within an eye-centred reference frame and are updated with every ocular movement. Conversely, the allocentric map is anchored to stable environmental features, remaining invariant to the observer’s gaze direction or position.

      Retinotopic memory vs. transsaccadic memory: We have removed the term "retinal memory" to avoid ambiguity. We now consistently use retinotopic memory to describe the persistence of visual information in eye-centred coordinates within a single fixation. In contrast, transsaccadic memory refers to the higher-level integration of visual information across saccades, which involves the active updating or remapping of representations to maintain stability.

      To incorporate these clarifications, we have implemented the following changes:

      In the Introduction, the second paragraph has been entirely rewritten to establish these definitions at the outset, providing a clearer theoretical framework for the study.

      “Central to this enquiry is the nature of the coordinate system used for the brain's internal spatial representation. Does the brain maintain a single, world-centred (allocentric) map, or does it rely on a dynamic, eye-centred (retinotopic) representation?[11,13,15,16] In the latter system, retinotopic memory preserves spatial information within a fixation, whereas transsaccadic memory describes the active process of updating these representations across eye movements to achieve spatiotopic stability—the perception of a stable world despite eye movements.[11,16–18] If spatial stability is indeed reconstructed through such remapping, the mechanism remains unresolved: do we retain memories of absolute fixation locations, or do we reconstruct these positions from noisy memories of the intervening saccade vectors? We can test these hypotheses by analysing when and where memory errors occur. Assuming that memory precision declines over time,[19] the resulting error distributions should reveal the specific variables that are represented and updated across each saccade.”

      In the Results, the opening section of the Results has been reorganised to align with this terminology. We have ensured that the hypotheses and behavioural data—specifically the definition of "saccade cost"—are introduced using this consistent conceptual vocabulary to improve the overall coherence of the narrative.

      (4) Rationale for the selective disruption hypothesis (p. 4, lines 153-154). The authors hypothesize that "saccades would selectively disrupt location memory while leaving colour memory intact." Providing theoretical or empirical justification for this prediction would strengthen the argument.

      We have revised the Results to state the hypothesis more explicitly and expanded the Discussion to provide a robust theoretical and empirical rationale:

      In the Results,

      “This design allowed us to isolate and quantify the unique impact of saccades on spatial memory, enabling us to test competing hypotheses regarding spatial representation. If spatial memory were solely underpinned by an allocentric mechanism, precision should remain comparable across all conditions as the representation would be world-centred and unaffected by eye movements. Thus, performance in the no-saccade condition should be comparable to the two-saccade condition. Conversely, if spatial memory relies on a retinotopic representation requiring active updating across eye movements, the two-saccade condition was anticipated to be the most challenging due to cumulative decay in the memory traces used for stimulus reconstruction after each saccade.[22] Critically, we hypothesised that this saccade cost would be specific to the spatial domain; while location requires active remapping via noisy oculomotor signals, non-spatial features like colour are not inherently tied to coordinate transformations and should therefore remain stable (see more in Discussion below).

      Meanwhile, the no-saccade condition was expected to yield the most accurate localisation, relying solely on retinotopic information (retinotopic working memory). These predictions were confirmed in young healthy adults (N = 21, mean age = 24.1 years, ranged between 19 and 34). A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(2.2,43.9)=33.2, p<0.001, partial η²=0.62), indicating substantial impairment after eye movements (Figure 2A). In contrast, colour memory remained remarkably stable across all saccade conditions (Figure 2B; F(2.2, 44.7) = 0.68, p=0.53, partial η² =0.03).

      This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.

      Critically, our comparison between spatial and colour memory does not rely on the absolute magnitude of errors, which are measured in different units (degrees of visual angle vs. radians). Instead, we assessed the relative impact of the same saccadic demand on each feature within the same trial. While location recall showed a robust saccade cost, colour recall remained statistically unchanged. To ensure this null effect was not due to a lack of measurement sensitivity, we examined the recency effect; recall performance for the second item was predicted to be better than for the first stimulus in each condition.[23,24] As expected, colour memory for Item 2 was significantly more accurate than for Item 1 (F(1,20) = 6.52, p = 0.02, partial η² = 0.25), demonstrating that the task was sufficiently sensitive to detect standard working memory fluctuations despite the absence of a saccade-induced deficit.”

      In the Discussion, we now write that on p.18:

      “A clear finding was the specificity of the saccade cost to spatial features; it was not observed for non-spatial features like colour, even in neurodegenerative conditions. This discrepancy challenges notions of fixed visual working memory capacity unaffected by saccades.16,44–46 The differential impact on spatial versus non-spatial features in transsaccadic memory aligns with the established "what" and "where" pathways in visual processing.32,33 For objects to remain unified, object features must be bound to stable representations of location across saccades.19 One possibility is that remapping updates both features and location through a shared mechanism, predicting equal saccadic interference for both colour and location in the present study.

      However, our findings suggest otherwise. One potential concern is whether this dissociation simply reflects the inherent spatial noise introduced by fixational eye movements (FEMs), such as microssacades and drifts.47 Because locations are stored in a retinotopic frame, fixational instability necessarily shifts retinal coordinates over time. However, the "saccade cost" here was defined as the error increase relative to a no-saccade baseline of equal duration; because both conditions are subject to the same fixational drift, any FEM-induced noise is effectively subtracted out. Thus, despite the ballistic and non-Gaussian nature of FEMs,48 they cannot account for the fact the saccade cost in the spatial memory, but total absence in the colour domain. Another possibility is that this dissociation reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.

      The fact that identical eye movements—executed simultaneously and with identical vectors—systematically degraded spatial precision while sparing colour suggests a feature-specific susceptibility to transsaccadic remapping. This supports the view that the computational process of updating an object’s location involves a vector-subtraction mechanism—incorporating noisy oculomotor commands (efference copies)—that introduces specific spatial variance. Because this remapping is a coordinate transformation, the resulting sensorimotor noise does not functionally propagate to non-spatial feature representations. Consequently, features like colour may be preserved or automatically remapped without the precision loss associated with spatial updating.11,49 Our paradigm thus provides a refined tool to investigate the architecture of transsaccadic working memory across distinct object features.”

      (5) Relationship between saccade cost and individual memory performance (p. 4, last paragraph).

      The authors report that larger saccades were associated with greater spatial memory disruption. It would be informative to examine whether individual differences in the magnitude of saccade cost correlate with participants' overall/baseline memory performance (e.g. their memory precision in the no-saccade condition). Such analyses might offer insights into how memory capacity/ability relates to resilience against saccade-induced updating.

      We have now conducted the correlation analysis to determine whether baseline memory capacity (no-saccade condition) predicts resilience to saccade-induced updating. The results indicate that these two factors are independent.

      To clarify the nature of the saccade-induced impairment, we have updated the text as follows:

      p.4: “This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.”

      p.5: “Further analysis examined whether individual differences in baseline memory precision (no-saccade condition) predicted resilience to saccadic disruption. Crucially, individual saccade costs (defined as the precision loss relative to baseline) did not correlate with baseline precision (rho = 0.20, p = 0.20). This suggests that the noise introduced by transsaccadic remapping acts as an independent, additive source of variance that is not modulated by an individual’s underlying memory capacity. These findings imply a functional dissociation between the mechanisms responsible for maintaining a representation and those involved in its coordinate transformation.”

      (6) Model fitting for the healthy elderly group to reveal memory-deficit factors (pp. 11-12). The manuscript discusses model-based insights into components that contribute to spatial memory deficits in AD and PD, but does not discuss components that contribute to spatial memory deficits in the healthy elderly group. Given that the EC group also shows impairments in certain parameters, explaining and discussing these outcomes of the EC group could provide additional insights into age-related memory decline, which would strengthen the study's broader conclusions.

      This is a very good point. We rewrote the corresponding results section (p.12-13):

      “Modelling reveals the sources of spatial memory deficits in healthy aging and neurodegeneration - To understand the source of the observed deficits, we applied the winning ‘Dual (Saccade) + Interference’ model the data from all participants (YC, EC, AD, and PD). By fitting the model to the entire dataset, we obtained estimates of the parameters for each individual, which then formed the basis for our group-level analysis. To formally test for group differences, we used Parametric Empirical Bayes (PEB), a hierarchical Bayesian approach that compares parameter estimates across groups while accounting for the uncertainty of each estimate [28]. This allowed us to identify which specific cognitive mechanisms, as formalised by the model parameters, were affected by age and disease.

      The Bayesian inversion used here allows us to quantify the posterior mode and variance for each parameter and the covariance for each parameter. From these, we can compute the probabilities that pairs of parameters differ from one another, which we report as P(A>B)—meaning the posterior probability that the parameter for group A was greater than that for group B.

      We first examined the specific parameters differentiating healthy elderly (EC) from young controls (YC) to isolate the factors contributing to non-pathological, age-related decline. The analysis revealed that healthy ageing is primarily characterised by a significant increase in Radial Decay (P(EC > YC) = 0.995), a heightened susceptibility to Interference (P(EC > YC) = 1.000), and a reduction in initial Angular Encoding precision (P(YC < EC) = 0.002; Figure 6). These results suggest that normal ageing degrades the fidelity of the initial memory trace and its resilience over time, while the core computational process of updating information across saccades remains intact.

      Beyond these baseline ageing effects, our clinical cohorts exhibited more severe and condition-dependent impairments. Radial decay showed a clear, graded impairment: AD patients had a greater decay rate than PD patients (P(AD > PD) = 1.000), who in turn were more impaired than the EC group (P(PD > EC) = 0.996). A similar graded pattern was observed for Interference, where AD patients were most susceptible (P(AD > PD) = 0.999), while the PD and EC groups did not significantly differ (P(PD > EC) = 0.532).

      Patients with AD also showed a tendency towards greater angular decay than controls (P(AD > EC) = 0.772), although this fell below the 95% probability threshold. This effect was influenced by a lower decay rate in the PD group compared to the EC group (P(PD < EC) = 0.037). In contrast, group differences in encoding were less pronounced. While YC exhibited significantly higher precision than all other groups, AD patients showed significantly higher angular encoding error than PD patients (P(AD > PD) = 0.985), though neither group differed significantly from the EC group.

      Crucially, parameters related to the saccade itself—saccade encoding and saccade decay—did not differentiate the groups. This indicates that neither healthy ageing nor the early stages of AD and PD significantly impair the fundamental machinery for transsaccadic remapping. Instead, the visuospatial deficits in these conditions arise from specific mechanistic failures: a faster decay of radial position information and increased susceptibility to interference, both of which are present in healthy ageing but significantly amplified by neurodegeneration.”

      In the Discussion, we added:

      “Although saccade updating was an essential component of the winning model, its two key parameters—initial encoding error and decay rate during maintenance—did not significantly differ across groups. This indicates that the core computational process of updating spatial information based on eye movements is largely preserved in healthy aging and neurodegeneration.

      Instead, group differences were driven by deficits in angular encoding error (precision of initial angle from fixation), angular decay, radial decay (decay in memory of distance from fixation), and interference susceptibility. This implies a functional and neuroanatomical dissociation: while the ventral stream (the “what” pathway) shows an age-related decline in the quality and stability of stored representations, the dorsal-stream (the “where” pathway) parietal-frontal circuits responsible for coordinate transformations remain functionally robust.[31–34] These spatial updating mechanisms appear resilient to the normal ageing trajectory and only break down when challenged by the specific pathological processes seen in Alzheimer’s or Parkinson’s disease.”

      (7) Presentation of saccade conditions in Figure 5 (p. 11). In Figure 5, it may be clearer to group the four saccade conditions together within each patient group. Since the main point is that saccadic interference on spatial memory remains robust across patient groups, grouping conditions by patient type rather than intermixing conditions would emphasize this interpretation.

      There are several valid ways to present these plots, but we chose this format because it allows for a direct visual comparison of the post-hoc group differences within each specific task demand. This arrangement clearly illustrates the graded impairment from young controls through to patients with Alzheimer’s disease across every condition. This structure also directly mirrors our two-way ANOVA, which identified significant main effects for both Group and Condition, but crucially, no significant Group x Condition interaction. We felt that grouping the data by participant group would force readers to look across four separate clusters to compare the slopes, making the stability of the saccadic remapping mechanism much harder to grasp at a glance.

      Reviewer #1 (Recommendations for the authors):

      (1) Formatting of statistical parameters.

      The formatting of statistical symbols should be consistent throughout the manuscript. Some instances of F, p, and t are italicized, while others are not. All statistical symbols should be italicized.

      Thank you for pointing this out. We have audited the manuscript. While we have revised the text to address these instances throughout the Results and Methods sections, any remaining minor formatting inconsistencies will be corrected during the final typesetting stage.

      (2) Minor typographical issues.

      (a) Line 532: "are" should be "be."

      (b) Line 654: "cantered" should be "centered."

      (c) Line 213: In "(p(bonf) < 0.001, |t| {greater than or equal to} 5.94)," the t value should be reported with its degrees of freedom, and t should be reported before p. The same applies to line 215.

      Thank you for your careful reading. All corrected.

      Reviewer #2 (Public review):

      We thank you for your positive feedback regarding our eye-tracking methodology and computational approach. We appreciate your critical insights into the feature-specific disruption hypothesis and the task structure. We have substantially revised the results and discussion about the saccadic interference on colour memory. Below we will answer your suggestions point-by-point:

      Reviewer #2 (Recommendations for the authors):

      (1) The study treats colour and location errors as comparable when arguing that saccades selectively disrupt spatial but not colour memory. However, these measures are defined in entirely different units (degrees of visual angle vs radians on a colour wheel) and are not psychophysically or statistically calibrated. Baseline task difficulty, noise level, or dynamic range do not appear to be calibrated or matched across features. As a result, the null effect of saccades on colour could reflect lower sensitivity or ceiling effects rather than implicit feature-specific robustness.

      We agree that direct comparisons of absolute error magnitudes across different dimensions are not appropriate. Our argument for feature-specific disruption relies not on the scale of errors, but on the presence or absence of a saccade cost within identical trials. In our within-subject design, the same saccade vectors produced a systematic increase in location error while leaving colour error statistically unchanged. To address sensitivity, we observed that colour memory was sufficiently precise to show a significant recency effect (p = 0.02). To further quantify the evidence for the null effect, we performed Bayesian repeated measures ANOVAs, which yielded a BF10 = 0.22. This provides substantial evidence that saccades do not disrupt colour precision, regardless of baseline sensitivity.

      We have substantially revised this in Results, Methods and Discussion:

      In the Results:

      “This design allowed us to isolate and quantify the unique impact of saccades on spatial memory, enabling us to test competing hypotheses regarding spatial representation. If spatial memory were solely underpinned by an allocentric mechanism, precision should remain comparable across all conditions as the representation would be world-centred and unaffected by eye movements. Thus, performance in the no-saccade condition should be comparable to the two-saccade condition. Conversely, if spatial memory relies on a retinotopic representation requiring active updating across eye movements, the two-saccade condition was anticipated to be the most challenging due to cumulative decay in the memory traces used for stimulus reconstruction after each saccade.[22] Critically, we hypothesised that this saccade cost would be specific to the spatial domain; while location requires active remapping via noisy oculomotor signals, non-spatial features like colour are not inherently tied to coordinate transformations and should therefore remain stable (see more in Discussion below).

      Meanwhile, the no-saccade condition was expected to yield the most accurate localisation, relying solely on retinotopic information (retinotopic working memory). These predictions were confirmed in young healthy adults (N = 21, mean age = 24.1 years, ranged between 19 and 34). A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(2.2,43.9)=33.2, p<0.001, partial η²=0.62), indicating substantial impairment after eye movements (Figure 2A). In contrast, colour memory remained remarkably stable across all saccade conditions (Figure 2B; F(2.2, 44.7) = 0.68, p=0.53, partial η² =0.03).

      This “saccade cost”—the loss of memory precision following an eye movement—indicates that spatial representations require active updating across saccades rather than being maintained in a static, world-centred reference frame.

      Critically, our comparison between spatial and colour memory does not rely on the absolute magnitude of errors, which are measured in different units (degrees of visual angle vs. radians). Instead, we assessed the relative impact of the same saccadic demand on each feature within the same trial. While location recall showed a robust saccade cost, colour recall remained statistically unchanged. To ensure this null effect was not due to a lack of measurement sensitivity, we examined the recency effect; recall performance for the second item was predicted to be better than for the first stimulus in each condition.[23,24] As expected, colour memory for Item 2 was significantly more accurate than for Item 1 (F(1,20) = 6.52, p = 0.02, partial η² = 0.25), demonstrating that the task was sufficiently sensitive to detect standard working memory fluctuations despite the absence of a saccade-induced deficit.”

      In the Methods, at the beginning of “Statistical Analysis”, we added

      “Because location and colour recall involve different scales and units, all analyses were performed independently for each feature to avoid cross-dimensional magnitude comparisons.” (p25)

      In the Discussion, we added:

      “A potential concern is whether the observed dissociation between colour and location reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.”

      (2) Colour and then location are probed serially, without a counter-balanced order. This fixed response order could introduce a systematic bias because location recall is consistently subject to longer memory retention intervals and cognitive interference from the colour decision. The observed dissociation-saccades impair location but not colour, and may therefore reflect task structure rather than implicit feature-specific differences in trans-saccadic memory.

      Thank you for the insightful observation regarding our fixed response order. We acknowledge that that a counterbalanced design is typically preferred to mitigate potential order effects. However, we chose this consistent sequence to ensure the task remained accessible for cognitively impaired patients (i.e., the Alzheimer’s disease (AD) and Parkinson’s disease (PD) cohorts). Conducting an eye-tracking memory task with cognitively impaired patients is challenging, as they may struggle with task engagement or forget complex instructions. During the design phase, we prioritised a consistent structure to reduce the cognitive load and task-switching demands that typically challenge these cohorts.

      Critically, because the saccade cost is a relative measure calculated by comparing conditions with identical timings, any bias from the fixed order is present in both the baseline and saccade trials. The disruption we report is therefore a specific effect of eye movements that goes beyond the noise introduced by the retention interval or the preceding colour report.

      We added the following text in the Methods – experimental procedure (p.22):

      “Recall was performed in a fixed order, with colour reported before location. This sequence was primarily chosen to minimise cognitive load and task-switching demands for the two neurological patient cohorts, ensuring the paradigm remained accessible for individuals with AD and PD. While this order results in a slightly longer retention interval for location recall, the saccade cost was identified by comparing location error across experimental conditions with similar timings but varying saccadic demands.”

      (3) Relatedly, because spatial representations are retinotopic, fixational eye movements (FEMs - microsaccades and drift) displace the retinal coordinates of encoded positions, increasing apparent spatial noise with time delays. Colour memory, however, is feature-based and unaffected by small retinal translations. Thus, any between-condition or between-group differences in FEMs could selectively inflate location error and the associated model parameters (encoding noise, decay, interference), while leaving colour error unchanged. Note that FEMs tend to be slightly ballistic [1,2], hence not well modelled with a Gaussian blur.

      This is a very insightful point. We have now addressed this in detail within the discussion:

      “However, our findings suggest otherwise. One potential concern is whether this dissociation simply reflects the inherent spatial noise introduced by fixational eye movements (FEMs), such as microssacades and drifts.[46] Because locations are stored in a retinotopic frame, fixational instability necessarily shifts retinal coordinates over time. However, the "saccade cost" here was defined as the error increase relative to a no-saccade baseline of equal duration; because both conditions are subject to the same fixational drift, any FEM-induced noise is effectively subtracted out. Thus, despite the ballistic and non-Gaussian nature of FEMs,n [47] they cannot account for the fact the saccade cost in the spatial memory, but total absence in the colour domain. Another possibility is that this dissociation reflects differences in baseline task difficulty or dynamic range. Yet, the presence of a robust recency effect in colour memory (Figure 2B) confirms that our paradigm was sensitive to memory-dependent variance and was not limited by floor or ceiling effects.”

      (4) There is no in silico demonstration that the modelling framework can recover the true generating model from synthetic data or recover accurate parameters under realistic noise levels, which can be challenging in generative models with a hierarchical structure (as per [3], for example). Figure 8b shows that the parameters possess substantial posterior covariance, which raises concerns as to whether they can be reliably disambiguate.

      Many thanks for this comment. We have added a simple recovery analysis as detailed below but are also keen to ensure we fully answer your question—which has more to do with empirical rather than simulated data—and make clear the rationale for this analysis in this instance.

      We added this in Supplementary Materials:

      “Model validation and recovery analysis

      The following section provides a detailed technical assessment of the model inversion scheme, focusing on the discriminability of the model space and the identifiability of individual parameters.

      Recovery analyses of this sort are typically used prior to collecting data to allow one to determine whether, in principle, the data are useful in disambiguating between hypotheses. In this sense, they have a role analogous to a classical power calculation. However, their utility is limited when used post-hoc when data have already been collected, as the question of whether the models can be disambiguated becomes one of whether non-trivial Bayes factors can be identified from those data.

      The reason for including a recovery analysis here is not to identify whether the model inversion scheme identifies a ‘true’ model. The concept of ‘true generative models’ commits to a strong philosophical position which is at odds with the ‘all models are wrong, but some are useful’ perspective held by many in statistics, e.g., (So, 2017). Of note, one can always confound a model recovery scheme by generating the same data in a simple way, and in (one of an infinite number of) more complex ways. A good model inversion scheme will always recover the simple model and therefore would appear to select the ‘wrong’ model in a recovery analysis. However, it is still the best explanation for the data. For these reasons, we do not necessarily expect ‘good’ recoverability in all parameter ranges. This is further confounded by the relationship between the models we have proposed—e.g., an interference model with very low interference will look almost identical to a model with no interference. The important question here is whether they can be disambiguated with real data.

      Instead, the value of a post-hoc recovery analysis here is to evaluate whether there was a sensible choice of model space—i.e., that it was not a priori guaranteed that a single model (and, specifically, the model we found to be the best explanation for the data) would explain the results of all others. To address this, for each model, we simulated 16 datasets, each of which relied upon parameters sampled from the model priors, which included examples of each of the experimental conditions. We then fit each of these datasets to each of the 7 models to construct the confusion matrix shown in the lower panel of Supplementary Figure 3, by accumulating evidence over each of the 16 participants generated according to each ‘true’ model (columns) for each of the possible explanatory models (rows). This shows that no one model, for the parameter ranges sampled here, explains all other datasets. Interestingly, our ‘winning’ model in the empirical analysis is not the best explanation for any of the datasets simulated (including its own). This is reassuring, in that it implies this model winning was not a foregone conclusion and is driven by the data—not just the choice of model space.”

      Your point about the posterior covariance is well founded. As we describe in Supplementary Materials, this is an inherent feature of inverse problems (analogous to EEG source localisation). However, the fact that our posterior densities move significantly away from the prior expectations demonstrates that the data are indeed informative. By adopting a Bayesian framework, we are able to explicitly quantify this uncertainty rather than ignoring it, providing a more transparent account of parameter identifiability. We have added the following in the same section of Supplementary Materials:

      “This problem is an inverse problem—inferring parameters from a non-linear model. We therefore expect a degree of posterior covariance between parameters and, consequently, that they cannot be disambiguated with complete certainty. While some degree of posterior covariance is inherent to inverse models—including established methods like EEG source localisation—the fact that many of the parameters are estimated with posterior densities that do not include their prior expectations implies the data are informative about these.

      The advantage of the Bayesian approach we have adopted here is that we can explicitly quantify posterior covariance between these parameters, and therefore the degree to which they can be disambiguated. While the posterior covariance matrices from empirical data are the relevant measure here, we can better understand the behaviour of the model inversion scheme in relation to the specific models used using the model recovery analysis reported in Supplementary figure 3.

      The middle panel of the figure is key, along with the correlation coefficients reported in the figure caption. Here, we see at least a weak positive correlation (in some cases much stronger) for almost all parameters and limited movement from prior expectations for those parameters that are less convincingly recovered. This reinforces that the ability of the scheme to recover parameters is best assessed in terms of the degree of movement of posterior from prior values following fitting to empirical data.”

      (5) The authors employ Bayes factors (BFs) to disambiguate models, but BFs would also strengthen the claims that location, but not colour, is impacted by saccades. Despite colour being a circular variable, colour error is analysed using ANOVA on linearised differences (radians). The authors should also arguably use circular statistics, such as the von Mises distribution, for the analysis of colour.

      Regarding the use of circular statistics, you are correct that such error distributions are not suitable for ANOVA, and it is better to use circular statistics. However, for the present dataset, we used the mean absolute angular error per condition (ranging from 0 to π radians), which represents the shortest distance on the colour wheel between the target and the response.

      This approach effectively linearises the measure by removing the 2π wrap-around boundary. because the observed errors were relatively small and did not cluster near the π boundary—even in the patient cohorts (Figure 5B)—the "wrap-around" effect of circular space is negligible. Moreover, by analysing the mean error across trials for each condition, rather than trial-wise data, we invoke the Central Limit Theorem. This ensures that the distribution of these means is approximately normal, satisfying the fundamental assumptions of ANOVA. Due to these reasons, we adopted simpler linear models. We confirmed that the data did not violate the assumptions of linear statistics. In this low-noise regime, linear and circular models converge on the same conclusions. This has been revised in Methods:

      “For colour memory, we calculated the absolute angular error, defined as the shortest distance on the colour wheel between the target and the reported colour (range 0 to π radians). For the primary statistical analyses, we utilised the mean absolute error per condition for each participant. By analysing these condition-wise means rather than trial-wise raw data, we invoke the Central Limit Theorem, which ensures that the sampling distribution of these means approximates normality. Because the absolute errors in this paradigm were relatively small and did not approach the π boundary (Figure 5B) even in the clinical cohorts, the data were treated as a continuous measure in our linear ANOVAs and regression models. Moreover, because location and colour recall involve different scales and units, all analyses were performed independently for each feature to avoid cross-dimensional magnitude comparisons.”

      We have also now integrated Bayesian repeated measures ANOVA throughout the manuscript. The Results section for the young healthy adults now reads (p. 4):

      “A repeated measures ANOVA revealed a significant main effect of saccades on location memory (F(3, 20) = 51.52, p < 0.001, partial η²=0.72), with Bayesian analysis providing decisive evidence for the inclusion of the saccade factor (BF<sub>incl</sub> = 3.52 x 10^13, P(incl|data) = 1.00). In contrast, colour memory remained remarkably stable across all saccade conditions (F(3, 20) = 0.57, p = 0.64, partial η² =0.03). This null effect was supported by Bayesian analysis, which provided moderate evidence in favour of the null hypothesis (BF<sub>01</sub> = 8.46, P(excl|data) = 0.89), indicating that the data were more than eight times more likely under the null model than a model including saccade-related impairment.”

      For elderly healthy adults:

      “In contrast, colour memory remained unaffected by saccade demands (F(3, 20) = 0.57, p = 0.65, partial η² =0.03), again supported by the Bayesian analysis: BF<sub>01</sub> = 8.68, P(excl|data) = 0.90.”

      For patient cohorts:

      “Bayesian repeated measures ANOVAs further supported this dissociation, providing moderate evidence for the null hypothesis in the AD group (BF<sub>01</sub> = 3.35, P(excl|data) = 0.77) and weak evidence in the PD group (BF<sub>01</sub> = 2.23, P(excl|data) = 0.69). This indicates that even in populations with established neurodegeneration, the detrimental impact of eye movements is specific to the spatial domain.”

      Related description is also updated in Methods – Statistical Analysis.

      Minor:

      (1) The modelling is described as computational but is arguably better characterised as a heuristic generative model at Marr's algorithmic level. It does not derive from normative computational principles or describe an implementation in neural circuits.

      We appreciate your perspective on the classification of our model within Marr’s hierarchy. We agree that our framework is best characterised as an algorithmic-level generative model. Our objective was to identify the mechanistic principles governing transsaccadic updating rather than to provide a normative derivation or a specific circuit-level implementation.

      To ensure readers do not over-interpret the term ‘computational’, we have added a clarifying statement in the Discussion acknowledging the algorithmic nature of the model. Interestingly, we note that a model predicated on this form of spatial diffusion implies a neural field representation with a spatial connectivity kernel whose limit approximates the second derivative of a Dirac delta function. While a formal neural field implementation is beyond the scope of the present work, our algorithmic results provide the necessary constraints for such future biophysical models.

      p.20: “While we describe the present framework as 'computational', it is more precisely characterised as an algorithmic-level generative model within Marr’s hierarchy. Our focus was on defining the rules of spatial integration and the sources of eye-movement-induced noise, rather than deriving these processes from normative principles or defining their specific neural implementation.”

      (2) I did not find a description of the recruitment and characterization of the AD and PD patients.

      Apologies for this omission. We have now included a detailed description of participant recruitment and clinical characterisation in the Methods section and also updated Table 2:

      “A total of 87 participants completed the study: 21 young healthy adults (YC), 21 older healthy adults (EC), 23 patients with Parkinson’s disease (PD), and 22 patients with Alzheimer’s disease (AD). Their demographic and clinical details are summarised in Table 2. Initially, 90 participants were recruited (22 YC, 21 EC, 25 PD, 22 AD); however, three individuals (1 YC and 2 PD) were excluded from all analyses due to technical issues during data acquisition.

      All participants were recruited locally in Oxford, UK. None were professional artists, had a history of psychiatric illness, or were taking psychoactive medications (excluding standard dopamine replacement therapy for PD patients). Young participants were recruited via the University of Oxford Department of Experimental Psychology recruitment system. Older healthy volunteers (all >50 years of age) were recruited from the Oxford Dementia and Ageing Research (OxDARE) database.

      Patients with PD were recruited from specialist clinics in Oxfordshire. All had a clinical diagnosis of idiopathic Parkinson's disease and no history of other major neurological or psychiatric conditions. While specific dosages of dopamine replacement therapy (e.g., levodopa equivalent doses) were not systematically recorded, all patients were tested while on their regular medication regimen ('ON' state).

      Patients with PD were recruited from clinics in the Oxfordshire area. All had a clinical diagnosis of idiopathic Parkinson’s disease and no history of other major neurological or psychiatric illnesses. While all patients were tested in their regular medication ‘ON’ state, the specific pharmacological profiles—including the exact types of medication (e.g., levodopa, dopamine agonists, or combinations) and dosages—were not systematically recorded. The disease duration and PD severity were also un-recorded for this study.

      Patients with AD were recruited from the Cognitive Disorders Clinic at the John Radcliffe Hospital, Oxford, UK. All AD participants presented with a progressive, multidomain, predominantly amnestic cognitive impairment. Clinical diagnoses were supported by structural MRI and FDG-PET imaging consistent with a clinical diagnosis of AD dementia (e.g., temporo-parietal atrophy and hypometabolism).69 All neuroimaging was reviewed independently by two senior neurologists (S.T. and M.H.).

      Global cognitive function was assessed using the Addenbrooke’s Cognitive Examination-III (ACE-III).70 All healthy participants scored above the standard cut-off of 88, with the exception of one elderly participant who scored 85. In the PD group, two participants scored below the cut-off (85 and 79). In the AD group, six participants scored above 88; these individuals were included based on robust clinical and radiological evidence of AD pathology rather than their ACE-III score alone.”

      (3) YA and OA patients appear to differ in gender distribution.

      We acknowledge the difference in gender distribution between the young (71.4% female) and older adult (57.1% female) cohorts. However, we do not anticipate that gender influences the fundamental computational mechanisms of retinotopic maintenance or transsaccadic remapping. These processes represent low-level visuospatial functions for which there is no established evidence of gender-specific differences in precision or coordinate transformation. We have ensured that the gender distribution for each cohort is clearly listed in the demographics table (Table 2) for full transparency.

      Thank you very much for very insightful feedback!

      Reviewer #3 (Public review):

      Thank you for the positive feedback regarding our inclusion of clinical groups and the identification of computational phenotypes that differentiate these cohorts.

      To address your concerns about the model, we have clarified our use of Bayesian Model Selection, which inherently penalises model complexity to ensure that our results are not driven solely by the number of parameters. We will also provide further evidence regarding model generalisability to address the concern of overfitting.

      Regarding the link with the ROCF, we have revised the manuscript to better highlight the specific relationship between our transsaccadic parameters and the ROCF data and better motivate the inclusion of these results in the main text.

      Below is our response to your suggestions point-by-point:

      (1) The models tested differ in terms of the number of parameters. In general, a larger number of parameters leads to a better goodness of fit. It is not clear how the difference in the number of parameters between the models was taken into account. It is not clear whether the modelling results could be influenced by overfitting (it is not clear how well the model can generalize to new observations).

      To ensure our results were not driven by the number of parameters, we utilised random-effects Bayesian Model Selection (BMS) to adjudicate between our candidate models. Unlike maximum likelihood methods, BMS relies on the marginal likelihood (model evidence), which inherently balances model fit against parsimony—a principle known as the Occam’s Razor (Rasmussen and Ghahramani, 2000). In this framework, a model is only preferred if the improvement in fit justifies the additional parameter space; redundant parameters actually lower model evidence by diluting the probability mass. We would be happy to point toward literature that discusses how these marginal likelihood approximations provide a more robust guard against overfitting than standard metrics like BIC or AIC (MacKay, 2003; Murray and Ghahramani, 2005; Penny, 2012).

      The fact that the "Dual (Saccade) + Interference" model (Model 7) emerged as the winner—with a Bayes Factor of 6.11 against the next best alternative—demonstrates that its complexity was statistically justified by its superior account of the trial-by-trial data.

      Furthermore, to address the risk of overfitting, we established the generalisability of these parameters by using them to predict performance on an independent clinical task. These parameters successfully explained ~62% of the variance in ROCF copy scores—a very distinct, real-world task--confirming that they represent robust computational phenotypes rather than idiosyncratic fits to the initial dataset.

      In the Results (p10):

      “We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[25–27]”

      In the Discussion (p17):

      “Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[25–27,42] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      (2) Results specificity: it is not clear how specific the modelling results are with respect to constructional ability (measured via the Rey-Osterrieth Complex Figure test). As with any cognitive test, performance can also be influenced by general, non-specific abilities that contribute broadly to test success.

      We agree that constructional performance is influenced by both specific mechanistic constraints and general cognitive abilities. To isolate the unique contribution of transsaccadic updating, we therefore performed a partial correlation analysis across the entire sample. We examined the relationship between location error in the two-saccades condition (our primary behavioural measure of transsaccadic memory) and ROCF copy scores. Even after partialling out the effects of global cognitive status (ACE-III total score), age, and years of education, the correlation remained highly significant (rho = -0.39, p < 0.001).

      This suggests that our model captures a specific computational phenotype—the precision of spatial updating during active visual sampling—rather than acting as a proxy for non-specific cognitive decline. This mechanistic link explains why traditional working memory measures (e.g., digit span or Corsi blocks) frequently fail to predict drawing performance; unlike those tasks, figure copying requires thousands of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified by our modelling framework.

      We added the following text in the Discussion (p19):

      “We also found that the relationship between transsaccadic working memory and ROCF performance remains highly significant (rho = -0.39, p < 0.001), even after controlling for age, education, and global cognitive status (ACE-III total score). Consequently, transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.[57]”

      Reviewer #3 (Recommendations for the authors):

      (1) The authors mention in the introduction the following: "One key hypothesis is that we use working memory across visual fixations to update perception dynamically", citing the following manuscript:

      Harrison, W. J., Stead, I., Wallis, T. S. A., Bex, P. J. & Mattingley, J. B. A computational 906 account of transsaccadic attentional allocation based on visual gain fields. Proc. Natl. 907 Acad. Sci. U.S.A. 121, e2316608121 (2024).

      However, the manuscript above does not refer explicitly to the involvement of working memory in transaccadic integration of object location in space. Rather, it takes advantage of recent evidence showing how the true location of a visual object is represented in the activity of neurons in primary visual cortex ( A. P. Morris, B. Krekelberg, A stable visual world in primate primary visual cortex. Curr. Biol. 29, 1471-1480.e6 (2019) ). The model hypothesizes that true locations of objects are readily available, and then allocates attention in real-world coordinates, allowing efficient coordination of attention and saccadic eye movements.

      Thank you for clarification. As suggested, we have now included the citation of Morris & Krekelberg (2019) to acknowledge the evidence for stable object locations within the primary visual cortex.

      (2) The authors in the introduction and the title use the terms 'transaccadic memory' and 'spatial working memory'. However, it is not clear whether these can be used interchangeably or are reflecting different constructs.

      Classical measures of visuo-spatial working memory are derived from the Corsi task (or similar), where the location of multiple objects is displayed and subsequently remembered. In such tasks, eye movements and saccades are not generally considered, only memory performance, representing the visuo-spatial span.

      Transaccadic memory tasks are instead explicitly measuring the performance on remembered object locations of features across explicit eye movements, usually using a very limited number of objects (1 or 2, as is the case for the current manuscript).

      While the two constructs share some features, it is not clear whether they represent the same underlying ability or not, especially because in transaccadic tasks, participants are required to perform one or more saccades, thus representing a dual-task case.

      I think the relationship between 'transaccadic memory' and 'spatial working memory' should be clarified in the manuscript.

      Thank you. Yes, we have added this within the Methods - Measurement of saccade cost to clarify that spatial working memory is the broad cognitive construct responsible for short-term maintenance, whereas transsaccadic memory is the specific, dynamic process of remapping representations to maintain stability across eye movements.

      In Methods (p.22):

      “Within this framework, it is important to distinguish between the broad construct of spatial working memory and the specific process of transsaccadic memory. While spatial working memory refers to the general ability to maintain spatial information over short intervals, transsaccadic memory describes the dynamic updating of these representations—termed remapping—to ensure stability across eye movements. Unlike classical 'static' measures of spatial working memory, such as the Corsi block task which focuses on memory span, transsaccadic memory tasks explicitly require the integration of stored visual information with motor signals from intervening saccades. Our paradigm treats transsaccadic updating as a core computational process within spatial working memory, where eye-centred representations are actively reconstructed based on noisy memories of the intervening saccade vectors.”

      (3) In Figure 1, the second row indicates the presentation of item 2. Indeed, in the condition 'saccade-after-item-1', the target in the second row of Figure 1 is displaced, as expected. This clarifies the direction and amplitude of the first saccade requested. However, from Figure 1, it is hard to understand the amplitude and direction of the second requested saccade. I think the figure should be updated, giving a full description of the direction and amplitude of the second saccade as well ('saccade-after-item-2' and 'two-saccades' conditions).

      We agree that making the figure legend more self-contained is beneficial for the reader. While the specific physical parameters and the trial sequence for each condition are detailed in the Results and Methods sections, we have now updated the legend for Figure 1 to explicitly define these details. Specifically, we have clarified that the colour wheel itself served as the target for the second instructed saccade (i.e., the movement from the second fixation cross to the colour wheel location). We have also included the quantitative constraint that all saccade vectors were at least 8.5 degrees of visual angle in amplitude. Given the limited space within a figure legend, we hope these concise additions provide the transparency requested without interrupting the conceptual flow of the diagram.

      Updated Figure 1 legend:

      “Participants were asked to fixate a white cross, wherever it appeared. They had to remember the colour and location of a sequence of two briefly presented coloured squares (Item 1 and 2), each appearing within a white square frame. They then fixated a colour wheel wherever it appeared on the screen, which served as the target for the second instructed saccade (i.e., a movement from the second fixation cross to the colour wheel location). This cued recall of a specific square (Item 1 or Item 2 labelled within the colour wheel). Participants selected the remembered colour on the colour wheel which led to a square of that colour appearing on the screen. They then dragged this square to its remembered location on the screen. Saccadic demands were manipulated by varying the locations of the second frame and the colour wheel, resulting in four conditions in their reliance on retinotopic versus transsaccadic memory: (1) No-Saccade condition providing a baseline measure of within-fixation precision as no eye movements were required. (2) Saccade After Item 1; (3) Saccade After Item 2; (4) Saccades after both items (Two Saccades condition). In all conditions requiring eye movements, saccade vectors were constrained to a minimum amplitude of 8.5° (degrees of visual angle). While the No-Saccade condition isolates retinotopic working memory, conditions (2) to (4) collectively quantify the impact of varying saccadic demands and timings on the maintenance of spatial information, thereby assessing the efficacy of the transsaccadic updating process.”

      (4) The authors write: "Eye tracking analysis confirmed high compliance: participants correctly maintained fixation or executed saccades as instructed on the vast majority of trials (83% {plus minus} 14%). Non-compliant trials were excluded 136 from further analysis." 14% of excluded trials are a substantial fraction of trials, given the task requirements. Is this proportion of excluded trials different between experimental groups, and are experimental groups contributing equally to this proportion?

      We thank the reviewer for pointing this out, and we apologise for the confusion. The 83% trial number was actually across all four cohorts, and all conditions, and it was actually above 90% for YC, EC and even AD, but dropped to 60 ish in PD group.

      We now have conducted a full analysis of compliant trial counts using a mixed ANOVA (4 saccade conditions x 4 cohorts). This analysis revealed a main effect of group (F(3, 80) = 8.06, p < 0.001), which was driven by lower compliance in the PD cohort (mean approx. 25.4 trials per condition) compared to the AD, EC, and YC cohorts (means ranging from 35.8 to 38.9 trials per condition). Crucially, however, the interaction between group and condition was not statistically significant (p = 0.151). This indicates that the relative impact of saccade demands on trial retention was consistent across all four groups.

      Because our primary behavioural measure—the saccade cost—is a within-subject comparison of impairment across conditions, these differences in absolute trial numbers do not introduce a systematic bias into our findings. Furthermore, even with the higher attrition in the PD group, we retained a sufficient number of high-quality trials (minimum mean of ~23 trials in the most demanding condition) to support robust trial-by-trial parameter estimation and valid statistical inference. We have updated the Results and Methods to reflect these details.

      In Results (p4):

      “To mitigate potential confounds, we monitored eye position throughout the experiment. Eye-tracking analysis confirmed high compliance in healthy adults, who followed instructions on the vast majority of trials (Younger Adults: 97.2 ± 5.2 %; Older Adults: 91.3 ± 20.4 %). The mean difference between these groups was negligible, representing just 1.25 trials per condition, and was not statistically significant (t(80) = 0.16, p = 1.000; see more in Methods – Eyetracking data analysis). Non-compliant trials were excluded from all further analyses.”

      In Methods (p27):

      “Eye-tracking analysis confirmed high compliance overall, with participants correctly maintaining fixation or executing saccades on the vast majority of trials (83% across all participants). A mixed ANOVA revealed a main effect of group on trial retention (F(3, 80) = 8.06, p < 0.001, partial η² = 0.23), primarily due to lower compliance in the PD cohort (YC: 97±4%; EC: 91±10%; AD: 95±5%; PD: 63±38%). Importantly, there was no significant interaction between group and saccade condition (F(3.36, 80) = 1.78, p = 0.15, partial η² = 0.008), suggesting that trial attrition was not disproportionately affected by specific task demands in any group.

      We acknowledge that this reduced trial count in the PD group represents a limitation for across-cohort comparison. However, the absolute number of compliant trials in PD group (mean approx. 25 per condition) remained sufficient for robust trial-by-trial parameter estimation. Furthermore, the lack of a significant group-by-condition interaction confirms that the results reported for this cohort remain valid and that our primary finding of a selective spatial memory deficit is robust to these differences in data retention.”

      (5) Modelling

      (a) Degrees of freedom, cross-validation, number of parameters.

      I appreciate the effort in introducing and testing different models. Models of increase in complexity and are based on different assumptions about the main drivers and mechanisms underlying the dependent variable. The models differ in the number of parameters. How are the differences in the number of parameters between models taken into account in the modelling analysis? Is there a cost associated with the extra parameters included in the more complex models?

      (b) Cross-validation and overfitting.

      Overfitting can occur when a model learns the training data but cannot generalize to novel datasets. Cross-validation is one approach that can be used to avoid overfitting. Was cross-validation (or other approaches) implemented in the fitting procedure against overfitting? Otherwise, the inference that can be derived from the modelled parameters can be limited.

      To address your concerns regarding model complexity and overfitting, we would like to clarify our use of Bayesian Model Selection (BMS). Unlike frequentist methods that often rely on cross-validation to assess generalisability, we used random-effects BMS based on the marginal likelihood (model evidence). This approach inherently implements Bayesian Occam’s Razor by integrating out the parameters. Under this framework, the use of the marginal likelihood for model selection provides a mathematically equivalent safeguard to frequentist cross-validation, as it evaluates the model's ability to generalise across the entire parameter space rather than just finding a maximum likelihood fit for the training data. Thus, models are penalised not just for the absolute number of parameters, but for their overall functional flexibility. A more complex model is only preferred if the improvement in model fit is substantial enough to outweigh this inherent penalty. The emergence of Model 7 as the winner (Bayes Factor = 6.11 against the next best alternative) confirms that its additional complexity is statistically justified.

      Furthermore, in this study we provided an external validation of these recovered parameters by demonstrating that they explain 62% of the variance in an independent, real-world, clinical task (ROCF copy). This empirical evidence confirms that our model captures robust mechanistic phenotypes rather than idiosyncratic noise. We have updated the Results and Discussion to explicitly state these.

      In Results: (p10)

      “We used random-effects Bayesian model selection to identify the most plausible generative model. This process relies on the marginal likelihood (model evidence), which inherently balances model fit against complexity—a principle often referred to as Occam’s razor.[26–28]”

      In Discussion: (p17)

      “Importantly, the risk of overfitting is mitigated by the Bayesian Model Selection framework; by utilising the marginal likelihood for model comparison, the procedure inherently penalises excessive model complexity and promotes generalisability.[26–28,43] This generalisability was further evidenced by the model's ability to predict performance on the independent ROCF task, confirming that these parameters represent robust mechanistic phenotypes rather than idiosyncratic fits to the initial dataset.”

      (6) n. of participants.

      (a) The authors write the following: "A total of healthy volunteers (21 young adults, mean age = 24.1 years; 21 older adults, mean age = 72.4 years) participated in this study. Their demographics are shown in Table 1. All participants were recruited locally in Oxford." However, Table 1 reports the data from more than 80 participants, divided into 4 groups. Details about the PD and AD groups are missing. Please clarify.

      We apologize for this lack of clarity in the text. We have rewrote and expand the “Participants” section and corrected Table 2 in the Methods section to reflect the correct number of participants.

      In Methods (p20):

      “A total of 87 participants completed the study: 21 young healthy adults (YC), 21 older healthy adults (EC), 23 patients with Parkinson’s disease (PD), and 22 patients with Alzheimer’s disease (AD). Their demographic and clinical details are summarised in Table 2. Initially, 90 participants were recruited (22 YC, 21 EC, 25 PD, 22 AD); however, three individuals (1 YC and 2 PD) were excluded from all analyses due to technical issues during data acquisition.

      All participants were recruited locally in Oxford, UK. None were professional artists, had a history of psychiatric illness, or were taking psychoactive medications (excluding standard dopamine replacement therapy for PD patients). Young participants were recruited via the University of Oxford Department of Experimental Psychology recruitment system. Older healthy volunteers (all >50 years of age) were recruited from the Oxford Dementia and Ageing Research (OxDARE) database.

      Patients with PD were recruited from specialist clinics in Oxfordshire. All had a clinical diagnosis of idiopathic Parkinson's disease and no history of other major neurological or psychiatric conditions. While specific dosages of dopamine replacement therapy (e.g., levodopa equivalent doses) were not systematically recorded, all patients were tested while on their regular medication regimen ('ON' state).

      Patients with PD were recruited from clinics in the Oxfordshire area. All had a clinical diagnosis of idiopathic Parkinson’s disease and no history of other major neurological or psychiatric illnesses. While all patients were tested in their regular medication ‘ON’ state, the specific pharmacological profiles—including the exact types of medication (e.g., levodopa, dopamine agonists, or combinations) and dosages—were not systematically recorded. The disease duration and PD severity were also un-recorded for this study.

      Patients with AD were recruited from the Cognitive Disorders Clinic at the John Radcliffe Hospital, Oxford, UK. All AD participants presented with a progressive, multidomain, predominantly amnestic cognitive impairment. Clinical diagnoses were supported by structural MRI and FDG-PET imaging consistent with a clinical diagnosis of AD dementia (e.g., temporo-parietal atrophy and hypometabolism).[70] All neuroimaging was reviewed independently by two senior neurologists (S.T. and M.H.).

      Global cognitive function was assessed using the Addenbrooke’s Cognitive Examination-III (ACE-III).[71] All healthy participants scored above the standard cut-off of 88, with the exception of one elderly participant who scored 85. In the PD group, two participants scored below the cut-off (85 and 79). In the AD group, six participants scored above 88; these individuals were included based on robust clinical and radiological evidence of AD pathology rather than their ACE-III score alone.”

      (b) As modelling results rely heavily on the quality of eye movements and eye traces, I believe it is necessary to report details about eye movement calibration quality and eye traces quality for the 4 experimental groups, as noisier data could be expected from naïve and possibly older participants, especially in case of clinical conditions. Potential differences in quality between groups should be discussed in light of the results obtained and whether these could contribute to the observed patterns.

      Thank you for pointing this out. We have revised the Methods about how calibration was done:

      (p27) “Prior to the experiment, a standard nine-point calibration and validation procedure was performed. Participants were instructed to fixate a small black circle with a white centre (0.5 degrees) as it appeared sequentially at nine points forming a 3 x 3 grid across the screen. Calibration was accepted only if the mean validation error was below 0.5 degrees and the maximum error at any single point was below 1.0 degree. If these criteria were not met, or if the experimenter noticed significant gaze drift between blocks, the calibration procedure was repeated. This calibration ensured high spatial accuracy across the entire display area, facilitating the precise monitoring of fixations on item frames and saccadic movements to the response colour wheel.”

      Moreover, as detailed in our response to Point 4, while the PD group exhibited lower compliance, there was no interaction between group and saccade condition for compliance (p = 0.151). This confirms that any noise or trial attrition was distributed evenly across experimental conditions. Consequently, the observed "saccade cost" (the difference in error between conditions) is not an artefact of unequal noise but represents a genuine mechanistic impairment in spatial updating. We have updated the Methods to clarify this distinction.

      Furthermore, our Bayesian framework explicitly estimates precision (random noise) as a distinct parameter from updating cost (saccade cost). This allows the model to partition the variance: even if a clinical group is "noisier" overall, this is captured by the precision parameter, ensuring it does not inflate the specific estimate of saccade-driven memory impairment.

      (7) Figure 5. I suggest reporting these results using boxplots instead of barplots, as the former gives a better overview of the distributions.

      We appreciate the suggestion to use boxplots to better illustrate data distributions. However, we have chosen to retain the current bar plot format due to the visual and statistical complexity of our 4 x 4 x 2 experimental design. Figure 5 represents 16 distinct distributions across four groups and four conditions for both location and colour measures; employing boxplots/violins for this density of data would significantly increase visual clutter and make the figure difficult to parse.

      Furthermore, the primary objective of this figure is to reflect the statistical analysis and illustrate group differences in overall performance and highlight the specific finding that patients with AD were significantly more impaired across all conditions compared to YC, EC, and PD groups. Our statistical focus remains on the mean effects—specifically the significant main effect of group (F(3, 318) = 59.71, p < 0.001) and the critical null-interaction between group and condition (p = 0.90). The error measure most relevant to these comparisons is the standard error of the mean (SEM), rather than the interquartile range (IQR). We think that bar plots provide the most straightforward and scannable representation of these mean differences and the consistent pattern of decay across cohorts for the final manuscript layout.

      To address the reviewer’s request for distributional transparency, we have provided a version of Figure 5 using grouped boxplots in the supplementary material (Supplementary figure 2). We note, however, that the spread of raw data points in these plots does not directly reflect the variance associated with our within-subject statistical comparisons.

      (8) Results specificity, trans-saccadic integration and ROCF. The authors demonstrate that the derived model parameters account for a significant amount of variability in ROCF performance across the experimental groups tested (Figure 8A). However, it remains unclear how specific the modelling results are with respect to the ROCF.

      The ROCF is generally interpreted as a measure of constructional ability. Nevertheless, as with any cognitive test, performance can also be influenced by more general, non-specific abilities that contribute broadly to test success. To more clearly link the specificity between modelling results and constructional ability, it would be helpful to include a test measure for which the model parameters would not be expected to explain performance, for example, a verbal working memory task.

      I am not necessarily suggesting that new data should be collected. However, I believe that the issue of specificity should be acknowledged and discussed as a potential limitation in the current context.

      We appreciate this important point regarding the discriminant validity of our findings. We agree that cognitive performance in clinical populations is often influenced by a general "g-factor" or non-specific executive decline. However, we chose the ROCF Copy task specifically because it is a hallmark clinical measure of constructional ability that effectively serves as a real-world transsaccadic task, requiring participants to integrate spatial information across hundreds of saccades between the model figure and the drawing surface.

      To address the reviewer’s concern regarding specificity, we leveraged the fact that all participants completed the ACE-III, which includes a dedicated verbal memory component (the ACE Memory subscale). We conducted a partial correlation analysis and found that the relationship between transsaccadic working memory and ROCF copy performance remains highly significant (rho = -0.46, p < 0.001), even after controlling for age, education, and the ACE-III Memory subscale score. This suggests that the link between transsaccadic updating and constructional ability is mechanistically specific rather than a byproduct of global cognitive impairment. We have substantially revised the Discussion to highlight this link and the supporting statistical evidence.

      We first updated the last paragraph of Introduction:

      “Finally, by linking these mechanistic parameters to a standard clinical measure of constructional ability (the Rey-Osterrieth Complex Figure task), we demonstrate that transsaccadic updating represents a core computational phenotype underpinning real-world visuospatial construction in both health and neurodegeneration.”

      The new section in Discussion highlighting the ROCF copy link:

      “Importantly, our computational framework establishes a direct mechanistic link between trassaccadic updating and real-world constructional ability. Specifically, higher saccade and angular encoding errors contribute to poorer ROCF copy scores. By mapping these mechanistic estimates onto clinical scores, we found that the parameters derived from our winning model explain approximately 62% of the variance in constructional performance across groups. These findings suggest that the computational parameters identified in the LOCUS task represent core phenotypes of visuospatial ability, providing a mechanistic bridge between basic cognitive theory and clinical presentation.

      This relationship provides novel insights into the cognitive processes underlying drawing, specifically highlighting the role of transsaccadic working memory. Previous research has primarily focused on the roles of fine motor control and eye-hand coordination in this skill.[4,50–55] This is partly because of consistent failure to find a strong relation between traditional memory measures and copying ability.[4,31] For instance, common measures of working memory, such as digit span and Corsi block tasks, do not directly predict ROCF copying performance.[31,56] Furthermore, in patients with constructional apraxia, these memory performance often remain relatively preserved despite significant drawing impairments.[56–58] In literature, this lack of association has often been attributed to “deictic” visual-sampling strategies, characterised by frequent eye movements that treat the environment as an external memory buffer, thereby minimising the need to maintain a detailed internal representation.[4,59] In a real-world copying task, the ROCF requires a high volume of saccades, making it uniquely sensitive to the precision of the dynamic remapping signals identified here. Recent eye-tracking evidence confirms that patients with AD exhibit significantly more saccades and longer fixations during figure copying compared to controls, potentially as a compensatory response to trassaccadic working memory constraints.[56] This high-frequency sampling—averaging between 150 and 260 saccades for AD patients compared to approximately 100 for healthy controls—renders the task highly dependent on the precision of dynamic remapping signals.[56] We also found that the relationship between transsaccadic working memory and ROCF performance remains highly significant (rho = -0.46, p < 0.001), even after controlling for age, education, and ACE-III Memory subscore. Consequently, transsaccadic updating may represent a discrete computational phenotype required for visuomotor control, rather than a non-specific proxy for global cognitive decline.[58]

      In other words, even when visual information is readily available in the world, the act of drawing performance depends critically on working memory across saccades. This reveals a fundamental computational trade-off: while active sampling strategies (characterised with frequent eye-hand movements) effectively reduce the load on capacity-limited working memory, they simultaneously increase the demand for precise spatial updating across eye movements. By treating the external world as an "outside" memory buffer, the brain minimises the volume of information it must hold internally, but it becomes entirely dependent on the reliability with which that information is remapped after each eye movement. This perspective aligns with, rather contradicts, the traditional view of active sampling, which posits that individuals adapt their gaze and memory strategies based on specific task demands.[3,60] Furthermore, this perspective provides a mechanistic framework for understanding constructional apraxia; in these clinical populations, the impairment may not lie in a reduced memory "span," but rather in the cumulative noise introduced by the constant spatial remapping required during the copying process.[58,61]

      Beyond constructional ability, these findings suggest that the primary evolutionary utility of high-resolution spatial remapping lies in the service of action rather than perception. While spatial remapping is often invoked to explain perceptual stability,[11–13,15] the necessity of high-resolution transsaccadic memory for basic visual perception is debated.[13,62–64] A prevailing view suggests that detailed internal models are unnecessary for perception, given the continuous availability of visual information in the external world.[13,44] Our findings support an alternative perspective, aligning with the proposal that high-resolution transsaccadic memory primarily serves action rather than perception.[13] This is consistent with the need for precise localisation in eye-hand coordination tasks such as pointing or grasping.[65] Even when unaware of intrasaccadic target displacements, individuals rapidly adjust their reaching movements, suggesting direct access of the motor system to remapping signals.[66] Further support comes from evidence that pointing to remembered locations is biased by changes in eye position,[67] and that remapping neurons reside within the dorsal “action” visual pathway, rather than the ventral “perception” visual pathway.[13,68,69] By demonstrating a strong link between transsaccadic working memory and drawing (a complex fine motor skill), our findings suggest that precise visual working memory across eye movements plays an important role in complex fine motor control.”

      We are deeply grateful to the reviewers for their meticulous reading of our manuscript and for the constructive feedback provided throughout this process. Your insights have significantly enhanced the clarity and rigour of our work.

      In addition to the changes requested by the reviewers, we wish to acknowledge a reporting error identified during the revision process. In the original Results section, the repeated measures ANOVA statistics for YC included Greenhouse-Geisser corrections, and the between-subjects degrees of freedom were incorrectly reported as within-subjects residuals. Upon re-evaluation of the data, we confirmed that the assumption of sphericity was not violated; therefore, we have removed the unnecessary Greenhouse-Geisser corrections and corrected the degrees of freedom throughout the Results and Methods sections. We have ensured that these statistical updates are reflected accurately in the revised manuscript and that they do not alter the significance or interpretation of any of our primary findings.

      We hope that these revisions address all the concerns raised and provide a more robust account of our findings. We look forward to your further assessment of our work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Polymers of orthophosphate of varying lengths are abundant in prokaryotes and some eukaryotes, where they regulate many cellular functions. Though they exist in metazoans, few tools exist to study their function. This study documents the development of tools to extract, measure, and deplete inorganic polyphosphates in *Drosophila*. Using these tools, the authors show:

      (1) That polyP levels are negligible in embryos and larvae of all stages while they are feeding. They remain high in pupae but their levels drop in adults.

      (2) That many cells in tissues such as the salivary glands, oocytes, haemocytes, imaginal discs, optic lobe, muscle, and crop, have polyP that is either cytoplasmic or nuclear (within the nucleolus).

      (3) That polyP is necessary in plasmatocytes for blood clotting in Drosophila.

      (4) That ployP controls the timing of eclosion.

      The tools developed in the study are innovative, well-designed, tested, and well-documented. I enjoyed reading about them and I appreciate that the authors have gone looking for the functional role of polyP in flies, which hasn't been demonstrated before. The documentation of polyP in cells is convincing as its role in plasmatocytes in clotting.

      We sincerely thank the reviewer for their encouraging assessment and for recognizing both the innovation of the FLYX toolkit and the functional insights it enables. Their remarks underscore the importance of establishing Drosophila as a tractable model for polyP biology, and we are grateful for their constructive feedback, which further strengthened the manuscript.

      Its control of eclosion timing, however, could result from non-specific effects of expressing an exogenous protein in all cells of an animal.

      We now explicitly state this limitation in the revised manuscript (p.16, l.347–349). The issue is that no catalytic-dead ScPpX1 is available as a control in the field. We plan to generate such mutants through systematic structural and functional studies and will update the FLYX toolkit once they are developed and validated. Importantly, the accelerated eclosion phenotype is reproducible and correlates with endogenous polyP dynamics.

      The RNAseq experiments and their associated analyses on polyP-depleted animals and controls have not been discussed in sufficient detail.  In its current form, the data look to be extremely variable between replicates and I'm therefore unsure of how the differentially regulated genes were identified.

      We thank the reviewer for pointing out the lack of clarity. We have expanded our RNAseq analysis in the revised manuscript (p.20, l.430–434). Because of inter-sample variation (PC2 = 19.10%, Fig. S7B), we employed Gene Set Enrichment Analysis (GSEA) rather than strict DEG cutoffs. This method is widely used when the goal is to capture pathway-level changes under variability (1). We now also highlight this limitation explicitly (p.20, l.430–432) and provide an additional table with gene-specific fold change (See Supplementary Table for RNA Sequencing Sheet 1). Please note that we have moved RNAseq data to Supplementary Fig. 7 and 8 as suggested in the review.

      It is interesting that no kinases and phosphatases have been identified in flies. Is it possible that flies are utilising the polyP from their gut microbiota? It would be interesting to see if these signatures go away in axenic animals.

      This is an interesting possibility. Several observations argue that polyP is synthesized by fly tissues: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (ii) In C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. Nevertheless, we agree that microbiota-derived polyP may contribute, and we plan systematic testing in axenic flies in future work.

      Reviewer #2 (Public review):

      Summary:

      The authors of this paper note that although polyphosphate (polyP) is found throughout biology, the biological roles of polyP have been under-explored, especially in multicellular organisms. The authors created transgenic Drosophila that expressed a yeast enzyme that degrades polyP, targeting the enzyme to different subcellular compartments (cytosol, mitochondria, ER, and nucleus, terming these altered flies Cyto-FLYX, Mito-FLYX, etc.). The authors show the localization of polyP in various wild-type fruit fly cell types and demonstrate that the targeting vectors did indeed result in the expression of the polyP degrading enzyme in the cells of the flies. They then go on to examine the effects of polyP depletion using just one of these targeting systems (the Cyto-FLYX). The primary findings from the depletion of cytosolic polyP levels in these flies are that it accelerates eclosion and also appears to participate in hemolymph clotting. Perhaps surprisingly, the flies seemed otherwise healthy and appeared to have little other noticeable defects. The authors use transcriptomics to try to identify pathways altered by the cyto-FLYX construct degrading cytosolic polyP, and it seems likely that their findings in this regard will provide avenues for future investigation. And finally, although the authors found that eclosion is accelerated in the pupae of Drosophila expressing the Cyto-FLYX construct, the reason why this happens remains unexplained.

      Strengths:

      The authors capitalize on the work of other investigators who had previously shown that expression of recombinant yeast exopolyphosphatase could be targeted to specific subcellular compartments to locally deplete polyP, and they also use a recombinant polyP-binding protein (PPBD) developed by others to localize polyP. They combine this with the considerable power of Drosophila genetics to explore the roles of polyP by depleting it in specific compartments and cell types to tease out novel biological roles for polyP in a whole organism. This is a substantial advance.

      We are grateful to the reviewer for their thorough and thoughtful evaluation. Their balanced summary of our work, recognition of the strengths of our genetic tools, and constructive suggestions have been invaluable in clarifying our experiments and strengthening the conclusions.

      Weaknesses:

      Page 4 of the Results (paragraph 1): I'm a bit concerned about the specificity of PPBD as a probe for polyP. The authors show that the fusion partner (GST) isn't responsible for the signal, but I don't think they directly demonstrate that PPBD is binding only to polyP. Could it also bind to other anionic substances? A useful control might be to digest the permeabilized cells and tissues with polyphosphatase prior to PPBD staining and show that the staining is lost.

      To address this concern, we have done two sets of experiments:

      (1) We generated a PPBD mutant (GST-PPBD<sup>Mut</sup>). We establish that GST-PPBD binds to polyP-2X FITC, whereas GST-PPBD<sup>Mut</sup> and GST do not bind polyP<sub>100</sub>-2X FITC using Microscale Thermophoresis. We found that, unlike the punctate staining pattern of GST-PPBD (wild-type), GST-PPBD<sup>Mut</sup> does not stain hemocytes. This data has been added to the revised manuscript (Fig. 2B-D, p.8, l.151–165).

      (2) A study in C.elegans by Quarles et.al has performed a similar experiment, suggested by the reviewer. In that study, treating permeabilized tissues with polyphosphatase prior to PPBD staining resulted in a decrease of PPBD-GFP signal from the tissues (2). We also performed the same experiment where we subjected hemocytes to GST-PPBD staining with prior incubation of fixed and permeabilised hemocytes with ScPpX1 and heat-inactivated ScPpX1 protein. We find that both staining intensity and the number of punctae are higher in hemocytes left untreated and in those treated with heat-inactivated ScPpX1. The hemocytes pre-treated with ScPpX1 showed reduced staining intensity and number of punctae. This data has been added to the revised manuscript (Fig. 2E-G, p.8, l.166-172).

      Further, Saito et al. reported that PPBD binds to polyP in vitro, as well as in yeast and mammalian cells, with a high affinity of ~45µM for longer polyP chains (35 mer and above) (3). They also show that the affinity of PPBD with RNA and DNA is very low. Furthermore, PPBD could detect differences in polyP labeling in yeasts grown under different physiological conditions that alter polyP levels (3). Taken together, published work and our results suggest that PPBD specifically labels polyP.

      In the hemolymph clotting experiments, the authors collected 2 ul of hemolymph and then added 1 ul of their test substance (water or a polyP solution). They state that they added either 0.8 or 1.6 nmol polyP in these experiments (the description in the Results differs from that of the Methods). I calculate this will give a polyP concentration of 0.3 or 0.6 mM. This is an extraordinarily high polyP concentration and is much in excess of the polyP concentrations used in most of the experiments testing the effects of polyP on clotting of mammalian plasma. Why did the authors choose this high polyP concentration? Did they try lower concentrations? It seems possible that too high a polyP concentration would actually have less clotting activity than the optimal polyP concentration.

      We repeated the assays using 125 µM polyP, consistent with concentrations employed in mammalian plasma studies (4,5). Even at this lower, physiologically relevant concentration, polyP significantly enhanced clot fibre formation (Included as Fig. S5F–I, p.12, l.241–243). This reconfirms the conclusion that polyP promotes hemolymph clotting.

      Author response image 1.

      Reviewer #3 (Public review):

      Summary:

      Sarkar, Bhandari, Jaiswal, and colleagues establish a suite of quantitative and genetic tools to use Drosophila melanogaster as a model metazoan organism to study polyphosphate (polyP) biology. By adapting biochemical approaches for use in D. melanogaster, they identify a window of increased polyP levels during development. Using genetic tools, they find that depleting polyP from the cytoplasm alters the timing of metamorphosis, accelerating eclosion. By adapting subcellular imaging approaches for D. melanogaster, they observe polyP in the nucleolus of several cell types. They further demonstrate that polyP localizes to cytoplasmic puncta in hemocytes, and further that depleting polyP from the cytoplasm of hemocytes impairs hemolymph clotting. Together, these findings establish D. melanogaster as a tractable system for advancing our understanding of polyP in metazoans.

      Strengths:

      (1) The FLYX system, combining cell type and compartment-specific expression of ScPpx1, provides a powerful tool for the polyP community.

      (2) The finding that cytoplasmic polyP levels change during development and affect the timing of metamorphosis is an exciting first step in understanding the role of polyP in metazoan development, and possible polyP-related diseases.

      (3) Given the significant existing body of work implicating polyP in the human blood clotting cascade, this study provides compelling evidence that polyP has an ancient role in clotting in metazoans.

      We sincerely thank the reviewer for their generous and insightful comments. Their recognition of both the technical strengths of the FLYX system and the broader biological implications reinforces our confidence that this work will serve as a useful foundation for the community.

      Limitations:

      (1) While the authors demonstrate that HA-ScPpx1 protein localizes to the target organelles in the various FLYX constructs, the capacity of these constructs to deplete polyP from the different cellular compartments is not shown. This is an important control to both demonstrate that the GTS-PPBD labeling protocol works, and also to establish the efficacy of compartment-specific depletion. While not necessary to do this for all the constructs, it would be helpful to do this for the cyto-FLYX and nuc-FLYX.

      We confirmed polyP depletion in Cyto-FLYX using the malachite green assay (Fig. 3D, p.10, l.212–214). The efficacy of ScPpX1 has also been earlier demonstrated in mammalian mitochondria (6). Our preliminary data from Mito-ScPpX1 expressed ubiquitously with Tubulin-Gal4 showed a reduction in polyP levels when estimated from whole flies (See Author response image 2 below, ongoing investigation). In an independent study focusing on mitochondrial polyP depletion, we are characterizing these lines in detail  and plan to check the amount of polyP contributed to the cellular pool by mitochondria using subcellular fractionation. Direct phenotypic and polyP depletion analyses of Nuc-FLYX and ER-FLYX are also being carried out, but are in preliminary stages. That there is a difference in levels of polyP in various tissues and that we get a very little subscellular fraction for polyP analysis have been a few challenging issues. This analysis requires detailed, independent, and careful analysis, and thus, we refrain from adding this data to the current manuscript.

      Author response image 2.

      Regarding the specificity, Saito et.al. reported that PPBD binds to polyP in vitro, as well as in yeast and mammalian cells with a high affinity of ~45µM for longer polyP chains (35 mer and above) (3). They also show that the affinity of PPBD with RNA and DNA is very low. Further, PPBD could reveal differences in polyP labeling with yeasts grown in different physiological conditions that can alter polyP levels. Now in the manuscript, we included following data to show specificity of PPBD:

      To address this concern we have done two sets of experiments:

      We generated a PPBD mutant (GST-PPBD<sup>Mut</sup>). Using Microscale Thermophoresis, we establish that GST-PPBD binds to polyP<sub>100</sub>-2X-FITC, whereas, GST-PPBD<sup>Mut</sup> and GST do not bind polyP<sub>100</sub>-2X-FITC at all. We found that unlike the punctate staining pattern of GST-PPBD (wild-type), GST-PPBD<sup>Mut</sup> does not stain hemocytes. This data has been added to the revised manuscript (Fig. 2B-D, p.8, l.151–165).

      A study in C.elegans by Quarles et.al has performed a similar experiment suggested by the reviewer. In that study, treating permeabilized tissues with polyphosphatase prior to PPBD staining resulted in decrease of PPBD-GFP signal from the tissues (2). We also performed the same experiment where we subjected hemocytes to GST-PPBD staining with prior incubation of fixed and permeabilised hemocytes with ScPpX1 and heat inactivated ScPpX1 protein. We find that both intensity of staining and number of punctae are higher in hemocytes that were left untreated and the one where heat inactivated ScPpX1 was added. The hemocytes pre-treated with ScPpX1 showed reduced staining intensity and number of punctae. This data has been added to the revised manuscript (Fig. 2E-G, p.8, l.166-172).

      (2) The cell biological data in this study clearly indicates that polyP is enriched in the nucleolus in multiple cell types, consistent with recent findings from other labs, and also that polyP affects gene expression during development. Given that the authors also generate the Nuc-FLYX construct to deplete polyP from the nucleus, it is surprising that they test how depleting cytoplasmic but not nuclear polyP affects development. However, providing these tools is a service to the community, and testing the phenotypic consequences of all the FLYX constructs may arguably be beyond the scope of this first study.

      We agree this is an important avenue. In this first study, we focused on establishing the toolkit and reporting phenotypes with Cyto-FLYX. We are systematically assaying phenotypes from all FLYX constructs, including Nuc-FLYX, in ongoing studies

      Recommendations for the authors:

      Reviewing Editor Comment:

      The reviewers appreciated the general quality of the rigour and work presented in this manuscript. We also had a few recommendations for the authors. These are listed here and the details related to them can be found in the individual reviews below.

      (1) We suggest including an appropriate control to show that PPBD binds polyP specifically.

      We have updated the response section as follows:

      (a) Highlighted previous literature that showed the specificity of PPBD.

      (b) We show that the punctate staining observed by PPBD is not demonstrated by the mutant PPBD (PPBD<sup>Mut</sup>) in which amino acids that are responsible for polyP binding are mutated.

      (c) We show that PPBD<sup>Mut</sup> does not bind to polyP using Microscale Thermophoresis.

      (d) We show that treatment of fixed and permeabilised hemocytes with ScPpX1 reduces the PPBD staining intensity and number of punctae, as compared to tissues left untreated or treated with heat-inactivated ScPpX1.

      We have included these in our updated revised manuscript (Fig. 2B-G, p.8, l.151–157)

      (2) The high concentration of PolyP in the clotting assay might be impeding clotting. The authors may want to consider lowering this in their assays.

      We have addressed this concern in our revised manuscript. We have performed the clotting assays with lower polyP concentrations (concentrations previously used in clotting experiments with human blood and polyP). Data is included in Fig. S5F–I, p.12, l.241–243.

      (3) The RNAseq study: can the authors please describe this better and possibly mine it for the regulation of genes that affect eclosion?

      In our revised manuscript, we have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446).

      (4) Have the authors considered the possibility that the gut microbiota might be contributing to some of their measurements and assays? It would be good to address this upfront - either experimentally, in the discussion, or (ideally) both.

      This is an exciting possibility. Several observations argue that fly tissues synthesize polyP: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (iii) in C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. Nevertheless, microbiota-derived polyP may contribute, and we plan systematic testing in axenic flies in future work.

      Reviewer #1 (Recommendations for the authors):

      (1) While the authors have shown that the depletion tool results in a general reduction of polyP levels in Figure 3D, it would have been nice to show this via IHC. Particularly since the depletion depends on the strength of the Gal4, it is possible that the phenotypes are being under-estimated because the depletions are weak.

      We agree that different Gal4 lines have different strengths and will therefore affect polyP levels and the strength of the phenotype differently.

      We performed PPBD staining on hemocytes expressing ScPPX; however, we observed very intense, uniform staining throughout the cells, which was unexpected. It seems like PPBD is recognizing overexpressed ScPpX1. Indeed, in an unpublished study by Manisha Mallick (Bhandari lab), it was found that His-ScPpX1 specifically interacts with GST-PPBD in a protein interaction assay (See Author response image 3). Due to these issues, we refrained from IHC/PPBD-based validation.

      Author response image 3.

      (2) The subcellular tools for depletion are neat! I wonder why the authors didn't test them. For example in the salivary gland for nuclear depletion?

      We have addressed this question in the reviewer responses. We are systematically assaying phenotypes from all FLYX constructs, including Mito-FLYX, and Nuc-FLYX, in ongoing independent investigations. As discussed in #1, a possible interaction of ScPpX and PPBD is making this test a bit more challenging, and hence, they each require a detailed investigation.

      (a) Does the absence of clotting defects using Lz-gal4 suggest that PolyP is more crucial in the plasmatocytoes and for the initial clotting process? And that it is dispensible/less important in the crystal cells and for the later clotting process. Or is it that the crystal cells just don't have as much polyP? The image (2E-H) certainly looks like it.

      In hemolymph, the primary clot formation is a result of the clotting factors secreted from the fat bodies and the plasmatocytes. The crystal cells are responsible for the release of factors aiding in successfully hardening the soft clot initially formed. Reports suggest that clotting and melanization of the clot are independent of each other (7). Since Crystal cells do not contribute to clot fibre formation, the absence of clotting defects using LzGAL4-CytoFLYX is not surprising. Alternatively, PolyP may be secreted from all hemocytes and contribute to clotting; however, the crystal cells make up only 5% hemocytes, and hence polyP depletion in those cells may have a negligible effect on blood clotting.

      Crystal cells do show PPBD staining. Whether polyP is significantly lower in levels in the crystal cells as compared to the plasmatocytes needs more systematic investigation. Image (2E-H) is a representative image of the presence of polyP in crystal cells and can not be considered to compare polyP levels in the crystal cells vs Plasmatocytes.

      (b) The RNAseq analyses and data could be better presented. If the data are indeed variable and the differentially expressed genes of low confidence, I might remove that data entirely. I don't think it'll take away from the rest of the work.

      We understand this concern and, therefore, in the revised manuscript, we have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446). We have also stated the limitations of such studies.

      (c) I would re-phrase the first sentence of the results section.

      We have re-phrased it in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors created several different versions of the FLYX system that would be targeted to different subcellular compartments. They mostly report on the effects of cytosolic targeting, but some of the constructs targeted the polyphosphatase to mitochondria or the nucleus.

      They report that the targeting worked, but I didn't see any results on the effects of those constructs on fly viability, development, etc.

      There is a growing literature of investigators targeting polyphosphatase to mitochondria and showing how depleting mitochondrial polyP alters mitochondrial function. What was the effect of the Nuc-FLYX and Mito-FLYX constructs on the flies?

      Also, the authors should probably cite the papers of others on the effects of depleting mitochondrial polyP in other eukaryotic cells in the context of discussing their findings in flies.

      We have addressed this question in the reviewer responses. We did not see any obvious developmental or viability defects with any of the FLYX lines, and only after careful investigation did we come across the clotting defects in the CytoFLYX. We are currently systematically assaying phenotypes from all FLYX constructs, including Mito-FLYX and Nuc-FLYX, in independent ongoing investigations.

      We have discussed the heterologous expression of mitochondrial polyphosphatase in mammalian cells to justify the need for developing Mito-FLYX (p. 10, l. 197-200). In the discussion section, we also discuss the presence and roles of polyP in the nucleus and how Nuc-FLYX can help study such phenomena (p. 19, l. 399-407).

      (2) The authors should number the pages of their manuscript to make it easier for reviewers to refer to specific pages.

      We have numbered our lines and pages in the revised manuscript.

      (3) Abstract: the abbreviation, "polyP", is not defined in the abstract. The first word in the abstract is "polyphosphate", so it should be defined there.

      We have corrected it in the revised version.

      (4) The authors repeatedly use the phrase, "orange hot", to describe one of the colors in their micrographs, but I don't know how this differs from "orange".

      ‘OrangeHot’ is the name of the LUT used in the ImageJ analysis and hence referred to as the colour

      (5) First page of the Introduction: the phrase, "feeding polyP to αβ expression Alzheimer's model of Caenorhabditis elegans" is awkward (it literally means feeding polyP to the model instead of the worms).

      We have revised it. (p.3, l.55-57).

      (6) Page 2 of the Introduction: The authors should cite this paper when they state that NUDT3 is a polyphosphatase: https://pubmed.ncbi.nlm.nih.gov/34788624/

      We have cited the paper in the revised version of the manuscript. (p.4, l. 68-70)

      (7) Page 2 of Results: The authors report the polyP content in the third instar larva (misspelled as "larval") to five significant digits ("419.30"). Their data do not support more than three significant digits, though.

      We have corrected it in the revised manuscript.

      (8) Page 3 of Results (paragraph 1): When discussing the polyP levels in various larval stages, the authors are extracting total polyP from the larvae. It seems that at least some of the polyP may come from gut microbes. This should probably be mentioned.

      This is an interesting possibility. Several observations argue that polyP is synthesized by fly tissues: (i) polyP levels remain very low during feeding stages but build up in wandering third instar larvae after feeding ceases; (ii) PPBD staining is absent from the gut except the crop (Fig. S3O–P); (ii) In C. elegans, intestinal polyP was unaffected when worms were fed polyP-deficient bacteria (2); (iv) depletion of polyP from plasmatocytes alone impairs hemolymph clotting, which would not be expected if gut-derived polyP were the major source and may have contributed to polyP in hemolymph. We mention this limitation in the revised manuscript (p.19-20, l. 425-433).

      (9) Page 3 of Results (paragraph 2): stating that the 4% paraformaldehyde works "best" is imprecise. What do the authors mean by "best"?

      We have addressed this comment in the revised manuscript and corrected it as 4% paraformaldehyde being better among the three methods we used to fix tissues, which also included methanol and Bouin’s fixative  (p.8, l. 152-154).

      (10) Page 4 of Results (paragraph 2, last line of the page): The scientific literature is vast, so one can never be sure that one knows of all the papers out there, even on a topic as relatively limited as polyP. Therefore, I would recommend qualifying the statement "...this is the first comprehensive tissue staining report...". It would be more accurate (and safer) to say something like, "to our knowledge, this is the first..." There is a similar statement with the word "first" on the next page regarding the FLYX library.

      We have addressed this concern and corrected it accordingly in the revised version of the manuscript (p.9, l. 192-193)

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should include in their discussion a comparison of cell biological observations using the polyP binding domain of E. coli Ppx (GST-PPBD) to fluorescently label polyP in cells and tissues with recent work using a similar approach in C. elegans (Quarles et al., PMID:39413779).

      In the revised manuscript, we have cited the work of Quarles et al. and have added a comparison of observations (p.19,l.408-410). In the discussion, we have also focused on multiple other studies about how polyP presence in different subcellular compartments, like the nucleus, can be assayed and studied with the tools developed in this study.

      (2) The gene expression studies of time-matched Cyto-FLYX vs WT larvae is very intriguing. Given the authors' findings that non-feeding third instar Cyto-FLYX larvae are developmentally ahead of WT larvae, can the observed trends be explained by known changes in gene expression that occur during eclosion? This is mentioned in the results section in the context of genes linked to neurons, but a broader discussion of which pathway changes observed can be explained by the developmental stage difference between the WT and FLYX larvae would be helpful in the discussion.

      We have included a broader discussion about the RNAseq analysis done in the article in both the ‘results’ and the ‘discussion’ sections, where we have rewritten the narrative from the perspective of accelerated eclosion. (p.15 l.310-335, p. 20, l.431-446). We have also stated the limitations of such studies.

      (3) The sentence describing NUDT3 is not referenced.

      We have addressed this comment and have cited the paper of NUDT3 in the revised version of the manuscript.(p.4, l. 68-70)

      (4) In the first sentence of the results section, the meaning/validity of the statement "The polyP levels have decreased as evolution progressed" is not clear. It might be more straightforward to give an estimate of the total pmoles polyP/mg protein difference between bacteria/yeast and metazoans.

      In the revised manuscript, we have given an estimate of the polyP content across various species across evolution to uphold the statement that polyP levels have decreased as evolution progressed (p. 5, l. 87-91).

      (5) The description of the malachite green assay in the results section describes it as "calorimetric" but this should read "colorimetric?"

      We have corrected it in the revised manuscript.

      References

      (1) Chicco D, Agapito G. Nine quick tips for pathway enrichment analysis. PLoS Comput Biol. 2022 Aug 11;18(8):e1010348.

      (2) Quarles E, Petreanu L, Narain A, Jain A, Rai A, Wang J, et al. Cryosectioning and immunofluorescence of C. elegans reveals endogenous polyphosphate in intestinal endo-lysosomal organelles. Cell Rep Methods. 2024 Oct 8;100879.

      (3) Saito K, Ohtomo R, Kuga-Uetake Y, Aono T, Saito M. Direct labeling of polyphosphate at the ultrastructural level in Saccharomyces cerevisiae by using the affinity of the polyphosphate binding domain of Escherichia coli exopolyphosphatase. Appl Environ Microbiol. 2005 Oct;71(10):5692–701.

      (4) Smith SA, Mutch NJ, Baskar D, Rohloff P, Docampo R, Morrissey JH. Polyphosphate modulates blood coagulation and fibrinolysis. Proc Natl Acad Sci USA. 2006 Jan 24;103(4):903–8.

      (5) Smith SA, Choi SH, Davis-Harrison R, Huyck J, Boettcher J, Rienstra CM, et al. Polyphosphate exerts differential effects on blood clotting, depending on polymer size. Blood. 2010 Nov 18;116(20):4353–9.

      (6) Abramov AY, Fraley C, Diao CT, Winkfein R, Colicos MA, Duchen MR, et al. Targeted polyphosphatase expression alters mitochondrial metabolism and inhibits calcium-dependent cell death. Proc Natl Acad Sci USA. 2007 Nov 13;104(46):18091–6.

      (7) Schmid MR, Dziedziech A, Arefin B, Kienzle T, Wang Z, Akhter M, et al. Insect hemolymph coagulation: Kinetics of classically and non-classically secreted clotting factors. Insect Biochem Mol Biol. 2019 Jun;109:63–71.

      (8) Jian Guan, Rebecca Lee Hurto, Akash Rai, Christopher A. Azaldegui, Luis A. Ortiz-Rodríguez, Julie S. Biteen, Lydia Freddolino, Ursula Jakob. HP-Bodies – Ancestral Condensates that Regulate RNA Turnover and Protein Translation in Bacteria. bioRxiv 2025.02.06.636932; doi: https://doi.org/10.1101/2025.02.06.636932.

      (9) Lonetti A, Szijgyarto Z, Bosch D, Loss O, Azevedo C, Saiardi A. Identification of an evolutionarily conserved family of inorganic polyphosphate endopolyphosphatases. J Biol Chem. 2011 Sep 16;286(37):31966–74.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      This paper introduces a dual-pathway model for reconstructing naturalistic speech from intracranial ECoG data. It integrates an acoustic pathway (LSTM + HiFi-GAN for spectral detail) and a linguistic pathway (Transformer + Parler-TTS for linguistic content). Output from the two components is later merged via CosyVoice2.0 voice cloning. Using only 20 minutes of ECoG data per participant, the model achieves high acoustic fidelity and linguistic intelligibility.

      Strengths

      (1) The proposed dual-pathway framework effectively integrates the strengths of neural-to-acoustic and neural-to-text decoding and aligns well with established neurobiological models of dual-stream processing in speech and language.

      (2) The integrated approach achieves robust speech reconstruction using only 20 minutes of ECoG data per subject, demonstrating the efficiency of the proposed method.

      (3) The use of multiple evaluation metrics (MOS, mel-spectrogram R², WER, PER) spanning acoustic, linguistic (phoneme and word), and perceptual dimensions, together with comparisons against noisedegraded baselines, adds strong quantitative rigor to the study.

      We thank Reviewer #1 for the supportive comments. In addition, we appreciate Reviewer #1’s thoughtful comments and feedback. By addressing these comments, we believe we have greatly improved the clarity of our claims and methodology. Below we list our point-to-point responses addressing concerns raised by Reviewer #1.

      Weaknesses:

      (1) It is unclear how much the acoustic pathway contributes to the final reconstruction results, based on Figures 3B-E and 4E. Including results from Baseline 2 + CosyVoice and Baseline 3 + CosyVoice could help clarify this contribution.

      We sincerely appreciate the inquiry from Reviewer 1. We thank the reviewer for this suggestion. However, we believe that directly applying CosyVoice to the outputs of Baseline 2 or Baseline 3 in isolation is not methodologically feasible and would not correctly elucidate the contribution of the auditory pathway and might lead to misinterpretation.

      The role of CosyVoice 2.0 in our framework is specifically voice cloning and fusion, not standalone enhancement. It is designed to integrate information from two pathways. Its operation requires two key inputs:

      (1) A voice reference speech that provides the target speaker's timbre and prosodic characteristics. In our final pipeline, this is provided by the denoised output of the acoustic pathway (Baseline 2).

      (2) A target word sequence that specifies the linguistic content to be spoken. This is obtained by transcribing the output of the linguistic pathway (Baseline 3) using Whisper ASR. Therefore, the standalone outputs of Baseline 2 and Baseline 3 are the purest demonstrations of what each pathway contributes before fusion. The significant improvement in WER/PER and MOS in the final output (compared to Baseline 2) and the significant improvement in melspectrogram R² (compared to Baseline 3) together demonstrate the complementary contributions of the two pathways. The fusion via CosyVoice is the mechanism that allows these contributions to be combined. We have added a clearer explanation of CosyVoice's role and the rationale for not testing it on individual baselines in the revised manuscript (Results section: "The fine-tuned voice cloner further enhances...").

      Edits:

      Page 11, Lines 277-282:

      “ Voice cloning is used to bridge the gap between acoustic fidelity and linguistic intelligibility in speech reconstruction. This approach strategically combines the strengths of complementary pathways: the acoustic pathway preserves speaker-specific spectral characteristics while the linguistic pathway maintains lexical and phonetic precision. By integrating these components through neural voice cloning, we achieve balanced reconstruction that overcomes the limitations inherent in isolated systems. CosyVoice 2.0, the voice cloner module serves specifically as a voice cloning and fusion engine, requiring two inputs: (1) a voice reference speech (provided by the denoised output of the acoustic pathway) to specify the target speaker's identity, and (2) a target word sequence (transcribed from the output of the linguistic pathway) to specify the linguistic content. The standalone baseline outputs of the two pathways can be integrated in this way.”

      (2) As noted in the limitations, the reconstruction results heavily rely on pre-trained generative models. However, no comparison is provided with state-of-the-art multimodal LLMs such as Qwen3-Omni, which can process auditory and textual information simultaneously. The rationale for using separate models (Wav2Vec for speech and TTS for text) instead of a single unified generative framework should be clearly justified. In addition, the adaptor employs an LSTM architecture for speech but a Transformer for text, which may introduce confounds in the performance comparison. Is there any theoretical or empirical motivation for adopting recurrent networks for auditory processing and Transformer-based models for textual processing?

      We thank the reviewer for the insightful suggestion regarding multimodal large language models (LLMs) such as Qwen3-Omni. It is important to clarify the distinction between general-purpose interactive multimodal models and models specifically designed for high-fidelity voice cloning and speech synthesis.

      As for the comparison with the state-of-the-art multimodal LLMs:

      Qwen3-Omni and GLM-4-Voice are powerful conversational agents capable of processing multiple modalities including text, speech, image, and video, as described in its documentation (see: https://help.aliyun.com/zh/model-studio/qwen-tts-realtime and https://docs.bigmodel.cn/cn/guide/models/sound-and-video/glm-4-voice). However, it is primarily optimized for interactive dialogue and multimodal understanding rather than for precise, speaker-adaptive speech reconstruction from neural signals. In contrast, CosyVoice 2.0, developed by the same team at Alibaba, is specifically designed for voice cloning and text-to-speech synthesis (see: https://help.aliyun.com/zh/model-studio/text-to-speech). It incorporates advanced speaker adaptation and acoustic modeling capabilities that are essential for reconstructing naturalistic speech from limited neural data. Therefore, our choice of CosyVoice for the final synthesis stage aligns with the goal of integrating acoustic fidelity and linguistic intelligibility, which is central to our study.

      For the selection of LSTM and Transformer in the two pathways:

      The goal of the acoustic adaptor is to reconstruct fine-grained spectrotemporal details (formants, harmonic structures, prosodic contours) with millisecond-to-centisecond precision. These features rely heavily on local temporal dynamics and short-to-medium range dependencies (e.g., within and between phonemes/syllables). In our ablation studies (to be added in the supplementary), we found that Transformer-based adaptors, which inherently emphasize global sentence-level context through self-attention, tended to oversmooth the reconstructed acoustic features, losing critical fine-temporal details essential for naturalness. In contrast, the recurrent nature of LSTMs, with their inherent temporal state propagation, proved more effective at modeling these local sequential dependencies without excessive smoothing, leading to higher mel-spectrogram fidelity. This aligns with the neurobiological observation that early auditory cortex processes sound with precise temporal fidelity. Moreover, from an engineering perspective, LSTM-based decoders have been empirically shown to perform well in sequential prediction tasks with limited data, as evidenced in prior work on sequence modeling and neural decoding (1).

      The goal of the linguistic adaptor is to decode abstract, discrete word tokens. This task benefits from modeling long-range contextual dependencies across a sentence to resolve lexical ambiguity and syntactic structure (e.g., subject-verb agreement). The self-attention mechanism of Transformers is exceptionally well-suited for capturing these global relationships, as evidenced by their dominance in NLP. Our experiments confirmed that a Transformer adaptor outperformed an LSTM-based one in word token prediction accuracy.

      While a unified multimodal LLM could in principle handle both modalities, such models often face challenges in modality imbalance and task specialization. Audio and text modalities have distinct temporal scales, feature distributions, and learning dynamics. By decoupling them into separate pathways with specialized adaptors, we ensure that each modality is processed by an architecture optimized for its inherent structure. This divide-and-conquer strategy avoids the risk of one modality dominating or interfering with the learning of the other, leading to more stable training and better final performance, especially important when adapting to limited neural data.

      Edits:

      Page 9, Lines 214-223:

      “The acoustic pathway, implemented through a bi-directional LSTM neural adaptor architecture (Fig. 1B), specializes in reconstructing fundamental acoustic properties of speech. This module directly processes neural recordings to generate precise time-frequency representations, focusing on preserving speaker-specific spectral characteristics like formant structures, harmonic patterns, and spectral envelope details. Quantitative evaluation confirms its core competency: achieving a mel-spectrogram R² of 0.793 ± 0.016 (Fig. 3B) demonstrates remarkable fidelity in reconstructing acoustic microstructure. This performance level is statistically indistinguishable from original speech degraded by 0dB additive noise (0.771 ± 0.014, p = 0.242, one-sided t-test). We chose a bidirectional LSTM architecture for this adaptor because its recurrent nature is particularly suited to modeling the fine-grained, short- to medium-range temporal dependencies (e.g., within and between phonemes and syllables) that are critical for acoustic fidelity. An ablation study comparing LSTM against Transformerbased adaptors for this task confirmed that LSTMs yielded superior mel-spectrogram reconstruction fidelity (higher R²), as detailed in Table S1, likely by avoiding the oversmoothing of spectrotemporal details sometimes induced by the strong global context modeling of Transformers”.

      “To confirm that the acoustic pathway’s output is causally dependent on the neural signal rather than the generative prior of the HiFi-GAN, we performed a control analysis in which portions of the input ECoG recording were replaced with Gaussian noise. When either the first half, second half, or the entirety of the neural input was replaced by noise, the melspectrogram R² of the reconstructed speech dropped markedly, corresponding to the corrupted segment (Fig. S5). This demonstrates that the reconstruction is temporally locked to the specific neural input and that the model does not ‘hallucinate’ spectrotemporal structure from noise. These results validate that the acoustic pathway performs genuine, input-sensitive neural decoding”.

      Edits:

      Page 10, Lines 272-277:

      “We employed a Transformer-based Seq2Seq architecture for this adaptor to effectively capture the long-range contextual dependencies across a sentence, which are essential for resolving lexical ambiguity and syntactic structure during word token decoding. This choice was validated by an ablation study (Table S2), indicating that the Transformer adaptor outperformed an LSTM-based counterpart in word prediction accuracy”

      (3) The model is trained on approximately 20 minutes of data per participant, which raises concerns about potential overfitting. It would be helpful if the authors could analyze whether test sentences with higher or lower reconstruction performance include words that were also present in the training set.

      Thank you for raising the important concern regarding potential overfitting given the limited size of our training dataset (~20 minutes per participant). To address this point directly, we performed a detailed lexical overlap analysis between the training and test sets.

      The test set contains 219 unique words. Among these:

      127 words (58.0%) appeared in the training set (primarily high-frequency, common words).

      92 words (42.0%) were entirely novel and did not appear in the training set. We further examined whether trials with the best reconstruction (WER = 0) relied more on training vocabulary. Among these top-performing trials, 55.0% of words appeared in the training set. In contrast, the worst-performing trials showed 51.9% overlap in words in the training set. No significant difference was observed, suggesting that performance is not driven by simple lexical memorization.

      The presence of a substantial proportion of novel words (42%) in the test set, combined with the lack of performance advantage for overlapping content, provides strong evidence that our model is generalizing linguistic and acoustic patterns rather than merely memorizing the training vocabulary. High reconstruction performance on unseen words would be improbable under severe overfitting.

      Therefore, we conclude that while some lexical overlap exists (as expected in natural language), the model’s performance is driven by its ability to decode generalized neural representations, effectively mitigating the overfitting risk highlighted by the reviewer.

      (4) The phoneme confusion matrix in Figure 4A does not appear to align with human phoneme confusion patterns. For instance, /s/ and /z/ differ only in voicing, yet the model does not seem to confuse these phonemes. Does this imply that the model and the human brain operate differently at the mechanistic level?

      We thank the reviewer for this detailed observation regarding the difference between our model's phoneme confusion patterns and typical human perceptual confusions (e.g., the lack of /s/-/z/ confusion).

      The reviewer is correct in inferring a mechanistic difference. This divergence is primarily attributable to the Parler-TTS model acting as a powerful linguistic prior. Our linguistic pathway decodes word tokens, which Parler-TTS then converts to speech. Trained on massive corpora to produce canonical pronunciations, Parler-TTS effectively performs an implicit "error correction." For instance, if the neural decoding is ambiguous between the words "sip" and "zip," the TTS model's strong prior for lexical and syntactic context will likely resolve it to the correct word, thereby suppressing purely acoustic confusions like voicing.

      This has important implications for interpreting our model's errors and its relationship to brain function. The phoneme errors in our final output reflect a combination of neural decoding errors and the generative biases of the TTS model, which is optimized for intelligibility rather than mimicking raw human misperception. This does imply our model operates differently from the human auditory periphery. The human brain may first generate a percept with acoustic confusions, which higher-level language regions then disambiguate. Our model effectively bypasses the "confused percept" stage by directly leveraging a pre-trained, high-level language model for disambiguation. This is a design feature contributing to its high intelligibility, not necessarily a flaw. This observation raises a fascinating question: Could a model that more faithfully simulates the hierarchical processing of the human brain (including early acoustic confusions) provide a better fit to neural data at different processing stages? Future work could further address this question.

      Edits:

      add another paragraph in Discussion (Page 14, Lines 397-398):

      “The phoneme confusion pattern observed in our model output (Fig. 4A) differs from classic human auditory confusion matrices. We attribute this divergence primarily to the influence of the Parler-TTS model, which serves as a strong linguistic prior in our pipeline. This component is trained to generate canonical speech from text tokens. When the upstream neural decoding produces an ambiguous or erroneous token sequence, the TTS model’s internal language model likely performs an implicit ‘error correction,’ favoring linguistically probable words and pronunciations. This underscores that our model’s errors arise from a complex interaction between neural decoding fidelity and the generative biases of the synthesis stage”

      (5) In general, is the motivation for adopting the dual-pathway model to better align with the organization of the human brain, or to achieve improved engineering performance? If the goal is primarily engineeringoriented, the authors should compare their approach with a pretrained multimodal LLM rather than relying on the dual-pathway architecture. Conversely, if the design aims to mirror human brain function, additional analysis, such as detailed comparisons of phoneme confusion matrices, should be included to demonstrate that the model exhibits brain-like performance patterns.

      Our primary motivation is engineering improvement, to overcome the fundamental trade-off between acoustic fidelity and linguistic intelligibility that has limited previous neural speech decoding work. The design is inspired by the related works of the convergent representation of speech and language perception (2). However, we do not claim that our LSTM and Transformer adaptors precisely simulate the specific neural computations of the human ventral and dorsal streams. The goal was to build a high-performance, data-efficient decoder. We will clarify this point in the Introduction and Discussion, stating that while the architecture is loosely inspired by previous neuroscience results, its primary validation is its engineering performance in achieving state-of-the-art reconstruction quality with minimal data.

      Edits:

      Page 14, Line 358-373:

      “In this study, we present a dual-path framework that synergistically decodes both acoustic and linguistic speech representations from ECoG signals, followed by a fine-tuned zero-shot text-to-speech network to re-synthesize natural speech with unprecedented fidelity and intelligibility. Crucially, by integrating large pre-trained generative models into our acoustic reconstruction pipeline and applying voice cloning technology, our approach preserves acoustic richness while significantly enhancing linguistic intelligibility beyond conventional methods. Our dual-pathway architecture, while inspired by converging neuroscience insights on speech and language perception, was principally designed and validated as an engineering solution. The primary goal to build a practical decoder that achieves state-of-theart reconstruction quality with minimal data. The framework's success is therefore ultimately judged by its performance metrics, high intelligibility (WER, PER), acoustic fidelity (melspectrogram R²), and perceptual quality (MOS), which directly address the core engineering challenge we set out to solve. Using merely 20 minutes of ECoG recordings, our model achieved superior performance with a WER of 18.9% ± 3.3% and PER of 12.0% ± 2.5% (Fig. 2D, E). This integrated architecture, combining pre-trained acoustic (Wav2Vec2.0 and HiFiGAN) and linguistic (Parler-TTS) models through lightweight neural adaptors, enables efficient mapping of ECoG signals to dual latent spaces. Such methodology substantially reduces the need for extensive neural training data while achieving breakthrough word clarity under severe data constraints. The results demonstrate the feasibility of transferring the knowledge embedded in speech-data pre-trained artificial intelligence (AI) models into neural signal decoding, paving the way for more advanced brain-computer interfaces and neuroprosthetics”.

      Reviewer #2 (Public review):

      Summary:

      The study by Li et al. proposes a dual-path framework that concurrently decodes acoustic and linguistic representations from ECoG recordings. By integrating advanced pre-trained AI models, the approach preserves both acoustic richness and linguistic intelligibility, and achieves a WER of 18.9% with a short (~20-minute) recording.

      Overall, the study offers an advanced and promising framework for speech decoding. The method appears sound, and the results are clear and convincing. My main concerns are the need for additional control analyses and for more comparisons with existing models.

      Strengths:

      (1) This speech-decoding framework employs several advanced pre-trained DNN models, reaching superior performance (WER of 18.9%) with relatively short (~20-minute) neural recording.

      (2) The dual-pathway design is elegant, and the study clearly demonstrates its necessity: The acoustic pathway enhances spectral fidelity while the linguistic pathway improves linguistic intelligibility.

      We thank Reviewer #2 for supportive comments. In addition, we appreciate Reviewer #2’s thoughtful comments and feedback. By addressing these comments, we believe we have greatly improved the clarity of our claims and methodology. Below we list our point-to-point responses addressing concerns raised by Reviewer #2.

      Weaknesses:

      The DNNs used were pre-trained on large corpora, including TIMIT, which is also the source of the experimental stimuli. More generally, as DNNs are powerful at generating speech, additional evidence is needed to show that decoding performance is driven by neural signals rather than by the DNNs' generative capacity.

      Thank you for raising this crucial point regarding the potential for pre-trained DNNs to generate speech independently of the neural input. We fully agree that it is essential to disentangle the contribution of the neural signals from the generative priors of the models. To address this directly, we have conducted two targeted control analyses, as you suggested, and have integrated the results into the revised manuscript (see Fig. S5 and the corresponding description in the Results section):

      (1) Random noise input: We fed Gaussian noise (matched in dimensionality and temporal structure to real ECoG recordings) into the trained adaptors. The outputs were acoustically unstructured and linguistically incoherent, confirming that the generative models alone cannot produce meaningful speech without valid neural input.

      (2) Partial sentence input (real + noise): For the acoustic pathway, we systematically replaced portions of the ECoG input with noise. The reconstruction quality (mel-spectrogram R²) dropped significantly in the corrupted segments, demonstrating that the decoding is temporally locked to the neural signal and does not “hallucinate” speech from noise.

      These results provide strong evidence that our model’s performance is causally dependent on and sensitive to the specific neural input, validating that it performs genuine neural decoding rather than merely leveraging the generative capacity of the pre-trained DNNs.

      The detailed edits are in the “recommendations” below. (See recommendations (1) and (2))

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Clarify the results shown in Figure 4E. The integrated approach appears to perform comparably to Baseline 3 in phoneme class clarity. However, Baseline 3 represents the output of the linguistic pathway alone, which is expected to encode information primarily at the word level.

      We appreciate the reviewer's observation and agree that clarification is needed. The phoneme class clarity (PCC) metric shown in Figure 4E measures whether mis-decoded phonemes are more likely to be confused within their own class (vowel-vowel or consonantconsonant) rather than across classes (vowel-consonant). A higher PCC indicates that the model's errors tend to be phonologically similar sounds (e.g., one vowel mistaken for another), which is a reasonable property for intelligibility.

      We would like to clarify the nature of Baseline 3. As stated in the manuscript (Results section: "The linguistic pathway reconstructs high-intelligibility, higher-level linguistic information"), Baseline 3 is the output of our linguistic pathway. This pathway operates as follows: the ECoG signals are mapped to word tokens via the Transformer adaptor, and these tokens are then synthesized into speech by the frozen Parler-TTS model. Crucially, the input to Parler-TTS is a sequence of word tokens.

      It is important to distinguish between the levels of performance measured: Word Error Rate (WER) reflects accuracy at the lexical level (whole words). The linguistic pathway achieves a low WER by design, as it directly decodes word sequences. Phoneme Error Rate (PER) reflects accuracy at the sublexical phonetic level (phonemes). A low WER generally implies a low PER, because robust word recognition requires reliable phoneme-level representations within the TTS model's prior. This explains why Baseline 3 also exhibits a low PER. However, acoustic fidelity (captured by metrics like mel-spectrogram R²) requires the preservation of fine-grained spectrotemporal details such as pitch, timbre, prosody, and formant structures, information that is not directly encoded at the lexical level and is therefore not a strength of the purely linguistic pathway.

      While Parler-TTS internally models sub-word/phonetic information to generate the acoustic waveform, the primary linguistic information driving the synthesis is at the lexical (word) level. The generated speech from Baseline 3 therefore contains reconstructed phonemic sequences derived from the decoded word tokens, not from direct phoneme-level decoding of ECoG.

      Therefore, the comparable PCC between our final integrated model and Baseline 3 (linguistic pathway) suggests that the phoneme-level error patterns (i.e., the tendency to confuse within-class phonemes) in our final output are largely inherited from the high-quality linguistic prior embedded in the pre-trained TTS model (Parler-TTS). The integrated framework successfully preserves this desirable property from the linguistic pathway while augmenting it with speaker-specific acoustic details from the acoustic pathway, thereby achieving both high intelligibility (low WER/PER) and high acoustic fidelity (high melspectrogram R²).

      We will revise the caption of Figure 4E and the corresponding text in the Results section to make this interpretation explicit.

      Edits:

      Page 12, Lines 317-322:

      “In addition to the confusion matrices, we categorized the phonemes into vowels and consonants to assess the phoneme class clarity. We defined "phoneme class clarity" (PCC) as the proportion of errors where a phoneme was misclassified within the same class versus being misclassified into a different class. The purpose of introducing PCC is to demonstrate that most of the misidentified phonemes belong to the same category (confusion between vowels or consonants), rather than directly comparing the absolute accuracy of phoneme recognition. For instance, a vowel being mistaken for another vowel would be considered a within-class error, whereas a vowel being mistaken for a consonant would be classified as a between-class error” 

      (2) Add results from Baseline 2 + CosyVoice and Baseline 3 + CosyVoice to clarify the contribution of the auditory pathway.

      Thank you for the suggestion. We appreciate the opportunity to clarify the role of CosyVoice in our framework.

      As explained in our response to point (1), CosyVoice 2.0 is designed as a fusion module that requires two inputs: 1) a voice reference (from the acoustic pathway) to specify speaker identity, and 2) a word sequence (from the linguistic pathway) to specify linguistic content. Because it is not a standalone enhancer, applying CosyVoice to a single pathway output (e.g., Baseline 2 or 3 alone) is not quite feasible and would not reflect its intended function and could lead to misinterpretation of each pathway’s contribution.

      Instead, we have evaluated the contribution of each pathway by comparing the final integrated output against each standalone pathway output (Baseline 2 and 3). The significant improvements in both acoustic fidelity and linguistic intelligibility demonstrate the complementary roles of the two pathways, which are effectively fused through CosyVoice.

      (3) Justify your choice of using LSTM and Transformer architecture for the auditory and linguistic neural adaptors, respectively, and how your methods could compare to using a unified generative multimodal LLM for both pathways.

      Thank you for revisiting this important point. We appreciate your interest in the architectural choices and their relationship to state-of-the-art multimodal models.

      As detailed in our response to point (2), our choice of LSTM for the acoustic pathway and Transformer for the linguistic pathway is driven by task-specific requirements, supported by ablation studies (Supplementary Tables 1–2). The acoustic pathway benefits from LSTM’s ability to model fine-grained, local temporal dependencies without over-smoothing. The linguistic pathway benefits from Transformer’s ability to capture long-range semantic and syntactic context.

      Regarding comparison with unified multimodal LLMs (e.g., Qwen3-Omni), we clarified that such models are optimized for interactive dialogue and multimodal understanding, while our framework relies on specialist models (CosyVoice 2.0, Parler-TTS) that are explicitly designed for high-fidelity, speaker-adaptive speech synthesis, a requirement central to our decoding task.

      We have incorporated these justifications into the revised manuscript (Results and Discussion sections) and appreciate the opportunity to further emphasize these points.

      Edits:

      Page 9, Lines 214-223:

      “The acoustic pathway, implemented through a bi-directional LSTM neural adaptor architecture (Fig. 1B), specializes in reconstructing fundamental acoustic properties of speech. This module directly processes neural recordings to generate precise time-frequency representations, focusing on preserving speaker-specific spectral characteristics like formant structures, harmonic patterns, and spectral envelope details. Quantitative evaluation confirms its core competency: achieving a mel-spectrogram R² of 0.793 ± 0.016 (Fig. 3B) demonstrates remarkable fidelity in reconstructing acoustic microstructure. This performance level is statistically indistinguishable from original speech degraded by 0dB additive noise (0.771 ± 0.014, p = 0.242, one-sided t-test). We chose a bidirectional LSTM architecture for this adaptor because its recurrent nature is particularly suited to modeling the fine-grained, short- to medium-range temporal dependencies (e.g., within and between phonemes and syllables) that are critical for acoustic fidelity. An ablation study comparing LSTM against Transformerbased adaptors for this task confirmed that LSTMs yielded superior mel-spectrogram reconstruction fidelity (higher R²), as detailed in Table S1, likely by avoiding the oversmoothing of spectrotemporal details sometimes induced by the strong global context modeling of Transformers”.

      “To confirm that the acoustic pathway’s output is causally dependent on the neural signal rather than the generative prior of the HiFi-GAN, we performed a control analysis in which portions of the input ECoG recording were replaced with Gaussian noise. When either the first half, second half, or the entirety of the neural input was replaced by noise, the melspectrogram R² of the reconstructed speech dropped markedly, corresponding to the corrupted segment (Fig. S5). This demonstrates that the reconstruction is temporally locked to the specific neural input and that the model does not ‘hallucinate’ spectrotemporal structure from noise. These results validate that the acoustic pathway performs genuine, input-sensitive neural decoding”.

      Page 10, Lines 272-277:

      “We employed a Transformer-based Seq2Seq architecture for this adaptor to effectively capture the long-range contextual dependencies across a sentence, which are essential for resolving lexical ambiguity and syntactic structure during word token decoding. This choice was validated by an ablation study (Table S2), indicating that the Transformer adaptor outperformed an LSTM-based counterpart in word prediction accuracy”.

      (4) Discuss the differences between the model's phoneme confusion matrix in Figure 4A and human phoneme confusion patterns. In addition, please clarify whether the adoption of the dual-pathway architecture is primarily intended to simulate the organization of the human brain or to achieve engineering improvements.

      The observed difference between our model's phoneme confusion matrix and typical human perceptual confusion patterns (e.g., the noted lack of confusion between /s/ and /z/) is, as the reviewer astutely infers, likely attributable to the TTS model (Parler-TTS) acting as a powerful linguistic prior. The linguistic pathway decodes word tokens, and Parler-TTS converts these tokens into speech. Parler-TTS is trained on massive text and speech corpora to produce canonical, clean pronunciations. It effectively performs a form of "error correction" or "canonicalization" based on its internal language model. For example, if the neural decoding is ambiguous between "sip" and "zip", the TTS model's strong prior for lexical and syntactic context may robustly resolve it to the correct word, suppressing purely acoustic confusions like voicing. Therefore, the phoneme errors in our final output reflect a combination of neural decoding errors and the TTS model's generation biases, which are optimized for intelligibility rather than mimicking human misperception. We will add this explanation to the paragraph discussing Figure 4A.

      Our primary motivation is engineering improvement, to overcome the fundamental tradeoff between acoustic fidelity and linguistic intelligibility that has limited previous neural speech decoding work. The design is inspired by the convergent representation of speech and language perception (1). However, we do not claim that our LSTM and Transformer adaptors precisely simulate the specific neural computations of the human ventral and dorsal streams. The goal was to build a high-performance, data-efficient decoder. We will clarify this point in the Introduction and Discussion, stating that while the architecture is loosely inspired by previous neuroscience results, its primary validation is its engineering performance in achieving state-of-the-art reconstruction quality with minimal data.

      Edits:

      Pages 2-3, Lines 74-85:

      “Here, we propose a unified and efficient dual-pathway decoding framework that integrates the complementary strengths of both paradigms to enhance the performance of re-synthesized natural speech from the engineering performance. Our method maps intracranial electrocorticography (ECoG) signals into the latent spaces of pre-trained speech and language models via two lightweight neural adaptors: an acoustic pathway, which captures low-level spectral features for naturalistic speech synthesis, and a linguistic pathway, which extracts high-level linguistic tokens for semantic intelligibility. These pathways are fused using a finetuned text-to-speech (TTS) generator with voice cloning, producing re-synthesized speech that retains both the acoustic spectrotemporal details, such as the speaker’s timbre and prosody, and the message linguistic content. The adaptors rely on near-linear mappings and require only 20 minutes of neural data per participant for training, while the generative modules are pre-trained on large unlabeled corpora and require no neural supervision”.

      Page 14, Lines 358-373:

      “In this study, we present a dual-path framework that synergistically decodes both acoustic and linguistic speech representations from ECoG signals, followed by a fine-tuned zero-shot text-to-speech network to re-synthesize natural speech with unprecedented fidelity and intelligibility. Crucially, by integrating large pre-trained generative models into our acoustic reconstruction pipeline and applying voice cloning technology, our approach preserves acoustic richness while significantly enhancing linguistic intelligibility beyond conventional methods. Our dual-pathway architecture, while inspired by converging neuroscience insights on speech and language perception, was principally designed and validated as an engineering solution. The primary goal to build a practical decoder that achieves state-of-the-art reconstruction quality with minimal data. The framework's success is therefore ultimately judged by its performance metrics, high intelligibility (WER, PER), acoustic fidelity (mel-spectrogram R²), and perceptual quality (MOS), which directly address the core engineering challenge we set out to solve. Using merely 20 minutes of ECoG recordings, our model achieved superior performance with a WER of 18.9% ± 3.3% and PER of 12.0% ± 2.5% (Fig. 2D, E). This integrated architecture, combining pre-trained acoustic (Wav2Vec2.0 and HiFi-GAN) and linguistic (Parler-TTS) models through lightweight neural adaptors, enables efficient mapping of ECoG signals to dual latent spaces. Such methodology substantially reduces the need for extensive neural training data while achieving breakthrough word clarity under severe data constraints. The results demonstrate the feasibility of transferring the knowledge embedded in speech-data pre-trained artificial intelligence (AI) models into neural signal decoding, paving the way for more advanced brain-computer interfaces and neuroprosthetics”.

      Reviewer #2 (Recommendations for the authors):

      (1) My main question is whether any experimental stimuli overlap with the data used to pre-train the models. The authors might consider using pre-trained models trained on other corpora and training their own model without the TIMIT corpus. Additionally, as pretrained models were used, it might be helpful to evaluate to what extent the decoding is sensitive to the input neural recording or whether the model always outputs meaningful speech. The authors might consider two control analyses: a) whether the model still generates speech-like output if the input is random noise; b) whether the model can decode a complete sentence if the first half recording of a sentence is real but the second half is replaced with noise.

      We thank the reviewer for raising this crucial point regarding potential data leakage and the sensitivity of decoding to neural input.

      We confirm that the pre-training phase of our core models (Wav2Vec2.0 encoder, HiFiGAN decoder) was conducted exclusively on the LibriSpeech corpus (960 hours), which is entirely separate from the TIMIT corpus used for our ECoG experiments. The subsequent fine-tuning of the CosyVoice 2.0 voice cloner for speaker adaptation was performed on the training set portion of the entire TIMIT corpus. Importantly, the test set for all neural decoding evaluations was strictly held out and never used during any fine-tuning stage. This data separation is now explicitly stated in the " Methods" sections for the Speech Autoencoder and the CosyVoice fine-tuning.

      Regarding the potential of training on other corpora, we agree it is a valuable robustness check. Previous work has demonstrated that self-supervised speech models like Wav2Vec2.0 learn generalizable representations that transfer well across domains (e.g., Millet et al., NeurIPS 2022). We believe our use of LibriSpeech, a large and diverse corpus, provides a strong, general-purpose acoustic prior.

      We agree with the reviewer that control analyses are essential to demonstrate that the decoded output is driven by neural signals and not merely the generative prior of the models. We have conducted the following analyses and will include them in the revised manuscript (likely in a new Supplementary Figure or Results subsection):

      (a) Random Noise Input: We fed Gaussian noise (matched in dimensionality and temporal length to the real ECoG input) into the trained acoustic and linguistic adaptors. The outputs were evaluated. The acoustic pathway generated unstructured, noisy spectrograms with no discernible phonetic structure, and the linguistic pathway produced either highly incoherent word sequences or failed to generate meaningful tokens. The fusion via CosyVoice produced unintelligible babble. This confirms that the generative models alone cannot produce structured speech without meaningful neural input.

      (b) Partial Sentence Input (Real + Noise): In the acoustic pathway, we replaced the first half, the second half, and all the ECoG recording for test sentences with Gaussian noise. The melspectrogram R<sup>2</sup> showed a clear degradation in the reconstructed speech corresponding to the noisy segment. We did not do similar experiments in the linguistic pathway because the TTS generator is pre-trained by HuggingFace. We did not train any parameters of Parler-TTS. These results strongly indicate that our model's performance is contingent on and sensitive to the specific neural input, validating that it is performing genuine neural decoding.

      Edits:

      Page 19, Lines 533-538:

      “The parameters in Wav2Vec2.0 were frozen within this training phase. The parameters in HiFi-GAN were optimized using the Adam optimizer with a fixed learning rate of 10<sub>-5</sub>, 𝛽<sub>!</sub> = 0.9, 𝛽<sub>2</sub> = 0.999. We trained this Autoencoder in LibriSpeech, a 960-hour English speech corpus with a sampling rate of 16kHz, which is entirely separate from the TIMIT corpus used for our ECoG experiments. We spent 12 days in parallel training on 6 Nvidia GeForce RTX3090 GPUs. The maximum training epoch was 2000. The optimization did not stop until the validation loss no longer decreased”.

      Edits:

      Page9, Lines214-223:

      “The acoustic pathway, implemented through a bi-directional LSTM neural adaptor architecture (Fig. 1B), specializes in reconstructing fundamental acoustic properties of speech. This module directly processes neural recordings to generate precise time-frequency representations, focusing on preserving speaker-specific spectral characteristics like formant structures, harmonic patterns, and spectral envelope details. Quantitative evaluation confirms its core competency: achieving a mel-spectrogram R² of 0.793 ± 0.016 (Fig. 3B) demonstrates remarkable fidelity in reconstructing acoustic microstructure. This performance level is statistically indistinguishable from original speech degraded by 0dB additive noise (0.771 ± 0.014, p = 0.242, one-sided t-test). We chose a bidirectional LSTM architecture for this adaptor because its recurrent nature is particularly suited to modeling the fine-grained, short- to medium-range temporal dependencies (e.g., within and between phonemes and syllables) that are critical for acoustic fidelity. An ablation study comparing LSTM against Transformer-based adaptors for this task confirmed that LSTMs yielded superior mel-spectrogram reconstruction fidelity (higher R²), as detailed in Table S1, likely by avoiding the oversmoothing of spectrotemporal details sometimes induced by the strong global context modeling of Transformers”.

      “To confirm that the acoustic pathway’s output is causally dependent on the neural signal rather than the generative prior of the HiFi-GAN, we performed a control analysis in which portions of the input ECoG recording were replaced with Gaussian noise. When either the first half, second half, or the entirety of the neural input was replaced by noise, the melspectrogram R² of the reconstructed speech dropped markedly, corresponding to the corrupted segment (Fig. S5). This demonstrates that the reconstruction is temporally locked to the specific neural input and that the model does not ‘hallucinate’ spectrotemporal structure from noise. These results validate that the acoustic pathway performs genuine, input-sensitive neural decoding”

      (2) For BCI applications, the decoding speed matters. Please report the model's inference speed. Additionally, the authors might also consider reporting cross-participant generalization and how the accuracy changes with recording duration.

      We thank the reviewer for these practical and important suggestions. 

      Inference Speed: You are absolutely right. On our hardware (single NVIDIA GeForce RTX 3090 GPU), the current pipeline has an inference time that is longer than the duration of the target speech segment. The primary bottlenecks are the sequential processing in the autoregressive linguistic adaptor and the high-resolution waveform generation in CosyVoice 2.0. This latency currently limits real-time application. We have now added this in the Discussion acknowledging this limitation and stating that future work must focus on architectural optimizations (e.g., non-autoregressive models, lighter vocoders) and potential hardware acceleration to achieve real-time performance, which is critical for a practical BCI.

      Cross-Participant Generalization: We agree that this is a key question for scalability. Our framework already addresses part of the cross-participant generalization challenge through the use of pre-trained generative modules (HiFi-GAN, Parler-TTS, CosyVoice 2.0), which are pretrained on large corpora and shared across all participants. Only a small fraction of the model, the lightweight neural adaptors, is subject-specific and requires a small amount of supervised fine-tuning (~20 minutes per participant). This design significantly reduces the per-subject calibration burden. As the reviewer implies, the ultimate goal would be pure zero-shot generalization. A promising future direction is to further improve cross-participant alignment by learning a shared neural feature encoder (e.g., using contrastive or self-supervised learning on aggregated ECoG data) before the personalized adaptors. We have added a paragraph in the Discussion outlining this as a major next step to enhance the framework’s practicality and further reduce calibration time.

      Accuracy vs. Recording Duration: Thank you for this insightful suggestion. To systematically evaluate the impact of training data volume on performance, we have conducted additional experiments using progressively smaller subsets of the full training set (i.e., 25%, 50%, and 75%). When we used more than 50% of the training data, performance degrades gracefully rather than catastrophically with less data, which is promising for potential clinical scenarios where data collection may be limited. We add another figure (Fig. S4) to demonstrate this.

      Edits:

      Pages 15-16, Lines 427-452:

      “There are several limitations in our study. The quality of the re-synthesized speech heavily relies on the performance of the generative model, indicating that future work should focus on refining and enhancing these models. Currently, our study utilized English speech sentences as input stimuli, and the performance of the system in other languages remains to be evaluated. Regarding signal modality and experimental methods, the clinical setting restricts us to collecting data during brief periods of awake neurosurgeries, which limits the amount of usable neural activity recordings. Overcoming this time constraint could facilitate the acquisition of larger datasets, thereby contributing to the re-synthesis of higher-quality natural speech. Furthermore, the inference speed of the current pipeline presents a challenge for real-time applications. On our hardware (a single NVIDIA GeForce RTX 3090 GPU), synthesizing speech from neural data takes approximately two to three times longer than the duration of the target speech segment itself. This latency is primarily attributed to the sequential processing in the autoregressive linguistic adaptor and the computationally intensive high-fidelity waveform generation in the vocoder (CosyVoice 2.0). While the current study focuses on offline reconstruction accuracy, achieving real-time or faster-than-real-time inference is a critical engineering goal for viable speech BCI prosthetics. Future work must therefore prioritize architectural optimizations, such as exploring non-autoregressive decoding strategies and more efficient neural vocoders, alongside potential hardware acceleration. Additionally, exploring non-invasive methods represents another frontier; with the accumulation of more data and the development of more powerful generative models, it may become feasible to achieve effective non-invasive neural decoding for speech resynthesis. Moreover, while our framework adopts specialized architectures (LSTM and Transformer) for distinct decoding tasks, an alternative approach is to employ a unified multimodal large language model (LLM) capable of joint acoustic-linguistic processing. Finally, the current framework requires training participant-specific adaptors, which limits its immediate applicability for new users. A critical next step is to develop methods that learn a shared, cross-participant neural feature encoder, for instance, by applying contrastive or selfsupervised learning techniques to larger aggregated ECoG datasets. Such an encoder could extract subject-invariant neural representations of speech, serving as a robust initialization before lightweight, personalized fine-tuning. This approach would dramatically reduce the amount of per-subject calibration data and time required, enhancing the practicality and scalability of the decoding framework for real-world BCI applications”

      “In summary, our dual-path framework achieves high speech reconstruction quality by strategically integrating language models for lexical precision and voice cloning for vocal identity preservation, yielding a 37.4% improvement in MOS scores over conventional methods. This approach enables high-fidelity, sentence-level speech synthesis directly from cortical recordings while maintaining speaker-specific vocal characteristics. Despite current constraints in generative model dependency and intraoperative data collection, our work establishes a new foundation for neural decoding development. Future efforts should prioritize: (1) refining few-shot adaptation techniques, (2) developing non-invasive implementations, (3) expanding to dynamic dialogue contexts, and (4) cross-subject applications. The convergence of neurophysiological data with multimodal foundation models promises transformative advances, not only revolutionizing speech BCIs but potentially extending to cognitive prosthetics for memory augmentation and emotional communication. Ultimately, this paradigm will deepen our understanding of neural speech processing while creating clinically viable communication solutions for those with severe speech impairments”

      Edits: 

      add another section in Methods: Page 22, Line 681:

      “Ablation study on training data volume”.

      “To assess the impact of training data quantity on decoding performance, we conducted an additional ablation experiment. For each participant, we created subsets of the full training set corresponding to 25%, 50%, and 75% of the original data by random sampling while preserving the temporal continuity of speech segments. Personalized acoustic and linguistic adaptors were then independently trained from scratch on each subset, following the identical architecture and optimization procedures described above. All other components of the pipeline, including the frozen pre-trained generators (HiFi-GAN, Parler-TTS) and the CosyVoice 2.0 voice cloner, remained unchanged. Performance metrics (mel-spectrogram R², WER, PER) were evaluated on the same held-out test set for all data conditions. The results (Fig. S4) demonstrate that when more than 50% of the training data is utilized, performance degrades gracefully rather than catastrophically, which is a promising indicator for clinical applications with limited data collection time”.

      (3) I appreciate that the author compared their model with the MLP, but more comparisons with previous models could be beneficial. Even simply summarizing some measures of earlier models, such as neural recording duration, WER, PER, etc., is ok.

      Thank you for this suggestion. We agree that a broader comparison contextualizes our contribution. We also acknowledge that given the differences in tasks, signal modality, and amount of data, it’s hard to draw a direct comparison. The main goal of this table is to summarize major studies, their methods and results for reference. We have now added a new Supplementary Table that summarizes key metrics from several recent and relevant studies in neural speech decoding. The table includes:

      - Neural modality (e.g., ECoG, sEEG, Utah array)

      - Approximate amount of neural data used per subject for decoder training

      - Primary task (perception vs. production)

      -Decoding framework

      -Reported Word Error Rate (WER) or similar intelligibility metrics (e.g., Character Error Rate)

      -Reported acoustic fidelity metrics (if available, e.g., spectral correlation)

      This table includes works such as Anumanchipalli et al., Nature 2019; Akbari et al., Sci Rep 2019; Willett et al., Nature 2023; and other contemporary studies. The table clearly shows that our dual-path framework achieves a highly competitive WER (~18.9%) using an exceptionally short neural recording duration (~20 minutes), highlighting its data efficiency. We will refer to this table in the revised manuscript.

      Edits:

      Page 14, Lines 374-376:

      “Our framework establishes a framework for speech decoding by outperforming prior acousticonly or linguistic-only approaches (Table S3) through integrated pretraining-powered acoustic and linguistic decoding”

      Minor:

      (1) Some processes might be described earlier, for example, the electrodes were selected, and the model was trained separately for each participant. That information was only described in the Method section now.

      Thank you for catching these. We have revised the manuscript accordingly.

      Edits:

      Page4, Lines 89-95:

      “Our proposed framework for reconstructing speech from intracranial neural recordings is designed around two complementary decoding pathways: an acoustic pathway focused on preserving low-level spectral and prosodic detail, and a linguistic pathway focused on decoding high-level textual and semantic content. For every participant, our adaptor is independently trained, and we select speech-responsive electrodes (selection details are provided in the Methods section) to tailor the model to individual neural patterns. These two streams are ultimately fused to synthesize speech that is both natural-sounding and intelligible, capturing the full richness of spoken language. Fig. 1 provides a schematic overview of this dual-pathway architecture”

      (2) Line 224-228 Figure 2 should be Figure 3

      Thank you for catching these. We have revised the manuscript accordingly. The information about participant-specific training and electrode selection is now briefly mentioned in the "Results" overview (section: "The acoustic and linguistic performance..."), with details still in the Methods. The figure reference error has been corrected.

      Edits:

      Page7, Lines 224-228:

      “However, exclusive reliance on acoustic reconstruction reveals fundamental limitations. Despite excellent spectral fidelity, the pathway produces critically impaired linguistic intelligibility. At the word level, intelligibility remains unacceptably low (WER = 74.6 ± 5.5%, Fig. 3D), while MOS and phoneme-level precision fares only marginally better (MOS = 2.878 ± 0.205, Fig. 3C; PER = 28.1 ± 2.2%, Fig. 3E)”.

      (3) For Figure 3C, why does the MOS seem to be higher for baseline 3 than for ground truth? Is this significant?

      This is a detailed observation. Baseline 3 achieves a mean opinion score of 4.822 ± 0.086 (Fig. 3C), significantly surpassing even the original human speech (4.234 ± 0.097, p = 6.674×10⁻33). We believe this trend arises because the TIMIT corpus, recorded decades ago, contains inherent acoustic noise and relatively lower fidelity compared to modern speech corpus. In contrast, the Parler-TTS model used in Baseline 3 is trained on massive, highquality, clean speech datasets. Therefore, it synthesizes speech that listeners may subjectively perceive as "cleaner" or more pleasant, even if it lacks the original speaker's voice. Crucially, as the reviewer implies, our final integrated output does not aim to maximize MOS at the cost of speaker identity; it successfully balances this subjective quality with high intelligibility and restored acoustic fidelity. We will add a brief note explaining this possible reason in the caption of Figure 3C.

      Edits:

      Page9, Lines 235-245:

      “The linguistic pathway reconstructs high-intelligibility, higher-level linguistic information”

      “The linguistic pathway, instantiated through a pre-trained TTS generator (Fig. 1B), excels in reconstructing abstract linguistic representations. This module operates at the phonological and lexical levels, converting discrete word tokens into continuous speech signals while preserving prosodic contours, syllable boundaries, and phonetic sequences. It achieves a mean opinion score of 4.822 ± 0.086 (Fig. 3C) - significantly surpassing even the original human speech (4.234 ± 0.097, p = 6.674×10⁻33) in that the TIMIT corpus, recorded decades ago, contains inherent acoustic noise and relatively lower fidelity compared to modern speech corpus.  Complementing this perceptual quality, objective intelligibility metrics confirm outstanding performance: WER reaches 17.7 ± 3.2%, with PER at 11.0 ± 2.3%”.

      Reference

      (1) Chen M X, Firat O, Bapna A, et al. The best of both worlds: Combining recent advances in neural machine translation[C]//Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long papers). 2018: 76-86

      (2) P. Chen et al. Do Self-Supervised Speech and Language Models Extract Similar Representations as Human Brain? 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024). 2225–2229 (2024).

      (3) H. Akbari, B. Khalighinejad, J. L. Herrero, A. D. Mehta, N. Mesgarani, Towards reconstructing intelligible speech from the human auditory cortex. Scientific reports 9, 874 (2019).

      (4) S. Komeiji et al., Transformer-Based Estimation of Spoken Sentences Using Electrocorticography. Int Conf Acoust Spee, 1311-1315 (2022).

      (5) L. Bellier et al., Music can be reconstructed from human auditory cortex activity using nonlinear decoding models. Plos Biology 21,  (2023).

      (6) F. R. Willett et al., A high-performance speech neuroprosthesis. Nature 620,  (2023).

      (7) S. L. Metzger et al., A high-performance neuroprosthesis for speech decoding and avatar control. Nature 620, 1037-1046 (2023).

      (8) J. W. Li et al., Neural2speech: A Transfer Learning Framework for NeuralDriven Speech Reconstruction. Int Conf Acoust Spee, 2200-2204 (2024).

      (9) X. P. Chen et al., A neural speech decoding framework leveraging deep learning and speech synthesis. Nat Mach Intell 6,  (2024).

      (10) M. Wairagkar et al., An instantaneous voice-synthesis neuroprosthesis. Nature,  (2025).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Chen et al. engineered and characterized a suite of next-generation GECIs for the Drosophila NMJ that allow for the visualization of calcium dynamics within the presynaptic compartment, at presynaptic active zones, and in the postsynaptic compartment. These GECIs include ratiometric presynaptic Scar8m (targeted to synaptic vesicles), ratiometric active zone localized Bar8f (targeted to the scaffold molecule BRP), and postsynaptic SynapGCaMP8m. The authors demonstrate that these new indicators are a large improvement on the widely used GCaMP6 and GCaMP7 series GECIs, with increased speed and sensitivity. They show that presynaptic Scar8m accurately captures presynaptic calcium dynamics with superior sensitivity to the GCaMP6 and GCaMP7 series and with similar kinetics to chemical dyes. The active-zone targeted Bar8f sensor was assessed for the ability to detect release-site-specific nanodomain changes, but the authors concluded that this sensor is still too slow to accurately do so. Lastly, the use of postsynaptic SynapGCaMP8m was shown to enable the detection of quantal events with similar resolution to electrophysiological recordings. Finally, the authors developed a Python-based analysis software, CaFire, that enables automated quantification of evoked and spontaneous calcium signals. These tools will greatly expand our ability to detect activity at individual synapses without the need for chemical dyes or electrophysiology.

      We thank this Reviewer for the overall positive assessment of our manuscript and for the incisive comments.

      (1) The role of Excel in the pipeline could be more clearly explained. Lines 182-187 could be better worded to indicate that CaFire provides analysis downstream of intensity detection in ImageJ. Moreover, the data type of the exported data, such as .csv or .xlsx, should be indicated instead of 'export to graphical program such as Microsoft Excel'.

      We thank the Reviewer for these comments, many of which were shared by the other reviewers. In response, we have now 1) more clearly explained the role of Excel in the CaFire pipeline (lines 677-681), 2) revised the wording in lines 676-679 to indicate that CaFire provides analysis downsteam of intensity detection in ImageJ, and 3) Clarified the exported data type to Excel (lines 677-681). These efforts have improved the clarity and readability of the CaFire analysis pipeline.

      (2) In Figure 2A, the 'Excel' step should either be deleted or included as 'data validation' as ImageJ exports don't require MS Excel or any specific software to be analysed. (Also, the graphic used to depict Excel software in Figure 2A is confusing.)

      We thank the reviewer for this helpful suggestion. In the Fig. 2A, we have changed the Excel portion and clarified the processing steps in the revised methods. Specifically, we now indicate that ROIs are first selected in Fiji/ImageJ and analyzed to obtain time-series data containing both the time information and the corresponding imaging mean intensity values. These data are then exported to a spreadsheet file (e.g., Excel), which is used to organize the output before being imported into CaFire for subsequent analysis. These changes can be found in the Fig. 2A and methods (lines 676-681).

      (3) Figure 2B should include the 'Partition Specification' window (as shown on the GitHub) as well as the threshold selection to give the readers a better understanding of how the tool works.

      We absolutely agree with this comment, and have made the suggested changes to the Fig. 2B. In particular, we have replaced the software interface panels and now include windows illustrating the Load File, Peak Detection, and Partition functions. These updated screenshots provide a clearer view of how CaFire is used to load the data, detect events, and perform partition specification for subsequent analysis. We agree these changes will give the readers a better understanding of how the tool works, and we thank the reviewer for this comment.

      (4) The presentation of data is well organized throughout the paper. However, in Figure 6C, it is unclear how the heatmaps represent the spatiotemporal fluorescence dynamics of each indicator. Does the signal correspond to a line drawn across the ROI shown in Figure 6B? If so, this should be indicated.

      We apologize that the heatmaps were unclear in Fig panel 6C (Fig. 7C in the Current revision). Each heatmap is derived from a one-pixel-wide vertical line within a miniature-event ROI. These heatmaps correspond to the fluorescence change in the indicated SynapGCaMP variant of individual quantal events and their traces shown in Fig. 7C, with a representative image of the baseline and peak fluorescence shown in Fig. 7B. Specifically, we have added the following to the revised Fig. 7C legend:

      The corresponding heatmaps below were generated from a single vertical line extracted from a representative miniature-event ROI, and visualize the spatiotemporal fluorescence dynamics (ΔF/F) along that line over time.

      (5) In Figure 6D, the addition of non-matched electrophysiology recordings is confusing. Maybe add "at different time points" to the end of the 6D legend, or consider removing the electrophysiology trace from Figure 6D and referring the reader to the traces in Figure 7A for comparison (considering the same point is made more rigorously in Figure 7).

      This is a good point, one shared with another reviewer. We apologize this was not clear, and have now revised this part of the figure to remove the electrophysiological traces in what is now Fig. 7 while keeping the paired ones still in what is now Fig. 8A as suggested by the reviewer. We agree this helps to clarify the quantal calcium transients.

      (6) In GitHub, an example ImageJ Script for analyzing the images and creating the inputs for CaFire would be helpful to ensure formatting compatibility, especially given potential variability when exporting intensity information for two channels. In the Usage Guide, more information would be helpful, such as how to select ∆R/R, ideally with screenshots of the application being used to analyze example data for both single-channel and two-channel images.

      We agree that additional details added to the GitHub would be helpful for users of CaFire. In response, we have now added the following improvements to the GitHub site: 

      - ImageJ operation screenshots

      Step-by-step illustrations of ROI drawing and Multi Measure extraction.

      - Example Excel file with time and intensity values

      Demonstrates the required data format for CaFire import, including proper headers.

      - CaFire loading screenshots for single-channel and dual-channel imaging

      Shows how to import GCaMP into Channel 1 and mScarlet into Channel 2.

      - Peak Detection and Partition setting screenshots

      Visual examples of automatic peak detection, manual correction, and trace partitioning.

      - Instructions for ROI Extraction and CaFire Analysis

      A written guide describing the full workflow from ROI selection to CaFire data export.

      These changes have improved the usability and accessibility of CaFire, and we thank the reviewer for these points.

      Reviewer #2

      Calcium ions play a key role in synaptic transmission and plasticity. To improve calcium measurements at synaptic terminals, previous studies have targeted genetically encoded calcium indicators (GECIs) to pre- and postsynaptic locations. Here, Chen et al. improve these constructs by incorporating the latest GCaMP8 sensors and a stable red fluorescent protein to enable ratiometric measurements. In addition, they develop a new analysis platform, 'CaFire', to facilitate automated quantification. Using these tools, the authors demonstrate favorable properties of their sensors relative to earlier constructs. Impressively, by positioning postsynaptic GCaMP8m near glutamate receptors, they show that their sensors can report miniature synaptic events with speed and sensitivity approaching that of intracellular electrophysiological recordings. These new sensors and the analysis platform provide a valuable tool for resolving synaptic events using all-optical methods.

      We thank the Reviewer for their overall positive evaluation and comments.

      Major comments:

      (1) While the authors rigorously compared the response amplitude, rise, and decay kinetics of several sensors, key parameters like brightness and photobleaching rates are not reported. I feel that including this information is important as synaptically tethered sensors, compared to freely diffusible cytosolic indicators, can be especially prone to photobleaching, particularly under the high-intensity illumination and high-magnification conditions required for synaptic imaging. Quantifying baseline brightness and photobleaching rates would add valuable information for researchers intending to adopt these tools, especially in the context of prolonged or high-speed imaging experiments.

      This is a good point made by the reviewer, and one we agree will be useful for researchers to be aware. First, it is important to note that the photobleaching and brightness of the sensors will vary depending on the nature of the user’s imaging equipment, which can vary significantly between widefield microscopes (with various LED or halogen light sources for illumination), laser scanning systems (e.g., line scans with confocal systems), or area scanning systems using resonant scanners (as we use in our current study). Under the same imaging settings, GCaMP8f and 8m exhibit comparable baseline fluorescence, whereas GCaMP6f and 6s are noticeably dimmer; because our aim is to assess each reagent’s potential under optimal conditions, we routinely adjust excitation/camera parameters before acquisition to place baseline fluorescence in an appropriate dynamic range. As an important addition to this study, motivated by the reviewer’s comments above, we now directly compare neuronal cytosolic GCaMP8m expression with our Scar8m sensor, showing higher sensitivity with Scar8m (now shown in the new Fig. 3F-H).

      Regarding photobleaching, GCaMP signals are generally stable, while mScarlet is more prone to bleaching: in presynaptic area scanned confocal recordings, the mScarlet channel drops by ~15% over 15 secs, whereas GCaMP6s/8f/8m show no obvious bleaching over the same window (lines 549-553). In contrast, presynaptic widefield imaging using an LED system (CCD), GCaMP8f shows ~8% loss over 15 secs (lines 610-611). Similarly, for postsynaptic SynapGCaMP6f/8f/8m, confocal resonant area scans show no obvious bleaching over 60 secs, while widefield shows ~2–5% bleaching over 60 secs (lines 634-638). Finally, in active-zone/BRP calcium imaging (confocal), mScarlet again bleaches by ~15% over 15 s, while GCaMP8f/8m show no obvious bleaching. The mScarlet-channel bleaching can be corrected in Huygens SVI (Bleaching correction or via the Deconvolution Wizard), whereas we avoid applying bleaching correction to the green GCaMP channel when no clear decay is present to prevent introducing artifacts. This information is now added to the methods (lines 548-553).

      (2) In several places, the authors compare the performance of their sensors with synthetic calcium dyes, but these comparisons are based on literature values rather than on side-by-side measurements in the same preparation. Given differences in imaging conditions across studies (e.g., illumination, camera sensitivity, and noise), parameters like indicator brightness, SNR, and photobleaching are difficult to compare meaningfully. Additionally, the limited frame rate used in the present study may preclude accurate assessment of rise times relative to fast chemical dyes. These issues weaken the claim made in the abstract that "...a ratiometric presynaptic GCaMP8m sensor accurately captures .. Ca²⁺ changes with superior sensitivity and similar kinetics compared to chemical dyes." The authors should clearly acknowledge these limitations and soften their conclusions. A direct comparison in the same system, if feasible, would greatly strengthen the manuscript.

      We absolutely agree with these points made the reviewer, and have made a concerted effort to address them through the following:

      We have now directly compared presynaptic calcium responses on the same imaging system using the chemical dye Oregon Green Bapta-1 (OGB-1), one of the primary synthetic calcium indicators used in our field. These experiments reveal that Scar8f exhibits markedly faster kinetics and an improved signal-to-noise ratio compared to OGB-1, with higher peak fluorescence responses (Scar8f: 0.32, OGB-1: 0.23). The rise time constants of the two indicators are comparable (both ~3 msecs), whereas the decay of Scar8f is faster than that of OGB-1 (Scar8f: ~40, OGB-1: ~60), indicating more rapid signal recovery. These results now directly demonstrate the superiority of the new GCaMP8 sensors we have engineered over conventional synthetic dyes, and are now presented in the new Fig. 3A-E of the manuscript.

      We agree with the reviewer that, in the original submission, the relatively slow resonant area scans (~115 fps) limited the temporal resolution of our rise time measurements. To address this, we have re-measured the rise time using higher frame-rate line scans (kHz). For Scar8f, the rise time constant was 6.736 msec at ~115 fps resonant area scanned, but shortened to 2.893 msec when imaged at ~303 fps, indicating that the original protocol underestimated the true kinetics. In addition, for Bar8m, area scans at ~118 fps yielded a rise time constant of 9.019 msec, whereas line scans at ~1085 fps reduced the rise time constant to 3.230 msec. These new measurements are now incorporated into the manuscript ( Figs. 3,4, and 6) to more accurately reflect the fast kinetics of these indicators.

      (3) The authors state that their indicators can now achieve measurements previously attainable with chemical dyes and electrophysiology. I encourage the authors to also consider how their tools might enable new measurements beyond what these traditional techniques allow. For example, while electrophysiology can detect summed mEPSPs across synapses, imaging could go a step further by spatially resolving the synaptic origin of individual mEPSP events. One could, for instance, image MN-Ib and MN-Is simultaneously without silencing either input, and detect mEPSP events specific to each synapse. This would enable synapse-specific mapping of quantal events - something electrophysiology alone cannot provide. Demonstrating even a proof-of-principle along these lines could highlight the unique advantages of the new tools by showing that they not only match previous methods but also enable new types of measurements.

      These are excellent points raised by the reviewer. In response, we have done the following: 

      We have now included a supplemental video as “proof-of-principle” data showing simultaneous imaging of SynapGCaMP8m quantal events at both MN-Is and -Ib, demonstrating that synapse-specific spatial mapping of quantal events can be obtained with this tool (see new Supplemental Video 1). 

      We have also included an additional discussion of the potential and limitations of these tools for new measurements beyond conventional approaches. This discussion is now presented in lines 419-421 in the manuscript.

      (4) For ratiometric measurements, it is important to estimate and subtract background signals in each channel. Without this correction, the computed ratio may be skewed, as background adds an offset to both channels and can distort the ratio. However, it is not clear from the Methods section whether, or how, background fluorescence was measured and subtracted.

      This is a good point, and we agree more clarification about how ratiometric measurements were made is needed. In response, we have now added the following to the Methods section (lines 548-568):

      Time-lapse videos were stabilized and bleach-corrected prior to analysis, which visibly reduced frame-toframe motion and intensity drift. In the presynaptic and active-zone mScarlet channel, a bleaching factor of ~1.15 was observed during the 15 sec recording. This bleaching can be corrected using the “Bleaching correction” tool in Huygens SVI. For presynaptic and active-zone GCaMP signals, there was minimal bleaching over these short imaging periods. Therefore, the bleaching correction step for GCaMP was skipped. Both GCaMP and mScarlet channels were processed using the default settings in the Huygens SVI “Deconvolution Wizard” (with the exception of the bleaching correction option). Deconvolution was performed using the CMLE algorithm with the Huygens default stopping criterion and a maximum of 30 iterations, such that the algorithm either converged earlier or, if convergence was not reached, was terminated at this 30iteration limit; no other iteration settings were used across the GCaMP series. ROIs were drawn on the processed images using Fiji ImageJ software, and mean fluorescence time courses were extracted for the GCaMP and mScarlet channels, yielding F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t). F(t)s were imported into CaFire with GCaMP assigned to Channel #1 (signal; required) and mScarlet to Channel #2 (baseline/reference; optional). If desired, the mScarlet signal could be smoothed in CaFire using a user-specified moving-average window to reduce high-frequency noise. In CaFire’s ΔR/R mode, the per-frame ratio was computed as R(t)=F<sub>GCaMP</sub>(t) and F<sub>mScarlet</sub>(t); a baseline ratio R0 was estimated from the pre-stimulus period, and the final response was reported as ΔR/R(t)=[R(t)−R0]/R0, which normalizes GCaMP signals to the co-expressed mScarlet reference and thereby reduces variability arising from differences in sensor expression level or illumination across AZs.

      (5) At line 212, the authors claim "... GCaMP8m showing 345.7% higher SNR over GCaMP6s....(Fig. 3D and E) ", yet the cited figure panels do not present any SNR quantification. Figures 3D and E only show response amplitudes and kinetics, which are distinct from SNR. The methods section also does not describe details for how SNR was defined or computed.

      This is another good point. We define SNR operationally as the fractional fluorescence change (ΔF/F). Traces were processed with CaFire, which estimates a per-frame baseline F<sub>0</sub>(t) with a user-configurable sliding window and percentile. In the Load File panel, users can specify both the length of the moving baseline window and the desired percentile; the default settings are a 50-point window and the 30th percentile, representing a 101-point window centered on each time point (previous 50 to next 50 samples) and took the lower 30% of values within that window to estimate F<sub>0</sub>(t). The signal was then computed as ΔF/F=[F(t)−F0(t)]/F0(t). This ΔF/F value is what we report as SNR throughout the manuscript and is now discussed explicitly in the revised methods (lines 686-693).

      (6) Lines 285-287 "As expected, summed ΔF values scaled strongly and positively with AZ size (Fig. 5F), reflecting a greater number of Cav2 channels at larger AZs". I am not sure about this conclusion. A positive correlation between summed ΔF values and AZ size could simply reflect more GCaMP molecules in larger AZs, which would give rise to larger total fluorescence change even at a given level of calcium increase.

      The reviewer makes a good point, one that we agree should be clarified. The reviewer is indeed correct that larger active zones should have more abundant BRP protein, which in turn will lead to a higher abundance of the Bar8f sensor, which should lead to a higher GCaMP response simply by having more of this sensor. However, the inclusion of the ratiometric mScarlet protein should normalize the response accurately, correcting for this confound, in which the higher abundance of GCaMP should be offset (normalized) by the equally (stoichiometric) higher abundance of mScarlet. Therefore, when the ∆R/R is calculated, the differences in GCaMP abundance at each AZ should be corrected for the ratiometric analysis. We now use an improved BRP::mScarlet3::GCaMP8m (Bar8m) and compute ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). CaFire estimated R0 with a sliding 101-point window using the lowest 10% of values, and responses were reported as ΔR/R=[R−R0]/R0. Area-scan examples (118 fps) show robust ΔR/R transients (peaks ≈1.90 and 3.28; tau rise ≈9.0–9.3 ms; Fig. 6C, middle).

      We have now made these points more clearly in the manuscript (lines 700-704) and moved the Bar8f intensity vs active zone size data to Table S1. Together, these revisions improve the indicator-abundance confound (via mScarlet normalization). 

      (6) Lines 313-314: "SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D)." This statement is quite confusing. In Figure 6D, the corresponding calcium and ephys traces look completely different and appear to reflect distinct sets of events. It was only after reading Figure 7 that I realized the traces shown in Figure 6D might not have been recorded simultaneously. The authors should clarify this point.

      Yes, we absolutely agree with this point, one shared by Reviewer 1. In response, we have removed the electrophysiological traces in Fig. 6 to clarify that just the calcium responses are shown, and save the direct comparison for the Fig. 7 data (now revised Fig. 8).

      (8) Lines 310-313: "SynapGCaMP8m .... striking an optimal balance between speed and sensitivity", and Lines 314-316: "We conclude that SynapGCaMP8m is an optimal indicator to measure quantal transmission events at the synapse." Statements like these are subjective. In the authors' own comparison, GCaMP8m is significantly slower than GCaMP8f (at least in terms of decay time), despite having a moderately higher response amplitude. It is therefore unclear why GCaMP8m is considered 'optimal'. The authors should clarify this point or explain their rationale for prioritizing response amplitude over speed in the context of their application.

      This is another good point that we agree with, as the “optimal” sensor will of course depend on the user’s objectives. Hence, we used the term “an optimal sensor” to indicate it is what we believed to be the best one for our own uses. However, this point should be clarified and better discussed. In response, we have revised the relevant sections of the manuscript to better define why we chose the 8m sensors to strike an optimal balance of speed and sensitivity for our uses, and go on to discuss situations in which other sensor variants might be better suited. These are now presented in lines 223-236 in the revised manuscript, and we thank the reviewer for making these comments, which have improved our study.

      Minor comments

      (1)  Please include the following information in the Methods section:

      (a) For Figures 3 and 4, specify how action potentials were evoked. What type of electrodes were used, where were they placed, and what amount of current or voltage was applied?

      We apologize for neglecting to include this information in the original submission. We have now added this information to the revised Methods section (lines 537-543).

      (b) For imaging experiments, provide information on the filter sets used for each imaging channel, and describe how acquisition was alternated or synchronized between the green and red channels in ratiometric measurements. Additionally, please report the typical illumination intensity (in mW/mm²) for each experimental condition.

      We thank the reviewer for this helpful comment. We have now added detailed information about the imaging configuration to the Methods (lines 512-528) with the following:

      Ca2+ imaging was conducted using a Nikon A1R resonant scanning confocal microscope equipped with a 60x/1.0 NA water-immersion objective (refractive index 1.33). GCaMP signals were acquired using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet/mCherry signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). ROIs focused on terminal boutons of MN-Ib or -Is motor neurons. For both channels, the confocal pinhole was set to a fixed diameter of 117.5 µm (approximately three Airy units under these conditions), which increases signal collection while maintaining adequate optical sectioning. Images were acquired as 256 × 64 pixel frames (two 12-bit channels) using bidirectional resonant scanning at a frame rate of ~118 frames/s; the scan zoom in NIS-Elements was adjusted so that this field of view encompassed the entire neuromuscular junction and was kept constant across experiments. In ratiometric recordings, the 488-nm (GCaMP) and 561-nm (mScarlet) channels were acquired in a sequential dual-channel mode using the same bidirectional resonant scan settings: for each time point, a frame was first collected in the green channel and then immediately in the red channel, introducing a small, fixed frame-to-frame temporal offset while preserving matched spatial sampling of the two channels.

      Directly measuring the absolute laser power at the specimen plane (and thus reporting illumination intensity in mW/mm²) is technically challenging on this resonant-scanning system, because it would require inserting a power sensor into the beam path and perturbing the optical alignment; consequently, we are unable to provide reliable absolute mW/mm² values. Instead, we now report all relevant acquisition parameters (objective, numerical aperture, refractive index, pinhole size, scan format, frame rate, and fixed laser/detector settings) and note that laser powers were kept constant within each experimental series and chosen to minimize bleaching and phototoxicity while maintaining an adequate signal-to-noise ratio. We have now added the details requested in the revised Methods section (lines 512-535), including information about the filter sets, acquisition settings, and typical illumination intensity.

      (2) Please clarify what the thin versus thick traces represent in Figures 3D, 3F, 4C, and 4E. Are the thin traces individual trials from the same experiment, or from different experiments/animals? Does the thick trace represent the mean/median across those trials, a fitted curve, or a representative example?

      We apologize this was not more clear in the original submission. Thin traces are individual stimulus-evoked trials (“sweeps”) acquired sequentially from the same muscle/NMJ in a single preparation; the panel is shown as a representative example of recordings collected across animals. The thick colored trace is the trialaveraged waveform (arithmetic mean) of those thin traces after alignment to stimulus onset and baseline subtraction (no additional smoothing beyond what is stated in Methods). The thick black curve over the decay phase is a single-exponential fit used to estimate τ. Specifically, we fit the decay segment by linear regression on the natural-log–transformed baseline-subtracted signal, which is equivalent to fitting y = y<sub>peak</sub>·e<sup>−t/τdecay</sup> over the decay window (revised Fig.4D and Fig.5C legends).

      (3) Please clarify what the reported sample size (n) represents. Does it indicate the number of experimental repeats, the number of boutons or PSDs, or the number of animals?

      Again, we apologize this was not clear. (n) refers to the number of animals (biological replicates), which is reported in Supplementary Table 1. All imaging was performed at muscle 6, abdominal segment A3. Per preparation, we imaged 1-2 NMJs in total, with each imaging targeting 2–3 terminal boutons at the target NMJ and acquired 2–3 imaging stacks choosing different terminal boutons per NMJ. For the standard stimulation protocol, we delivered 1 Hz stimulation for 1ms and captured 14 stimuli in a 15s time series imaging (lines 730-736).

      Reviewer #3

      Genetically encoded calcium indicators (GECIs) are essential tools in neurobiology and physiology. Technological constraints in targeting and kinetics of previous versions of GECIs have limited their application at the subcellular level. Chen et al. present a set of novel tools that overcome many of these limitations. Through systematic testing in the Drosophila NMJ, they demonstrate improved targeting of GCaMP variants to synaptic compartments and report enhanced brightness and temporal fidelity using members of the GCaMP8 series. These advancements are likely to facilitate more precise investigation of synaptic physiology.

      This is a comprehensive and detailed manuscript that introduces and validates new GECI tools optimized for the study of neurotransmission and neuronal excitability. These tools are likely to be highly impactful across neuroscience subfields. The authors are commended for publicly sharing their imaging software.

      This manuscript could be improved by further testing the GECIs across physiologically relevant ranges of activity, including at high frequency and over long imaging sessions. The authors provide a custom software package (CaFire) for Ca2+ imaging analysis; however, to improve clarity and utility for future users, we recommend providing references to existing Ca2+ imaging tools for context and elaborating on some conceptual and methodological aspects, with more guidance for broader usability. These enhancements would strengthen this already strong manuscript.

      We thank the Reviewer for their overall positive evaluation and comments. 

      Major comments:

      (1) Evaluation of the performance of new GECI variants using physiologically relevant stimuli and frequency. The authors took initial steps towards this goal, but it would be helpful to determine the performance of the different GECIs at higher electrical stimulation frequencies (at least as high as 20 Hz) and for longer (10 seconds) (Newman et al, 2017). This will help scientists choose the right GECI for studies testing the reliability of synaptic transmission, which generally requires prolonged highfrequency stimulation.

      We appreciate this point by the reviewer and agree it would be of interest to evaluate sensor performance with higher frequency stimulation and for a longer duration. In response, we performed a variety of stimulation protocols at high intensities and times, but found the data to be difficult to separate individual responses given the decay kinetics of all calcium sensors. Hence, we elected not to include these in the revised manuscript. However, we have now included an evaluation of the sensors with 20 Hz electrical stimulation for ~1 sec using a direct comparison of Scar8f with OGB-1. These data are now presented in a new Fig. 3D,E and discussed in the manuscript (lines 396-403).

      (2) CaFire.

      The authors mention, in line 182: 'Current approaches to analyze synaptic Ca2+ imaging data either repurpose software designed to analyze electrophysiological data or use custom software developed by groups for their own specific needs.' References should be provided. CaImAn comes to mind (Giovannucci et al., 2019, eLife), but we think there are other software programs aimed at analyzing Ca2+ imaging data that would permit such analysis.

      Thank you for the thoughtful question. At this stage, we’re unable to provide a direct comparison with existing analysis workflows. In surveying prior studies that analyze Drosophila NMJ Ca²⁺ imaging traces, we found that most groups preprocess images in Fiji/ImageJ and then rely on their own custom-made MATLAB or Python scripts for downstream analysis (see Blum et al. 2021; Xing and Wu 2018). Because these pipelines vary widely across labs, a standardized head-to-head evaluation isn’t currently feasible. With CaFire, our goal is to offer a simple, accessible tool that does not require coding experience and minimizes variability introduced by custom scripts. We designed CaFire to lower the barrier to entry, promote reproducibility, and make quantal event analysis more consistent across users. We have added references to the sentence mentioned above.

      Regarding existing software that the reviewer mentioned – CaImAn (Giovannucci et al. 2019): We evaluated CaImAn, which is a powerful framework designed for large-scale, multicellular calcium imaging (e.g., motion correction, denoising, and automated cell/ROI extraction). However, it is not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. Achieving this level of granularity would typically require additional custom Python scripting and parameter tuning within CaImAn’s code-centric interface. This runs counter to CaFire’s design goals of a nocode, task-focused workflow that enables users to analyze miniature events quickly and consistently without specialized programming expertise.

      Regarding Igor Pro (WaveMetrics), (Müller et al. 2012): Igor Pro is another platform that can be used to analyze calcium imaging signals. However, it is commercial (paid) software and generally requires substantial custom scripting to fit the specific analyses we need. In practice, it does not offer a simple, open-source, point-and-click path to per-event kinetic quantification, which is what CaFire is designed to provide.

      The authors should be commended for making their software publicly available, but there are some questions:

      How does CaFire compare to existing tools?

      As mentioned above, we have not been able to adapt the custom scripts used by various labs for our purposes, including software developed in MatLab (Blum et al. 2021), Python (Xing and Wu 2018), and Igor (Müller et al. 2012). Some in the field do use semi-publically available software, including Nikon Elements (Chen and Huang 2017) and CaImAn (Giovannucci et al. 2019). However, these platforms are not optimized for the per-event kinetics central to our project - such as extracting rise and decay times for individual quantal events at single synapses. We have added more details about CaFire, mainly focusing on the workflow and measurements, highlighting the superiority of CaFire, showing that CaFire provides a no-code, standardized pipeline with automated miniature-event detection and per-event metrics (e.g., amplitude, rise time τ, decay time τ), optional ΔR/R support, and auto-partition feature. Collectively, these features make CaFire simpler to operate without programming expertise, more transparent and reproducible across users, and better aligned with the event-level kinetics required for this project.

      Very few details about the Huygens deconvolution algorithms and input settings were provided in the methods or text (outside of MLE algorithm used in STED images, which was not Ca2+ imaging). Was it blind deconvolution? Did the team distill the point-spread function for the fluorophores? Were both channels processed for ratiometric imaging? Were the same settings used for each channel? Importantly, please include SVI Huygens in the 'Software and Algorithms' Section of the methods.

      We thank the reviewer for raising this important point. We have now expanded the Methods to describe our use of Huygens in more detail and have added SVI Huygens Professional (Scientific Volume Imaging, Hilversum, The Netherlands) to the “Software and Algorithms” section. For Ca²⁺ imaging data, time-lapse stacks were processed in the Huygens Deconvolution Wizard using the standard estimation algorithm (CMLE). This is not a blind deconvolution procedure. Instead, Huygens computes a theoretical point-spread function (PSF) from the full acquisition metadata (objective NA, refractive index, voxel size/sampling, pinhole, excitation/emission wavelengths, etc.); if refractive index values are provided and there is a mismatch, the PSF is adjusted to account for spherical aberration. We did not experimentally distill PSFs from bead measurements, as Huygens’ theoretical PSFs are sufficient for our data.

      Both green (GCaMP) and red (mScarlet) channels were processed for ratiometric imaging using the same workflow (stabilization, optional bleaching correction, and deconvolution within Huygens). For each channel, the PSF, background, and SNR were estimated automatically by the same built-in algorithms, so the underlying procedures were identical even though the numerical values differ between channels because of their distinct wavelengths and noise characteristics. Importantly, Huygens normalizes each PSF to unit total intensity, such that the deconvolution itself does not add or remove signal and therefore preserves intensity ratios between channels; only background subtraction and bleaching correction can change absolute fluorescence values. For the mScarlet channel, where we observed modest bleaching (~1.10 over 15 sec), we applied Huygens’ bleaching correction and visually verified that similar structures maintained comparable intensities after correction. For presynaptic GCaMP signals, bleaching over these short recordings was negligible, so we omitted the bleaching-correction step to avoid introducing multiplicative artifacts. This workflow ensures that ratiometric ΔR/R measurements are based on consistently processed, intensity-conserving deconvolved images in both channels.

      The number of deconvolution iterations could have had an effect when comparing GCAMP series; please provide an average number of iterations used for at least one experiment. For example, Figure 3, Syt::GCAMP6s, Scar8f & Scar8m, and, if applicable, the maximum number of permissible iterations.

      We thank the reviewer for this comment. For all Ca²⁺ imaging datasets, deconvolution in Huygens was performed using the recommended default settings of the CMLE algorithm with a maximum of 30 iterations. The stopping criterion was left at the Huygens default, so the algorithm either converged earlier or, if convergence was not reached, terminated at this 30-iteration limit. No other iteration settings were used across the GCaMP series (lines 555-559).

      Please clarify if the 'Express' settings in Huygens changed algorithms or shifted input parameters.

      We appreciate the reviewer’s question regarding the Huygens “Express” settings. For clarity, we note that all Ca²⁺ imaging data reported in this manuscript were deconvolved using the “Deconvolution Wizard”, not the “Deconvolution Express” mode. In the Wizard, we explicitly selected the CMLE algorithm (or GMLE in a few STED-related cases as recommended by SVI), using the recommended maximum of 30 iterations, and other recommended settings while allowing Huygens to auto-estimate background and SNR for each channel.Bleaching correction was toggled manually per channel (applied to mScarlet when bleaching was evident, omitted for GCaMP when bleaching was negligible), as described in the revised Methods (lines 553-559).

      By contrast, the Deconvolution Express tool in Huygens is a fully automated front-end that can internally adjust both the choice of deconvolution algorithm (e.g., CMLE vs. GMLE/QMLE) and key input parameters such as SNR, number of iterations, and quality threshold based on the selected “smart profile” and the image metadata. In preliminary tests on our datasets, Express sometimes produced results that were either overly smoothed or showed subtle artifacts, so we did not use it for any data included in this study. Instead, we relied exclusively on the Wizard with explicitly controlled settings to ensure consistency and transparency across all GCaMP series and ratiometric analyses.

      We suggest including a sample data set, perhaps in Excel, so that future users can beta test on and organize their data in a similar fashion.

      We agree that this would be useful, a point shared by R1 above. In response, we have added a sample data set to the GitHub site and included sample ImageJ data along with screenshots to explain the analysis in more detail. These improvements are discussed in the manuscript (lines 705-708).

      (3) While the challenges of AZ imaging are mentioned, it is not discussed how the authors tackled each one. What is defined as an active zone? Active zones are usually identified under electron microscopy. Arguably, the limitation of GCaMP-based sensors targeted to individual AZs, being unable to resolve local Ca2+ changes at individual boutons reliably, might be incorrect. This could be a limitation of the optical setup being used here. Please discuss further. What sensor performance do we need to achieve this performance level, and/or what optical setup would we need to resolve such signals?

      We appreciate the reviewer’s thoughtful comments and agree that the technical challenges of active zone (AZ) Ca²⁺ imaging merit further clarification. We defined AZs, as is the convention in our field, as individual BRP puncta at NMJs. These BRP puncta co-colocalize with individual puncta of other AZ components, including CAC, RBP, Unc13, etc. ROIs were drawn tightly over individual BRP puncta and only clearly separable spots were included.

      To tackle the specific obstacles of AZ imaging (small signal volume, high AZ density, and limited photon budget at high frame rates), we implemented both improved sensors and optimized analysis (Fig. 6). First, we introduced a ratiometric AZ-targeted indicator, BRP::mScarlet3::GCaMP8m (Bar8m), and computed ΔR/R with ΔR/R with R(t)=F<sub>GCaMP8m</sub>/F<sub>mScarlet3</sub>. ROIs were drawn over individual AZs (Fig. 6B). Under our standard resonant area-scan conditions (~118 fps), Bar8m produces robust ΔR/R transients at individual AZs (example peaks ≈ 3.28; τ<sub>rise</sub>≈9.0 ms; Fig. 6C, middle), indicating that single-AZ signals can be detected reproducibly when AZs are optically resolvable.

      Second, we increased temporal resolution using high-speed Galvano line-scan imaging (~1058 fps), which markedly sharpened the apparent kinetics (τ<sub>rise</sub>≈3.23 ms) and revealed greater between-AZ variability (Fig. 6C, right; 6D–E). Population analyses show that line scans yield much faster rise times than area scans (Fig. 6D) and a dramatically higher fraction of significantly different AZ pairs (8.28% and 4.14% in 8f and 8m areascan vs 78.62% in 8m line-scan, lines 721-725), uncovering pronounced AZ-to-AZ heterogeneity in Ca²⁺ signals. Together, these revisions demonstrate that under our current confocal configuration, AZ-targeted GCaMP8m can indeed resolve local Ca²⁺ changes at individual, optically isolated boutons.

      We have revised the Discussion to clarify that our original statement about the limitations of AZ-targeted GCaMPs refers specifically to this combination of sensor and optical setup, rather than an absolute limitation of AZ-level Ca²⁺ imaging. In our view, further improvements in baseline brightness and dynamic range (ΔF/F or ΔR/R per action potential), combined with sub-millisecond kinetics and minimal buffering, together with optical configurations that provide smaller effective PSFs and higher photon collection (e.g., higher-NA objectives, optimized 2-photon or fast line-scan modalities, and potentially super-resolution approaches applied to AZ-localized indicators), are likely to be required to achieve routine, high-fidelity Ca²⁺ measurements at every individual AZ within a neuromuscular junction.

      (4) In Figure 5: Only GCAMP8f (Bar8f fusion protein) is tested here. Consider including testing with GCAMP8m. This is particularly relevant given that GCAMP8m was a more successful GECI for subcellular post-synaptic imaging in Figure 6.

      We appreciate this point and request by Reviewer 3. The main limitation for detecting local calcium changes at AZs is the speed of the calcium sensor, and hence we used the fastest available (GCaMP8f) to test the Bar8f sensor. While replacing GCaMP8f with GCaMP8m would indeed be predicted to enhance sensitivity (SNR), since GCaMP8m does not have faster kinetics relative to GCaMP8f, it is unlikely to be a more successful GECI for visualizing local calcium differences at AZs. 

      That being said, we agree that the Bar8m tool, including the improved mScarlet3 indicator, would likely be of interest and use to the field. Fortunately, we had engineered the Bar8m sensor while this manuscript was in review, and just recently received transgenic flies. We have evaluated this sensor, as requested by the reviewer, and included our findings in Fig. 1 and 6. In short, while the sensitivity is indeed enhanced in Bar8m compared to Bar8f, the kinetics remain insufficient to capture local AZ signals. These findings are discussed in the revised manuscript (lines 424-442, 719-730), and we appreciate the reviewer for raising these important points.

      In earlier experiments, Bar8f yielded relatively weak fluorescence, so we traded frame rate for image quality during resonant area scans (~60 fps). After switching to Bar8m, the signal was bright enough to restore our standard 118 fps area-scan setting. Nevertheless, even with dual-channel resonant area scans and ratiometric (GCaMP/mScarlet) analysis, AZ-to-AZ heterogeneity remained difficult to resolve. Because Ca²⁺ influx at individual active zones evolves on sub-millisecond timescales, we adopted a high-speed singlechannel Galvano line-scan (~1 kHz) to capture these rapid transients. We first acquired a brief area image to localize AZ puncta, then positioned the line-scan ROI through the center of the selected AZ. This configuration provided the temporal resolution needed to uncover heterogeneity that was under-sampled in area-scan data. Consistent with this, Bar8m line-scan data showed markedly higher AZ heterogeneity (significant AZ-pair rate ~79%, vs. ~8% for Bar8f area scans and ~4% for Bar8m area scans), highlighting Bar8m’s suitability for quantifying AZ diversity. We have updated the text, Methods, and figure legend accordingly (tell reviewer where to find everything).

      (5) Figure 5D and associated datasets: Why was Interquartile Range (IQR) testing used instead of ZScoring? Generally, IQR is used when the data is heavily skewed or is not normally distributed. Normality was tested using the D'Agostino & Pearson omnibus normality test and found that normality was not violated. Please explain your reasoning for the approach in statistical testing. Correlation coefficients in Figures 5 E & F should also be reported on the graph, not just the table. In Supplementary Table 1. The sub-table between 4D-F and 5E-F, which describes the IQR, should be labeled as such and contain identifiers in the rows describing which quartile is described. The table description should be below. We would recommend a brief table description for each sub-table.

      Thank you for this helpful suggestion. We have updated the analysis in two complementary ways. First, we now perform paired two-tailed t-tests between every two AZs within the same preparation (pairwise AZ–AZ comparisons of peak responses). At α<0.05, the fraction of significant AZ pairs is ~79% for Bar8m line-scan data versus ~8% for Bar8f area-scan data, indicating markedly greater AZ-to-AZ diversity when measured at high temporal resolution. Second, for visually marking the outlying AZs, we re-computed the IQR (Q1–Q3) based on the individual values collected from each AZs(15 data points per AZ, 30 AZs for each genotype), and marked AZs whose mean response falls above Q3 or below Q1; IQR is used here solely as a robust dispersion reference rather than for hypothesis testing. Both analyses support the same observation: Bar8m line-scan data reveal substantially higher AZ heterogeneity than Bar8f and Bar8m area-scan data. We have revised the Methods, figure panels, and legends accordingly (t-test details; explicit “IQR (Q1–Q3)” labeling; significant AZ-pair rates reported on the plots) (lines 719-730).

      (6) Figure 6 and associated data. The authors mention: ' SynapGCaMP quantal signals appeared to qualitatively reflect the same events measured with electrophysiological recordings (Fig. 6D).' If that was the case, shouldn't the ephys and optical signal show some sort of correlation? The data presented in Figure 6D show no such correlation. Where do these signals come from? It is important to show the ROIs on a reference image.

      We apologize this was not clear, as similar points were raised by R1 and R2. We were just showing separate (uncorrelated) sample traces of electrophysiological and calcium imaging data. Given how confusing this presentation turned out to be, and the fact that we show the correlated ephys and calcium imaging events in Fig. 7, we have elected to remove the uncorrelated electrophysiological events in Fig. 6 to just focus on the calcium imaging events (now Figures 7 and 8).

      Figure 7B: Were Ca2+ transients not associated with mEPSPs ever detected? What is the rate of such events?

      This is an astute question. Yes indeed, during simultaneous calcium imaging and current clamp electrophysiology recordings, we occasionally observed GCaMP transients without a detectable mEPSP in the electrophysiological trace. This may reflect the detection limit of electrophysiology for very small minis; with our noise level and the technical limitation of the recording rig, events < ~0.2 mV cannot be reliably detected, whereas the optical signal from the same quantal event might still be detected. The fraction of calcium-only events was ~1–10% of all optical miniature events, depending on genotype (higher in lines with smaller average minis). These calcium-only detections were low-amplitude and clustered near the optical threshold (lines 361-365).

      Minor comments

      (1) It should be mentioned in the text or figure legend whether images in Figure 1 were deconvolved, particularly since image pre-processing is only discussed in Figure 2 and after.

      We thank the reviewer for pointing this out. Yes, the confocal images shown in Figure 1 were also deconvolved in Huygens using the CMLE-based workflow described in the revised Methods. We applied deconvolution to improve contrast, reduce out-of-focus blur, and better resolve the morphology of presynaptic boutons, active zones, and postsynaptic structures, so that the localization of each sensor is more clearly visualized. We have now explicitly stated in the Fig. 1 legend and Methods (lines 575-577) that these images were deconvolved prior to display. 

      (2) The abbreviation, SNR, signal-to-noise ratio, is not defined in the text.

      We have corrected this error and thank the reviewer for pointing this out.

      (3) Please comment on the availability of fly stocks and molecular constructs.

      We have clarified that all fly stocks and molecular constructs will be shared upon request (lines 747-750). We are also in the process of depositing the new Scar8f/m, Bar8f/m, and SynapGCaMP sensors to the Bloomington Drosophila Stock Center for public dissemination.

      (4) Please add detection wavelengths and filter cube information for live imaging experiments for both confocal and widefield.

      We thank the reviewer for this helpful suggestion. We have now added the detection wavelengths and filter cube configurations for both confocal and widefield live imaging to the Methods.

      For confocal imaging, GCaMP signals were acquired on a Nikon A1R system using the FITC/GFP channel (488-nm laser excitation; emission collected with a 525/50-nm band-pass filter), and mScarlet signals were acquired using the TRITC/mCherry channel (561-nm laser excitation; emission collected with a 595/50-nm band-pass filter). Both channels were detected with GaAsP detectors under the same pinhole and scan settings described above (lines 512-517).

      For widefield imaging, GCaMP was recorded using a GFP filter cube (LED excitation ~470/40 nm; emission ~525/50 nm), which is now explicitly described in the revised Methods section (lines 632-633).

      (5) Please include a mini frequency analysis in Supplemental Figure S1.

      We apologize for not including this information in the original submission. This is now included in the Supplemental Figure S1.

      (6) In Figure S1B, consider flipping the order of EPSP (currently middle) and mEPSP (currently left), to easily guide the reader through the quantification of Figure S1A (EPSPs, top traces & mEPSPs, bottom traces).

      We agree these modifications would improve readability and clarity. We have now re-ordered the electrophysiological quantifications in Fig. S1B as requested by the reviewer.

      (7) Figure 6C: Consider labeling with sensor name instead of GFP.

      We agree here as well, and have removed “GFP” and instead added the GCaMP variant to the heatmap in Fig. 7C.

      (8) Figure 6E, 7B, 7E: Main statistical differences highlighting sensor performance should be represented on the figures for clarity.

      We did not show these differences in the original submission in an effort to keep the figures “clean” and for clarity, putting the detailed statistical significance in Table S1. However, we agree with the reviewer that it would be easier to see these in the Fig. 6E and 7B,E graphs. This information has now been added the Figs. 7 and 8.

      (9) Please report if the significance tested between the ephys mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-) is the same as for Ca2+ mini (WT vs IIB-/-, WT vs IIA-/-, IIB-/- vs IIA-/-). These should also exhibit a very high correlation (mEPSP (mV) vs Ca2+ mini deltaF/F). These tests would significantly strengthen the final statement of "SynapGCaMP8m can capture physiologically relevant differences in quantal events with similar sensitivity as electrophysiology."

      We agree that adding the more detailed statistical analysis requested by the reviewer would strengthen the evidence for the resolution of quantal calcium imaging using SynapGCaMP8m. We have included the statistical significance between the ephys and calcium minis in Fig. 8 and included the following in the revised methods (lines 358-361), the Fig. 8 legend and Table S1:

      Using two-sample Kolmogorov–Smirnov (K–S) tests, we found that SynapGCaMP8m Ca²⁺ minis (ΔF/F, Fig. 8E) differ significantly across all genotype pairs (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>; all p < 0.0001). The genotype rank order of the group means (±SEM) is IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.967 ± 0.036; 0.713 ± 0.021; 0.427 ± 0.017; n=69, 65, 59). For electrophysiological minis (mEPSP amplitude, Fig. 8F), K–S tests likewise show significant differences for the same comparisons (all p < 0.0001) with D statistics of 0.1854, 0.3647, and 0.4043 (WT vs IIB<sup>-/-</sup>, WT vs IIA<sup>-/-</sup>, IIB<sup>-/-</sup> vs IIA<sup>-/-</sup>, respectively). Group means (±SEM) again follow IIB<sup>-/-</sup> > WT > IIA<sup>-/-</sup> (0.824 ± 0.017 mV; 0.636 ± 0.015 mV; 0.383 ± 0.007 mV; n=41 each). These K–S results demonstrate identical significance and rank order across modalities, supporting our conclusion that SynapGCaMP8m resolves physiologically relevant quantal differences with sensitivity comparable to electrophysiology.

      References

      Blum, Ian D., Mehmet F. Keleş, El-Sayed Baz, Emily Han, Kristen Park, Skylar Luu, Habon Issa, Matt Brown, Margaret C. W. Ho, Masashi Tabuchi, Sha Liu, and Mark N. Wu. 2021. 'Astroglial Calcium Signaling Encodes Sleep Need in Drosophila', Current Biology, 31: 150-62.e7.

      Chen, Y., and L. M. Huang. 2017. 'A simple and fast method to image calcium activity of neurons from intact dorsal root ganglia using fluorescent chemical Ca(2+) indicators', Mol Pain, 13: 1744806917748051.

      Giovannucci, Andrea, Johannes Friedrich, Pat Gunn, Jérémie Kalfon, Brandon L. Brown, Sue Ann Koay, Jiannis Taxidis, Farzaneh Najafi, Jeffrey L. Gauthier, Pengcheng Zhou, Baljit S. Khakh, David W. Tank, Dmitri B. Chklovskii, and Eftychios A. Pnevmatikakis. 2019. 'CaImAn an open source tool for scalable calcium imaging data analysis', eLife, 8: e38173.

      Müller, M., K. S. Liu, S. J. Sigrist, and G. W. Davis. 2012. 'RIM controls homeostatic plasticity through modulation of the readily-releasable vesicle pool', J Neurosci, 32: 16574-85.

      Wu, Yifan, Keimpe Wierda, Katlijn Vints, Yu-Chun Huang, Valerie Uytterhoeven, Sahil Loomba, Fran Laenen, Marieke Hoekstra, Miranda C. Dyson, Sheng Huang, Chengji Piao, Jiawen Chen, Sambashiva Banala, Chien-Chun Chen, El-Sayed Baz, Luke Lavis, Dion Dickman, Natalia V. Gounko, Stephan Sigrist, Patrik Verstreken, and Sha Liu. 2025. 'Presynaptic Release Probability Determines the Need for Sleep', bioRxiv: 2025.10.16.682770.

      Xing, Xiaomin, and Chun-Fang Wu. 2018. 'Unraveling Synaptic GCaMP Signals: Differential Excitability and Clearance Mechanisms Underlying Distinct Ca<sup>2+</sup> Dynamics in Tonic and Phasic Excitatory, and Aminergic Modulatory Motor Terminals in Drosophila', eneuro, 5: ENEURO.0362-17.2018.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a system for delivering precisely controlled cutaneous stimuli to freely moving mice by coupling markerless real-time tracking to transdermal optogenetic stimulation, using the tracking signal to direct a laser via galvanometer mirrors. The principal claims are that the system achieves sub-mm targeting accuracy with a latency of <100 ms. The nature of mouse gait enables accurate targeting of forepaws even when mice are moving.

      Strengths:

      The study is of high quality and the evidence for the claims is convincing. There is increasing focus in neurobiology in studying neural function in freely moving animals, engaged in natural behaviour. However, a substantial challenge is how to deliver controlled stimuli to sense organs under such conditions. The system presented here constitutes notable progress towards such experiments in the somatosensory system and is, in my view, a highly significant development that will be of interest to a broad readership.

      Weaknesses:

      (1) "laser spot size was set to 2.00 } 0.08 mm2 diameter (coefficient of variation = 3.85)" is unclear. Is the 0.08 SD or SEM? (not stated). Also, is this systematic variation across the arena (or something else)? Readers will want to know how much the spot size varies across the arena - ie SD. CV=4 implies that SD~7 mm. ie non-trivial variation in spot size, implying substantial differences in power delivery (and hence stimulus intensity) when the mouse is in different locations. If I misunderstood, perhaps this helps the authors to clarify. Similarly, it would be informative to have mean & SD (or mean & CV) for power and power density. In future refinements of the system, would it be possible/useful to vary laser power according to arena location?

      We thank the reviewer for their comments and for identifying areas needing more clarity. The previous version was ambiguous: 0.08 refers to the standard deviation (SD). We have removed the ambiguity by stating mean ± SD and reporting a unitless coefficient of variation (CV).

      The revised text reads “laser spot size was set to 2.00 ± 0.08 mm<sup>2</sup> (mean ± SD; coefficient of variation = 0.039).” This makes clear that the variability in spot size is minimal: it is 0.08 mm<sup>2</sup> SD (≈0.03 mm SD in diameter). This should help clarify that spot size variability across the arena is minute and unlikely to contribute meaningfully to differences in stimulus intensity across locations. The power was modulated depending on the experiment, so we provide the unitless CV here in “The absolute optical power and power density were uniform across the glass platform (coefficient of variation 0.035 and 0.029, respectively; Figure 2—figure supplement)”. We are grateful to the reviewer for spotting these omissions.

      The reviewer also asks whether, in the future, it is “possible/useful to vary laser power according to arena location”. This is already possible in our system for infrared cutaneous stimulation using analog modulation (Figure 4). We have added the following sentence to make this clearer: “Laser power could be modulated using the analog control.”

      (2) "The video resolution (1920 x 1200) required a processing time higher than the frame interval (33.33 ms), resulting in real-time pose estimation on a sub-sample of all frames recorded". Given this, how was it possible to achieve 84 ms latency? An important issue for closed-loop research will relate to such delays. Therefore please explain in more depth and (in Discussion) comment on how the latency of the current system might be improved/generalised. For example, although the current system works well for paws it would seem to be less suited to body parts such as the snout that do not naturally have a stationary period during the gait cycle.

      We captured and stored video with a frame-to-frame interval of 33.33 ms (30 fps). DeepLabCut-live! was run in a latency-optimization mode, meaning that new frames are not processed while the network is busy - only the most recent frame is processed when free. The processing latency is measured per processed frame, and intermediate frames are thus skipped while the network is busy. Although a wide field of view and high resolution is required to capture the large environment, increasing the per-frame compute time, the processing latency remained small enough to track and stimulate moving mice. This processing latency of 84 ± 12 ms (mean ± SD) was calculated using the timestamps stored in the output files from DeepLabCut-live!: subtracting the frame acquisition timestamp from the frame processing timestamp across 16,000 processed frames recorded across four mice (4,000 each). In addition, there is a small delay to move the galvanometers and trigger the laser, calculated as 3.3 ± 0.5 ms (mean ± SD; 245 trials). This is described in the manuscript, but can be combined with the processing latency to indicate a total closed-loop delay of ≈87 ms so we have expanded on the ‘Optical system characterization’ subsection in the Methods, adding “We estimated a processing latency of 84 ± 12 ms (mean ± SD) by subtracting…” and that “In the current configuration the end-to-end closed-loop delay is ≈87 ms from the combination of the processing latency and other delays”. To the Discussion, we now comment on how this latency can be reduced and how this can allow for generalization to more rapidly moving body parts.

      Reviewer #2 (Public review):

      Parkes et al. combined real-time keypoint tracking with transdermal activation of sensory neurons to examine the effects of recruitment of sensory neurons in freely moving mice. This builds on the authors' previous investigations involving transdermal stimulation of sensory neurons in stationary mice. They illustrate multiple scenarios in which their engineering improvements enable more sophisticated behavioral assessments, including (1) stimulation of animals in multiple states in large arenas, (2) multi-animal nociceptive behavior screening through thermal and optogenetic activation, and (3) stimulation of animals running through maze corridors. Overall, the experiments and the methodology, in particular, are written clearly. However, there are multiple concerns and opportunities to fully describe their newfound capabilities that, if addressed, would make it more likely for the community to adopt this methodology:

      The characterization of laser spot size and power density is reported as a coefficient of variation, in which a value of ~3 is interpreted as uniform. My interpretation would differ - data spread so that the standard deviation is three times larger than the mean indicates there is substantial variability in the data. The 2D polynomial fit is shown in Figure 2 - Figure Supplement 1A and, if the fit is good, this does support the uniformity claim (range of spot size is 1.97 to 2.08 mm2 and range of power densities is 66.60 to 73.80 mW). The inclusion of the raw data for these measurements and an estimate of the goodness of fit to the polynomials would better help the reader evaluate whether these parameters are uniform across space and how stable the power density is across repeated stimulations of the same location. Even more helpful would be an estimate of whether the variation in the power density is expected to meaningfully affect the responses of ChR2-expressing sensory neurons.

      We thank the reviewer for their comments. As also noted in response to Reviewer 1, the coefficient of variation (CV) is now reported in unitless form (rather than a percentage) to ensure clarity. For avoidance of doubt, the CV is 0.039 (3.9%), so the variation in laser spot size is minimal – there is negligible spot size variability across the system. The ranges are indeed consistent with uniformity. We have included the goodness-of-fit estimates in the appropriate figure legend “fit with a two-dimensional polynomial (area R<sup>2</sup> = 0.91; power R<sup>2</sup> = 0.75)”. This indicates that the polynomials fit well overall.

      The system already allows for control of spot size. To examine whether the variation in the power density affects the responses of ChR2-expressing sensory neurons, we examined this in our previous work that focused more on input-output relationships, demonstrating a steep relationship between spot size (range of 0.02 mm<sup>2</sup> to 2.30 mm<sup>2</sup>) and the probability of paw response, demonstrating a meaningful change in response probability (Schorscher-Petcu et al. eLife, 2021). In future studies, we aim to use this approach to “titrate” cutaneous inputs as mice move through their environments.

      While the error between the keypoint and laser spot error was reported as ~0.7 to 0.8 mm MAE in Figure 2L, in the methods, the authors report that there is an additional error between predicted keypoints and ground-truth labeling of 1.36 mm MAE during real-time tracking. This suggests that the overall error is not submillimeter, as claimed by the authors, but rather on the order of 1.5 - 2.5 mm, which is considerable given the width of a hind paw is ~5-6 mm and fore paws are even smaller. In my opinion, the claim for submillimeter precision should be softened and the authors should consider that the area of the paw stimulated may differ from trial to trial if, for example, the error is substantial enough that the spot overlaps with the edge of the paw.

      We thank the reviewer for identifying a discrepancy in these reported errors. We clarify this below and in the manuscript

      The real-time tracking error is the mean absolute Euclidean distance (MAE) between ground truth and DLC on the left hind paw where likelihood was relatively high. More specifically, ground truth was obtained by manual annotation of the left hind paw center. The corresponding DLC keypoint was evaluated in frames with likelihood >0.8 (the stimulation threshold). Across 1,281 frames from five videos of freely exploring mice (30 fps), the MAE was 1.36 mm.

      The targeting error is the MAE between ground truth and the laser spot location, so should reflect the real-time tracking error plus errors from targeting the laser. More specifically, this metric was determined by comparing the manually determined ground truth keypoint of the left hind paw and the actual center of the laser spot. Importantly, this metric was calculated using four five-minute high-speed videos recorded at 270 fps of mice freely exploring the open arena (463 frames) and frames were selected with a likelihood threshold >0.8. This allowed us to resolve the brief laser pulses but inadvertently introduced a difference in spatial scaling. After rescaling, the values give a targeting error MAE now in line with the real-time tracking error  (see corrected Figure 2L). This is approximately 1.3 mm across all locomotion speeds categories. These errors are small and are limited by the spatial resolution of the cameras. We thank the reviewer for noting this discrepancy and prompting us to get to its root cause.

      We have amended the subtitle on Figure 2L as “Ground truth keypoint to laser spot error” and have avoided the use of submillimeter throughout. We have added the following sentence to clarify this point: “As laser targeting relies on real-time tracking to direct the laser to the specified body part, this metric includes any errors introduced by tracking and targeting”.

      As the major advance of this paper is the ability to stimulate animals during ongoing movement, it seems that the Figure 3 experiment misses an opportunity to evaluate state-dependent whole-body reactions to nociceptor activation. How does the behavioral response relate to the animal's activity just prior to stimulation?

      The reviewers suggest analysis of state-dependent responses. In the Figure 3 experiment, mice were stimulated up to five times when stationary. Analysis of whole body reactions in stationary mice has been described in (Schorscher-Petcu et al. eLife, 2021) and doing this here would be redundant, so instead we now analyse the responses of moving mice in Figure 5. This new analysis shows robust state-dependent responses during movement as suggested by the reviewer. We find two behavioral clusters: one that is for faster, direct (coherent) movement and the other that is for slower assessment (incoherent) movement. Stimulation during the former results in robust and consistent slowing and shift towards assessment, whereas stimulation during the former results in a reduction in assessment. We describe and interpret these new data in the Results and Discussion sections and add information in the Methods and Figure legend, as given below. We believe that demonstrating movement statedependence is a valuable addition to the paper and thank the reviewer for suggesting this.

      Given the characterization of full-body responses to activation of TrpV1 sensory neurons in Figure 4 and in the authors' previous work, stimulation of TrpV1 sensory neurons has surprisingly subtle effects as the mice run through the alternating T maze. The authors indicate that the mice are moving quickly and thus that precise targeting is required, but no evidence is shared about the precision of targeting in this context beyond images of four trials. From the characterization in Figure 2, at max speed (reported at 241 +/- 53 mm/s, which is faster than the high speeds in Figure 2), successful targeting occurs less than 50% of the time. Is the initial characterization consistent with the accuracy in this context? To what extent does inaccuracy in targeting contribute to the subtlety of affecting trajectory coherence and speed? Is there a relationship between animal speed and disruption of the trajectory?

      We thank the reviewer for pointing out the discrepancy in the reported maximum speed. We have corrected the error in the main text: the average maximum speed is 142 ± 26 mm/s (four mice).

      The self-paced T-maze alternation task in Figure 5 demonstrates that mice running in a maze can be stimulated using this method. We did not optimize the particular experimental design to assess the hit accuracy, as this was determined in Figure 2. Instead, we optimized for the pulse frequencies, meaning the galvanometers tracked with processed frames but the laser was triggered whether or not the paw was actually targeted. However, even in this case with the system pulsing in the free-run mode, the laser hit rate was 54 ± 6% (mean ± sem, n = 7 mice). We have weakened references to submillimeter as it was only inferred from other experiments and was not directly measured here. We find in this experiment that stimulation in freely moving mice can cause them to briefly halt and evaluate. In the future, we will use experimental designs to more optimally examine learning.

      The reviewer also asks if there is a relationship between speed and disruption of the trajectory. We find that this is the case as described above with our additional analysis.

      Reviewer #3 (Public review):

      Summary:

      To explore the diverse nature of somatosensation, Parkes et al. established and characterized a system for precise cutaneous stimulation of mice as they walk and run in naturalistic settings. This paper provides a framework for real-time body part tracking and targeted optical stimuli with high precision, ensuring reliable and consistent cutaneous stimulation. It can be adapted in somatosensation labs as a general technique to explore somatosensory stimulation and its impact on behavior, enabling rigorous investigation of behaviors that were previously difficult or impossible to study.

      Strengths:

      The authors characterized the closed-loop system to ensure that it is optically precise and can precisely target moving mice. The integration of accurate and consistent optogenetic stimulation of the cutaneous afferents allows systematic investigation of somatosensory subtypes during a variety of naturalistic behaviors. Although this study focused on nociceptors innervating the skin (Trpv1::ChR2 animals), this setup can be extended to other cutaneous sensory neuron subtypes, such as low-threshold mechanoreceptors and pruriceptors. This system can also be adapted for studying more complex behaviors, such as the maze assay and goal-directed movements.

      Weaknesses:

      Although the paper has strengths, its weakness is that some behavioral outputs could be analyzed in more detail to reveal different types of responses to painful cutaneous stimuli. For example, paw withdrawals were detected after optogenetically stimulating the paw (Figures 3E and 3F). Animals exhibit different types of responses to painful stimuli on the hind paw in standard pain assays, such as paw lifting, biting, and flicking, each indicating a different level of pain. Improving the behavioral readouts from body part tracking would greatly strengthen this system by providing deeper insights into the role of somatosensation in naturalistic behaviors. Additionally, if the laser spot size could be reduced to a diameter of 2 mm², it would allow the activation of a smaller number of cutaneous afferents, or even a single one, across different skin types in the paw, such as glabrous or hairy skin.

      We thank the reviewer for highlighting how our system can be combined with improved readouts of coping behavior to provide deeper insights. Optogenetic and infrared cutaneous stimulation are well established generators of coping behaviors (lifting, flicking, licking, biting, guarding). Detection of these behaviors is an active and evolving field with progress being made regularly (e.g. Jones et al., eLife 2020 [PAWS];  Wotton et al., Mol Pain 2020; Zhang et al., Pain 2022; Oswell et al., bioRxiv 2024 [LUPE]; Barkai et al., Cell Reports Methods 2025 [BAREfoot], along with more general tools like Hsu et al., Nature Communications 2021 [B-SOiD]; Luxem et al., Communications Biology 2022 [VAME]; Weinreb et al,. Nature Methods 2024 [Keypoints-MoSeq]). One output of our system is bodypart keypoints, which are the typical input to many of these tools. We will leave the readers and users of the system to decide which tools are appropriate for their experimental designs - the focus of this current manuscript is describing the novel stimulation approach in moving animals.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) It is hard to see how the rig is arranged from the render of Figure 2AB due to the components being black on black. A particularly useful part of Fig2AB is the aerial view in panel B that shows the light paths. I suggest adding the labelling of Figure 2A also to that. The side/rear views could perhaps be deleted, allowing the aerial view to be larger.

      We appreciate this suggestion and have revised Figure 2B to improve the visibility of the optomechanical components. We have enlarged the side and aerial views, removed the rear view, and added further labels to the aerial view.

      (2) MAE - to interpret the 0.54 result, it would be useful to state the arena size in this paragraph.

      Thank you. We have added the arena size in this paragraph and also added scales in the relevant figure (Figure 2).

      (3) "pairwise correlations of R = 0.999 along both x- and y-axes". Is this correlation between hindpaw keypoint and galvo coordinates?

      Yes, we have added the following to clarify: “...between galvanometer coordinates and hind paw keypoints”

      (4) Latency was 84 ms. Is this mainly/entirely the delay between DLC receiving the camera image and outputting key point coordinates?

      Yes, we hope that the additional detail in the Methods and Discussion described above will now clarify the current closed-loop latencies.

      (5) "Mice move at variable speeds": in this sentence, spell out when "speed" refers to mouse and when it refers to hindpaw. Similarly, Fig 2i. The sentence is potentially confusing to general readers (paws stationary although the mouse is moving). Presumably, it's due to gait. I suggest explaining this here.

      The speed values that relate to the mouse body and paws are now clearer in the main text and in the legend for Figure 2I.

      (6) Figure 2k and associated main text. It is not clear what "success/hit rate" means here.

      We have added the following sentence in the main text: “Hit accuracy refers to the percentage of trials in which the laser successfully targeted (‘hit’) the intended hind paw.” and use hit accuracy throughout instead of success rate.

      (7) Figure 2L. All these points are greater than the "average" 0.54 reported in the text. How is this possible?

      The MAE of 0.54 mm refers to the “predicted and actual laser spot locations” (that is, the difference between where the calibration map should place the laser spot and where it actually fell), while Figure 2L MAE values refers to the error between the ground truth keypoint to laser spot (that is, the error between the human-observed paw target and where the laser spot fell). The latter error will include the former error so is expected to be larger. We have clarified this point throughout the text, for example, stating “As laser targeting relies on real-time tracking to direct the laser to the specified body part, this metric inherently accounts for any errors introduced by the tracking and targeting.”. This is also discussed above in response to Reviewer 2.

      (8) "large circular arena". State the size here

      We have added this to the Figure 2 legend.

      (9) Figure 3c-left. Can the contrast between the mouse and floor be increased here?

      We have improved the contrast in this image.

      (10) Figure 5c. It is unclear what C1, C2, etc refers to. Mice?

      Yes, these refer to mice. We have removed reference to these now as they are not needed.

      (11) Discussion. A comment. There is scope for elaborating on the potential for new research by combining it with new methods for measurements of neural activity in freely moving animals in the somatosensory system.

      Thank you. We agree and have added more detail on this in the discussion stating “The system may be combined with existing tools to record neural activity in freely-moving mice, such as fiber photometry, miniscopes, or large-scale electrophysiology, and manipulations of this neural activity, such as optogenetics and chemogenetics. This can allow mechanistic dissection of cell and circuit biology in the context of naturalistic behaviors.”

      Reviewer #3 (Recommendations for the authors):

      (1) Include the number of animals for behavior assays for the panels (e.g., Figures 4G).

      Where missing, we now state the number of animals in panels.

      (2) If representative responses are shown, such as in Figures 3E and 4F, include the average response with standard deviation so readers can appreciate the variation in the responses.

      We appreciate the suggestion to show variability in the responses. We have made several changes to Figures 3 and 4. Specifically, to illustrate the variability across multiple trials more clearly, Figure 3E now shows representative keypoint traces for each body part from two mice during their 5 trials. For Figure 4, we have re-analyzed the thermal stimulation trials and shown a raster plot of keypoint-based local motion energy (Figure 4E) sorted by response latency for hundreds of trials. Figure 4G now presents the cumulative distribution for all trials and animals for thermal (18 wild-type mice, 315 trials) and optogenetic stimulation trials (9 Trpv1::ChR2 mice, 181 trials). We also now provide means ± SD for the key metrics for optogenetic and thermal stimulation trials in Figure 4 in the Results section. This keeps the manuscript focused on the methodological advances while showing the trial variability.

      (3) "optical targeting of freely-moving mice in a large environments" should be "optical targeting of freely-moving mice in a large environment".

      Corrected

      (4) Define fps when you first mention this in the manuscript.

      Added

      (5) Data needs to be shown for the claim "Mice concurrently turned their heads toward the stimulus location while repositioning their bodies away from it".

      We state this observation to qualify that the stimulation of stationary mice resulted in behavioral responses “consistent with previous studies”. It would be redundant to repeat our full analysis and might distract from the novelty of the current manuscript. We have restricted this sentence to make it clearer: “Consistent with previous studies, we observed the whole-body behaviors like head orienting concurrent with local withdrawal (Browne et al., Cell Reports 2017; Blivis et al., eLife, 2017.)”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Druker et al. shows that siRNA depletion of PHD1, but not PHD2, increases H3T3 phosphorylation in cells arrested in prometaphase. Additionally, the expression of wild-type RepoMan, but not the RepoMan P604A mutant, restored normal H3T3 phosphorylation localization in cells arrested in prometaphase. Furthermore, the study demonstrates that expression of the RepoMan P604A mutant leads to defects in chromosome alignment and segregation, resulting in increased cell death. These data support a role for PHD1-mediated prolyl hydroxylation in controlling progression through mitosis. This occurs, at least in part, by hydroxylating RepoMan at P604, which regulates its interaction with PP2A during chromosome alignment.

      Strengths:

      The data support most of the conclusions made. However, some issues need to be addressed.

      Weaknesses:

      (1) Although ectopically expressed PHD1 interacts with ectopically expressed RepoMan, there is no evidence that endogenous PHD1 binds to endogenous RepoMan or that PHD1 directly binds to RepoMan.

      We do not fully agree that this comment is accurate - the implication is that we only show interaction between two exogenously expressed proteins, i.e. both exogenous PHD1 and RepoMan, when in fact we show that tagged PHD1 interacts with endogenous RepoMan. The major technical challenge here is the well-known difficulty of detecting endogenous PHD1 in such cell lines. We agree that co-IP studies do not prove that this interaction is direct and never claim to have shown this, though we do feel that a direct interaction is most likely, albeit not proven.

      (2) There is no genetic evidence indicating that PHD1 controls progression through mitosis by catalyzing the hydroxylation of RepoMan.

      We agree that our current study is primarily a biochemical and cell biological study, rather than a genetic study. Nonetheless, similar biochemical and cellular approaches have been widely used and validated in previous studies in mechanisms regulating cell cycle progression and we are confident in the conclusions drawn based on the data obtained so far.

      (3) Data demonstrating the correlation between dynamic changes in RepoMan hydroxylation and H3T3 phosphorylation throughout the cell cycle are needed.

      We agree that it will be very interesting to analyse in more detail the cell cycle dynamics of RepoMan hydroxylation and H3T3 phosphorylation - along with other cell cycle parameters. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments.

      (4) The authors should provide biochemical evidence of the difference in binding ability between RepoMan WT/PP2A and RepoMan P604A/PP2A.

      Here again we agree that it will be very interesting to analyse in future the detailed binding interactions between wt and mutant RepoMan and other interacting proteins, including PP2A. We show reduced interaction in cells by PLA (Figure 5A) and in biochemical analysis (Figure 5C). More in vitro analysis is, in our view, outside the scope of our present study and we are actively engaged in raising the additional funding needed to pursue such future experiments.

      (5) PHD2 is the primary proline hydroxylase in cells. Why does PHD1, but not PHD2, affect RepoMan hydroxylation and subsequent control of mitotic progression? The authors should discuss this issue further.

      We agree with the main point underpinning this comment, i.e., that there are still many things to be learned concerning the specific roles and mechanisms of the different PHD enzymes in vivo. We address this in the Discussion section and look forward to addressing these questions experimentally in future studies.

      Reviewer #2 (Public review):

      Summary:

      This is a concise and interesting article on the role of PHD1-mediated proline hydroxylation of proline residue 604 on RepoMan and its impact on RepoMan-PP1 interactions with phosphatase PP2A-B56 complex leading to dephosphorylation of H3T3 on chromosomes during mitosis. Through biochemical and imaging tools, the authors delineate a key mechanism in the regulation of the progression of the cell cycle. The experiments performed are conclusive with well-designed controls.

      Strengths:

      The authors have utilized cutting-edge imaging and colocalization detection technologies to infer the conclusions in the manuscript.

      Weaknesses:

      Lack of in vitro reconstitution and binding data.

      We agree that it will be very interesting to pursue in vitro reconstitution studies and detailed binding data. We view this as outside the scope of our present study and are actively engaged in raising the additional funding needed to pursue such future experiments. We do provide in vitro hydroxylation data in our accompanying manuscript by Jiang et al, 2025 Elife.

      Reviewer #3 (Public review):

      Summary:

      The manuscript is a comprehensive molecular and cell biological characterisation of the effects of P604 hydroxylation by PHD1 on RepoMan, a regulatory subunit of the PPIgamma complex. The identification and molecular characterisation of the hydroxylation site have been written up and deposited in BioRxiv in a separate manuscript. I reviewed the data and came to the conclusion that the hydroxylation site has been identified and characterised to a very high standard by LC-MS, in cells and in vitro reactions. I conclude that we should have no question about the validity of the PHD1-mediated hydroxylation. 

      In the context of the presented manuscript, the authors postulate that hydroxylation on P604 by PHD1 leads to the inactivation of the complex, resulting in the retention of pThr3 in H3. 

      Strengths:

      Compelling data, characterisation of how P604 hydroxylation is likely to induce the interaction between RepoMan and a phosphatase complex, resulting in loading of RepoMan on Chromatin. Loss of the regulation of the hydroxylation site by PHD1 results in mitotic defects.

      Weaknesses:

      Reliance on a Proline-Alanine mutation in RepoMan to mimic an unhydroxylatable protein. The mutation will introduce structural alterations, and inhibition or knockdown of PHD1 would be necessary to strengthen the data on how hydroxylates regulate chromatin loading and interactions with B56/PP2A.

      We do not agree that we rely solely on analysis of the single site pro-ala mutant in RepoMan for our conclusions, since we also present a raft of additional experimental evidence, including knock-down data and experiments using both fumarate and FG. We would also reference the data we present on RepoMan in the parallel study by Jiang et al, which has also published in eLife(https://doi.org/10.7554/eLife.108128.1)). Of course, we agree with the reviewer that even although the mutant RepoMan features only a single amino acid change, this could still result in undetermined structural effects on the RepoMan protein that could conceivably contribute, at least in part, to some of the phenotypic effects observed. We now provide evidence in the current revision (new Figure 5D) that reduced interaction between RepoMan and B56gamma/PP2A is also evident when PHD1 is depleted from cells.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) The manuscript can benefit from improved quality of writing and avoidance of grammatical errors.

      We have checked through the manuscript again and corrected any mistakes we have encountered in the Current revision.

      (2) Although the data in the manuscript is compelling, it is difficult to rule out indirect effects in the interactions. Hence, in vitro binding assays with purified proteins are important to validate the findings, along with in vitro reconstitution of phosphatase activity.

      It is possible that cofactors and / or additional PTMs are required to promote these interactions in vivo. We have provided in vitro hydroxylation analysis and the additional experiments suggested will be the subject of follow-on future studies.

      (3) Proline to alanine is a drastic mutation in the amino acid backbone. The authors could purify PHD1 and reconstitute P604 hydroxylation to show if it performs as expected.

      This is likely to be a challenging experiment technically, given that RepoMan is a component of multiple distinct complexes, some of which are dynamic. We did not feel able to address this within the scope of the current study.

      (4) The confocal images showing the overlap of two fluorescent signals need to show some sort of quantification and statistics to prove that the overlap is significant.

      We now provide Pearson correlation measurements for Figure 2A in new Figure 2B in the Current revision.

      (5) Kindly provide a clearer panel for the Western blot of H3T3ph in Figure 3c.

      We have now included a new panel for this Figure in the Current revision.

      (6) Kindly also include the figures for validation of siRNAs used in the study

      We have added this throughout in supplementary figures.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors have shown that PHD1 and RepoMan interact; can the interaction be "trapped" by the addition of DMOG? Generally, hydroxylase substrates can be trapped, which would add an additional layer of confidence that PHD1 and RepoMan form an enzyme-substrate complex. 

      This is something we are planning to do for follow-up studies using the established methods from the von Kriesgheim laboratory.

      (2) How does P604A mutation affect the interaction with PHD1? One would expect a reduction in interaction. 

      Another interesting point we are planning to investigate in the future.

      (3) The effects of expression of the wt and P604A mutant repoman are well-characterised. Could the authors check the effects of overexpressing PHD1 and deadPHD1, inhibition on the mitosis/H3 phosphorylation? My concerns are that a P-A mutation will disrupt the secondary structure, and although it is a good tool, data should be backed up by increasing/decreasing the hydroxylation of RepoMan over the mutation. Repeat some of the most salient experiments where the P604A mutation has been used and modulate the hydP604 by modulating PHD1 activity/expression (such as Chromatin interaction, PLA assay, B56gamma interaction, H3 phosphorylation localisation, Monastrol release, etc.)

      We agree, the PA mutant can potentially affect the protein structure. In our manuscript we have provided pH3 analysis for PHD inhibition using siRNA, FG4592 and Fumarate. In the Current revision ee also data showing that depletion of PHD1 results in a reduction in interaction between RepoMan and B56gamma/PP2A. This is now presented in new figure 5D.

      (4) I also have a general question, as a point of interest, as the interaction between PHD1 and RepoMan appears to be cell cycle dependent, is it possible that the hydroxylation status cycles as well? Could this explain how some sub-stochiometric hydroxylation events observed may be masked by assessing unsynchronised cells in bulk?

      Indeed, a very good question. We believe this is an interesting question for follow up studies. Given our previous publication showing phosphorylation of PHD1 by CDKs alters substrate binding (Ortmann et al, 2016 JCS), this is our current hypothesis.

    1. Author response:

      We would like to thank the reviewers for their helpful feedback. We appreciate their recognition of many positive features from our study and plan to address the weaknesses with the following set of changes:

      Reviewer #1 rightly points out that the titration of performance throughout the experiment could reduce the overall size of the phasic effect we observed by compressing the overall range of d’. In our revision, we plan to acknowledge the potential consequence of stimulus titration as well as emphasize that the resultant vector length approach we took to quantify phase-behavior coupling is a better reflection of the effect size than the plot of phase-binned d’. Next, we will include language cautioning the certainty of our double-pass statistics since half of our participants had much fewer double-pass trials due to a coding error. Finally, we can gladly clarify methodological details requested and revise the discussions by phrasing several of our interpretations more conservatively: specifically discussing the possibility that the frontal-occipital phase difference could also arise from two counter-phase sources, and including the possibility that sensory noise reduction and sharpened tuning may be two separate mechanisms.

      Reviewer #2 raises concerns about performing group-level statistical analyses on a small sample size. We acknowledge this as a reasonable concern and will include the single-subject effects of our main analysis in the Supplementary Materials as well as discuss that although the sample size is a limitation of our study, there are several justifications for taking a small-n, large-trial approach given our research question. We would also like to highlight that we feel more confident in the reproducibility of our results given the convergence of evidence across multiple measures (phase-d’ coupling, counter-phasic hit and false alarm rates, response consistency, and classification images) which are all pointing towards a consistent interpretation of a phase effect on internal variability.

    1. Author response

      Public Reviews:

      Reviewer #1 (Public review):

      This study presents evidence that the addition of the two GTPases EngA and ObgE to reactions comprised of rRNAs and total ribosomal proteins purified from native bacterial ribosomes can bypass the requirements for non-physiological temperature shifts and Mg<sup>+2</sup> ion concentrations for in vitro reconstitution of functional E. coli ribosomes.

      Strengths:

      This advance allows ribosome reconstitution in a fully reconstituted protein synthesis system containing individually purified recombinant translation factors, with the reconstituted ribosomes substituting for native purified ribosomes to support protein synthesis. This work potentially represents an important development in the long-term effort to produce synthetic cells.

      Weaknesses:

      While much of the evidence is solid, the analysis is incomplete in certain respects that detract from the scientific quality and significance of the findings:

      (1) The authors do not describe how the native ribosomal proteins (RPs) were purified, and it is unclear whether all subassemblies of RPs have been disrupted in the purification procedure. If not, additional chaperones might be required beyond the two GTPases described here for functional ribosome assembly from individual RPs.

      Native ribosomal proteins (RPs) were prepared from native ribosomes, according to the well-established protocol described by Dr. Knud H. Nierhaus [Nierhaus, K. H. Reconstitution of ribosomes in Ribosomes and protein synthesis: A Practical Approach (Spedding G. eds.) 161-189, IRL Press at Oxford University Press, New York (1990)]. In this method, ribosome proteins are subjected to dialysis in 6 M urea buffer, a strong denaturing condition that may completely disrupt ribosomal structure and dissociate all ribosomal protein subassemblies. To make this point clear, we will describe the ribosomal protein (RP) preparation procedure in the manuscript, rather than merely referring to the book.

      In addition, we would like to clarify one point related to this comment. The focus of the present study is to show that the presence of two factors is required for single-step ribosome reconstitution under translation-compatible, cell-free conditions. We do not intend to claim that these two factors are absolutely sufficient for ribosome reconstitution. Hence, we will revise the manuscript to more explicitly state what this work does and does not conclude.

      (2) Reconstitution studies in the past have succeeded by using all recombinant, individually purified RPs, which would clearly address the issue in the preceding comment and also eliminate the possibility that an unknown ribosome assembly factor that co-purifies with native ribosomes has been added to the reconstitution reactions along with the RPs.

      As noted in the response to the Comment (1), the focus of the present study is the requirement of the two factors for functional ribosome assembly. Therefore, we consider that it is not necessary to completely exclude the possibility that unknown ribosome assembly factors are present in the RP preparation. Nevertheless, we agree that it is important to clarify what factors, if any, are co-present in the RP fraction. To address this, we plan to add proteomic analysis results of the TP70 preparation.

      We also agree that additional, as-yet-unidentified components, including factors involved in rRNA modification, could plausibly further improve assembly efficiency. We will explicitly note this possibility in the Discussion.

      Finally, extending the system to the use of in vitro-transcribed rRNA and fully recombinant ribosomal proteins could be essentially a next step of this study, and we are currently exploring these directions in our laboratory. However, we consider them beyond the scope of the present study and will provide them as future perspectives of this study in the Discussion.

      (3) They never compared the efficiency of the reconstituted ribosomes to native ribosomes added to the "PURE" in vitro protein synthesis system, making it unclear what proportion of the reconstituted ribosomes are functional, and how protein yield per mRNA molecule compares to that given by the PURE system programmed with purified native ribosomes.

      We consider that it is feasible to estimate the GFP synthesis rate from the increase in fluorescence over time under conditions where the template mRNA is in excess, and to compare this rate directly between reconstituted and native ribosomes. We will therefore consider performing this experiment. This comparison should provide insight into what fraction of ribosomes reconstituted in our system are functionally active.

      By contrast, quantifying protein yield per mRNA molecule is substantially more challenging. The translation system is complex, and the apparent yield per mRNA can vary depending on factors such as differences in polysome formation efficiency. In addition, the PURE system is a coupled transcription–translation setup that starts from DNA templates, which further complicates rigorous normalization on a per-mRNA basis. Because the main focus of this study is to determine how many functionally active ribosomes can be reconstituted under translation-compatible conditions, we plan to address this comment by carrying out the former experiment.

      (4) They also have not examined the synthesized GFP protein by SDS-PAGE to determine what proportion is full-length.

      Because we can add an affinity tag to the GFP reporter, it should be feasible to selectively purify the synthesized protein from the reaction mixture and analyze it by SDS–PAGE. We therefore plan to perform this experiment.

      (5) The previous development of the PURE system included examinations of the synthesis of multiple proteins, one of which was an enzyme whose specific activity could be compared to that of the native enzyme. This would be a significant improvement to the current study. They could also have programmed the translation reactions containing reconstituted ribosomes with (i) total native mRNA and compared the products in SDS-PAGE to those obtained with the control PURE system containing native ribosomes; (ii) with specifc reporter mRNAs designed to examine dependence on a Shine-Dalgarno sequence and the impact of an in-frame stop codon in prematurely terminating translation to assess the fidelity of initiation and termination events; and (iii) an mRNA with a programmed frameshift site to assess elongation fidelity displayed by their reconstituted ribosomes.

      Following the recommendation, we plan to test the synthesis of at least one additional protein with enzymatic activity, in addition to GFP, so that the activity of the translated product can be assessed.

      We agree that comparing translation products using total mRNA, testing dependence on the Shine–Dalgarno sequence, and performing dedicated assays to evaluate initiation/elongation/termination fidelity are all attractive and valuable studies. However, we consider these to be beyond the scope of the present manuscript. We will therefore describe them explicitly as future directions in the Discussion.

      At the same time, we anticipate that mass spectrometric (MS) analysis of GFP and the enzyme product(s) that we attempt to synthesize could partially address concerns related to product integrity (e.g., truncations) and, to some extent, translational fidelity. We therefore plan to carry out MS analysis of these translated products.

      Reviewer #2 (Public review):

      This study presents a significant advance in the field of in vitro ribosome assembly by demonstrating that the bacterial GTPases EngA and ObgE enable single-step reconstitution of functional 50S ribosomal subunits under near-physiological conditions-specifically at 37 {degree sign}C and with total Mg²⁺ concentrations below 10 mM.

      This achievement directly addresses a long-standing limitation of the traditional two-step in vitro assembly protocol (Nierhaus & Dohme, PNAS 1974), which requires non-physiological temperatures (44-50 {degree sign}C), and high Mg²⁺ concentrations (~20 mM). Inspired by the integrated Synthesis, Assembly, and Translation (iSAT) platform (Jewett et al., Mol Syst Biol 2013), leveraging E. coli S150 crude extract, which supplies essential assembly factors, the authors hypothesize that specific ribosome biogenesis factors-particularly GTPases present in such extracts-may be responsible for enabling assembly under mild conditions. Through systematic screening, they identify EngA and ObgE as the minimal pair sufficient to replace the need for temperature and Mg²⁺ shifts when using phenol-extracted (i.e., mature, modified) rRNA and purified TP70 proteins.

      However, several important concerns remain:

      (1) Dependence on Native rRNA Limits Generalizability

      The current system relies on rRNA extracted from native ribosomes via phenol, which retains natural post-transcriptional modifications. As the authors note (lines 302-304), attempts to assemble active 50S subunits using in vitro transcribed rRNA, even in the presence of EngA and ObgE, failed. This contrasts with iSAT, where in vitro transcribed rRNA can yield functional (though reduced-activity, ~20% of native) ribosomes, presumably due to the presence of rRNA modification enzymes and additional chaperones in the S150 extract. Thus, while this study successfully isolates two key GTPase factors that mimic part of iSAT's functionality, it does not fully recapitulate iSAT's capacity for de novo assembly from unmodified RNA. The manuscript should clarify that the in vitro assembly demonstrated here is contingent on using native rRNA and does not yet achieve true bottom-up reconstruction from synthetic parts. Moreover, given iSAT's success with transcribed rRNA, could a similar systematic omission approach (e.g., adding individual factors) help identify the additional components required to support unmodified rRNA folding?

      We fully recognize the reviewer’s point that our current system has not yet achieved a true bottom-up reconstruction. Although we intended to state this clearly in the manuscript, the fact that this concern remains indicates that our description was not sufficiently explicit. We will therefore revisit the organization and wording of the manuscript and revise it to ensure that this limitation is clearly communicated to readers.

      (2) Imprecise Use of "Physiological Mg²⁺ Concentration"

      The abstract states that assembly occurs at "physiological Mg²⁺ concentration" (<10 mM). However, while this total Mg²⁺ level aligns with optimized in vitro translation buffers (e.g., in PURE or iSAT systems), it exceeds estimates of free cytosolic [Mg²⁺] in E. coli (~1-2 mM). The authors should clarify that they refer to total Mg²⁺ concentrations compatible with cell-free protein synthesis, not necessarily intracellular free ion levels, to avoid misleading readers about true physiological relevance.

      We agree that this is a very reasonable point. We will therefore revise the manuscript to clarify that we are referring to the total Mg²⁺ concentration compatible with cell-free protein synthesis, rather than the intracellular free Mg²⁺ level under physiological conditions.

      In summary, this work elegantly bridges the gap between the two-step method and the extract-dependent iSAT system by identifying two defined GTPases that capture a core functionality of cellular extracts: enabling ribosome assembly under translation-compatible conditions. However, the reliance on native rRNA underscores that additional factors - likely present in iSAT's S150 extract - are still needed for full de novo reconstitution from unmodified transcripts. Future work combining the precision of this defined system with the completeness of iSAT may ultimately realize truly autonomous synthetic ribosome biogenesis.

    1. Author response:

      Thank you for your letter and for the constructive feedback from the reviewers on our manuscript (eLife-RP-RA-2025-109174). We appreciate the time and expertise you and the reviewers have dedicated to improving our work.

      We have carefully considered all comments and have developed a comprehensive revision plan. To address the primary concerns, we will conduct several new experiments designed to provide robust support for our key conclusions. Other points will be addressed through textual revisions, including the addition of existing ADMET data and an expanded discussion section.

      We are confident that these revisions will fully satisfy the reviewers' concerns and significantly strengthen the manuscript. Our detailed experimental plan and point-by-point responses are provided below.

      (1) Addressing "Qualitative analyses of some of the lipid measures, as opposed to more quantitative analyses"

      Supplementary experiments and analyses

      We will add the assessment of hepatic triglyceride and total cholesterol levels in liver tissues from control, experimental, and drug-treated mice, thereby providing further quantitative validation.

      (2) Addressing "SREBP2"

      Supplementary experiments and analyses

      We will include a luciferase assay to determine whether alcohol plus PA induces SREBP2 activation in AML-12 cells.

      As suggested, we will assess the expression levels of SREBP2 downstream target genes (Hmgcr, Hmgcs, Ldlr, and Lcn2) in both in vitro and in vivo models.

      (3) Timeline and process arrangement of supplementary experiments

      To comprehensively address these issues, we plan to purchase the following required reagents and have formulated the following experimental plan:

      Author response table 1.

      Given the time required for reagent acquisition and the execution of these in vitro and in vivo experiments, we kindly request an extension of the revision deadline by 8 weeks. This will ensure the comprehensive and high-quality completion of all necessary studies.

      We will fully commit to delivering a thoroughly revised manuscript that robustly addresses all reviewer comments and aligns with the high standards of eLife. We greatly appreciate your guidance and flexibility.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript addresses an important question: how do circadian clocks adjust to a complex rhythmic environment with multiple daily rhythms? The focus is on the temperature and light cycles (TC and LD) and their phase relationship. In nature, TC usually lags the LD cycle, but the phase delay can vary depending on seasonal and daily weather conditions. The authors present evidence that circadian behavior adjusts to different TC/LD phase relationships, that temperature-sensitive tim splicing patterns might underlie some of these responses, and that artificial selection for preferential evening or morning eclosion behavior impacts how flies respond to different LD/TC phase relationship

      Strength:

      Experiments are conducted on control strains and strains that have been selected in the laboratory for preferential morning or evening eclosion phenotypes. This study is thus quite unique as it allows us to probe whether this artificial selection impacted how animals respond to different environmental conditions, and thus gives hints on how evolution might shape circadian oscillators and their entrainment. The authors focused on circadian locomotor behavior and timeless (tim) splicing because warm and cold-specific transcripts have been described as playing an important role in determining temperature-dependent circadian behavior. Not surprisingly, the results are complex, but there are interesting observations. In particular, the "late" strain appears to be able to adjust more efficiently its evening peak in response to changes in the phase relationship between temperature and light cycles, but the morning peak seems less responsive in this strain. Differences in the circadian pattern of expression of different tim mRNA isoforms are found under specific LD/TC conditions.

      We sincerely thank the reviewer for this generous assessment and for recognizing several key strengths of our study. We are particularly gratified that the reviewer values our use of long-term laboratory-selected chronotype lines (350+ generations), which provide a unique evolutionary perspective on how artificial selection reshapes circadian responses to complex LD/TC phase relationships—precisely our core research question.

      Weaknesses:

      These observations are interesting, but in the absence of specific genetic manipulations, it is difficult to establish a causative link between tim molecular phenotypes and behavior. The study is thus quite descriptive. It would be worth testing available tim splicing mutants, or mutants for regulators of tim splicing, to understand in more detail and more directly how tim splicing determines behavioral adaptation to different phase relationships between temperature and light cycles. Also, I wonder whether polymorphisms in or around tim splicing sites, or in tim splicing regulators, were selected in the early or late strains.

      We thank the reviewer for this insightful comment. We agree that our current data do not establish a direct causal link between tim splicing (or Psi) and behaviour, and we appreciate that some of our wording (e.g. “linking circadian gene splicing to behavioural plasticity” or describing tim splicing as a “pivotal node”) may have suggested unintended causal links. In the revision, we will (i) explicitly state in the Abstract, Introduction, and early Discussion that the main aim was to test whether selection for timing of eclosion is accompanied by correlated evolution of temperature‑dependent tim splicing patterns and evening activity plasticity under complex LD/TC regimes, and (ii) consistently describe the molecular findings as correlational and hypothesis‑generating rather than causal. We will also add phrases throughout the text to point the reader more clearly to existing passages where we already emphasize “correlated evolution” and explicitly label our mechanistic ideas as “we speculate” / “we hypothesize” and as future experiments.

      We fully agree that studies using tim splicing mutants or manipulations of splicing regulators under in‑sync and out‑of‑sync LD/TC regimes will be essential to ascertain what role tim variants play under such environmental conditions, and we will highlight this as a key future direction. At the same time, we emphasize that the long‑term selection lines provide a complementary perspective to classical mutant analyses by revealing how behavioural and molecular phenotypes can exhibit correlated evolution under a specific, chronobiologically relevant selection pressure (timing of emergence).

      Finally, we appreciate the suggestion regarding polymorphisms. Whole‑genome analyses of these lines in a PhD thesis from our group (Ghosh, 2022, unpublished, doctoral dissertation) reveal significant SNPs in intronic regions of timeless in both Early and Late populations, as well as SNPs in CG7879, a gene implicated in alternative mRNA splicing, in the Late line. Because these analyses are ongoing and not yet peer‑reviewed, we do not present them as main results.

      I also have a major methodological concern. The authors studied how the evening and morning phases are adjusted under different conditions and different strains. They divided the daily cycle into 12h morning and 12h evening periods, and calculated the phase of morning and evening activity using circular statistics. However, the non-circadian "startle" responses to light or temperature transitions should have a very important impact on phase calculation, and thus at least partially obscure actual circadian morning and evening peak phase changes. Moreover, the timing of the temperature-up startle drifts with the temperature cycles, and will even shift from the morning to the evening portion of the divided daily cycle. Its amplitude also varies as a function of the LD/TC phase relationship. Note that the startle responses and their changes under different conditions will also affect SSD quantifications.

      We thank the reviewer for this perceptive methodological concern, which we had anticipated and systematically quantified but had not included in the original submission. The reviewer is absolutely correct that non-circadian startle responses to zeitgeber transitions could confound both circular phase (CoM) calculations and SSD quantifications, particularly as TC drift creates shifting startle locations across morning/evening windows.

      We will be including startle response quantification (previously conducted but unpublished) as new a Supplementary figure, systematically measuring SSD in 1-hour windows immediately following each of the four environmental transitions (lights-ON, lights-OFF, temperature rise and temperature fall) across all six LDTC regimes (2-12hr TC-LD lags) for all 12 selection lines (early<sub>1-4</sub>, control<sub>1-4</sub>, late<sub>1-4</sub>).

      Author response image 1.

      Startle responses in selection lines under LDTC regimes: SSD calculated to assess startle response to each of the transitions (1-hour window after the transition used for calculations). Error bars are 95% Tukey’s confidence intervals for the main effect of selection in a two-factor ANOVA design with block as a random factor. Non-overlapping error bars indicate significant differences among the values. SSD values between in-sync and out-of-sync regimes for a range of phase relationships between LD and TC cycles (A) LDTC 2-hr, (B) LDTC 4-hr, (C) LDTC 6-hr, (D) LDTC 8-hr, (E) LDTC 10-hr, (F) LDTC 12-hr.

      Key findings directly addressing the reviewer's concerns:

      (1) Morning phase advances in LDTC 8-12hr regimes are explained by quantified nocturnal startle activity around temperature rise transitions occurring within morning windows. Critically, these startles show no selection line differences, confirming they represent equivalent non-circadian confounds across lines.

      (2) Early selection lines exhibit significantly heightened startle responses specifically to temperature rise in LDTC 4hr and 6hr regimes (early > control ≥ late), demonstrating that startle responses themselves exhibit correlated evolution with emergence timing—an important novel finding that strengthens our evolutionary story.

      (3) Startle responses differed among selection lines only for the temperature rise transition under two of the regimes used, LDTC 4 hr and 6 hr regimes. Under LDTC 4 hr, temperature rise transition falls in the morning window and despite early having significantly greater startle than late, the overall morning SSD (over 12 hours morning window) did not differ significantly among the selection lines for this regime. Thus, eliminating the startle window would make the selection lines more similar to one another. On the other hand, under LDTC 6 hour regime, the startle response to temperature rise falls in the evening 12 hour window. In this case too, early showed higher startle than control and late. A higher startle in early would thus, contribute to the observed differences among selection lines. We agree with the reviewer that eliminating this startle peak would lead to a clearer interpretation of the change in circadian evening activity.

      We deliberately preserved all behavioural data without filtering out startle windows since it would require arbitrary cutoffs like 1 hr, 2 hr or 3 hours post transitions or until the startle peaks declines in different selection lines under different regimes. In the revised version, we will add complementary analyses excluding the startle windows to obtain mean phase and SSD values which are unaffected by the startle responses.

      For the circadian phase, these issues seem, for example, quite obvious for the morning peak in Figure 1. According to the phase quantification on panel D, there is essentially no change in the morning phase when the temperature cycle is shifted by 6 hours compared to the LD cycle, but the behavior trace on panel B clearly shows a phase advance of morning anticipation. Comparison between the graphs on panels C and D also indicates that there are methodological caveats, as they do not correlate well.

      Because of the various masking effects, phase quantification under entrainment is a thorny problem in Drosophila. I would suggest testing other measurements of anticipatory behavior to complement or perhaps supersede the current behavior analysis. For example, the authors could employ the anticipatory index used in many previous studies, measure the onset of morning or evening activity, or, if more reliable, the time at which 50% of anticipatory activity is reached. Termination of activity could also be considered. Interestingly, it seems there are clear effects on evening activity termination in Figure 3. All these methods will be impacted by startle responses under specific LD/TC phase relationships, but their combination might prove informative.

      We agree that phase quantification under entrained conditions in Drosophila is challenging and that anticipatory indices, onset/offset measures, and T50 metrics each have particular strengths and weaknesses. In designing our analysis, we chose to avoid metrics that require arbitrary or subjective criteria (e.g. defining activity thresholds or durations for anticipation, or visually marking onset/offset), because these can substantially affect the estimated phase and reduce comparability across regimes and genotypes. Instead, we used two fully quantitative, parameter-free measures applied to the entire waveform within defined windows: (i) SSD to capture waveform change in shape/amplitude and (ii) circular mean phase of activity (CoM) restricted to the 12 h morning and 12 h evening windows. By integrating over the entire window, these measures are less sensitive to the exact choice of threshold and to short-lived, high-amplitude startles at transitions, and they treat all bins within the window in a consistent, reproducible way across all LDTC regimes and lines. Panels C (SSD) and D (CoM) are intentionally complementary, not redundant: SSD reflects how much the waveform changes in shape and amplitude, whereas CoM reflects the timing of the center of mass of activity. Under conditions where masking alters amplitude and introduces short-lived bouts without a major shift of the main peak, it is expected that SSD and CoM will not correlate linearly across regimes.

      We will be including a detailed calculation of how CoM is obtained in our methods for the revised version.  

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to dissect the plasticity of circadian outputs by combining evolutionary biology with chronobiology. By utilizing Drosophila strains selected for "Late" and "Early" adult emergence, they sought to investigate whether selection for developmental timing co-evolves with plasticity in daily locomotor activity. Specifically, they examined how these diverse lines respond to complex, desynchronized environmental cues (temperature and light cycles) and investigated the molecular role of the splicing factor Psi and timeless isoforms in mediating this plasticity.

      Major strengths and weaknesses:

      The primary strength of this work is the novel utilization of long-term selection lines to address fundamental questions about how organisms cope with complex environmental cues. The behavioral data are compelling, clearly demonstrating that "Late" and "Early" flies possess distinct capabilities to track temperature cycles when they are desynchronized from light cycles.

      We sincerely thank the reviewer for this enthusiastic recognition of our study's core strengths. We are particularly gratified that the reviewer highlights our novel use of long-term selection lines (350+ generations) as the primary strength, enabling us to address fundamental evolutionary questions about circadian plasticity under complex environmental cues. We thank them for identifying our behavioral data as compelling (Figs 1, 3), which robustly demonstrate selection-driven divergence in temperature cycle tracking.

      However, a significant weakness lies in the causal links proposed between the molecular findings and these behavioral phenotypes. The molecular insights (Figures 2, 4, 5, and 6) rely on mRNA extracted from whole heads. As head tissue is dominated by photoreceptor cells and glia rather than the specific pacemaker neurons (LNv, LNd) driving these behaviors, this approach introduces a confound. Differential splicing observed here may reflect the state of the compound eye rather than the central clock circuit, a distinction highlighted by recent studies (e.g., Ma et al., PNAS 2023).

      We thank the reviewer for highlighting this important methodological consideration. We fully agree that whole-head extracts do not provide spatial resolution to distinguish central pacemaker neurons (~100-200 total) from compound eyes and glia, and that cell-type-specific profiling represents the critical next experimental step. As mentioned in our response to Reviewer 1, we appreciate the issue with our phrasing and will be revising it accordingly to more clearly describe that we do not claim any causal connections between expression of the tim splice variants in particular circadian neurons and their contribution of the phenotype observed.

      We chose whole-head extracts for practical reasons aligned with our study's specific goals:

      (1) Fly numbers: Our artificially selected populations are maintained at large numbers (~1000s per line). Whole-head extracts enabled sampling ~150 flies per time point = ~600 flies per genotype per environmental, providing means to faithfully sample the variation that may exist in such randomly mating populations.

      (2) Established method for characterizing splicing patterns: The majority of temperature-dependent period/timeless splicing studies have successfully used whole-head extracts (Majercak et al., 1999; Shakhmantsir et al., 2018; Martin Anduaga et al., 2019) to characterize splicing dynamics under novel conditions.

      (3) Novel environmental regimes: Our primary molecular contribution was documenting timeless splicing patterns under previously untested LDTC phase relationships (TC 2-12hr lags relative to LD) and testing whether these exhibit selection-dependent differences consistent with behavioral divergence.

      Furthermore, while the authors report that Psi mRNA loses rhythmicity under out-of-sync conditions, this correlation does not definitively prove that Psi oscillation is required for the observed splicing patterns or behavioral plasticity. The amplitude of the reported Psi rhythm is also low (~1.5 fold) and variable, raising questions about its functional significance in the absence of manipulation experiments (such as constitutive expression) to test causality.

      We thank the reviewer for this insightful comment and appreciate that our phrasing has been misleading. We will especially pay attention to this issue, raised by two reviewers, and clearly highlight our results as correlated evolution and hypothesis-generating.

      We appreciate the reviewer highlighting these points and would like to draw attention to the following points in our Discussion section:

      “Psi and levels of tim-cold and tim-sc (Foley et al., 2019). We observe that this correlation is most clearly upheld under temperature cycles wherein tim-medium and Psi peak in-phase while the cold-induced transcripts start rising when Psi falls (Figure 8A1&2). Under LDTC in-sync conditions this relationship is weaker, even though Psi is rhythmic, potentially due to light-modulated factors influencing timeless splicing (Figure 8B1&2). This is in line with Psi’s established role in regulating activity phasing under TC 12:12 but not LD 12:12 (Foley et al., 2019). This is also supported by the fact that while tim-medium and tim-cold are rhythmic under LD 12:12 (Shakhmantsir et al., 2018), Psi is not (datasets from Kuintzle et al., 2017; Rodriguez et al., 2013). Assuming this to be true across genetic backgrounds and sexes and combined with our similar findings for these three transcripts under LDTC out-of-sync (Figure 2B3, D3&E3), we speculate that Psi rhythmicity may not be essential for tim-medium or tim-cold rhythmicity especially under conditions wherein light cycles are present along with temperature cycles (Figure 8C1&2). Our study opens avenues for future experiments manipulating PSI expression under varying light-temperature regimes to dissect its precise regulatory interactions. We hypothesize that flies with Psi knocked down in the clock neurons should exhibit a less pronounced shift of the evening activity under the range LDTC out-of-sync conditions for which activity is assayed in our study. On the other hand, its overexpression should cause larger delays in response to delayed temperature cycles due to the increased levels of tim-medium translating into delay in TIM protein accumulation.”

      Appraisal of aims and conclusions:

      The authors successfully demonstrate the co-evolution of emergence timing and activity plasticity, achieving their aim on the behavioral level. However, the conclusion that the specific molecular mechanism involves the loss of Psi rhythmicity driving timeless splicing changes is not yet fully supported by the data. The current evidence is correlative, and without spatial resolution (specific clock neurons) or causal manipulation, the mechanistic model remains speculative.

      This study is likely to be of significant interest to the chronobiology and evolutionary biology communities as it highlights the "enhanced plasticity" of circadian clocks as an adaptive trait. The findings suggest that plasticity to phase lags - common in nature where temperature often lags light - may be a key evolutionary adaptation. Addressing the mechanistic gaps would significantly increase the utility of these findings for understanding the molecular basis of circadian plasticity.

      Thank you for this thoughtful appraisal affirming our successful demonstration of co-evolution between emergence timing and circadian activity plasticity.

      Reviewer #3 (Public review):

      Summary:

      This study attempts to mimic in the laboratory changing seasonal phase relationships between light and temperature and determine their effects on Drosophila circadian locomotor behavior and on the underlying splicing patterns of a canonical clock gene, timeless. The results are then extended to strains that have been selected over many years for early or late circadian phase phenotypes.

      Strengths:

      A lot of work, and some results showing that the phasing of behavioural and molecular phenotypes is slightly altered in the predicted directions in the selected strains.

      We thank the reviewer for acknowledging the substantial experimental effort across 7 environmental regimes (6 LDTC phase relationships + LDTC in-phase), 12 replicate populations (early<sub>1-4</sub>, control<sub>1-4</sub>, late<sub>1-4</sub>), and comprehensive behavioural + molecular phenotyping.

      Weaknesses:

      The experimental conditions are extremely artificial, with immediate light and temperature transitions compared to the gradual changes observed in nature. Studies in the wild have shown how the laboratory reveals artifacts that are not observed in nature. The behavioural and molecular effects are very small, and some of the graphs and second-order analyses of the main effects appear contradictory. Consequently, the Discussion is very speculative as it is based on such small laboratory effects.

      We thank the reviewer for these important points regarding ecological validity, effect sizes, and interpretation scope.

      (1) Behavioural effects are robust across population replicates in selection lines (not small/weak)

      Our study assayed 12  populations total (4 replicate populations each of early, control, and late selection lines) under 7 LDTC regimes. Critically, selection effects were consistent across all 4 replicate populations within each selection line for every condition tested. In these randomly mating large populations, the mixed model ANOVA reveals highly significant selection×regime interactions [F(5,45)=4.1, p=0.003; Fig 3E, Table S2], demonstrating strong, replicated evolutionary divergence in evening temperature sensitivity.

      (2) Molecular effects test critical evolutionary hypothesis

      As stated in our Introduction, "selection can shape circadian gene splicing and temperature responsiveness" (Low et al., 2008, 2012). Our laboratory-selected chronotype populations—known to exhibit evolved temperature responsiveness (Abhilash et al., 2019, 2020; Nikhil et al., 2014; Vaze et al., 2012)—provide an apt system to test whether selection for temporal niche leads to divergence in timeless splicing. With ~600 heads per environmental regime per selection line, we detect statistically robust, selection line-specific temporal profiles [early4 advanced timeless phase (Fig 4A4); late4 prolonged tim-cold (Fig 5A4); significant regime×selection×time interactions (Tables S3-S5)], providing initial robust evidence of correlated molecular evolution under novel LDTC regimes.

      (3) Systematic design fills critical field gap

      Artificial conditions like LD/DD have been useful in revealing fundamental zeitgeber principles. Our systematic 2-12hr TC-LD lags directly implement Pittendrigh & Bruce (1959) + Oda & Friesen (2011) validated design, which discuss how such experimental designs can provide a more comprehensive understanding of zeitgeber integration compared to studies with only one phase jump between two zeitgebers.

      (4) Ramping regimes as essential next step

      Gradual ramping regimes better mimic nature and represent critical future experiments. New Discussion addition in the revised version: "Ramping LDTC regimes can test whether selection-specific zeitgeber hierarchy persists under naturalistic gradients." While ramping experiments are essential, we would like to emphasize that we aimed to use this experimental design as a tool to test if evening activity exhibits greater temperature sensitivity and if this property of the circadian system can undergo correlated evolution upon selection for timing of eclosion/emergence.

      (5) New startle quantification addresses masking

      Our startle quantification (which will be added as a new supplementary figure) confirms circadian evening tracking persists despite quantified, selection-independent masking in most of the regimes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Hao Jiang et al described a systematic approach to identify proline hydroxylation proteins. The authors implemented a proteomic strategy with HILIC-chromatographic separation and reported an identification of 4993 sites from HEK293 cells (4 replicates) and 3247 sites from RCC4 sites (3 replicates) with 1412 sites overlapping between the two cell lines. From the analysis, the authors identified 225 sites and 184 sites respectively from 293 and RCC4 cells with HyPro diagnostic ion. The identifications were validated by analyzing a few synthetic peptides, with a specific focus on Repo-man (CDCA2) through comparing MS/MS spectra, retention time, and diagnostic ions. With SILAC analysis and recombinant enzyme assay, the study showed that Repo-man HyPro604 is a target of the PHD1 enzyme.

      Strengths:

      The study involved extensive LC-MS analysis and was carefully implemented. The identification of over 4000 confident proline hydroxylation sites would be a valuable resource for the community. The characterization of Repo-man proline hydroxylation is a novel finding.

      Weaknesses:

      However, as a study mainly focused on methodology, the findings from the experimental data did not convincingly demonstrate the sensitivity and specificity of the workflow for site-specific identification of proline hydroxylation in global studies.

      Proline hydroxylation is an enzymatic post translational protein modification, catalysed by prolyl Hydroxylases (PHDs), which can have profound biological significance, e.g. altering protein half-life and/or the stability of protein-protein interactions. Furthermore, there has been controversy in the field as to the true number of protein targets for PHDs in cells. Thus, there is a clear need for methods to enable the robust identification of genuine PHD targets and to reliably map sites of PHD-catalysed proline hydroxylation in proteins. We believe, therefore, that our methodology, as reported here in Jiang et al., is an important contribution towards this goal. We note that our methodology has already been used successfully by others

      (https://doi.org/10.1016/j.mcpro.2025.100969). While further improvements in this methodology may of course be developed in future, we are not currently aware of any superior methods that have been reported previously in the literature. The criticism made by the reviewer notably does not include reference to any such alternative published methodology that interested researchers can use which would offer superior results to the approach we document in this study.

      Major concerns:

      (1) The study applied HILIC-based chromatographic separation with a goal of enriching and separating hydroxyproline-containing peptides. However, as the authors mentioned, such an approach is not specific to proline hydroxylation. In addition, many other chromatography techniques can achieve deep proteome fractionation such as high pH reverse phase fractionation, strong-cation exchange etc. There was no data in this study to demonstrate that the strategy offered improved coverage of proline hydroxylation proteins, as the identifications of the HyPro sites could be achieved through deep fractionation and a highly sensitive LCMS setup. The data of Figure 2A and S1A were somewhat confusing without a clear explanation of the heat map representations. 

      The data we present in this study demonstrate clearly that peptides with hydroxylated prolines are enriched in specific HILIC fractions (F10-F18), in comparison with total unfractionated peptides derived from cell extracts. We also refer the reviewer to our previously published study by Bensaddek et al (International Journal of Mass Spectrometry: doi:10.1016/j.ijms.2015.07.029), which was reference 41 in this study, in which we compared directly the performance of both HILIC and strong anionic exchange chromatography, (hSAX). This showed that HILIC provided superior enrichment to hSAX for enrichment of peptides containing hydroxylated proline residues. To clarify this point for readers, we have now included a specific reference to our previous study at the start of the Results section in our current revision. Currently, we use HILIC to provide a degree of enrichment for proline hydroxylated peptides because we are not aware of alternative chromatographic methods that in our hands provide better results.

      We have included descriptions of the information shown in the heatmaps in the associated figure legends and captions.

      (2) The study reported that the HyPro immonium ion is a diagnostic ion for HyPro identification. However, the data showed that only around 5% of the identifications had such a diagnostic ion. In comparison, acetyl-lysine immonium ion was previously reported to be a useful marker for acetyllysine peptides (PMID: 18338905), and the strategy offered a sensitivity of 70% with a specificity of 98%. In this study, the sensitivity of HyPro immonium ion was quite low. The authors also clearly demonstrated that the presence of immonium ion varied significantly due to MS settings, peptide sequence, and abundance. With further complications from L/I immonium ions, it became very challenging to implement this strategy in a global LC-MS analysis to either validate or invalidate HyPro identifications.

      The reviewer appears to have misunderstood the point we make with regard to the identification of the immonium ion and its use as a diagnostic marker for proline hydroxylation in MS analyses. We do not claim that this immonium ion is an essential diagnostic marker for proline hydroxylation. As the reviewer notes, with respect to the acetyl-lysine modification, the corresponding immonium ion is often used in MS studies as a diagnostic for identification of specific post translational modifications. Previous studies have reported that the immonium ion for hydroxylated proline is detected when the transcription factor HIF is analysed, but is often absent with other putative PHD targets, which has been used as an argument that these targets are not genuine proline hydroxylation sites. We are not, therefore, introducing the idea in this study that the hydroxy-proline immonium ion is a required diagnostic marker for proline hydroxylation, but instead demonstrating that detection of this ion, at least in some peptide sequences, may require the use of higher MS collision energies than are typically required for routine peptide identification. We believe that this is an interesting observation that can help to clear up discussions in the literature regarding the true prevalence of PHD-catalysed proline hydroxylation in different target proteins. Our data suggest that, in future MS studies analysing suspected PHD target proteins, two different collision energy might need to be used, i.e., normal collision energy for the routine identification of a peptide, combined with use of a higher collision energy if the hydroxy-proline immonium ion was not already detected.

      (3) The study aimed to apply the HILIC-based proteomics workflow to identify HyPro proteins regulated by the PHD enzyme. However, the quantification strategy was not rigorous. The study just considered the HyPro proteins not identified by FG-4592 treatment as potential PHD targeted proteins. There are a few issues. First, such an analysis was not quantitative without reproducibility or statistical analysis. Second, it did not take into consideration that data-dependent LC-MS analysis was not comprehensive and some peptide ions may not be identified due to background interferences. Lastly, FG-4592 treatment for 24 hrs could lead to wide changes in gene expressions and protein abundances. Therefore, it is not informative to draw conclusions based on the data for bioinformatic analysis.

      We refer the reviewer to the data we present in this study using SILAC analysis, combined with our MS workflow. to achieve a more accurate quantitative picture of proline hydroxylation levels. While we agree that the point the reviewer makes is valid, regarding our data dependent LC-MS/MS analysis potentially not being comprehensive, this means, however, that we are potentially underestimating the true prevalence of proline hydroxylated peptides, not overestimating the level of these modified peptides. We also refer the reviewer to the accompanying study by Druker et al., (eLife 2025; doi.org/10.7554/eLife.108131.1)  in which we present a detailed follow-on study demonstrating the functional significance of the novel proline hydroxylation site we detected in the protein RepoMan (CDCA2). Therefore, even if we have not achieved a fully comprehensive analysis of all proline hydroxylated peptides catalysed by PHD enzymes, we believe that we have advanced the field by documenting a workflow that is able to identify and validate novel PHD targets.

      (4) The authors performed an in vitro PHD1 enzyme assay to validate that Repo-man can be hydroxylated by PHD1. However, Figure 9 did not show quantitatively PHD1-induced increase in Repo-man HyPro abundance and it is difficult to assess its reaction efficiency to compare with HIF1a HyPro.

      The analysis shown in Figure 9 was not intended to quantify the efficiency of in vitro hydroxylation of RepoMan by PHD1, but rather to answer the question, ‘Can recombinant PHD1 alone hydroxylate P604 on RepoMan in vitro, yes or no?’. The data show that the answer here is ‘yes’. Clearly, the HIF peptide is a more efficient substrate in vitro for recombinant PHD1 than the RepoMan peptide and we have now included a statement in the Discussion that addresses the significance of this observation more directly.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Jiang et al. developed a robust workflow for identifying proline hydroxylation sites in proteins. They identified proline hydroxylation sites in HEK293 and RCC4 cells, respectively. The authors found that the more hydrophilic HILIC fractions were enriched in peptides containing hydroxylated proline residues. These peptides showed differences in charge and mass distribution compared to unmodified or oxidized peptides. The intensity of the diagnostic hydroxyproline iminium ion depended on parameters including MS collision energy, parent peptide concentration, and the sequence of amino acids adjacent to the modified proline residue. Additionally, they demonstrate that a combination of retention time in LC and optimized MS parameter settings reliably identifies proline hydroxylation sites in peptides, even when multiple proline residues are present.

      Strengths:

      Overall, the manuscript presents an advanced, standardized protocol for identifying proline hydroxylation. The experiments were well designed, and the developed protocol is straightforward, which may help resolve confusion in the field.

      Weaknesses:

      (1) The authors should provide a summary of the standard protocol for identifying proline hydroxylation sites in proteins that can easily be followed by others.

      This is a good suggestion and we have now included a figure (Figure 10) with a summary of our workflow in the current revision.

      (2) Cockman et al. proposed that HIF-α is the only physiologically relevant target for PHDs. Their approach is considered the gold standard for identifying PHD targets. Therefore, the authors should discuss the major progress they made in this manuscript that challenges Cockman's conclusion.

      While we had mentioned the Cockman et al., paper in the Introduction, we had not focussed on this somewhat controversial issue. However, in response to the Reviewer’s request, we have now added a comment in the Discussion section in the current revision of how our new data address the proposal discussed previously by Cockman et al. In brief, we believe that our findings are not consistent with a model in which PHDs have no protein targets other than HIFs.

      Reviewer #3 (Public review): 

      Summary:

      The authors present a new method for detecting and identifying proline hydroxylation sites within the proteome. This tool utilizes traditional LC-MS technology with optimized parameters, combined with HILIC-based separation techniques. The authors show that they pick up known hydroxy-proline sites and also validate a new site discovered through their pipeline.

      Strengths:

      The manuscript utilizes state-of-the-art mass spectrometric techniques with optimized collision parameters to ensure proper detection of the immonium ions, which is an advance compared to other similar approaches before. The use of synthetic control peptides on the HILIC separation step clearly demonstrates the ability of the method to reliably distinguish hydroxy-proline from oxidized methionine - containing peptides. Using this method, they identify a site on CDCA2, which they go on to validate in vitro and also study its role in regulation of mitotic progression in an associated manuscript.

      Weaknesses:

      Despite the authors' claim about the specificity of this method in picking up the intended peptides, there is a good amount of potential false positives that also happen to get picked (owing to the limitations of MS-based readout), and the authors' criteria for downstream filtering of such peptides require further clarification. In the same vein, greater and more diverse cell-based validation approach will be helpful to substantiate the claims regarding enrichment of peptides in the described pathway analyses.

      We of course agree that false positives may arise, as is true for essentially all PTM studies. There are two issues here; first, are identified sites technically correct? (i.e. not misidentifications from the MS data) and second, are the identified modifications of biological significance? We have addressed this using the popular MaxQuant software suite to evaluate the modifications identified and to control the false discovery rate (FDR) at both the precursor and protein level, as described in the manuscript. We are aware that false positives could arise from confusing oxidation of methionine with hydroxylation of proline. Therefore, to address the issue as to whether we could identify bona fide PHD protein targets outside of the HIF family, we adopted a conservative approach by simply filtering out peptides where there was a methionine residue within three amino acids of the predicted proline hydroxylation site. This was a pragmatic decision made to reduce the likelihood of false positives in our dataset and we recognise that this likely results in us overlooking some genuine proline hydroxylation sites that occur nearby methionine residues. To address the potential biological relevance of the proline hydroxylation sites identified, we analysed extracts from cells treated with FG inhibitors. Of course a detailed understanding of biological significance relies upon follow-on experimental analyses for each site, which we have performed for P604 on RepoMan in accompanying study by Druker et al., (eLife 2025; doi.org/10.7554/eLife.108131.1).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The finding that the immonium ion intensities of L/I did not increase with increasing collision energy was surprising. Was this specific to this synthetic peptide?

      We agree this is an interesting and unexpected finding. We have no reason to believe that it is specific to synthetic peptides per se, but rather think this reflects an effect of amino acid composition in the peptides analysed. It will be interesting to explore this phenomenon in more detail in future.

      (2) The sequence logos in Figure 4 seemed to lack any amino acid enrichment in most positions except for collagen peptides. Have these findings been tested with statistical analysis?

      The results we show for sequence logo analysis were generated using WebLogo (10.1101/gr.849004) and correspond to an analysis of all proline hydroxylated peptides we detected across all cell lines and replicates analysed. The fact that collagens are highly abundant proteins with very high levels of proline hydroxylation likely explains why collagen peptides dominated the outcome of the sequence logo analysis. There is clearly scope for more detailed follow up analysis in future of the sequence specificity of proline hydroxylation sites in no- collagen proteins that are validated PHD targets.

      (3) Overall figure quality was not ideal. The resolution and font sizes of figures should be carefully evaluated and adjusted. The figure legend should contain a title for the figure. Annotations of the figures were somewhat confusing. 

      We agree with the criticism of the figure resolution in the review copies - the lower resolution appears to have been generated after we had uploaded higher resolution original images. We are providing again higher resolution versions of all figures for the current revision.

      Reviewer #3 (Recommendations for the authors):

      Certain concerns regarding portions of the manuscript that need addressing:

      (1) " These data show that two different cell lines show unique profiles of proteins with hydroxylated peptides." - It is difficult to conclusively say this statement after profiling the prolyl hydroxy proteome from just two cell lines, especially since the amino acids with the highest frequency in the most enriched peptides are similar in both cell lines.

      We agree with this point and have changed the current revision to state instead, “This shows that each of the two cell lines analysed have distinct profiles.”

      (2) "We noted that there was a high frequency of a methionine residues appearing either at the first, second, or even third positions after the HyPro site.." - according to the authors, claim, the advantage of their method was that they were able to overcome the limitation of older methods that couldn't separate methionine oxidation from proline hydroxylation. However, in this statement, they say that the high frequency of methionine residues may be because of the similar mass shift. These statements are contradictory. The authors should either tone down the claim or prove that those are indeed hydroxyproline sites. Is it possible that in the filtering step of excluding these high-frequency of methionine - containing peptides, we are losing potential positive hits for hydroxy-proline sites? What is the authors' take on this?

      We respectfully do not agree that our, “statements are contradictory”, with respect to the potential confusion between identification of methionine oxidation and proline hydroxylation, but acknowledge that we have not explained this issue clearly enough. It is a fact that the similar mass shift resulting from proline hydroxylation and methionine oxidation is a technical challenge that can potentially lead to misidentifications in MS studies and that is what we state clearly in the manuscript. We have addressed this issue head on experimentally in this study and show using synthetic peptides how detailed analysis of specific proline hydroxylation sites in target proteins can be distinguished from methionine oxidation, based upon differential chromatographic behaviour of peptides with either hydroxylated proline or oxidised methionine, as well as by detailed analysis of fragmentation spectra. However, in the case of our global analysis, as we were not able to perform synthetic peptide comparisons for every putative site identified, we took the pragmatic approach of filtering out examples of peptides where a methionine residue was present within three residues of a potential proline hydroxylation site. This was done simply to reduce the possibility of misidentification in the set of novel proline hydroxylated peptides identified and we accept that as a consequence we are likely filtering out peptides that include bona fide proline hydroxylation sites. We have clarified this point in the current revision and hope to be able to address this issue more comprehensively in future studies.

      (3) "Accordingly, a score cut-off of 40 for hydroxylated peptides and a localisation probability cut-off of more than 0.5 for hydroxylated peptides was performed." Could the authors shed more light and clarify what was the basis for this value of cut-off to be used in this filtering step? Is this sample dependent? What should be the criteria to determine this value?

      We used MaxQuant software (10.1016/j.cell.2006.09.026), for PTM analysis, in which a localization probability score of 0.75 and score cut-off of 40 is a commonly used threshold to define high confidence. The reason that we used 0.5 at the first step was to investigate how likely it might be that the misassignment of delta m/z +16 Da (oxidation) on Methionine would affect the identification of hydroxylation on Proline. However, we note that in the final results set used for analysis, all putative proline hydroxylated peptides that had a Methionine residue near to the hydroxylated proline were disregarded as a pragmatic step to reduce the probability of false identifications.

      (4) The authors are requested to kindly make the HPLC and MS traces more legible and use highresolution images, with clearly labeled values on the peaks. Kindly extract coordinates from the underlying data files to plot the curves if needed to make it clearer.

      We have reviewed the clarity of all images and figures in the current revision.

      (5) There seems to be no error bars in Figure 3, Figure 7E, and panels of Figure 8 with bar graphs. Are those single replicate data?

      These specific figures are from single replicate data.

      (6) For Figure 9C, the control with only PHD1 (no peptide) is missing. 

      The ‘no peptide control’ was not included in the figure because it is simply a blank lane and there is nothing to see.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study aimed to determine whether bacterial translation inhibitors affect mitochondria through the same mechanisms. Using mitoribosome profiling, the authors found that most antibiotics, except telithromycin, act similarly in both systems. These insights could help in the development of antibiotics with reduced mitochondrial toxicity.

      They also identified potential novel mitochondrial translation events, proposing new initiation sites for MT-ND1 and MT-ND5. These insights not only challenge existing annotations but also open new avenues for research on mitochondrial function.

      Strengths:

      Ribosome profiling is a state-of-the-art method for monitoring the translatome at very high resolution. Using mitoribosome profiling, the authors convincingly demonstrate that most of the analyzed antibiotics act in the same way on both bacterial and mitochondrial ribosomes, except for telithromycin. Additionally, the authors report possible alternative translation events, raising new questions about the mechanisms behind mitochondrial initiation and start codon recognition in mammals.

      Weaknesses:

      The main weaknesses of this study are:

      While the authors highlight an interesting difference in the inhibitory mechanism of telithromycin on bacterial and mitochondrial ribosomes, mechanistic explanations or hypotheses are lacking.

      We acknowledge that we were not able to present a clear explanation for potential mechanistic differences of telithromycin inhibition between mitochondrial and bacterial ribosomes. In future work, structural analyses in collaboration with experts will provide these insights.

      The assignment of alternative start codons in MT-ND1 and MT-ND5 is very interesting but does not seem to fully align with structural data.

      We appreciate the reviewer’s comment and consulted a cryo-EM expert to review our findings in the context of the available structural data. We downloaded the density map and reviewed the N-termini of MT-ND1 and MT-ND5. We only observed the density of the N-terminus of MT-ND1 at low confidence. At an RMSD of 2, we could not observe density for the side chains of Met and Pro, and there are gaps in the density for what is modeled as the main chain. The assignment of these residues may have been overlooked due to the expectation that they should be present in the peptide.

      For MT-ND5, we did observe some density that could be part of the main chain; however, it did not fill out until we reduced the stringency, and we did not observe density mapping to side chain residues. To summarize, we do not confidently see density for either the side chain or the main chain for either peptide.

      The newly proposed translation events in the ncRNAs are preliminary and should be further substantiated with additional evidence or interpreted with more caution.

      We agree with the reviewer that we did not provide conclusive evidence that our phased ribosome footprinting data on mitochondrial non-coding RNAs are proof of novel translation events. We do acknowledge this in the main text:” Due to both the short ORFs, minimal read coverage, and lack of a detectable peptide we could not determine if translation elongation occurred on the mitochondrial tRNAs. These sites may be unproductive mitoribosome binding events or simply from tRNAs partially digesting during MNase treatment.”

      Reviewer #2 (Public review):

      In this study, the authors set out to explore how antibiotics known to inhibit bacterial protein synthesis also affect mitoribosomes in HEK cells. They achieved this through mitoribosome profiling, where RNase I and Mnase were used to generate mitoribosome-protected fragments, followed by sequencing to map the regions where translation arrest occurs. This profiling identified the codon-specific impact of antibiotics on mitochondrial translation.

      The study finds that most antibiotics tested inhibit mitochondrial translation similarly to their bacterial counterparts, except telithromycin, which exhibited distinct stalling patterns. Specifically, chloramphenicol and linezolid selectively inhibited translation when certain amino acids were in the penultimate position of the nascent peptide, which aligns with their known bacterial mechanism. Telithromycin stalls translation at an R/K-X-R/K motif in bacteria, and the study demonstrated a preference for arresting at an R/K/A-X-K motif in mitochondria. Additionally, alternative translation initiation sites were identified in MT-ND1 and MT-ND5, with non-canonical start codons. Overall, the paper presents a comprehensive analysis of antibiotics in the context of mitochondrial translation toxicity, and the identification of alternative translation initiation sites will provide valuable insights for researchers in the mitochondrial translation field.

      From my perspective as a structural biologist working on the human mitoribosome, I appreciate the use of mitoribosome profiling to explore off-target antibiotic effects and the discovery of alternative mitochondrial translation initiation sites. However, the description is somewhat limited by a focus on this single methodology. The authors could strengthen their discussion by incorporating structural approaches, which have contributed significantly to the field. For example, antibiotics such as paromomycin and linezolid have been modeled in the human mitoribosome (PMID: 25838379), while streptomycin has been resolved (10.7554/eLife.77460), and erythromycin was previously discussed (PMID: 24675956). The reason we can now describe off-target effects more meaningfully is due to the availability of fully modified human mitoribosome structures, including mitochondria-specific modifications and their roles in stabilizing the decoding center and binding ligands, mRNA, and tRNAs (10.1038/s41467-024-48163-x).

      These and other relevant studies should be acknowledged throughout the paper to provide additional context.

      We appreciate the work that has previously revealed how different antibiotics bind the mitochondrial ribosome. We have included these references in the manuscript to provide background and context for this work in relationship to the field.

      Reviewer #3 (Public review):

      Summary:

      Recently, the off-target activity of antibiotics on human mitoribosome has been paid more attention in the mitochondrial field. Hafner et al applied mitoribosome profilling to study the effect of antibiotics on protein translation in mitochondria as there are similarities between bacterial ribosome and mitoribosome. The authors conclude that some antibiotics act on mitochondrial translation initiation by the same mechanism as in bacteria. On the other hand, the authors showed that chloramphenicol, linezolid and telithromycin trap mitochondrial translation in a context-dependent manner. More interesting, during deep analysis of 5' end of ORF, the authors reported the alternative start codon for ND1 and ND5 proteins instead of previously known one. This is a novel finding in the field and it also provides another application of the technique to further study on mitochondrial translation.

      Strengths:

      This is the first study which applied mitoribosome profiling method to analyze mutiple antibiotics treatment cells.

      The mitoribosome profiling method had been optimized carefully and has been suggested to be a novel method to study translation events in mitochondria. The manuscript is constructive and written well.

      Weaknesses:

      This is a novel and interesting study, however, most of the conclusion comes from mitoribosome profiling analysis, as a result, the manuscript lacks the cellular biochemical data to provide more evidence and support the findings.

      We thank the reviewer for the positive assessment of our work. We agree that future biochemical and structural experiments will strengthen the conclusions we derive from the ribosome profiling.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In Fig. 1A, the quality of the Western blot for the sucrose gradient is suboptimal. I recommend enhancing the quality of the Western blot image and providing the sucrose gradient sedimentation patterns for both the mtSSU and mtLSU to confirm the accurate selection of the monosome fraction. Additionally, to correctly assign the A260 peaks to mitochondrial and cytosolic ribosomes, it would be helpful to include markers for both the cytoribosomal LSU and SSU, too. Furthermore, do the authors observe mitochondrial polysomes in their sucrose gradient? If so, were those fractions fully excluded from the analysis?

      We repeated our sucrose gradient and Western blotting with antibodies for the large and small subunits of the mitoribosome. We did not repeat western blotting for the cytoribosomes as the 40S, 60S, and 80S peaks are present in their canonical heights and locations on a sucrose gradient. Western blotting indicates that the large and small subunits of the mitoribosome are located in the fraction taken for mitoribo-seq. We do see trace amounts of mitoribosome in fractions past the 55S site. Those fractions were excluded from library preparation.

      The MNase footprints exhibited a bimodal distribution, which the authors suggest may indicate that "MNase-treatment may have captured two distinct conformations of the ribosome." It would be relevant to clarify whether an enzyme titration was performed, as excessive MNase could lead to ribosomal RNA degradation, potentially influencing the footprints.

      We did not perform a titration and instead based our concentration on the protocol from Rooijers et al, 2013. We included a statement of this and a reference to the concentration in the methods.

      Is there an explanation for why RNase I footprinting reveals a very high peak at the 5'-end of the MT-CYB transcript, whereas this is not observed with MNase footprinting?

      It is not clear. The intensity of peaks at the 5’ end of the transcripts varies. We do observe that the relative intensity of the 5’ peak is greater for RNase I footprinted samples than MNase-treated samples.

      I understand that throughout the manuscript, the authors use MT-CYB as an example to illustrate the effects of the antibiotics on mitochondrial translation. However, to strengthen the generality of the conclusions, it would be beneficial to provide the read distribution across the entire mitochondrial transcriptome, possibly in the supplementary material. Additionally, I suggest including the read distribution for MT-CYB in untreated cells to improve data interpretation and enhance the clarity of figures (e.g., Figs. 1B, 2B, 3B).

      As these experiments were generated across multiple mitoribo-seq experiments, each was done with its own control experiment. It would be inaccurate to show a single trace as representative of all experiments. Instead, we include Supplementary Figure 1, which shows the untreated MT-CYB trace for all control samples and indicates which treatment they pair with.

      It would be very valuable to label each individual data point in the read phasing shown in Fig. 1D with the corresponding transcripts. For improved data visualization, I suggest assigning distinct colors to each transcript.

      We are concerned that including the name of each gene in the main figure would be too difficult for the reader to accurately interpret. Instead, we have added a Supplementary Table with those values.

      How do the authors explain the significant peak (approx. 10,000 reads) at the 5' end of the transcript in the presence of tiamulin (Fig. 2B)? Does this peak correspond to the start codon, and how does it relate to the quantification reported in Fig. 2C?

      Yes, this represents the start codon. These reads are likely derived from the start codon as they are mapping to the 5’ end of the transcript. There are differences in sequencing depth depending on the experiment, so what is critical is the relative distribution of reads on the transcript rather than comparing absolute reads between experiments. MT-CYB has 54% of the reads at the start site, which is representative of what we see across all genes.

      Throughout the manuscript, I found the usage of the terms "5' end" and "start codon" somewhat confusing, as they appear to be used synonymously in some instances. For example, in Fig. 2C, the y-axis label states "ribosomes at start codon," while the figure caption mentions "...percentage of reads that map to the 5' end of mitochondrial transcripts." Given the size of the graphs, it is also challenging for the reader to determine whether the peaks correspond specifically to the start codon or if multiple peaks accumulate at the initial codons.

      We were selected for this language because two different types of analysis are being carried out. Ribosome profiling carried out in Figures 2 and 3 is carried out with RNase I, which poorly maps the ribosomes at the start codon when we do the read length analysis in Figure 4. Ribosome footprint at the 5’ end may include ribosomes that are on the 2-4 codons following the start codon, so it would not be accurate to label those as “ribosomes at a start codon.” We have renamed the axis to “Ribosomes at 5’ end”. Wig files are available online for all mitoribosome profiling experiments. In this case, the assigned “P-site” is several codons after the start codon due to the offset applied and the minimal 5’ UTR. Thus, it is less important to see which codon density is assigned to, but rather the general distribution of the reads.

      The authors state, "Cells treated with telithromycin did show a slight increase in MRPF abundance at the 5' end of MT-CYB" and "the cumulative distribution of MRPFs suggested that ribosome density was biased towards the 5' end of the gene for chloramphenicol and telithromycin, but not significantly for linezolid." Could this observation be linked to the presence of specific stalling motifs in that region? If so, it would be beneficial to display such motifs on the graphs of the read distribution across the transcriptome to substantiate the context-dependent inhibition.

      Thank you for this suggestion. For chloramphenicol and linezolid, alanine, serine, and threonine make up nearly 25% of the mitochondrial proteome. As such, there are numerous stall sites across the transcript. Given their identical stalling motifs, the difference between chloramphenicol and linezolid is due to sequence-specific differences. Potentially, this could be due to conditions such as the final concentration of antibiotic inside the mitochondria and the on/off rate of an inhibitor with the translating mitoribosome. Both may affect the kinetics of stalling and allow mitoribosomes to evade early stall sites.

      We have also included the sites of all A/K/R-X-K motifs located in the genome and the calculated fold change for each position. As a note, this includes sites that do not pass the minimum filter set by our analysis and we note this in the text.

      The comment raises an additional question: Does the increased density at the 5’ end derive from stalled mitoribosomes or queued mitoribosomes behind a stalling event? Recent work by Iwasaki’s group shows that mitoribosomes can form disomes and queue behind each other. However, we could not observe 30 aa periodicities behind stalling events that would be indicative of collided mitoribosomes.

      In Fig. 3E, the authors report an additional and very interesting observation that is not discussed. Linezolid treatment causes reduced ribosome occupancy when proline or glycine codons occupy the P-site, or when the amino acids have been incorporated into the polypeptide chain and occupy the -1 position. It is known that the translation of proline and glycine frequently leads to ribosome stalling due to the physicochemical properties of these amino acids. Has this effect of linezolid been reported in the bacterial translation system? Additionally, can the authors propose hypotheses for the mechanism behind this observation? A similar observation is noted for telithromycin when glycine occupies the same positions, as well as when aspartate occupies the P- and A-sites.

      In bacteria, Linezolid does have an “anti-stalling” motif when glycine is present in the A-site. However, this is due to the size of the residue being compatible with antibiotic binding.

      The most likely cause of this effect is a redistribution of ribosome footprints. As the antibiotics introduce new arrest sites, ribosome density at other sites relatively decreases. This is likely an artifact from mitoribosomes redistributing from endogenously slow codons to new arrest sites. When looking at carrying out our disome profiling in the presence of anisomycin, we see a similar effect. Cytoribosomes are redistributed from endogenous stalling sites, such as proline, and are redistributed throughout the gene. As a result, translation at proline appears “more efficient” upon treatment with an inhibitor but is instead an artifact of analysis.

      Figure 3F could benefit from indicating which mtDNA-encoded protein corresponds to each of the strongest stalling motifs.

      We have included a supplementary figure to highlight which mitochondrially-encoded genes containing the R/K/A-X-K motif and noted in the text that mitochondrial translation may be unevenly inhibited.

      The legend "increasing mRPF abundance" in Fig. 4C may be missing the corresponding colors.

      The legend applies to all sections of the figure. We double-checked the range of the colors in the tables, and they do match the legend.

      The observation that the start codons in MT-ND1 and MT-ND5 might differ from the annotated canonical ones is intriguing. While the ribosome profiling data appear clear, mass spectrometry (MS) analysis may be misleading. The absence of evidence does not necessarily imply evidence of absence. How does this proposed conclusion correlate with the structural data obtained from HEK cells? For instance, the cryo-EM structural model of a complex I-containing human supercomplex (PDB: 5XTD, PMID: 28844695) shows the presence of Pro2 in MT-ND1 and the full-length MT-ND5 protein. The authors should carefully examine structural data to ascertain whether alternative forms of MT-ND1 and MT-ND5 are actually observed in the assembled complex I.

      We really appreciate this comment. We sat down with an expert in cryo-EM and reviewed the figure. We downloaded the density map and reviewed the N-termini of MT-ND1 and MT-ND5. We only observed the density of the N-terminus of MT-ND1 at low confidence. At an RMSD of 2, we could not observe density for the side chains of Met and Pro, and there are gaps in the density for what is modeled as the main chain. The assignment of these residues may have been overlooked due to the expectation that they should be present in the peptide.

      For MT-ND5, we did observe some density that could be part of the main chain; however, it did not fill out until we reduced the stringency, and we did not observe density mapping to side chain residues. To summarize, we do not confidently see density for either the side chain or the main chain for either peptide.

      Given that ribosome profiling is based on the assumption that ribosomes protect mRNA fragments from RNase digestion, interpreting the data related to Fig. 5 and the proposed existence of translation events involving ncRNAs is challenging. Most importantly, tRNAs and rRNAs are highly folded RNA molecules and, by definition, are protected by ribosomal proteins. Simultaneously, as the authors point out, "These reads could either be products of random digestion of the abundant background of ncRNAs or be genuine MRPFs." RNase I preferentially digests single-stranded RNA (ssRNA), but excess enzyme can still lead to degradation. Consequently, many random tRNA/rRNA fragments may be generated by RNase digestion, potentially resulting in artifacts. I suggest that the authors examine what happens to these reads when mitochondrial translation is inhibited.

      We have low-quality mitoribo-seq with initiation inhibitors and Mnase showing footprints of the same size. We do not have a small-molecule inhibitor that is able to completely ablate translation, as they instead stabilize mitoribosomes at different steps in translation. We have considered alternative ways of capturing a background rRNA and tRNA digestion pattern; however, these have their own drawbacks. Dissociation with EDTA prior to digestion or carrying out library prep on the small and large subunits may capture mitoribosomes no longer in the process of translation; however, dissociated subunits would have different surfaces now available for digestion and may not capture tRNAs.

      Regarding the statement, "While the ORF on MT-TS1 is longer, MRPF density was low and we did not observe read phasing and thus it is likely not translated (not shown)," the data should not be excluded unless a clear explanation is provided for why translation would not occur from this specific RNA.

      We have included this value in the graph as well as in Supplementary Figure 1.

      The graph in Fig. 5B shows the periodicity of only the putative RNR1 ORF, but not that of the other proposed ORFs. What is the reason for this?

      We have included the MT-TS1 putative ORF in Figure 5 and Figure S1. Other ORFs did not have density in the ORF. If these are real mitoribosome footprints at these start codons, it may be due to them being transient binding events that never result in elongation. Alternatively, they may be due to tRNA degradation during library preparation.

      The assumption that the UUG codon can serve as a start site for mitochondrial translation has not been substantiated. Recent data have identified translation initiation events from non-ATG/ATA codons (near-cognate and sub-cognate) using retapamulin, but UUG was not among them. Can the authors detect such events in their ribosome profiling data collected in the presence of retapamulin, tiamulin, or josamycin?

      The report of translation initiation at non-ATG/ATA codons strongly disagrees with our findings. We report that sites of translation initiation observed within annotated coding regions in mitochondria occur at the annotated start sites, while the other report finds frequent alternative initiation events. We have looked for those arrest sites and did not observe them.

      In the section "Mitoribosome profiling reveals novel translation events," the title may be misleading given the preliminary nature of the results. To support such a claim, the authors should provide experimental evidence demonstrating that the proposed translation events genuinely exist and result in the synthesis of previously unidentified polypeptides. Alternatively, the interpretation should be approached with greater caution and more clearly indicated as preliminary.

      We agree with the reviewers that a distinction should be made between reporting truly novel translation events, like the recently reported MT-ND5-dORF, and sites we suspect mitoribosomes may be binding and that require detailed follow-up. We altered the section title to suggest that this may be showing novel translation events. Additionally, we included a statement in the discussion that these MRPFs may be simply tRNA digestion by RNase I.

      Although located at the 5' end of RNR1, the newly identified ORF is situated 79 nt downstream. According to current knowledge, this appears to be a lengthened 5' UTR that may hinder mitoribosome loading. The authors should speculate on potential initiation mechanisms.

      The start of the putative ORF is not located 79 nts down, but at the 8<sup>th</sup> nucleotide. The reviewer may be including the tRNA-Phe in their calculation, which is cleaved from MT-RNR1. This start site is closer to the 5’ end than our findings with MT-ND5.

      To enhance the interpretation of the mitoribosome profiling data, the authors could complement their findings with classical metabolic labeling using (35)S-methionine. This approach would allow for a different assessment of the stringency of the inhibition under the tested experimental conditions.

      We are currently working on these experiments using mito-funcats. A future direction we are taking this work is to understand how the cell responds to different mechanisms of translation inhibition. For example, we are trying to understand if telithromycin, which appears highly selective, only partially inhibits translation of the mtDNA-encoded proteome.

      Reviewer #2 (Recommendations for the authors):

      Other small editorial comments:

      Line 24: "translate proteins"?

      Revised for clarity

      Line 24: The sentence describing mitochondrial translation as "closely resembling the one in prokaryotes" could be reformulated. While the core of the mitoribosome is conserved, the entire apparatus has many mitochondria-specific features.

      Since this is the abstract, we simplified the point by saying that mitoribosomes are more similar to prokaryotic than cytosolic ribosomes.

      Clarified to highlight that the mitochondrial system is more similar to the bacterial system than the eukaryotic system.

      Line 33: "novel" or "previously unrecognized" ?

      Rewritten for clarity.

      Lines 33-35: The claim made here is not shown in the paper.

      We removed the more aspirational goal of this paper and focused on the main findings of the paper.

      Lines 44, 47, 89 (and elsewhere): "cytoplasmic" or "cytosolic" ?

      Both terms are used in the literature. We opted for cytoplasmic as it can also include ribosomes not free in the cytosol, such as those bound to the ER.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should state why they chose these antibiotics for mitoribosome profiling analysis over other antibiotics from same group. Did they screen multiple antibiotics to determine the candidates for next step?

      We selected antibiotics that had a known stalling motif in bacteria (initiation or context-dependent elongation inhibitors). In addition, we carried out mitoribosome profiling with erythromycin, azithromycin, thiostrepton, and kanamycin in this work. However, we did not see any effect from these drugs in mitoribosome profiling. We are currently testing other inhibitors, such as doxycycline and tigecycline, and looking at optimizing treatment conditions to identify stalling motifs in samples that previously showed no difference.

      (2) What is the reason for choosing the concentration of antibiotics retapamulin, tiamulin and josamycin, this is IC50 value or above this value? On the other hand, none of this information has been provided for the antibiotics in the next part. The authors should provide biochemical study for the effect of these antibiotics on cell survival and/or protein translation such as S35 assay or steady state level of mtDNA-encoded proteins upon cell treatment with these antibiotics.

      Prior to mitoribo-seq, we carried out time and concentration assays with all antibiotics. 100 µg/ml and a 30-minute treatment was tolerable for all antibiotics except retapamulin. We aimed to treat cells with a relatively high concentration of inhibitor in order to capture actively translating mitoribosomes. We were concerned that longer treatments may lead to decreased translation initiation, leading to the capture of fewer mitoribosomes. These concentrations were nearly identical to contemporary conditions carried out in Bibel et al, RNA 2025.

      (3) Why did the authors choose MT-CYB as the representative for further analysis in the second and third parts of the manuscript?

      We chose MT-CYB because its length allowed for easy visualization. Some mitochondrial genes, such as MT-ND6, had a propensity for stronger stalling at initiation. While coverage was throughout the genes, it was difficult to visualize the changes within the ORF. Also, MT-CYB was less visually complex than polycistronic transcripts. All wigs were uploaded to GEO.

      (4) Page 11, line 233-234: the authors state that telithromycin induces stalling at R/K/A-X-K motif. The authors should do further analysis on mitochondrial genome which proteins contain this motif. Furthermore, same as comment 2: the authors should confirm by 35S assay or WB to know which mtDNA-encoded proteins are affected.

      We have included a supplementary figure showing which mitochondrial genes contain these motifs.

      (5) The results and conclusion from the fourth paragraph are very interesting. The authors suggest alternative start codon for two mtDNA encoded proteins: ND1 and ND5 based on ribosome profiling analysis. Again, I have several comments on this part: <br /> (a) For the accumulation of the alternative start codon of ND1 and ND5 as suggested in the manuscript, do the authors observe this trend with the initiation inhibitors used in the second paragraphs of the manuscript?

      We did not observe similar read lengths with retapamulin, tiamulin, or josamycin, which produced read lengths that were consistent with other RNase I footprinted samples.

      (b) This observation was further confirmed by MS with a peptide form ND1 protein, the authors should show MS peak indicating MW of the peptide and MS/MS data for the peptide which supports this hypothesis.

      We are including the MS/MS report for this peptide.

      (c) Interestingly, several high-resolution structures of mammalian complex I have been reported so far (PMID: 7614227, 10396290, 38870289), ND1 and ND5 protein express full sequences with fMet at the distal N-terminal. This is different to the suggestion from the manuscript. Could the author discuss or comment on that?

      This point was brought up by another reviewer. We have carefully analyzed the density map of PMID: 28844695. We sat down with an expert in cryo-EM and reviewed the figure. We downloaded the density map and reviewed the N-termini of MT-ND1 and MT-ND5. We only observed the density of the N-terminus of MT-ND1 at low confidence. At an RMSD of 2, we could not observe density for the sidechains of Met and Pro, and there is a gap in density for what is modeled as the main chain. The assignment of these residues may have been overlooked due to the expectation that they should be present in the peptide.

      For MT-ND5, we did observe some density that could be part of the main chain; however, it did not fill out until we reduced the stringency, and we did not observe density mapping to side chain residues. To summarize, we do not confidently see density for either the side chain or the main chain for either peptide.

      Minor comments:

      The method should be written more accurately for easily repeating experiments by other groups. For example:

      (1) The authors should indicate what was exact HEK293 cell line used in this study.

      We have indicated the exact cell line.

      (2) Page 22, line 471: which (number) fractions had been collected. The Western Blot analysis shown in Figure 1A should be repeated with both proteins from small and large subunits.

      We have repeated the Western blot with antibodies for large and small subunits. We took fractions 8 and 9, which are now indicated in the text and figure.

      (3) Page 23, line 502: is this number of cells used for MS experiment is correct? Or is this number of cells per mL?

      This is correct and is based on the kit protocol. It is not cells per mL. We have clarified the kit being used in the methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This work provides an important resource identifying 72 proteins as novel candidates for plasma membrane and/or cell wall damage repair in budding yeast, and describes the temporal coordination of exocytosis and endocytosis during the repair process. The data are convincing; however, additional experimental validation will better support the claim that repair proteins shuttle between the bud tip and the damage site.

      We thank the editors and reviewers for their positive assessment of our work and the constructive feedback to improve our manuscript. We agree with the assessment that additional validation of repair protein shuttling between the bud tip and the damage site is required to further support the model.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yamazaki et al. conducted multiple microscopy-based GFP localization screens, from which they identified proteins that are associated with PM/cell wall damage stress response. Specifically, the authors identified that budlocalized TMD-containing proteins and endocytotic proteins are associated with PM damage stress. The authors further demonstrated that polarized exocytosis and CME are temporally coupled in response to PM damage, and CME is required for polarized exocytosis and the targeting of TMD-containing proteins to the damage site. From these results, the authors proposed a model that CME delivers TMD-containing repair proteins between the bud tip and the damage site.

      Strengths:

      Overall, this is a well-written manuscript, and the experiments are well-conducted. The authors identified many repair proteins and revealed the temporal coordination of different categories of repair proteins. Furthermore, the authors demonstrated that CME is required for targeting of repair proteins to the damage site, as well as cellular survival in response to stress related to PM/cell wall damage. Although the roles of CME and bud-localized proteins in damage repair are not completely new to the field, this work does have conceptual advances by identifying novel repair proteins and proposing the intriguing model that the repairing cargoes are shuttled between the bud tip and the damaged site through coupled exocytosis and endocytosis.

      Weaknesses:

      While the results presented in this manuscript are convincing, they might not be sufficient to support some of the authors' claims. Especially in the last two result sessions, the authors claimed CME delivers TMD-containing repair proteins from the bud tip to the damage site. The model is no doubt highly possible based on the data, but caveats still exist. For example, the repair proteins might not be transported from one localization to another localization, but are degraded and resynthesized. Although the Gal-induced expression system can further support the model to some extent, I think more direct verification (such as FLIP or photo-convertible fluorescence tags to distinguish between pre-existing and newly synthesized proteins) would significantly improve the strength of evidence.

      Major experiment suggestions:

      (1) The authors may want to provide more direct evidence for "protein shuttling" and for excluding the possibility that proteins at the bud are degraded and synthesized de novo near the damage site. For example, if the authors could use FLIP to bleach budlocalized fluorescent proteins, and the damaged site does not show fluorescent proteins upon laser damage, this will strongly support the authors' model. Alternatively, the authors could use photo-convertible tags (e.g., Dendra) to differentiate between preexisting repair proteins and newly synthesized proteins.

      We thank the reviewer for evaluating our work and giving us important feedback. We agree that the FLIP and photo-convertible experiments will further confirm our model. Here, due to time and resource constraints, we decided not to perform this experiment. Instead, we have discussed this limitation in 363-366. Our proposed model of repair protein shuttling should be further tested in our future work.

      (2) In line with point 1, the authors used Gal-inducible expression, which supported their model. However, the author may need to show protein abundance in galactose, glucose, and upon PM damage. Western blot would be ideal to show the level of fulllength proteins, or whole-cell fluorescence quantification can also roughly indicate the protein abundance. Otherwise, we cannot assume that the tagged proteins are only expressed when they are growing in galactose-containing media.

      Thank you very much for raising the concern and suggesting the important experiments.We agree that the Western blot experiment to confirm the mNG-Snc1 expression in each medium will further strengthen our conclusion. Along with point (1), further investigation of repair protein shuttling between the bud tip and the damage site and the mechanisms underlying it will be an important future direction. As described above, we have discussed this limitation in 363-366.

      (3) Similarly, for Myo2 and Exo70 localization in CME mutants (Figure 4), it might be worth doing a western or whole-cell fluorescence quantification to exclude the caveat that CME deficiency might affect protein abundance or synthesis.

      We thank the reviewer for suggesting the point. Following the reviewer’s suggestion, we quantified the whole-cell fluorescence of WT and CME mutants and verified that the effect of the CME deletion on the expression levels of Myo2-sfGFP and Exo70-mNG is minimal ( Figure S6). We added the description in lines 211-212.

      (4) From the authors' model in Figure 7, it looks like the repair proteins contribute to bud growth. Does laser damage to the mother cell prevent bud growth due to the reduction of TMD-containing repair proteins at the bud? If the authors could provide evidence for that, it would further support the model.

      Thank you very much for raising the important point. We speculate that the reduction of TMD-containing proteins at the bud by CME is one of the causes of cell growth arrest after PM damage (1). This is because TMD-containing repair proteins at the bud tip, including phospholipid flippases (Dnf1/Dnf2), Snc1, and Dfg5, are involved in polarized cell growth (2-4). This will be an important future direction as well.

      (5) Is the PM repair cell-cycle-dependent? For example, would the recruitment of repair proteins to the damage site be impaired when the cells are under alpha-factor arrest?

      Thank you for raising this interesting point. Indeed, the senior author Kono previously performed this experiment when she was in David Pellman’s lab. The preliminary results suggest that Pkc1 can be targeted to the damage site, without any impairment, under alpha-factor arrest. A more comprehensive analysis in the future will contribute to concluding the relation between PM repair and the cell cycle.

      Reviewer #2 (Public review):

      This paper remarkably reveals the identification of plasma membrane repair proteins, revealing spatiotemporal cellular responses to plasma membrane damage. The study highlights a combination of sodium dodecyl sulfate (SDS) and lase for identifying and characterizing proteins involved in plasma membrane (PM) repair in Saccharomyces cerevisiae. From 80 PM, repair proteins that were identified, 72 of them were novel proteins. The use of both proteomic and microscopy approaches provided a spatiotemporal coordination of exocytosis and clathrin-mediated endocytosis (CME) during repair. Interestingly, the authors were able to demonstrate that exocytosis dominates early and CME later, with CME also playing an essential role in trafficking transmembrane-domain (TMD)containing repair proteins between the bud tip and the damage site.

      Weaknesses/limitations:

      (1) Why are the authors saying that Pkc1 is the best characterized repair protein? What is the evidence?

      We would like to thank the reviewer for taking his/her time to evaluate our work and for valuable suggestions. We described Pkc1 as “best characterized” because it was the first protein reported to accumulate at the laser damage site in budding yeast (5). However, as the reviewer suggested, we do not have enough evidence to describe Pkc1 as “best characterized”. We therefore used “one of the known repair proteins” to mention Pkc1 in the manuscript (lines 90-91).

      (2) It is unclear why the authors decided on the C-terminal GFP-tagged library to continue with the laser damage assay, exclusively the C-terminal GFP-tagged library. Potentially, this could have missed N-terminal tag-dependent localizations and functions and may have excluded functionally important repair proteins

      Thank you very much for the comments. We decided to use the C-terminal GFP-tagged library for the laser damage assay because we intended to evaluate the proteins of endogenous expression levels. The N-terminal sfGFP-tagged library is expressed by the NOP1 promoter, while the C-terminal GFP-tagged library is expressed by the endogenous promoters. We clarified these points in lines 114-118. We agree with the reviewer on that we may have missed some portion of repair proteins in the N-terminaldependent localization and functions by this approach. Therefore, in our manuscript, we discussed these limitations in lines 281-289.

      (3) The use of SDS and laser damage may bias toward proteins responsive to these specific stresses, potentially missing proteins involved in other forms of plasma membrane injuries, such as mechanical, osmotic, etc.). SDS stress is known to indirectly induce oxidative stress and heat-shock responses.

      Thank you very much for raising this point. We agree that the combination of SDS and laser may be biased to identify PM repair proteins. Therefore, in the manuscript, we discussed this point as a limitation of this work in lines 292-298.

      (4) It is unclear what the scale bars of Figures 3, 5, and 6 are. These should be included in the figure legend.

      We apologize for the missing scale bars. We added them to the legends of the figures in the manuscript.

      (5) Figure 4 should be organized to compare WT vs. mutant, which would emphasize the magnitude of impairment.

      Thank you for raising this point. Following the suggestion, we updated Figure 4. In the Figure 4, we compared WT vs mutant in the manuscript. We clarified it in the legends in the manuscript. 

      (6) It would be interesting to expand on possible mechanisms for CME-mediated sorting and retargeting of TMD proteins, including a speculative model.

      Thank you very much for this important suggestion. We think it will be very important to characterize the mechanism of CME-mediated TMD protein trafficking between the bud tip and the damage site. In the manuscript, we discussed the possible mechanism for CME activation at the damage site in lines 328-333. We speculate that the activation of the CME may facilitate the retargeting of the TMD proteins from the damage site to the bud tip.

      We do not have a model of how CMEs activate at the bud tip to sort and target the TMD proteins to the damage site. One possibility is that the cell cycle arrest after PM damage (1) may affect the localization of CME proteins because the cell cycle affects the localization of some of the CME proteins (6). We will work on the mechanism of repair protein sorting from the bud tip to the damage site in our future work.

      Reviewer #3 (Public review):

      Summary:

      This work aims to understand how cells repair damage to the plasma membrane (PM). This is important, as failure to do so will result in cell lysis and death. Therefore, this is an important fundamental question with broad implications for all eukaryotic cells. Despite this importance, there are relatively few proteins known to contribute to this repair process. This study expands the number of experimentally validated PM from 8 to 80. Further, they use precise laser-induced damage of the PM/cell wall and use livecell imaging to track the recruitment of repair proteins to these damage sites. They focus on repair proteins that are involved in either exocytosis or clathrin-mediated endocytosis (CME) to understand how these membrane remodeling processes contribute to PM repair. Through these experiments, they find that while exocytosis and CME both occur at the sites of PM damage, exocytosis predominates in the early stages of repairs, while CME predominates in the later stages of repairs. Lastly, they propose that CME is responsible for diverting repair proteins localized to the growing bud cell to the site of PM damage.

      Strengths:

      The manuscript is very well written, and the experiments presented flow logically. The use of laser-induced damage and live-cell imaging to validate the proteome-wide screen using SDS-induced damage strengthens the role of the identified candidates in PM/cell wall repair.

      Weaknesses:

      (1) Could the authors estimate the fraction of their candidates that are associated with cell wall repair versus plasma membrane repair? Understanding how many of these proteins may be associated with the repair of the cell wall or PM may be useful for thinking about how these results are relevant to systems that do or do not have a cell wall. Perhaps this is already in their GO analysis, but I don't see it mentioned in the manuscript.

      We would like to thank the reviewer for taking his/her time to evaluate our work and valuable suggestions. We agree that this is important information to include. Although it may be difficult to completely distinguish the PM repair and cell wall repair proteins, we have identified at least six proteins involved in cell wall synthesis (Flc1, Dfg5, Smi1, Skg1, Tos7, and Chs3). We included this information in lines 142-146 in the manuscript.

      (2) Do the authors identify actin cable-associated proteins or formin regulators associated with sites of PM damage? Prior work from the senior author (reference 26) shows that the formin Bnr1 relocalizes to sites of PM damage, so it would be interesting if Bnr1 and its regulators (e.g., Bud14, Smy1, etc) are recruited to these sites as well. These may play a role in directing PM repair proteins (see more below).

      Thank you for the suggestion. We identified several Bnr1-interacting proteins, including Bud6, Bil1, and Smy1 (Table S2), although Bnr1 itself was not identified in our screening. This could be attributed to the fact that (1) C-terminal GFP fusion impaired the function of Bnr1, and (2) a single GFP fusion is not sufficient to visualize the weak signal at the damage site. Indeed, in reference 26, 3GFP-Bnr1 (N-terminal 3xGFP fusion) was used.

      (3) Do the authors suspect that actin cables play a role in the relocalization of material from the bud tip to PM damage sites? They mention that TMD proteins are secretory vesicle cargo (lines 134-143) and that Myo2 localizes to damage sites. Together, this suggests a possible role for cable-based transport of repair proteins. While this may be the focus of future work, some additional discussion of the role of cables would strengthen their proposed mechanism (steps 3 and 4 in Figure 7).

      Thank you very much for the suggestion. We agree that actin cables may play a role in the targeting of vesicles and repair proteins to the damage site. Following the reviewer’s suggestion, we discussed the roles of Bnr1 and actin cables for repair protein trafficking in lines 309-313 in the manuscript.

      (4) Lines 248-249: I find the rationale for using an inducible Gal promoter here unclear. Some clarification is needed.

      Thank you for raising this point. We clarified this as possible as we could in lines 249255 in the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The N-terminal GFP collection screen is interesting but seems irrelevant to the rest of the results. The authors discussed that in the discussion part, but it might be worth showing how many hits from the laser damage screen (in Figure 2) overlap with the Nterminal GFP screen hits.

      Thank you for the suggestion. We found that 48 out of 80 repair proteins are hits in the N-terminal GFP library (Table S1 and S2). This result suggested that the N-terminal library is also a useful resource for identifying repair proteins. In the manuscript, we discussed it in lines 288-289.

      (2) SDS treatment seems a harsh stressor. As the authors mentioned, the overlapped hits from the N- and C-terminal GFP screen might be more general stress factors. Thus, I think Line 84 (the subtitle) might be overclaiming, and the authors might need to tone down the sentence.

      Thank you for the suggestion. Following the reviewer’s suggestion, we changed the sentence to “Proteome-scale identification of SDS-responsive proteins” in the manuscript. We believe that the new sentence describes our findings more precisely.

      (3) Line 103-106, it does not seem obvious to me that the protein puncta in the cytoplasm are due to endocytosis. The authors might need to provide more experimental evidence for the conclusion, or at least provide more reasoning/references on that aspect (e.g.,several specific protein hits belonging to that group have been shown to be endocytosed).

      Thank you very much for raising this point. We agree with the reviewer and deleted the description that these puncta are due to endocytosis in the manuscript.

      (4) For Figure 1D and S1C, the authors annotated some of the localization changes clearly, but some are confusing to me. For example," from bud tip/neck" to where? And from where to "Puncta/foci"? A clearer annotation might help the readers to understand the categorization.

      Thank you very much for the suggestion. These annotations were defined because it is difficult to conclusively describe the protein localization after SDS treatment. To convincingly identify the destination of the GFP fusion proteins, the dual color imaging of proteins with organelle markers or deep learning-based localization estimation is required. We feel that this might be out of the scope of this work. Therefore, as criteria, we used the localization of protein localization in normal/non-stressed conditions reported in (7) and the Saccharomyces Genome Database (SGD). We clarified this annotation definition in the manuscript (lines 413-436).

      (5) For localization in Figure 2C, as I understand, does it refer to6 the "before damage/normal" localization? If so, I think it would be helpful to state that these localizations are based on the untreated/normal conditions in the text.

      Yes, it refers to the “before damage/normal localization”. Following the reviewer’s suggestion, we stated that these localizations are based on these conditions in the manuscript (line 130).

      (6) The authors mentioned "four classes" in Line 120, but did not mention the "PM to cytoplasm" class in the text. It would be helpful to discuss/speculate why these transporters might contribute to PM damage repair.

      Thank you very much for this suggestion. We speculated that these transporters are endocytosed after PM damage because endocytosis of PM proteins contributes to cell adaptation to environmental stress (8). We mentioned it in the manuscript (lines 120-122).

      (7) Line 175-180 My understanding of the text is that the signals of Exo70-mNG/Dnf1mNG peak before the Ede1-mSc-I peaks. They occur simultaneously, but their dominating phase are different. It is clearer when looking at the data, but I think the conclusion sentences themselves are confusing to me. The authors might consider rewriting the sentences to make them more straightforward.

      Thank you very much for pointing this out. Following the reviewer’s suggestion, we revised the sentence (lines 177-182 in the manuscript).

      Reviewer #2 (Recommendations for the authors):

      It would be interesting to expand on the functional characterization of the 72 novel candidates and explore possible mechanisms for CME-mediated sorting and retargeting of TMD proteins by including a speculative model.

      Thank you very much for the comment. We agree that the further characterization of novel repair proteins and exploration of the possible mechanisms for CME-mediated TMD protein sorting and retargeting are truly important. This should be our important future direction.

      Reviewer #3 (Recommendations for the authors):

      The x-axis in Figure 1C is labeled 'Ratio' - what is this a ratio of?

      Thank you for raising this point. It is the ratio of the number of proteins associated with a GO term to the total number of proteins in the background. We clarified it in the legend of Figure 1C in the manuscript.

      References

      (1) K. Kono, A. Al-Zain, L. Schroeder, M. Nakanishi, A. E. Ikui, Plasma membrane/cell wall perturbation activates a novel cell cycle checkpoint during G1 in Saccharomyces cerevisiae. Proc Natl Acad Sci U S A 113, 6910-6915 (2016).

      (2) A. Das et al., Flippase-mediated phospholipid asymmetry promotes fast Cdc42 recycling in dynamic maintenance of cell polarity. Nat Cell Biol 14, 304-310 (2012).

      (3) M. Adnan et al., SNARE Protein Snc1 Is Essential for Vesicle Trafficking, Membrane Fusion and Protein Secretion in Fungi. Cells 12 (2023).

      (4) H.-U. Mösch, G. R. Fink, Dissection of Filamentous Growth by Transposon Mutagenesis in Saccharomyces cerevisiae. Genetics 145, 671-684 (1997).

      (5) K. Kono, Y. Saeki, S. Yoshida, K. Tanaka, D. Pellman, Proteasomal degradation resolves competition between cell polarization and cellular wound healing. Cell 150, 151-164 (2012).

      (6) A. Litsios et al., Proteome-scale movements and compartment connectivity during the eukaryotic cell cycle. Cell 187, 1490-1507.e1421 (2024).

      (7) W.-K. Huh et al., Global analysis of protein localization in budding yeast.Nature 425, 686-691 (2003).

      (8) T. López-Hernández, V. Haucke, T. Maritzen, Endocytosis in the adaptation to cellular stress. Cell Stress 4, 230-247 (2020).

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review):

      Summary:

      Morgan et al. studied how paternal dietary alteration influenced testicular phenotype, placental and fetal growth using a mouse model of paternal low protein diet (LPD) or Western Diet (WD) feeding, with or without supplementation of methyl-donors and carriers (MD). They found diet- and sex-specific effects of paternal diet alteration. All experimental diets decreased paternal body weight and the number of spermatogonial stem cells, while fertility was unaffected. WD males (irrespective of MD) showed signs of adiposity and metabolic dysfunction, abnormal seminiferous tubules, and dysregulation of testicular genes related to chromatin homeostasis. Conversely, LPD induced abnormalities in the early placental cone, fetal growth restriction, and placental insufficiency, which were partly ameliorated by MD. The paternal diets changed the placental transcriptome in a sex-specific manner and led to a loss of sexual dimorphism in the placental transcriptome. These data provide a novel insight into how paternal health can affect the outcome of pregnancies, which is often overlooked in prenatal care.

      Strengths:

      The authors have performed a well-designed study using commonly used mouse models of paternal underfeeding (low protein) and overfeeding (Western diet). They performed comprehensive phenotyping at multiple timepoints, including the fathers, the early placenta, and the late gestation feto-placental unit. The inclusion of both testicular and placental morphological and transcriptomic analysis is a powerful, non-biased tool for such exploratory observational studies. The authors describe changes in testicular gene expression revolving around histone (methylation) pathways that are linked to altered offspring development (H3.3 and H3K4), which is in line with hypothesised paternal contributions to offspring health. The authors report sex differences in control placentas that mimic those in humans, providing potential for translatability of the findings. The exploration of sexual dimorphism (often overlooked) and its absence in response to dietary modification is novel and contributes to the evidence-base for the inclusion of both sexes in developmental studies.

      Weaknesses:

      The data are overall consistent with the conclusions of the authors. The paternal and pregnancy data are discussed separately, instead of linking the paternal phenotype to offspring outcomes. Some clarifications regarding the methods and the model would improve the interpretation of the findings.

      (1) The authors insufficiently discuss their rationale for studying methyl-donors and carriers as micronutrient supplementation in their mouse model. The impact of the findings would be better disseminated if their role were explained in more detail.

      We acknowledge the Reviewer’s comments regarding the amount of detail in support of the inclusion of methyl carriers and donors within our diet. Therefore, we will revise the manuscript to include more justification, especially within the Introduction section, for their inclusion.

      (2) It is unclear from the methods exactly how long the male mice were kept on their respective diets at the time of mating and culling. Male mice were kept on the diet between 8 and 24 weeks before mating, which is a large window in which the males undergo a considerable change in body weight (Figure 1A). If males were mated at 8 weeks but phenotyped at 24 weeks, or if there were differences between groups, this complicates the interpretation of the findings and the extrapolation of the paternal phenotype to changes seen in the fetoplacental unit. The same applies to paternal age, which is an important known factor affecting male fertility and offspring outcomes.

      We thank the Reviewer for their comments regarding the ages of the males analysed. We will provide more detailed descriptions of the males in our manuscript. However, all male ages were balanced across all groups.

      (3) The male mice exhibited lower body weights when fed experimental diets compared to the control diet, even when placed on the hypercaloric Western Diet. As paternal body weight is an important contributor to offspring health, this is an important confounder that needs to be addressed. This may also have translational implications; in humans, consumption of a Western-style diet is often associated with weight gain. The cause of the weight discrepancy is also unaddressed. It is mentioned that the isocaloric LPD was fed ad libitum, while it is unclear whether the WD was also fed ad libitum, or whether males under- or over-ate on each experimental diet.

      We agree with the Reviewer that the general trend towards a lighter body weight for our experimental animals is unexpected. We can confirm that all diets were fed ad libitum. However, as males were group housed, we were unable to measure food consumption for individual males. We also observed that for males fed the high fat diets, they often shredded significant quantities of their diet, rather than eating it, so preventing accurate measurement of food intake.

      We also agree with the Reviewer that body weight can be a significant confounder for many paternal and offspring parameters. However, while the experimental males did become lighter, there were no statistical differences between groups in mean body weight. As such, body weight was not included as a variable within our statistical analysis.

      (4) The description and presentation of certain statistical analyses could be improved.

      (i) It is unclear what statistical analysis has been performed on the time-course data in Figure 1A (if any). If one-way ANOVA was performed at each timepoint (as the methods and legend suggest), this is an inaccurate method to analyse time-course data.

      (ii) It is unclear what methods were used to test the relative abundance of microbiome species at the family level (Figure 2L), whether correction was applied for multiple testing, and what the stars represent in the figure. 3) Mentioning whether siblings were used in any analyses would improve transparency, and if so, whether statistical correction needed to be applied to control for confounding by the father.

      We apologize for the lack of clarity regarding the statistical analyses. Going forward, we will revise the manuscript and include a more detailed description of the different analyses, the inclusion of siblings, and the correction for multiple testing.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated the effects of a low-protein diet (LPD) and a high sugar- and fat-rich diet (Western diet, WD) on paternal metabolic and reproductive parameters and fetoplacental development and gene expression. They did not observe significant effects on fertility; however, they reported gut microbiota dysbiosis, alterations in testicular morphology, and severe detrimental effects on spermatogenesis. In addition, they examined whether the adverse effects of these diets could be prevented by supplementation with methyl donors. Although LPD and WD showed limited negative effects on paternal reproductive health (with no impairment of reproductive success), the consequences on fetal and placental development were evident and, as reported in many previous studies, were sex-dependent.

      Strengths:

      This study is of high quality and addresses a research question of great global relevance, particularly in light of the growing concern regarding the exponential increase in metabolic disorders, such as obesity and diabetes, worldwide. The work highlights the importance of a balanced paternal diet in regulating the expression of metabolic genes in the offspring at both fetal and placental levels. The identification of genes involved in metabolic pathways that may influence offspring health after birth is highly valuable, strengthening the manuscript and emphasizing the need to further investigate long-term outcomes in adult offspring.

      The histological analyses performed on paternal testes clearly demonstrate diet-induced damage. Moreover, although placental morphometric analyses and detailed histological assessments of the different placental zones did not reveal significant differences between groups, their inclusion is important. These results indicate that even in the absence of overt placental phenotypic changes, placental function may still be altered, with potential consequences for fetal programming.

      Weaknesses:

      Overall, this manuscript presents a rich and comprehensive dataset; however, this has resulted in the analysis of paternal gut dysbiosis remaining largely descriptive. While still valuable, this raises questions regarding why supplementation with methyl donors was unable to restore gut microbial balance in animals receiving the modified diets.

      We thank the Reviewer for their considered thoughts on the gut dysbiosis induced in our models the minimal impact of the methyl donors and carriers. We will include additional text within the Discussion to acknowledge this. However, at this point in time, we are unsure as to why the methyl donors had minimal impact. It could be that the macronutrients (i.e. protein, fat, carbohydrates) have more of an influence on gut bacterial profiles than micronutrients. Alternatively, due to the prolonged nature of our feeding regimens, any initial influences of the methyl donors may become diluted out over time. We will amend the text to reflect these potential factors.

    1. Author response:

      Weaknesses:

      (1) Several conclusions are insufficiently supported at this point. For example, evidence that the Hiw foci represent bona fide liquid-liquid phase (LLP) separated condensates is limited. Sensitivity to 1,6-hexanediol is not definitive proof of their liquid condensate nature, and their recovery kinetics after 1,6-hexanediol wash-out and their morphology are inconsistent with a pure liquid behaviour. Furthermore, the claim that the Hiw foci are non-vesicular is not strongly supported, as it is only based on the lack of colocalization with a handful of endosomal proteins.

      We agree that, at the current stage of the manuscript, we have presented data only on Hiw foci in the VNC and shown that they are sensitive to 1,6-HD but not to 2,5-HD. To further provide definitive proof that these are bona fide condensates, we will now perform in vitro analysis of different domains of Hiw and the Hiw IDR region. In addition, we will also investigate the Hiw-GFP behavior in non-neuronal and transiently transfected cell lines using FRAP and other protocols previously applied to condensate-forming proteins.

      Finally, we will perform an in-depth analysis of the Hiw condensates for their colocalization with endocytic proteins and cellular compartments and determine whether they are part of any known vesicular structures.

      (2) Importantly, the appearance of the putative condensates is correlative rather than causative for synaptic overgrowth, and in the absence of a mechanistic link between endocytosis and Hiw condensation, the causality is difficult to address. Of note is that the putative condensates are already present (albeit to a lesser extent) in the absence of endocytic defects and that the conclusions rely heavily on overexpressed GFP-Hiw, which may perturb normal protein behaviour and artificially induce condensation or aggregation.

      To investigate the formation of condensates and their relation to synaptic growth, we will perform a time-course analysis of changes at the NMJ and correlate with the Hiw condensate appearance in the VNC of shi<sup>ts</sup> expressing GFP-Hiw, along with appropriate controls. The GFP transgene used is a functional transgene and well established for studying Hiw behaviour. The Hiw condensates do not form when expressed on an otherwise wild-type background. We will further assess the formation of Hiw condensates in other endocytic mutants with appropriate controls.

      (3) The use of hypomorphic mutants in genetic experiments also introduces some ambiguity in their interpretation, as the results may reflect dosage effects from multiple pathways rather than pathway order. Finally, the manuscript would benefit from a more comprehensive reference to relevant literature on JNKKKs and BMP signalling, as well as on the recycling endosome function in synaptic growth and the regulation of the aforementioned pathways.

      We will perform genetic analysis using homozygous mutants of the wit and saxophone genes to further support epistatic interactions between the BMP signaling pathway and synaptic growth. We will strengthen the discussion part.

    1. Author response:

      eLife Assessment

      The authors use sequencing of nascent DNA (DNA linked to an RNA primer, "SNS-Seq") to localise DNA replication origins in Trypanosoma brucei, so this work will be of interest to those studying either Kinetoplastids or DNA replication. The paper presents the SNS-seq results for only part of the genome, and there are significant discrepancies between the SNS-Seq results and those from other, previously-published results obtained using other origin mapping methods. The reasons for the differences are unknown and from the data available, it is not possible to assess which origin-mapping method is most suitable for origin mapping in T. brucei. Thus at present, the evidence that origins are distributed as the authors claim - and not where previously mapped - is inadequate.

      We would like to clarify a few points regarding our study. Our primary objective was to characterise the topology and genome-wide distribution of short nascent-strand (SNS) enrichments. The stranded SNS-seq approach provides the high strand-specific resolution required to analyse origins. The observation that SNS-seq peaks (potential origins) are most frequently found in intergenic regions is not an artefact of analysing only part of the genome; rather, it is a result of analysing the entire genome.

      We agree that orthogonal validation is necessary. However, neither MFA-seq nor TbORC1/CDC6 ChIP-on-chip has yet been experimentally validated as definitive markers of origin activity in T. brucei, nor do they validate each other. 

      Public Reviews:

      Reviewer #1 (Public review):

      In this paper, Stanojcic and colleagues attempt to map sites of DNA replication initiation in the genome of the African trypanosome, Trypanosoma brucei. Their approach to this mapping is to isolate 'short-nascent strands' (SNSs), a strategy adopted previously in other eukaryotes (including in the related parasite Leishmania major), which involves isolation of DNA molecules whose termini contain replication-priming RNA. By mapping the isolated and sequenced SNSs to the genome (SNS-seq), the authors suggest that they have identified origins, which they localise to intergenic (strictly, inter-CDS) regions within polycistronic transcription units and suggest display very extensive overlap with previously mapped R-loops in the same loci. Finally, having defined locations of SNS-seq mapping, they suggest they have identified G4 and nucleosome features of origins, again using previously generated data.

      Though there is merit in applying a new approach to understand DNA replication initiation in T. brucei, where previous work has used MFA-seq and ChIP of a subunit of the Origin Replication Complex (ORC), there are two significant deficiencies in the study that must be addressed to ensure rigour and accuracy.

      (1) The suggestion that the SNS-seq data is mapping DNA replication origins that are present in inter-CDS regions of the polycistronic transcription units of T. brucei is novel and does not agree with existing data on the localisation of ORC1/CDC6, and it is very unclear if it agrees with previous mapping of DNA replication by MFA-seq due to the way the authors have presented this correlation. For these reasons, the findings essentially rely on a single experimental approach, which must be further tested to ensure SNS-seq is truly detecting origins. Indeed, in this regard, the very extensive overlap of SNS-seq signal with RNA-DNA hybrids should be tested further to rule out the possibility that the approach is mapping these structures and not origins.

      (2) The authors' presentation of their SNS-seq data is too limited and therefore potentially provides a misleading view of DNA replication in the genome of T. brucei. The work is presented through a narrow focus on SNS-seq signal in the inter-CDS regions within polycistronic transcription units, which constitute only part of the genome, ignoring both the transcription start and stop sites at the ends of the units and the large subtelomeres, which are mainly transcriptionally silent. The authors must present a fuller and more balanced view of SNS-seq mapping across the whole genome to ensure full understanding and clarity.

      Regarding comparisons with previous work:

      Two other attempts to identify origins in T. brucei —ORC1/CDC6 binding sites (ChIP-on-chip, PMID: 22840408) and MFA-seq (PMID: 22840408, 27228154)—were both produced by the McCulloch group. These methods do not validate each other; in fact, MFA-seq origins overlap with only 4.4% of the 953 ORC1/CDC6 sites (PMID: 29491738). Therefore, low overlap between SNS-seq peaks and ORC1/CDC6 sites cannot disqualify our findings. Similar low overlaps are observed in other parasites (PMID: 38441981, PMID: 38038269, PMID: 36808528) and in human cells (PMID: 38567819).

      We also would like to emphasize that the ORC1/CDC6 dataset originally published (PMID: 22840408) is no longer available; only a re-analysis by TritrypDB exists, which differs significantly from the published version (personal communication from Richard McCulloch). While the McCulloch group reported a predominant localization of ORC1/CDC6 sites within SSRs at transcription start and termination regions, our re-analysis indicates that only 10.3% of TbORC1/CDC6-12Myc sites overlapped with 41.8% of SSRs.

      MFA-seq does not map individual origins, it rather detects replicated genomic regions by comparing DNA copy number between S- and G1-phases of the cell cycle (PMID: 36640769; PMID: 37469113; PMID: 36455525). The broad replicated regions (0.1–0.5 Mbp) identified by MFA-seq in T. brucei are likely to contain multiple origins, rather than just one. In that sense we disagree with the McCulloch's group who claimed that there is a single origin per broad peak. Our analysis shows that up to 50% of the origins detected by stranded SNS-seq locate within broad MFA-seq regions. The methodology used by McCulloch’s group to infer single origins from MFA-seq regions has not been published or made available, as well as the precise position of these regions, making direct comparison difficult.

      Finally, the genomic features we describe—poly(dA/dT) stretches, G4 structures and nucleosome occupancy patterns—are consistent with origin topology described in other organisms.

      On the concern that SNS-seq may map RNA-DNA hybrids rather than replication origins: Isolation and sequencing of short nascent strands (SNS) is a well-established and widely used technique for high-resolution origin mapping. This technique has been employed for decades in various laboratories, with numerous publications documenting its use. We followed the published protocol for SNS isolation (Cayrou et al., Methods, 2012, PMID: 22796403). RNA-DNA hybrids cannot persist through the multiple denaturation steps in our workflow, as they melt at 95°C (Roberts and Crothers, Science, 1992; PMID: 1279808). Even in the unlikely event that some hybrids remained, they would not be incorporated into libraries prepared using a single-stranded DNA protocol and therefore would not be sequenced (see Figure 1B and Methods).

      Furthermore, our analysis shows that only a small proportion (1.7%) of previously reported RNA-DNA hybrids overlap with SNS-seq origins. It is important to note that RNA-primed nascent strands naturally form RNA-DNA hybrids during replication initiation, meaning the enrichment of RNA-DNA hybrids near origins is both expected and biologically relevant.

      On the claim that our analysis focuses narrowly on inter-CDS regions and ignores other genomic compartments: this is incorrect. We mapped and analyzed stranded SNS-seq data across the entire genome of T. brucei 427 wild-type strain (Müller et al., Nature, 2018; PMID: 30333624), including both core and subtelomeric regions. Our findings indicate that most origins are located in intergenic regions, but all analyses were performed using the full set of detected origins, regardless of location.

      We did not ignore transcription start and stop sites (TSS/TTS). The manuscript already includes origin distribution across genomic compartments as defined by TriTrypDB (Fig. 2C) and addresses overlap with TSS, TTS and HT in the section “Spatial coordination between the activity of the origin and transcription”. While this overlap is minimal, we have included metaplots in the revised manuscript for clarity.

      Reviewer #2 (Public review):

      Summary: 

      Stanojcic et al. investigate the origins of DNA replication in the unicellular parasite Trypanosoma brucei. They perform two experiments, stranded SNS-seq and DNA molecular combing. Further, they integrate various publicly available datasets, such as G4-seq and DRIP-seq, into their extensive analysis. Using this data, they elucidate the structure of the origins of replication. In particular, they find various properties located at or around origins, such as polynucleotide stretches, G-quadruplex structures, regions of low and high nucleosome occupancy, R-loops, and that origins are mostly present in intergenic regions. Combining their population-level SNS-seq and their single-molecule DNA molecular combing data, they elucidate the total number of origins as well as the number of origins active in a single cell.

      Strengths:

      (1) A very strong part of this manuscript is that the authors integrate several other datasets and investigate a large number of properties around origins of replication. Data analysis clearly shows the enrichment of various properties at the origins, and the manuscript concludes with a very well-presented model that clearly explains the authors' understanding and interpretation of the data.

      We sincerely thank you for this positive feedback.

      (2) The DNA combing experiment is an excellent orthogonal approach to the SNS-seq data. The authors used the different properties of the two experiments (one giving location information, one giving single-molecule information) well to extract information and contrast the experiments.

      Thank you very much for this remark.

      (3) The discussion is exemplary, as the authors openly discuss the strengths and weaknesses of the approaches used. Further, the discussion serves its purpose of putting the results in both an evolutionary and a trypanosome-focused context.

      Thank you for appreciating our discussion.

      Weaknesses:

      I have major concerns about the origin of replication sites determined from the SNS-seq data. As a caveat, I want to state that, before reading this manuscript, SNS-seq was unknown to me; hence, some of my concerns might be misplaced.

      (1) I do not understand why SNS-seq would create peaks. Replication should originate in one locus, then move outward in both directions until the replication fork moving outward from another origin is encountered. Hence, in an asynchronous population average measurement, I would expect SNS data to be broad regions of + and -, which, taken together, cover the whole genome. Why are there so many regions not covered at all by reads, and why are there such narrow peaks?

      Thank you for asking these questions. As you correctly point out, replication forks progress in both directions from their origins and ultimately converge at termination sites. However, the SNS-seq method specifically isolates short nascent strands (SNSs) of 0.5–2.5 kb using a sucrose gradient. These short fragments are generated immediately after origin firing and mark the sites of replication initiation, rather than the entire replicated regions. Consequently: (i) SNS-seq does not capture long replication forks or termination regions, only the immediate vicinity of origins. (ii) The narrow peaks indicate the size of selected SNSs (0.5–2.5 kb) and the fact that many cells initiate replication at the same genomic sites, leading to localized enrichment. (iii) Regions without coverage refer to genomic areas that do not serve as efficient origins in the analyzed cell population. Thus, SNS-seq is designed to map origin positions, but not the entire replicated regions.

      (2) I am concerned that up to 96% percent of all peaks are filtered away. If there is so much noise in the data, how can one be sure that the peaks that remain are real? Specifically, if the authors placed the same number of peaks as was measured randomly in intergenic regions, would 4% of these peaks pass the filtering process by chance?

      Maintaining the strandness of the sequenced DNA fibres enabled us to filter the peaks, thereby increasing the probability that the filtered peak pairs corresponded to origins. Two SNS peaks must be oriented in a way that reflects the topology of the SNS strands within an active origin: the upstream peak must be on the minus strand and followed by the downstream peak on the plus strand.

      As suggested by the reviewer, we tested whether randomly placed plus and minus peaks could reproduce the number of filter-passing peaks using the same bioinformatics workflow. Only 1–6% of random peaks passed the filters, compared with 4–12% in our experimental data, resulting in about 50% fewer selected regions (origins). Moreover, the “origins” from random peaks showed 0% reproducibility across replicates, whereas the experimental data showed 7–64% reproducibility. These results indicate that the retainee peaks are highly unlikely to arise by chance and support the specificity of our approach. Thank you for this suggestion.

      (3) There are 3 previous studies that map origins of replication in T. brucei. Devlin et al. 2016, Tiengwe et al. 2012, and Krasiļņikova et al. 2025 (https://doi.org/10.1038/s41467-025-56087-3), all with a different technique: MFA-seq. All three previous studies mostly agree on the locations and number of origins. The authors compared their results to the first two, but not the last study; they found that their results are vastly different from the previous studies (see Supplementary Figure 8A). In their discussion, the authors defend this discrepancy mostly by stating that the discrepancy between these methods has been observed in other organisms. I believe that, given the situation that the other studies precede this manuscript, it is the authors' duty to investigate the differences more than by merely pointing to other organisms. A conclusion should be reached on why the results are different, e.g., by orthogonally validating origins absent in the previous studies.

      The MFA-seq data for T. brucei were published in two studies by McCulloch’s group: Tiengwe et al. (2012) using TREU927 PCF cells, and Devlin et al. (2016) using PCF and BSF Lister427 cells. In Krasilnikova et al. (2025), previously published MFA-seq data from Devlin et al. were remapped to a new genome assembly without generating new MFA-seq data, which explains why we did not include that comparison.

      Clarifying the differences between MFA-seq and our stranded SNS-seq data is essential. MFA-seq and SNS-seq interrogate different aspects of replication. SNS-seq is a widely used, high-resolution method for mapping individual replication origins, whereas MFA-seq detects replicated regions by comparing DNA copy number between S and G1 phases. MFA-seq identified broad replicated regions (0.1–0.5 Mb) that were interpreted by McCulloch’s group as containing a single origin. We disagree with this interpretation and consider that there are multiple origins in each broad peaks; theoretical considerations of replication timing indicate that far more origins are required for complete genome duplication during the short S-phase. Once this assumption is reconsidered, MFA-seq and SNS-seq results become complementary: MFA-seq identifies replicated regions, while SNS-seq pinpoints individual origins within those regions. Our analysis revealed that up to 50% of the origins detected by stranded SNS-seq were located within the broad MFA peaks. This pattern—broad MFA-seq regions containing multiple initiation sites—has also recently been found in Leishmania by McCulloch’s team using nanopore sequencing (PMID: 26481451). Nanopore sequencing showed numerous initiation sites within MFA-seq regions and additional numerous sites outside these regions in asynchronous cells, consistent with what we observed using stranded SNS-seq in T. brucei. We will expand our discussion and conclude that the discrepancy arises from methodological differences and interpretation. The two approaches provide complementary insights into replication dynamics, rather than ‘vastly different’ results.

      We recognize the importance of validating our results in future using an alternative mapping method and functional assays. However, it is important to emphasize that stranded SNS-seq is an origin mapping technique with a very high level of resolution. This technique can detect regions between two divergent SNS peaks, which should represent regions of DNA replication initiation. At present, no alternative technique has been developed that can match this level of resolution.

      (4) Some patterns that were identified to be associated with origins of replication, such as G-quadruplexes and nucleosomes phasing, are known to be biases of SNS-seq (see Foulk et al. Characterizing and controlling intrinsic biases of lambda exonuclease in nascent strand sequencing reveals phasing between nucleosomes and G-quadruplex motifs around a subset of human replication origins. Genome Res. 2015;25(5):725-735. doi:10.1101/gr.183848.114).

      It is important to note that the conditions used in our study differ significantly from those applied in the Foulk et al. Genome Res. 2015. We used SNS isolation and enzymatic treatments as described in previous reports (Cayrou, C. et al. Genome Res, 2015 and Cayrou, C et al. Methods, 2012). Here, we enriched the SNS by size on a sucrose gradient and then treated this SNS-enriched fraction with high amounts of repeated λ-exonuclease treatments (100u for 16h at 37oC - see Methods). In contrast, Foulk et al. used sonicated total genomic DNA for origin mapping, without enrichment of SNS on a sucrose gradient as we did, and then they performed a λ-exonuclease treatment. A previous study (Cayrou, C. et al. Genome Res, 2015, Figure S2, which can be found at https://genome.cshlp.org/content/25/12/1873/suppl/DC1) has shown that complete digestion of G4-rich DNA sequences is achieved under the conditions we used.

      Furthermore, the SNS depleted control (without RNA) was included in our experimental approach. This control represents all molecules that are difficult to digest with lambda exonuclease, including G4 structures. Peak calling was performed against this background control, with the aim of removing false positive peaks resulting from undigested DNA structures. We explained better this step in the revised manuscript.

      The key benefit of our study is that the orientation of the enrichments (peaks) remains consistent throughout the sequencing process. We identified an enrichment of two divergent strands synthesised on complementary strands containing G4s. These two divergent strands themselves do not, however, contain G4s (see Fig. 8 for the model). Therefore, the enriched molecules detected in our study do not contain G4s. They are complementary to the strands enriched with G4s. This means that the observed enrichment of

      G4s cannot be an artefact of the enzymatic treatments used in this study. We added this part in the discussion of the revised manuscript.

      We also performed an additional control which is not mentioned in the manuscript. In parallel with replicating cells, we isolated the DNA from the stationary phase of growth, which primarily contains non-replicating cells. Following the three λ-exonuclease treatments, there was insufficient DNA remaining from the stationary phase cells to prepare the libraries for sequencing. This control strongly indicated that there was little to no contaminating DNA present with the SNS molecules after λ-exonuclease enrichment.

    1. Author response:

      The following is the authors’ response to the current reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting dissociable contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative instructed-probability task, Bayesian behavioural modeling, and model-based fMRI analyses provides a solid foundation for the main claims; however, major interpretational limitations remain, particularly a potential confound between posterior switch probability and time in the neuroimaging results. At the behavioural level, reliance on explicitly instructed conditional probabilities leaves open alternative explanations that complicate attribution to a single computational mechanism, such that clearer disambiguation between competing accounts and stronger control of temporal and representational confounds would further strengthen the evidence.

      Thank you. In this revision, we will focus on addressing Reviewer 3’s concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. Third, to address Reviewer 3’s comment that from the Tables of activation in the supplement vmPFC and ventral striatum cannot be located, we will add slice-by-slice image of the whole-brain results on Pt in the Supplemental Information in addition to the Tables of Activation.

      Public Reviews:

      Reviewer #1 (Public review):<br /> Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      Weaknesses:

      The authors have adequately addressed my prior concerns.

      Thank you for reviewing our paper and providing constructive comments that helped us improve our paper.

      Reviewer #3 (Public review):

      Thank you again for reviewing the manuscript. In this revision, we will focus on addressing your concern on the potential confound between posterior probability and time in neuroimaging results. First, we will present whole-brain results of subjects’ probability estimates (Pt, their subjective posterior probability of switch) after controlling for the effect of time on probability of switch (the intertemporal prior). Second, we will compare the effect of probability estimates (Pt) on vmPFC and ventral striatum activity—which we found to correlate with Pt—with and without including intertemporal prior in the GLM. These results will be summarized in a new figure (Figure 4).

      Finally, to address that you were not able to locate vmPFC and ventral striatum from the Tables of activation, we will add slice-by-slice image of the whole-brain results on Pt in the supplement in addition to the Tables of Activation.

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      In the response to this comment the authors have pointed out their own previous work showing that system neglect can occur even when numerical probabilities are not used. This is reassuring but there remains a large body of classic work showing that observers do struggle with conditional probabilities of the type presented in the task.

      Thank you. Yes, people do struggle with conditional probabilities in many studies. However, as our previous work suggested (Massey and Wu, 2005), system-neglect was likely not due to response mode (having to enter probability estimates or making binary predictions, and etc.).

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. We do not disagree that there are alternative models that can describe over- and underreactions seen in the dataset. However, we do wish to point out that since we began with the normative Bayesian model, the natural progression in case the normative model fails to capture data is to modify the starting model. It is under this context that we developed the system-neglect model. It was a simple extension (a parameterized version) of the normative Bayesian model.

      Regarding the hyperprior idea, even if the participants have a hyperprior, there has to be some function that describes/implements attraction to the mean. Having a hyperprior itself does not imply attraction to this hyperprior. We therefore were not sure why the hyperprior itself can produce attraction to the mean.

      We do look further into the possibility of attraction to the mean. First, as suggested by the reviewer, we looked into another dataset with different mean ground-truth value. In Massey and Wu (2005), the transition probabilities were [0.02 0.05 0.1 0.2], which is different from the current study [0.01 0.05 0.1], and there they also found over- and underreactions as well. Second, we reason that for the attraction to the mean idea to work subjects need to know the mean of the system parameters. This would take time to develop because we did not tell subjects about the mean. If this is caused by attraction to the mean, subjects’ behavior would be different in the early stage of the experiment where they had little idea about the mean, compared with the late stage of the experiment where they knew about the mean. We will further analyze and compare participants’ data at the beginning of the experiment with data at the end of the experiment.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers, resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020 )

      We thank the reviewer for pointing out these potential explanations. Again, we do not disagree that any model in which participants don’t fully use numerical information they were given would produce system neglect. It is hard to separate ‘not fully using numerical information’ from ‘lack of sensitivity to the numerical information’. We will respond in more details to the four example reasons later.

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Again, we do not disagree with the reviewer on the modeling statement. However, we also wish to point out that the system-neglect model we had is a simple extension of the normative Bayesian model. Had we gone to a non-Bayesian framework, we would have faced the criticism of why we simply do not consider a simple extension of the starting model. In response, we will add a section in Discussion summarizing our exchange on this matter.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, Pt always increases with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? To control for this the authors include, in a supplementary analysis, an 'intertemporal prior.' I would have preferred to see the results of this better-controlled analysis presented in the main figure. From the tables in the SI it is very difficult to tell how the results change with the includion of the control regressors.

      Thank you. In response, we will add a new figure, now Figure 4, showing the results of Pt and delta Pt from GLM-2 where we added the intertemporal prior as a regressor to control for temporal confounds. We compared Pt and delta Pt results in vmPFC and ventral striatum between GLM-1 and GLM-2. We also will show the results of intertemporal prior on vmPFC and ventral striatum under GLM-2.

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example, in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments ( subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, , in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of  did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of  can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      **Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):**

      Thank you for pointing out the inclusion of the intertemporal prior in glm2, this seems like an important control that would address my criticism. Why not present this better-controlled analysis in the main figure, rather than the results for glm1 which has no effective control of the increasing posterior probability of a reversal with time?

      Thank you for this suggestion. We added a new figure (Figure 4) that showed results from GLM-2. In this new figure, we showed whole-brain results on Pt and delta Pt, ROI results of vmPFC and ventral striatum on Pt, delta Pt, and intertemporal prior.

      The reason we kept results from GLM-1 (Figure 3) was primarily because we wanted to compare the effect of Pt between experiments under identical GLM. In other words, the regressors in GLM-1 was identical across all 3 experiments. In Experiments 1 and 2, Pt and delta Pt were respectively probability estimates and belief updates that current regime was the Blue regime. In Experiment 3, Pt and delta Pt were simply the number subjects were instructed to press (Pt) and change in number between successive periods (delta Pt).

      As a further point I could not navigate the tables of fMRI activations in SI and recommend replacing or supplementing these with images. For example I cannot actually find a vmPFC or ventral striatum cluster listed for the effect of Pt in GLM1 (version in table S1), which I thought were the main results? Beyond that, comparing how much weaker (or not) those results are when additional confound regressors are included in GLM2 seems impossible.

      The vmPFC and ventral striatum were part of the cluster labeled as Central Opercular cortex. In response, we will provide information about coordinates on the local maxima within the cluster. We will also add slice-by-slice images showing the effect of Pt.


      The following is the authors’ response to the original reviews

      eLife Assessment

      This study offers valuable insights into how humans detect and adapt to regime shifts, highlighting distinct contributions of the frontoparietal network and ventromedial prefrontal cortex to sensitivity to signal diagnosticity and transition probabilities. The combination of an innovative task design, behavioral modeling, and model-based fMRI analyses provides a solid foundation for the conclusions; however, the neuroimaging results have several limitations, particularly a potential confound between the posterior probability of a switch and the passage of time that may not be fully controlled by including trial number as a regressor. The control experiments intended to address this issue also appear conceptually inconsistent and, at the behavioral level, while informing participants of conditional probabilities rather than requiring learning is theoretically elegant, such information is difficult to apply accurately, as shown by well-documented challenges with conditional reasoning and base-rate neglect. Expressing these probabilities as natural frequencies rather than percentages may have improved comprehension. Overall, the study advances understanding of belief updating under uncertainty but would benefit from more intuitive probabilistic framing and stronger control of temporal confounds in future work.

      We thank the editors for the assessment and we appreciate your efforts in reviewing the paper. The editors added several limitations in the assessment based on the new reviewer 3 in this round, which we would like to clarify below.

      With regard to temporal confounds, we clarified in the main text and response to Reviewer 3 that we had already addressed the potential confound between posterior probability of a switch and passage of time in GLM-2 with the inclusion of intertemporal prior. After adding intertemporal prior in the GLM, we still observed the same fMRI results on probability estimates. In addition, we did two other robustness checks, which we mentioned in the manuscript.

      With regard to response mode (probability estimation rather than choice or indicating natural frequencies), we wish to point out that the in previous research by Massey and Wu (2005), which the current study was based on, the concern of participants showing system-neglect tendencies due to the mode of information delivery, namely indicating beliefs through reporting probability estimates rather than through choice or other response mode was addressed. Massy and Wu (2005, Study 3) found the same biases when participants performed a choice task that did not require them to indicate probability estimates.

      With regard to the control experiments, the control experiments in fact were not intended to address the confounds between posterior probability and passage of time. Rather, they aimed to address whether the neural findings were unique to change detection (Experiment 2) and to address visual and motor confounds (Experiment 3). These and the results of the control experiments were mentioned on page 18-19.

      We also wish to highlight that we had performed detailed model comparisons after reviewer 2’s suggestions. Although reviewer 2 was unable to re-review the manuscript, we believe this provides insight into the literature on change detection. See “Incorporating signal dependency into system-neglect model led to better models for regime-shift detection” (p.27-30). The model comparison showed that system-neglect models that incorporate signal dependency are better models than the original system-neglect model in describing participants probability estimates. This suggests that people respond to change-consistent and change-inconsistent signals differently when judging whether the regime had changed. This was not reported in previous behavioral studies and was largely inspired by the neural finding on signal dependency in the frontoparietal cortex. It indicates that neural findings can provide novel insights into computational modeling of behavior.

      To better highlight and summarize our key contributions, we added a paragraph at the beginning of Discussion:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”    

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study examines human biases in a regime-change task, in which participants have to report the probability of a regime change in the face of noisy data. The behavioral results indicate that humans display systematic biases, in particular, overreaction in stable but noisy environments and underreaction in volatile settings with more certain signals. fMRI results suggest that a frontoparietal brain network is selectively involved in representing subjective sensitivity to noise, while the vmPFC selectively represents sensitivity to the rate of change.

      Strengths:

      - The study relies on a task that measures regime-change detection primarily based on descriptive information about the noisiness and rate of change. This distinguishes the study from prior work using reversal-learning or change-point tasks in which participants are required to learn these parameters from experiences. The authors discuss these differences comprehensively.

      - The study uses a simple Bayes-optimal model combined with model fitting, which seems to describe the data well. The model is comprehensively validated.

      - The authors apply model-based fMRI analyses that provide a close link to behavioral results, offering an elegant way to examine individual biases.

      We thank the reviewer for the comments.

      Weaknesses:

      The authors have adequately addressed most of my prior concerns.

      We thank the reviewer for recognizing our effort in addressing your concerns.

      My only remaining comment concerns the z-test of the correlations. I agree with the non-parametric test based on bootstrapping at the subject level, providing evidence for significant differences in correlations within the left IFG and IPS.

      However, the parametric test seems inadequate to me. The equation presented is described as the Fisher z-test, but the numerator uses the raw correlation coefficients (r) rather than the Fisher-transformed values (z). To my understanding, the subtraction should involve the Fisher z-scores, not the raw correlations.

      More importantly, the Fisher z-test in its standard form assumes that the correlations come from independent samples, as reflected in the denominator (which uses the n of each independent sample). However, in my opinion, the two correlations are not independent but computed within-subject. In such cases, parametric tests should take into account the dependency. I believe one appropriate method for the current case (correlated correlation coefficients sharing a variable [behavioral slope]) is explained here:

      Meng, X.-l., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111(1), 172-175. https://doi.org/10.1037/0033-2909.111.1.172

      It should be implemented here:

      Diedenhofen B, Musch J (2015) cocor: A Comprehensive Solution for the Statistical Comparison of Correlations. PLoS ONE 10(4): e0121945. https://doi.org/10.1371/journal.pone.0121945

      My recommendation is to verify whether my assumptions hold, and if so, perform a test that takes correlated correlations into account. Or, to focus exclusively on the non-parametric test.

      In any case, I recommend a short discussion of these findings and how the authors interpret that some of the differences in correlations are not significant.

      Thank you for the careful check. Yes. This was indeed a mistake from us. We also agree that the two correlations are not independent. Therefore, we modified the test that accounts for dependent correlations by following Meng et al. (1992) suggested by the reviewer. We updated in the Methods section on p.56-57:

      “In the parametric test, we adopted the approach of Meng et al. (1992) to statistically compare the two correlation coefficients. This approach specifically tests differences between dependent correlation coefficients according to the following equation

      Where N is the number of subjects, z<sub>ri</sub> is the Fisher z-transformed value of r<sub>i</sub>,(r<sub>1</sub> = r<sub>blue</sub> and r<sub>2</sub> = r<sub>red</sub>), and r<sub>x</sub> is the correlation between the neural sensitivity at change-consistent signals and change-inconsistent signals. The computation of h is based on the following equations

      Where is the mean of the , and f should be set to 1 if > 1.”

      We updated on the Results section on p.29:

      “Since these correlation coefficients were not independent, we compared them using the test developed in Meng et al. (1992) (see Methods). We found that among the five ROIs in the frontoparietal network, two of them, namely the left IFG and left IPS, the difference in correlation was significant (one-tailed z test; left IFG: z = 1.8908, p = 0.0293; left IPS: z = 2.2584, p = 0.0049). For the remaining three ROIs, the difference in correlation was not significant (dmPFC: z = 0.9522, p = 0.1705; right IFG: z = 0.9860, p = 0.1621; right IPS: z = 1.4833, p = 0.0690).”

      We added a Discussion on these results on p.41:

      “Interestingly, such sensitivity to signal diagnosticity was only present in the frontoparietal network when participants encountered change-consistent signals. However, while most brain areas within this network responded in this fashion, only the left IPS and left IFG showed a significant difference in coding individual participants’ sensitivity to signal diagnosticity between change-consistent and change-inconsistent signals. Unlike the left IPS and left IFG, we observed in dmPFC a marginally significant correlation with behavioral sensitivity at change-inconsistent signals as well. Together, these results indicate that while different brain areas in the frontoparietal network responded similarly to change-consistent signals, there was a greater degree of heterogeneity in responding to change-inconsistent signals.”

      Reviewer #3 (Public review):

      This study concerns how observers (human participants) detect changes in the statistics of their environment, termed regime shifts. To make this concrete, a series of 10 balls are drawn from an urn that contains mainly red or mainly blue balls. If there is a regime shift, the urn is changed over (from mainly red to mainly blue) at some point in the 10 trials. Participants report their belief that there has been a regime shift as a % probability. Their judgement should (mathematically) depend on the prior probability of a regime shift (which is set at one of three levels) and the strength of evidence (also one of three levels, operationalized as the proportion of red balls in the mostly-blue urn and vice versa). Participants are directly instructed of the prior probability of regime shift and proportion of red balls, which are presented on-screen as numerical probabilities. The task therefore differs from most previous work on this question in that probabilities are instructed rather than learned by observation, and beliefs are reported as numerical probabilities rather than being inferred from participants' choice behaviour (as in many bandit tasks, such as Behrens 2007 Nature Neurosci).

      The key behavioural finding is that participants over-estimate the prior probability of regime change when it is low, and under estimate it when it is high; and participants over-estimate the strength of evidence when it is low and under-estimate it when it is high. In other words participants make much less distinction between the different generative environments than an optimal observer would. This is termed 'system neglect'. A neuroeconomic-style mathematical model is presented and fit to data.

      Functional MRI results how that strength of evidence for a regime shift (roughly, the surprise associated with a blue ball from an apparently red urn) is associated with activity in the frontal-parietal orienting network. Meanwhile, at time-points where the probability of a regime shift is high, there is activity in another network including vmPFC. Both networks show individual differences effects, such that people who were more sensitive to strength of evidence and prior probability show more activity in the frontal-parietal and vmPFC-linked networks respectively.

      We thank the reviewer for the overall descriptions of the manuscript.

      Strengths

      (1) The study provides a different task for looking at change-detection and how this depends on estimates of environmental volatility and sensory evidence strength, in which participants are directly and precisely informed of the environmental volatility and sensory evidence strength rather than inferring them through observation as in most previous studies

      (2) Participants directly provide belief estimates as probabilities rather than experimenters inferring them from choice behaviour as in most previous studies

      (3) The results are consistent with well-established findings that surprising sensory events activate the frontal-parietal orienting network whilst updating of beliefs about the word ('regime shift') activates vmPFC.

      Thank you for these assessments.

      Weaknesses

      (1) The use of numerical probabilities (both to describe the environments to participants, and for participants to report their beliefs) may be problematic because people are notoriously bad at interpreting probabilities presented in this way, and show poor ability to reason with this information (see Kahneman's classic work on probabilistic reasoning, and how it can be improved by using natural frequencies). Therefore the fact that, in the present study, people do not fully use this information, or use it inaccurately, may reflect the mode of information delivery.

      We appreciate the reviewer’s concern on this issue. The concern was addressed in Massey and Wu (2005) as participants performed a choice task in which they were not asked to provide probability estimates (Study 3 in Massy and Wu, 2005). Instead, participants in Study 3 were asked to predict the color of the ball before seeing a signal. This was a more intuitive way of indicating his or her belief about regime shift. The results from the choice task were identical to those found in the probability estimation task (Study 1 in Massey and Wu). We take this as evidence that the system-neglect behavior the participants showed was less likely to be due to the mode of information delivery.

      (2) Although a very precise model of 'system neglect' is presented, many other models could fit the data.

      For example, you would get similar effects due to attraction of parameter estimates towards a global mean - essentially application of a hyper-prior in which the parameters applied by each participant in each block are attracted towards the experiment-wise mean values of these parameters. For example, the prior probability of regime shift ground-truth values [0.01, 0.05, 0.10] are mapped to subjective values of [0.037, 0.052, 0.069]; this would occur if observers apply a hyper-prior that the probability of regime shift is about 0.05 (the average value over all blocks). This 'attraction to the mean' is a well-established phenomenon and cannot be ruled out with the current data (I suppose you could rule it out by comparing to another dataset in which the mean ground-truth value was different).

      We thank the reviewer for this comment. It is true that the system-neglect model is not entirely inconsistent with regression to the mean, regardless of whether the implementation has a hyper prior or not. In fact, our behavioral measure of sensitivity to transition probability and signal diagnosticity, which we termed the behavioral slope, is based on linear regression analysis. In general, the modeling approach in this paper is to start from a generative model that defines ideal performance and consider modifying the generative model when systematic deviations in actual performance from the ideal is observed. In this approach, a generative Bayesian model with hyper priors would be more complex to begin with, and a regression to the mean idea by itself does not generate a priori predictions.

      More generally, any model in which participants don't fully use the numerical information they were given would produce apparent 'system neglect'. Four qualitatively different example reasons are: 1. Some individual participants completely ignored the probability values given. 2. Participants did not ignore the probability values given, but combined them with a hyperprior as above. 3. Participants had a reporting bias where their reported beliefs that a regime-change had occurred tend to be shifted towards 50% (rather than reporting 'confident' values such 5% or 95%). 4. Participants underweighted probability outliers resulting in underweighting of evidence in the 'high signal diagnosticity' environment (10.1016/j.neuron.2014.01.020)

      In summary I agree that any model that fits the data would have to capture the idea that participants don't differentiate between the different environments as much as they should, but I think there are a number of qualitatively different reasons why they might do this - of which the above are only examples - hence I find it problematic that the authors present the behaviour as evidence for one extremely specific model.

      Thank you for raising this point. The modeling principle we adopt is the following. We start from the normative model—the Bayesian model—that defined what normative behavior should look like. We compared participants’ behavior with the Bayesian model and found systematic deviations from it. To explain those systematic deviations, we considered modeling options within the confines of the same modeling framework. In other words, we considered a parameterized version of the Bayesian model, which is the system-neglect model and examined through model comparison the best modeling choice. This modeling approach is not uncommon in economics and psychology. For example, Kahneman and Tversky adopted this approach when proposing prospect theory, a modification of expected utility theory where expected utility theory can be seen as one specific model for how utility of an option should be computed.

      (3) Despite efforts to control confounds in the fMRI study, including two control experiments, I think some confounds remain.

      For example, a network of regions is presented as correlating with the cumulative probability that there has been a regime shift in this block of 10 samples (Pt). However, regardless of the exact samples shown, doesn't Pt always increase with sample number (as by the time of later samples, there have been more opportunities for a regime shift)? Unless this is completely linear, the effect won't be controlled by including trial number as a co-regressor (which was done).

      Thank you for raising this concern. Yes, Pt always increases with sample number regardless of evidence (seeing change-consistent or change-inconsistent signals). This is captured by the ‘intertemporal prior’ in the Bayesian model, which we included as a regressor in our GLM analysis (GLM-2), in addition to Pt. In short, GLM-1 had Pt and sample number. GLM-2 had Pt, intertemporal prior, and sample number, among other regressors. And we found that, in both GLM-1 and GLM-2, both vmPFC and ventral striatum correlated with Pt.

      To make this clearer, we updated the main text to further clarify this on p.18:

      “We examined the robustness of P<sub>t</sub> representations in these two regions in several follow-up analyses. First, we implemented a GLM (GLM-2 in Methods) that, in addition to P<sub>t</sub>, included various task-related variables contributing to P<sub>t</sub> as regressors (Fig. S7 in SI). Specifically, to account for the fact that the probability of regime change increased over time, we included the intertemporal prior as a regressor in GLM-2. The intertemporal prior is the natural logarithm of the odds in favor of regime shift in the t-th period, where q is transition probability and t = 1,…,10 is the period (see Eq. 1 in Methods). It describes normatively how the prior probability of change increased over time regardless of the signals (blue and red balls) the subjects saw during a trial. Including it along with P<sub>t</sub> would clarify whether any effect of P<sub>t</sub> can otherwise be attributed to the intertemporal prior. Second, we implemented a GLM that replaced P<sub>t</sub> with the log odds of P<sub>t</sub>, ln (P<sub>t</sub>/(1-P<sub>t</sub>)) (Fig. S8 in SI). Third, we implemented a GLM that examined  separately on periods when change-consistent (blue balls) and change-inconsistent (red balls) signals appeared (Fig. S9 in SI). Each of these analyses showed the same pattern of correlations between P<sub>t</sub> and activation in vmPFC and ventral striatum, further establishing the robustness of the P<sub>t</sub> findings.”

      On the other hand, two additional fMRI experiments are done as control experiments and the effect of Pt in the main study is compared to Pt in these control experiments. Whilst I admire the effort in carrying out control studies, I can't understand how these particular experiment are useful controls. For example in experiment 3 participants simply type in numbers presented on the screen - how can we even have an estimate of Pt from this task?

      We thank the reviewer for this comment. On the one hand, the effect of Pt we see in brain activity can be simply due to motor confounds and the purpose of Experiment 3 was to control for them. Our question was, if subjects saw the similar visual layout and were just instructed to press buttons to indicate two-digit numbers, would we observe the vmPFC, ventral striatum, and the frontoparietal network like what we did in the main experiment (Experiment 1)?

      On the other hand, the effect of Pt can simply reflect probability estimates of that the current regime is the blue regime, and therefore not particularly about change detection. In Experiment 2, we tested that idea, namely whether what we found about Pt was unique to change detection. In Experiment 2, subjects estimated the probability that the current regime is the blue regime (just as they did in Experiment 1) except that there were no regime shifts involved. In other words, it is possible that the regions we identified were generally associated with probability estimation and not particularly about probability estimates of change. We used Experiment 2 to examine whether this were true.

      To make the purpose of the two control experiments clearer, we updated the paragraph describing the control experiments on page 9:

      “To establish the neural representations for regime-shift estimation, we performed three fMRI experiments (n\=30 subjects for each experiment, 90 subjects in total). Experiment 1 was the main experiment, while Experiments 2 to 3 were control experiments that ruled out two important confounds (Fig. 1E). The control experiments were designed to clarify whether any effect of subjects’ probability estimates of a regime shift, P<sub>t</sub>, in brain activity can be uniquely attributed to change detection. Here we considered two major confounds that can contribute to the effect of . First, since subjects in Experiment 1 made judgments about the probability that the current regime is the blue regime (which corresponded to probability of regime change), the effect of P<sub>t</sub> did not particularly have to do with change detection. To address this issue, in Experiment 2 subjects made exactly the same judgments as in Experiment 1 except that the environments were stationary (no transition from one regime to another was possible), as in Edwards (1968) classic “bookbag-and-poker chip” studies. Subjects in both experiments had to estimate the probability that the current regime is the blue regime, but this estimation corresponded to the estimates of regime change only in Experiment 1. Therefore, activity that correlated with probability estimates in Experiment 1 but not in Experiment 2 can be uniquely attributed to representing regime-shift judgments. Second, the effect of P<sub>t</sub> can be due to motor preparation and/or execution, as subjects in Experiment 1 entered two-digit numbers with button presses to indicate their probability estimates. To address this issue, in Experiment 3 subjects performed a task where they were presented with two-digit numbers and were instructed to enter the numbers with button presses. By comparing the fMRI results of these experiments, we were therefore able to establish the neural representations that can be uniquely attributed to the probability estimates of regime-shift.”

      To further make sure that the probability-estimate signals in Experiment 1 were not due to motor confounds, we implemented an action-handedness regressor in the GLM, as we described below on page 19:

      “Finally, we note that in GLM-1, we implemented an “action-handedness” regressor to directly address the motor-confound issue, that higher probability estimates preferentially involved right-handed responses for entering higher digits. The action-handedness regressor was parametric, coding -1 if both finger presses involved the left hand (e.g., a subject pressed “23” as her probability estimate when seeing a signal), 0 if using one left finger and one right finger (e.g., “75”), and 1 if both finger presses involved the right hand (e.g., “90”). Taken together, these results ruled out motor confounds and suggested that vmPFC and ventral striatum represent subjects’ probability estimates of change (regime shifts) and belief revision.”

      (4) The Discussion is very long, and whilst a lot of related literature is cited, I found it hard to pin down within the discussion, what the key contributions of this study are. In my opinion it would be better to have a short but incisive discussion highlighting the advances in understanding that arise from the current study, rather than reviewing the field so broadly.

      Thank you. We thank the reviewer for pushing us to highlight the key contributions. In response, we added a paragraph at the beginning of Discussion to better highlight our contributions:

      “In this study, we investigated how humans detect changes in the environments and the neural mechanisms that contribute to how we might under- and overreact in our judgments. Combining a novel behavioral paradigm with computational modeling and fMRI, we discovered that sensitivity to environmental parameters that directly impact change detection is a key mechanism for under- and overreactions. This mechanism is implemented by distinct brain networks in the frontal and parietal cortices and in accordance with the computational roles they played in change detection. By introducing the framework in system neglect and providing evidence for its neural implementations, this study offered both theoretical and empirical insights into how systematic judgment biases arise in dynamic environments.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Many of the figures are too tiny - the writing is very small, as are the pictures of brains. I'd suggest adjusting these so they will be readable without enlarging.

      Thank you. We apologize for the poor readability of the figures. We had enlarged the figures (Fig. 5 in particular) and their font size to make them more readable.

    1. Author response:

      General Response

      We thank the reviewers for their positive assessment of our work and for acknowledging the timeliness of the problem and the novelty of using domain adaptation to address model mismatch. We appreciate the constructive feedback regarding validation and clarity. In the revised manuscript, we will address these points as follows:

      (1) Systematic Validation: We will design and perform systematic in silico experiments to evaluate the method beyond the single in vivo dataset , including robustness tests regarding recording length and network synchrony.

      (2) Recurrent Networks & Failure Analysis: We will test our method on synthetic datasets generated from highly recurrent networks and analyze exactly when the method breaks as a function of mismatch magnitude.

      (3) Method Comparisons: We will report the Matthews Correlation Coefficient (MCC) for the approach by English et al. (2017) and expand our comparison and discussion of GLM-based methods.

      (4) Clarifications: We will rigorously define the dataset details (labeling, recording methodology), mathematical notation, and machine learning terminology ('data', 'labels').

      (5) Discussion of Limitations: We will explicitly discuss the challenges and limitations inherent in generalizing to more recurrently connected regions.

      Below are our more detailed responses:

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The validation of the approach is incomplete: due to its very limited size, the single ground-truth dataset considered does not provide a sufficient basis to draw a strong conclusion. While the authors correctly note that this is the only dataset of its kind, the value of this validation is limited compared to what could be done by carefully designing in silico experiments.

      We thank the reviewer for acknowledging the scarcity of suitable in vivo ground-truth datasets and the limitations this poses. We agree that additional validation is necessary to draw strong conclusions. In the revised manuscript, we will systematically design and perform in silico experiments for evaluations beyond the single in vivo dataset.

      (2) Surprisingly, the authors fail to compare their method to the approach originally proposed for the data they validate on (English et al., 2017).

      We agree that this is an essential comparison. We will report the Matthews Correlation Coefficient (MCC) result of the approach by English et al. (2017) on the spontaneous period of the recording.

      (3) The authors make a commendable effort to study the method's robustness by pushing the limits of the dataset. However, the logic of the robustness analysis is often unclear, and once again, the limited size of the dataset poses major limitations to the authors.

      We appreciate the reviewer recognizing our initial efforts to evaluate robustness. In our original draft, we tested recording length, network model choices, and analyzed failure cases. However, we agree that the limited real data restricts the scope of these tests. To address this, we will perform more systematic robustness tests on the newly generated synthetic datasets in the revised version, allowing us to evaluate performance under a wider range of conditions.

      (4) The lack of details concerning both the approach and the validation makes it challenging for the reader to establish the technical soundness of the study.

      We will revise the manuscript thoroughly to better present the methodology of our framework and the validation pipelines. We will ensure that the figures and text clearly articulate the technical details required to assess the soundness of the study.

      Although in the current form this study does not provide enough basis to judge the impact of DeepDAM in the broader neuroscience community, it nevertheless puts forward a valuable and novel idea: using domain adaptation to mitigate the problem of model mismatch. This approach might be leveraged in future studies and methods to infer connectivity.

      We thank the reviewer again for acknowledging the novelty and importance of our work.

      Reviewer #2 (Public review):

      While the validation data set was well chosen and of high quality, it remains a single dataset and also remains a non-recurrent network. The authors acknowledge this in the discussion, but I wanted to chime in to say that for the method to be more than convincing, it would need to have been tested on more datasets. It should be acknowledged that the problem becomes more complicated in a recurrent excitatory network, and thus the method may not work as well in the cortex or in CA3.

      We will carefully revise our text to specifically discuss this limitation and the challenges inherent in generalizing to more recurrently connected regions. Furthermore, to empirically address this concern, we will test our method extensively on synthetic datasets generated from highly recurrent networks to quantify performance in these regimes.

      While the data is shown to work in this particular dataset (plus the two others at the end), I was left wondering when the method breaks. And it should break if the models are sufficiently mismatched. Such a question can be addressed using synthetic-synthetic models. This was an important intuition that I was missing, and an important check on the general nature of the method that I was missing.

      We thank the reviewer for this insight regarding the general nature of the method. While we previously analyzed failure cases regarding strong covariation and low spike counts, we agree that a systematic analysis of mismatch magnitude is missing. Building on our planned experiments with synthetic data, we will analyze and discuss exactly when the method breaks as a function of the mismatch magnitude between datasets.

      While the choice of state-of-the-art is good in my opinion, I was looking for comments on the methods prior to that. For instance, methods such based on GLMs have been used by the Pillow, Paninski, and Truccolo groups. I could not find a decent discussion of these methods in the main text and thought that both their acknowledgement and rationale for dismissing were missing.

      As the reviewer noted, we extensively compared our method with a GLM-based method (GLMCC) and CoNNECT, whose superiority over other GLM-based methods, such as extend GLM method (Ren et al., 2020, J Neurophysiol), have already been demonstrated in their papers (Endo et al., Sci Rep, 2021). However, we acknowledge that the discussion of the broader GLM literature was insufficient. To make the comparison more thorough, we will conduct comparisons with additional GLM-based methods and include a detailed discussion of these approaches.

      Endo, D., Kobayashi, R., Bartolo, R., Averbeck, B. B., Sugase-Miyamoto, Y., Hayashi, K., ... & Shinomoto, S. (2021). A convolutional neural network for estimating synaptic connectivity from spike trains. Scientific Reports, 11(1), 12087.

      Ren, N., Ito, S., Hafizi, H., Beggs, J. M., & Stevenson, I. H. (2020). Model-based detection of putative synaptic connections from spike recordings with latency and type constraints. Journal of Neurophysiology, 124(6), 1588-1604.

      While most of the text was very clear, I thought that page 11 was odd and missing much in terms of introductions. Foremost is the introduction of the dataset, which is never really done. Page 11 refers to 'this dataset', while the previous sentence was saying that having such a dataset would be important and is challenging. The dataset needs to be properly described: what's the method for labeling, what's the brain area, what were the spike recording methodologies, what is meant by two labeling methodologies, what do we know about the idiosyncrasies of the particular network the recording came from (like CA1 is non-recurrent, so which connections)? I was surprised to see 'English et al.' cited in text only on page 13 since their data has been hailed from the beginning.

      Further elements that needed definition are the Nsyn and i, which were not defined in the cortex of Equation 2-3: I was not sure if it referred to different samples or different variants of the synthetic model. I also would have preferred having the function f defined earlier, as it is defined for Equation 3, but appears in Equation 2.

      When the loss functions are described, it would be important to define 'data' and 'labels' here. This machine learning jargon has a concrete interpretation in this context, and making this concrete would be very important for the readership.

      We thank the reviewer for these constructive comments on the writing. We will clarify the introduction of the dataset (labeling method, brain area, recording methodology) and ensure all mathematical terms (such as Nsyn, i, and function f) and machine learning terminology (definitions of 'data' and 'labels' in this context) are rigorously defined upon first use in the revised manuscript.

      While I appreciated that there was a section on robustness, I did not find that the features studied were the most important. In this context, I was surprised that the other datasets were relegated to supplementary, as these appeared more relevant.

      Robustness is an important aspect of our framework to demonstrate its applicability to real experimental scenarios. We specifically analyzed how synchrony between neurons, the number of recorded spikes and the choice of the network influence the performance of our method. We also agree that these aspects are limited by the one dataset we evaluated on. Therefore, we will test the robustness of our method more systematically on synthetic datasets.

      With more extensive analysis on synthetic datasets, we believe that the results on inferring biophysical properties of single neuron and microcircuit models remain in the supplement, such that the main figures focus purely on synaptic connectivity inference.

      Some of the figures have text that is too small. In particular, Figure 2 has text that is way too small. It seemed to me that the pseudo code could stand alone, and the screenshot of the equations did not need to be repeated in a figure, especially if their size becomes so small that we can't even read them.

      We will remove the pseudo-code and equations from Figure 2 to improve readability. The pseudo-code will be presented as a distinct box in the main text.

    1. Author response:

      Thank you very much for the constructive feedback on our manuscript, "Simple Methods to Acutely Measure Multiple Timing Metrics among Sexual Repertoire of Male Drosophila," and for the opportunity to address the reviewers' comments. We appreciate the time and effort the reviewers have invested in evaluating our work, and we agree that their suggestions will significantly strengthen the manuscript.

      We are currently working diligently to address all the concerns raised in the public reviews and recommendations. Below is an outline of the major revisions we plan to implement in the revised version:

      (1) Statistical Rigor and Analysis

      We acknowledge the statistical limitations pointed out by Reviewer #2. We will re-analyze the multi-group data in Figures 3 and 4 using One-way and Two-way ANOVA with appropriate post-hoc tests (e.g., Tukey's HSD), respectively, to properly account for multiple comparisons and interaction effects between genotype and training conditions.

      (2) Comparison with Existing Tools

      As suggested by both reviewers, we will provide a detailed comparison of DrosoMating with established automated tracking systems (e.g., FlyTracker, JAABA, Ctrax),and specific use cases where DrosoMating offers distinct advantages in terms of cost, accessibility, and ease of use for high-throughput screening.

      (3) Control for Locomotor Activity

      To address the potential confound of general locomotor deficits in w1118 and y1 mutants, we will calculate and present general locomotion metrics (e.g., average velocity, total distance traveled) from our tracking data to dissociate motor defects from specific courtship deficits.

      (4) Software Capabilities and Cross-Species Applicability

      We will clarify how DrosoMating handles fly identification during mating (including occlusion management). We will also discuss or test the software's applicability across different *Drosophila* species, as requested.

      (5) Minor Corrections

      We will address all textual errors, standardize terminology (e.g., "Mating Duration" vs. "Copulation Duration"), improve figure legibility, and provide complete statistical details for all figures.

      We believe these revisions will substantially improve the rigor, clarity, and utility of our manuscript. We aim to resubmit the revised version within the standard timeframe and will ensure the preprint is updated accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This fundamental study identifies a new mechanism that involves a mycobacterial nucleomodulin manipulation of the host histone methyltransferase COMPASS complex to promote infection. Although other intracellular pathogens are known to manipulate histone methylation, this is the first report demonstrating the specific targeting of the COMPASS complex by a pathogen. The rigorous experimental design using state-of-the art bioinformatic analysis, protein modeling, molecular and cellular interaction, and functional approaches, culminating with in vivo infection modeling, provides convincing, unequivocal evidence that supports the authors' claims. This work will be of particular interest to cellular microbiologists working on microbial virulence mechanisms and effectors, specifically nucleomodulins, and cell/cancer biologists that examine COMPASS dysfunction in cancer biology.

      Strengths:

      (1) The strengths of this study include the rigorous and comprehensive experimental design that involved numerous state-of-the-art approaches to identify potential nucleomodulins, define molecular nucleomodulin-host interactions, cellular nucleomodulin localization, intracellular survival, and inflammatory gene transcriptional responses, and confirmation of the inflammatory and infection phenotype in a small animal model.

      (2) The use of bioinformatic, cellular, and in vivo modeling that are consistent and support the overall conclusions is a strength of the study. In addition, the rigorous experimental design and data analysis, including the supplemental data provided, further strengthen the evidence supporting the conclusions.

      Weaknesses:

      (1) This work could be stronger if the MgdE-COMPASS subunit interactions that negatively impact COMPASS complex function were better defined. Since the COMPASS complex consists of many enzymes, examining the functional impact on each of the components would be interesting.

      We thank the reviewer for this insightful comment. A biochemistry assays could be helpful to interpret the functional impact on each of the components by MgdE interaction. However, the purification of the COMPASS complex could be a hard task itself due to the complexity of the full COMPASS complex along with its dynamic structural properties and limited solubility.

      (2) Examining the impact of WDR5 inhibitors on histone methylation, gene transcription, and mycobacterial infection could provide additional rigor and provide useful information related to the mechanisms and specific role of WDR5 inhibition on mycobacterial infection.

      We thank the reviewer for the comment. A previous study showed that WIN-site inhibitors, such as compound C6, can displace WDR5 from chromatin, leading to a reduction in global H3K4me3 levels and suppression of immune-related gene expression (Hung et al., Nucleic Acids Res, 2018; Bryan et al., Nucleic Acids Res, 2020). These results closely mirror the functional effects we observed for MgdE, suggesting that MgdE may act as a functional mimic of WDR5 inhibition. This supports our proposed model in which MgdE disrupts COMPASS activity by targeting WDR5, thereby dampening host pro-inflammatory responses.

      (3) The interaction between MgdE and COMPASS complex subunit ASH2L is relatively undefined, and studies to understand the relationship between WDR5 and ASH2L in COMPASS complex function during infection could provide interesting molecular details that are undefined in this study.

      We thank the reviewer for the comment. In this study, we constructed single and multiple point mutants of MgdE at residues S<sup>80</sup>, D<sup>244</sup>, and H<sup>247</sup> to identify key amino acids involved in its interaction with ASH2L (Figure 5A and B; New Figure S4C). However, these mutations did not interrupt the interaction with MgdE, suggesting that more residues are involved in the interaction.

      ASH2L and WDR5 function cooperatively within the WRAD module to stabilize the SET domain and promote H3K4 methyltransferase activity with physiological conditions (Couture and Skiniotis, Epigenetics, 2013; Qu et al., Cell, 2018; Rahman et al., Proc Natl Acad Sci U S A, 2022). ASH2L interacts with RbBP5 via its SPRY domain, whereas WDR5 bridges MLL1 and RbBP5 through the WIN and WBM motifs (Chen et al., Cell Res, 2012; Park et al., Nat Commun, 2019). The interaction status between ASH2L and WDR5 during mycobacterial infection could not be determined in our current study.

      (4) The AlphaFold prediction results for all the nuclear proteins examined could be useful. Since the interaction predictions with COMPASS subunits range from 0.77 for WDR5 and 0.47 for ASH2L, it is not clear how the focus on COMPASS complex over other nuclear proteins was determined.

      We thank the reviewer for the comment. We employed AlphaFold to predict the interactions between MgdE and the major nuclear proteins. This screen identified several subunits of the SET1/COMPASS complex as high-confidence candidates for interaction with MgdE (Figure S4A). This result is consistent with a proteomic study by Penn et al. which reported potential interactions between MgdE and components of the human SET1/COMPASS complex based on affinity purification-mass spectrometry analysis (Penn et al., Mol Cell, 2018).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Chen et al addresses an important aspect of pathogenesis for mycobacterial pathogens, seeking to understand how bacterial effector proteins disrupt the host immune response. To address this question, the authors sought to identify bacterial effectors from M. tuberculosis (Mtb) that localize to the host nucleus and disrupt host gene expression as a means of impairing host immune function.

      Strengths:

      The researchers conducted a rigorous bioinformatic analysis to identify secreted effectors containing mammalian nuclear localization signal (NLS) sequences, which formed the basis of quantitative microscopy analysis to identify bacterial proteins that had nuclear targeting within human cells. The study used two complementary methods to detect protein-protein interaction: yeast two-hybrid assays and reciprocal immunoprecipitation (IP). The combined use of these techniques provides strong evidence of interactions between MgdE and SET1 components and suggests that the interactions are, in fact, direct. The authors also carried out a rigorous analysis of changes in gene expression in macrophages infected with the mgdE mutant BCG. They found strong and consistent effects on key cytokines such as IL6 and CSF1/2, suggesting that nuclear-localized MgdE does, in fact, alter gene expression during infection of macrophages.

      Weaknesses:

      There are some drawbacks in this study that limit the application of the findings to M. tuberculosis (Mtb) pathogenesis. The first concern is that much of the study relies on ectopic overexpression of proteins either in transfected non-immune cells (HEK293T) or in yeast, using 2-hybrid approaches. Some of their data in 293T cells is hard to interpret, and it is unclear if the protein-protein interactions they identify occur during natural infection with mycobacteria. The second major concern is that pathogenesis is studied using the BCG vaccine strain rather than virulent Mtb. However, overall, the key findings of the paper - that MgdE interacts with SET1 and alters gene expression are well-supported.

      We thank the reviewer for the comment. We agree that the ectopic overexpression could not completely reflect a natural status, although these approaches were adopted in many similar experiments (Drerup et al., Molecular plant, 2013; Chen et al., Cell host & microbe, 2018; Ge et al., Autophagy, 2021). Further, the MgdE localization experiment using Mtb infected macrophages will be performed to increase the evidence in the natural infection.

      We agree with the reviewer that BCG strain could not fully recapitulate the pathogenicity or immunological complexity of M. tuberculosis infection. We employed BCG as a biosafe surrogate model since it was acceptable in many related studies (Wang et al., Nat Immunol, 2025; Wang et al., Nat Commun, 2017; Péan et al., Nat Commun, 2017; Li et al., J Biol Chem, 2020).

      Reviewer #3 (Public review):

      In this study, Chen L et al. systematically analyzed the mycobacterial nucleomodulins and identified MgdE as a key nucleomodulin in pathogenesis. They found that MgdE enters into host cell nucleus through two nuclear localization signals, KRIR<sup>108-111</sup> and RLRRPR<sup>300-305</sup>, and then interacts with COMPASS complex subunits ASH2L and WDR5 to suppress H3K4 methylation-mediated transcription of pro-inflammatory cytokines, thereby promoting mycobacterial survival. This study is potentially interesting, but there are several critical issues that need to be addressed to support the conclusions of the manuscript.

      (1) Figure 2: The study identified MgdE as a nucleomodulin in mycobacteria and demonstrated its nuclear translocation via dual NLS motifs. The authors examined MgdE nuclear translocation through ectopic expression in HEK293T cells, which may not reflect physiological conditions. Nuclear-cytoplasmic fractionation experiments under mycobacterial infection should be performed to determine MgdE localization.

      We thank the reviewer for this insightful comment. In the revised manuscript, we addressed this concern by performing nuclear-cytoplasmic fractionation experiments using M. bovis BCG-infected macrophages to assess the subcellular localization of MgdE (New Figure 2F) (Lines 146–155). Nuclear-cytoplasmic fractionation experiments showed that WT MgdE and the NLS single mutants (MgdE<sup>ΔNLS1</sup> and MgdE<sup>ΔNLS2</sup>) could be detected both in the cytoplasm and in the nucleus, while the double mutant MgdE<sup>ΔNLS1-2</sup> was detectable only in the cytoplasm. These findings strongly indicate that MgdE is capable of translocating into the host cell nucleus during BCG infection, and that this nuclear localization relies on the dual NLS motifs.

      (2) Figure 2F: The authors detected MgdE-EGFP using an anti-GFP antibody, but EGFP as a control was not detected in its lane. The authors should address this technical issue.

      We thank the reviewer for this question. In the revised manuscript, we have included the uncropped immunoblot images, which clearly show the EGFP band in the corresponding lane. These have been provided in the New Figure 2E.

      (3) Figure 3C-3H: The data showing that the expression of all detected genes in 24 h is comparable to that in 4 h (but not 0 h) during WT BCG infection is beyond comprehension. The issue is also present in Figure 7C, Figure 7D, and Figure S7. Moreover, since Il6, Il1β (pro-inflammatory), and Il10 (anti-inflammatory) were all upregulated upon MgdE deletion, how do the authors explain the phenomenon that MgdE deletion simultaneously enhanced these gene expressions?

      We thank the reviewer for the comment. A relative quantification method was used in our qPCR experiments to normalize the WT expression levels in Figure 3C–3H, Figure 7C, 7D, and New Figure S6.

      The concurrent induction of both types of cytokines likely represents a dynamic host strategy to fine-tune immune responses during infection. This interpretation is supported by previous studies (Podleśny-Drabiniok et al., Cell Rep, 2025; Cicchese et al., Immunological Reviews, 2018).

      (4) Figure 5: The authors confirmed the interactions between MgdE and WDR5/ASH2L. How does the interaction between MgdE and WDR5 inhibit COMPASS-dependent methyltransferase activity? Additionally, the precise MgdE-ASH2L binding interface and its functional impact on COMPASS assembly or activity require clarification.

      We thank the reviewer for this insightful comment. We cautiously speculate that the MgdE interaction inhibits COMPASS-dependent methyltransferase activity by interfering with the integrity and stability of the COMPASS complex. Accordingly, we have incorporated the following discussion into the revised manuscript (Lines 303-315):

      “The COMPASS complex facilitates H3K4 methylation through a conserved assembly mechanism involving multiple core subunits. WDR5, a central scaffolding component, interacts with RbBP5 and ASH2L to promote complex assembly and enzymatic activity (Qu et al., 2018; Wysocka et al., 2005). It also recognizes the WIN motif of methyltransferases such as MLL1, thereby anchoring them to the complex and stabilizing the ASH2L-RbBP5 dimer (Hsu et al., Cell, 2018). ASH2L further contributes to COMPASS activation by interacting with both RbBP5 and DPY30 and by stabilizing the SET domain, which is essential for efficient substrate recognition and catalysis (Qu et al., Cell, 2018; Park et al., Nat Commun, 2019). Our work shows that MgdE binds both WDR5 and ASH2L and inhibits the methyltransferase activity of the COMPASS complex. Site-directed mutagenesis revealed that residues D<sup>224</sup> and H<sup>247</sup> of MgdE are critical for WDR5 binding, as the double mutant MgdE-D<sup>224</sup>A/H<sup>247</sup>A fails to interact with WDR5 and shows diminished suppression of H3K4me3 levels (Figure 5D).”

      Regarding the precise MgdE-ASH2L binding interface, we attempted to identify the key interaction site by introducing point mutations into ASH2L. However, these mutations did not disrupt the interaction (Figure 5A and B; New Figure S4C), suggesting that more residues are involved in the interaction.

      (5) Figure 6: The authors proposed that the MgdE-regulated COMPASS complex-H3K4me3 axis suppresses pro-inflammatory responses, but the presented data do not sufficiently support this claim. H3K4me3 inhibitor should be employed to verify cytokine production during infection.

      We thank the reviewer for the comment. We have now revised the description in lines 220-221 and lines 867-868 "MgdE suppresses host inflammatory responses probably by inhibition of COMPASS complex-mediated H3K4 methylation."

      (6) There appears to be a discrepancy between the results shown in Figure S7 and its accompanying legend. The data related to inflammatory responses seem to be missing, and the data on bacterial colonization are confusing (bacterial DNA expression or CFU assay?).

      We thank the reviewer for the comment. New Figure S6 specifically addresses the effect of MgdE on bacterial colonization in the spleens of infected mice, which was assessed by quantitative PCR rather than by CFU assay.

      We have now revised the legend of New Figure S6 as below (Lines 986-991):

      “MgdE facilitates bacterial colonization in the spleens of infected mice. Bacterial colonization was assessed in splenic homogenates from infected mice (as described in Figure 7A) by quantifying bacterial DNA using quantitative PCR at 2, 14, 21, 28, and 56 days post-infection.”

      (7) Line 112-116: Please provide the original experimental data demonstrating nuclear localization of the 56 proteins harboring putative NLS motifs.

      We thank the reviewer for the comment. We will provide this data in the New Table S3.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      There are a few concerns about specific experiments:

      Major Comments:

      (1) Questions about the exact constructs used in their microscopy studies and the behavior of their controls. GFP is used as a negative control, but in the data they provide, the GFP signal is actually nuclear-localized (for example, Figure 1c, Figure 2a). Later figures do show other constructs with clear cytoplasmic localization, such as the delta-NLS-MgdE-GFP in Figure 2D. This raises significant questions about how the microscopy images were analyzed and clouds the interpretation of these findings. It is also not clear if their microscopy studies use the mature MdgE, lacking the TAT signal peptide after signal peptidase cleavage (the form that would be delivered into the host cell) or if they are transfecting the pro-protein that still has the TAT signal peptide (a form that would present in the bacterial cell but that would not be found in the host cell). This should be clarified, and if their construct still has the TAT peptide, then key findings such as nuclear localization and NLS function should be confirmed with the mature protein lacking the signal peptide.

      We thank the reviewer for this question.  EGFP protein can passively diffuse through nuclear pores due to its smaller size (Petrovic et al., Science, 2022; Yaseen et al., Nat Commun, 2015; Bhat et al., Nucleic Acids Res, 2015). However, upon transfection with EGFP-tagged wild-type MdgE and its NLS deletion mutants (MdgE<sup>ΔNLS1</sup>, MdgE<sup>ΔNLS2</sup>, and MdgE<sup>ΔNLS1-2</sup>), we observed significantly stronger nuclear fluorescence in cells expressing wild-type MdgE compared to the EGFP protein. Notably, the MdgE<sup>ΔNLS1-2</sup>-EGFP mutant showed almost no detectable nuclear fluorescence (Figure 2C, D, and E). These results indicate that (i) MdgE-EGFP fusion protein could not enter the nucleus by passive diffusion, and (ii) EGFP does not interfere with the nuclear targeting ability of MdgE.

      We did not construct a signal peptide-deleted MgdE for transfection assays. Instead, we performed an infection experiment using recombinant M. bovis BCG strains expressing Flag-tagged wild-type MgdE. The mature MgdE protein (signal peptide cleaved) can be detected in the nucleus fractionation (New Figure 2F), suggesting that the signal peptide does not play a role for the nuclear localization of MgdE.

      (2) The localization of MdgE is not shown during actual infection. The study would be greatly strengthened by an analysis of the BCG strain expressing their MdgE-FLAG construct.

      We thank the reviewer for the comment. In the revised manuscript, we constructed M. bovis BCG strains expressing FLAG-tagged wild-type MdgE as well as NLS deletion mutants (MdgE<sup>ΔNLS1</sup>, MdgE<sup>ΔNLS2</sup>, and MdgE<sup>ΔNLS1-2</sup>). These strains were used to infect THP-1 cells, and nuclear-cytoplasmic fractionation was performed 24 hours post-infection.

      Nuclear-cytoplasmic fractionation experiments showed that WT MgdE and the NLS single mutants could be detected both in the cytoplasm and in the nucleus by immunoblotting, while the double mutant MgdE<sup>ΔNLS1-2</sup> was detectable only in the cytoplasm (New Figure 2F) (Lines 146–155). These findings indicate that MdgE is capable of entering the host cell nucleus during BCG infection, and that this nuclear localization depends on the presence of both its N-terminal and C-terminal NLS motifs.

      (3) Their pathogenesis studies suggesting a role for MdgE would be greatly strengthened by studying MdgE in virulent Mtb rather than the BCG vaccine strain. If this is not possible because of technical limitations (such as lack of a BSL3 facility), then at least a thorough discussion of studies that examined Rv1075c/MdgE in Mtb is important. This would include a discussion of the phenotype observed in a previously published study examining the Mtb Rv1075c mutant that showed a minimal phenotype in mice (PMID: 31001637) and would also include a discussion of whether Rv1075c was identified in any of the several in vivo Tn-Seq studies done on Mtb.

      We thank the reviewer for this insightful comment. In the revised manuscript, we have incorporated a more thorough discussion of prior studies that examined Rv1075c/MgdE in Mtb, including the reported minimal phenotype of an Mtb MgdE mutant in mice (PMID: 31001637) (Lines 288–294).

      In the latest TnSeq studies in M. tuberculosis, Rv1075c/MgdE was not classified as essential for in vivo survival or virulence (James et al., NPJ Vaccines, 2025; Zhang et al., Cell, 2013). However, this absence should not be interpreted as evidence of dispensability since these datasets also failed to identify some well characterized virulence factors including Rv2067c (Singh et al., Nat Commun, 2023), PtpA (Qiang et al., Nat Commun, 2023), and PtpB (Chai et al., Science, 2022) which were demonstrated to be required for the virulence of Mtb.

      Minor Comments:

      (1) Multiple figures with axes with multiple discontinuities used when either using log-scale or multiple graphs is more appropriate, including 3B, 7A.

      We sincerely thank the reviewer for pointing this out. In the revised manuscript, we have updated Figure 3B and Figure 7A.

      (2) Figure 1C - Analysis of only nuclear MFI can be very misleading because it is affected by the total expression of each construct. Ratios of nuclear to cytoplasmic MFI are a more rigorous analysis.

      We thank the reviewer for this comment. We agree that analyzing the ratio of nuclear to cytoplasmic mean fluorescence intensity (MFI) provides a more rigorous quantification of nuclear localization, particularly when comparing constructs with different expression levels. However, the analysis presented in Figure 1C was intended as a preliminary qualitative screen to identify Tat/SPI-associated proteins with potential nuclear localization, rather than a detailed quantitative assessment.

      (3) Figure 5C - Controls missing and unclear interpretation of their mutant phenotype. There is no mock or empty-vector control transfection, and their immunoblot shows a massive increase in total cellular H3K4me3 signal in the bulk population, although their prior transfection data show only a small fraction of cells are expressing MdgE. They also see a massive increase in methylation in cells transfected with the inactive mutant, but the reason for this is unclear. Together, these data raise questions about the specificity of the increasing methylation they observe. An empty vector control should be included, and the phenotype of the mutant explained.

      We thank the reviewer for this comment. In the revised manuscript, we transfected HEK293T cells with an empty EGFP vector and performed a quantitative analysis of H3K4me3 levels. The results demonstrated that, at the same time point, cells expressing MdgE showed significantly lower levels of H3K4me3 compared to both the EGFP control and the catalytically inactive mutant MdgE (D<sup>244</sup>A/H<sup>247</sup>A) (New Figure 5D) (Lines 213–216). These findings support the conclusion that MdgE specifically suppresses H3K4me3 levels in cells.

      (4) Figure S1A - The secretion assay is lacking a critical control of immunoblotting a cytoplasmic bacterial protein to demonstrate that autolysis is not releasing proteins into the culture filtrate non-specifically - a common problem with secretion assays in mycobacteria.

      We thank the reviewer for this comment. To address the concerns, we examined FLAG-tagged MgdE and the secreted antigen Ag85B in the culture supernatants by monitoring the cytoplasmic protein GlpX. The absence of GlpX in the supernatant confirmed that there was no autolysis in the experiment. We could detect MgdE-Flag in the culture supernatant (New Figure S2A), indicating that MgdE is a secreted protein.

      (5) The volcano plot of their data shows that the proteins with the smallest p-values have the smallest fold-changes. This is unusual for a transcriptomic dataset and should be explained.

      We thank the reviewer for this comment. We are not sure whether the p-value is correlated with fold-change in the transcriptomic dataset. This is probably case by case.

      Reviewer #3 (Recommendations for the authors):

      There are several minor comments:

      (1) Line 104-109: The number of proteins harboring NLS motifs and candidate proteins assigned to the four distinct pathways does not match the data presented in Table S2. Please recheck the details. Figure 1A and B, as well as Figure S1A and B, should also be corrected accordingly.

      We thank the reviewer for the comment. We have carefully checked the details and the numbers were confirmed and updated.

      (2) Please add the scale bar in all image figures, including Figure 1C, Figure 2D, Figure 5C, Figure 7B, and Figure S2.

      We thank the reviewer for this suggestion. We have now added scale bars to all relevant image figures in the revised manuscript, including Figure 1C, New Figure 2C, Figure 5C, Figure 7B, and New Figure S2B.

      (3) Please add the molecular marker in all immunoblotting figures, including Figure 2C, Figure 2F, Figure 4B, Figure 4C, Figure 5B, Figure 5D, and Figure S5.

      We thank the reviewer for this suggestion. We have now added the molecular marker in all immunoblotting figures in the revised manuscript, including New Figure 2E–F, Figure 4B–C, Figure 5B and D, Figure S2A, New Figure S2E and New Figure S4C.

      References

      Bryan AF, Wang J, Howard GC, Guarnaccia AD, Woodley CM, Aho ER, Rellinger EJ, Matlock BK, Flaherty DK, Lorey SL, Chung DH, Fesik SW, Liu Q, Weissmiller AM, Tansey WP (2020) WDR5 is a conserved regulator of protein synthesis gene expression Nucleic Acids Res 48:2924-2941.

      Chai Q, Yu S, Zhong Y, Lu Z, Qiu C, Yu Y, Zhang X, Zhang Y, Lei Z, Qiang L, Li BX, Pang Y, Qiu XB, Wang J, Liu CH (2022) A bacterial phospholipid phosphatase inhibits host pyroptosis by hijacking ubiquitin Science 378(6616):eabq0132.

      Chen C, Nguyen BN, Mitchell G, Margolis SR, Ma D, Portnoy DA (2018) The listeriolysin O PEST-like sequence co-opts AP-2-mediated endocytosis to prevent plasma membrane damage during Listeria infection Cell host & microbe 23: 786-795.

      Chen Y, Cao F, Wan B, Dou Y, Lei M (2012) Structure of the SPRY domain of human Ash2L and its interactions with RbBP5 and DPY30 Cell Res 22:598–602.

      Cicchese JM, Evans S, Hult C, Joslyn LR, Wessler T, Millar JA, Marino S, Cilfone NA, Mattila JT, Linderman JJ, Kirschner DE (2018) Dynamic balance of pro‐ and anti‐inflammatory signals controls disease and limits pathology Immunological Reviews 285: 147–167.

      Couture JF, Skiniotis G (2013) Assembling a COMPASS Epigenetics 8:349-54

      Drerup MM, Schlücking K, Hashimoto K, Manishankar P, Steinhorst L, Kuchitsu K, Kudla J (2013) The calcineurin B-like calcium sensors CBL1 and CBL9 together with their interacting protein kinase CIPK26 regulate the Arabidopsis NADPH oxidase RBOHF Molecular plant 6: 559-569.

      Ge P, Lei Z, Yu Y, Lu Z, Qiang L, Chai Q, Zhang Y, Zhao D, Li B, Pang Y, Liu C, Wang J (2021) M. tuberculosis PknG Manipulates Host Autophagy Flux to Promote Pathogen Intracellular Survival Autophagy 18: 576–94.

      Hung KH, Woo YH, Lin IY, Liu CH, Wang LC, Chen HY, Chiang BL, Lin KI (2018) The KDM4A/KDM4C/NF-κB and WDR5 epigenetic cascade regulates the activation of B cells Nucleic Acids Res 46:5547–5560.

      James KS, Jain N, Witzl K, Cicchetti N, Fortune SM, Ioerger TR, Martinot AJ, Carey AF (2025) TnSeq identifies genetic requirements of Mycobacterium tuberculosis for survival under vaccine-induced immunity NPJ Vaccines 10:103.

      Li X, Chen L, Liao J, Hui J, Li W, He ZG (2020) A novel stress-inducible CmtR-ESX3-Zn²⁺ regulatory pathway essential for survival of Mycobacterium bovis under oxidative stress J Biol Chem 295:17083–17099.

      Park SH, Ayoub A, Lee YT, Xu J, Kim H, Zheng W, Zhang B, Sha L, An S, Zhang Y, Cianfrocco MA, Su M, Dou Y, Cho US (2019) Cryo-EM structure of the human MLL1 core complex bound to the nucleosome Nat Commun 10:5540.

      Penn BH, Netter Z, Johnson JR, Von Dollen J, Jang GM, Johnson T, Ohol YM, Maher C, Bell SL, Geiger K (2018) An Mtb-human protein-protein interaction map identifies a switch between host antiviral and antibacterial responses Mol Cell 71:637-648.e5.

      Petrovic S, Samanta D, Perriches T, Bley CJ, Thierbach K, Brown B, Nie S, Mobbs GW, Stevens TA, Liu X, Tomaleri GP, Schaus L, Hoelz A (2022) Architecture of the linker-scaffold in the nuclear pore Science 376: eabm9798.

      Podleśny-Drabiniok A, Romero-Molina C, Patel T, See WY, Liu Y, Marcora E, Goate AM (2025) Cytokine-induced reprogramming of human macrophages toward Alzheimer's disease-relevant molecular and cellular phenotypes in vitro Cell Rep 44:115909.

      Qiang L, Zhang Y, Lei Z, Lu Z, Tan S, Ge P, Chai Q, Zhao M, Zhang X, Li B, Pang Y, Zhang L, Liu CH, Wang J (2023) A mycobacterial effector promotes ferroptosis-dependent pathogenicity and dissemination Nat Commun 14:1430.

      Qu Q, Takahashi YH, Yang Y, Hu H, Zhang Y, Brunzelle JS, Couture JF, Shilatifard A, Skiniotis G (2018) Structure and Conformational Dynamics of a COMPASS Histone H3K4 Methyltransferase Complex Cell 174:1117-1126.e12.

      Rahman S, Hoffmann NA, Worden EJ, Smith ML, Namitz KEW, Knutson BA, Cosgrove MS, Wolberger C (2022) Multistate structures of the MLL1-WRAD complex bound to H2B-ubiquitinated nucleosome Proc Natl Acad Sci U S A 119:e2205691119.

      Sharma G, Upadhyay S, Srilalitha M, Nandicoori VK, Khosla S 2015 The interaction of mycobacterial protein Rv2966c with host chromatin is mediated through non-CpG methylation and histone H3/H4 binding Nucleic Acids Res 43:3922-37.

      Singh PR, Dadireddy V, Udupa S, Kalladi SM, Shee S, Khosla S, Rajmani RS, Singh A, Ramakumar S, Nagaraja V (2023) The Mycobacterium tuberculosis methyltransferase Rv2067c manipulates host epigenetic programming to promote its own survival Nat Commun 14:8497.

      Wang J, Ge P, Qiang L, Tian F, Zhao D, Chai Q, Zhu M, Zhou R, Meng G, Iwakura Y, Gao GF, Liu CH (2017) The mycobacterial phosphatase PtpA regulates the expression of host genes and promotes cell proliferation Nat Commun 8:244.

      Wang J, Li BX, Ge PP, Li J, Wang Q, Gao GF, Qiu XB, Liu CH (2015) Mycobacterium tuberculosis suppresses innate immunity by coopting the host ubiquitin system Nat Immunol 16:237–245

      Wysocka J, Swigut T, Milne TA, Dou Y, Zhang X, Burlingame AL, Roeder RG, Brivanlou AH, Allis CD (2005) WDR5 associates with histone H3 methylated at K4 and is essential for H3 K4 methylation and vertebrate development Cell 121:859-72.

      Yaseen I, Kaur P, Nandicoori VK, Khosla S (2015) Mycobacteria modulate host epigenetic machinery by Rv1988 methylation of a non-tail arginine of histone H3 Nat Commun 6:8922.

      Zhang L, Kent JE, Whitaker M, Young DC, Herrmann D, Aleshin AE, Ko YH, Cingolani G, Saad JS, Moody DB, Marassi FM, Ehrt S, Niederweis M (2022) A periplasmic cinched protein is required for siderophore secretion and virulence of Mycobacterium tuberculosis Nat Commun 13:2255.

      Zhang YJ, Reddy MC, Ioerger TR, Rothchild AC, Dartois V, Schuster BM, Trauner A, Wallis D, Galaviz S, Huttenhower C, Sacchettini JC, Behar SM, Rubin EJ (2013) Tryptophan biosynthesis protects mycobacteria from CD4 T-cell-mediated killing Cell 155:1296-308.

    1. Author response:

      Reviewer #1

      We thank the reviewer for their thoughtful and constructive assessment of AutoMorphoTrack and for recognizing its potential utility as an open-source end-to-end workflow for organelle analysis.

      (1) Novelty and relationship to existing tools / FIJI workflows

      We appreciate this concern and agree that many of the underlying image-processing operations (e.g., thresholding, morphological cleanup, region properties) are well-established. Our goal with AutoMorphoTrack is not to introduce new segmentation algorithms, but rather to provide a curated, reproducible, and extensible end-to-end workflow that integrates segmentation, morphology, tracking, motility, and colocalization into a single, transparent pipeline tailored for live-cell organelle imaging.

      While an experienced user could assemble similar analyses ad hoc using FIJI or custom scripts, our contribution lies in:

      Unifying these steps into a single workflow with consistent parameterization and outputs

      Generating standardized, publication-ready visualizations and tables at each step,

      Enabling batch and longitudinal analyses across cells and conditions, and

      Lowering the barrier for users who do not routinely write custom analysis code.

      We note that the documentation-style presentation of the manuscript is intentional, as it serves both as a methods paper and a practical reference for users implementing the workflow. We agree, however, that the manuscript currently overemphasizes step-by-step execution at the expense of positioning. In revision, we will more explicitly frame AutoMorphoTrack as a workflow integration and usability contribution, rather than a fundamentally new algorithmic advance.

      We will also cite and discuss the image.sc example referenced by the reviewer, clarifying conceptual overlap and differences in scope.

      (2) Comparison to existing pipelines (Imaris, CellProfiler, CellPose, StarDist)

      We agree and thank the reviewer for highlighting this omission. In the revised manuscript, we will expand the related-work and positioning section to explicitly compare AutoMorphoTrack with established commercial (e.g., Imaris) and open-source (e.g., CellProfiler, MiNA, MitoGraph) platforms, as well as learning-based segmentation tools such as CellPose and StarDist.

      Rather than claiming superiority, we will clarify trade-offs, emphasizing that AutoMorphoTrack prioritizes:

      Transparency and parameter interpretability,

      Lightweight dependencies suitable for small live-imaging datasets

      Direct integration of morphology, tracking, and colocalization in a single workflow, and

      Ease of modification for domain-specific use cases.

      (3) AI / chatbot integration

      We appreciate this critique and agree that the current description is insufficiently precise. AutoMorphoTrack does not implement a native natural-language interface. Instead, our intent was to convey that the workflow can be executed and modified with assistance from external large language models (LLMs) in a notebook-based environment.

      In revision, we will revise this section to:

      Clearly distinguish AutoMorphoTrack’s functionality from that of external LLM tools,

      Remove any implication of a built-in AI interface, and

      Provide concrete, reproducible examples of how non-coding users may interact with the pipeline using natural-language prompts mediated by external tools.

      Reviewer #2

      We thank the reviewer for their detailed and technically rigorous evaluation. We appreciate the recognition of the workflow’s motivation and structure, and we agree that several aspects of validation, positioning, and quantitative reporting must be strengthened.

      (1) AI-assisted / natural-language functionality

      We agree with this critique. AutoMorphoTrack does not provide a native natural-language execution layer, and the manuscript currently overstates this aspect. In revision, we will explicitly scope any reference to AI assistance as external, optional support for code generation and parameter editing, with clearly documented examples and stated limitations.

      We agree that conflating external LLM capabilities with the software itself risks misleading readers, and we will correct this accordingly.

      (2) Lack of quantitative validation

      We fully agree that the current manuscript lacks formal quantitative validation. In the revised version, we will add a dedicated validation section including:

      Segmentation accuracy compared to expert annotations using overlap metrics (e.g., Dice / IoU),

      Tracking fidelity assessed using manually annotated tracks and/or synthetic ground truth,

      Sensitivity analyses for key parameters (e.g., thresholding and linking distance), and

      Explicit discussion of failure modes and quality-control indicators.

      We acknowledge that without such validation, claims of robustness are not sufficiently supported.

      (3) Benchmarking and positioning relative to existing tools

      We agree and will substantially strengthen AutoMorphoTrack’s benchmarking and positioning relative to existing platforms. Rather than framing novelty algorithmically, we will clarify that the primary contribution is a reproducible, integrated workflow designed specifically for two-organelle live imaging in neurons, with transparent parameters and standardized outputs.

      We note that our goal is not to exhaustively benchmark against all available tools, but rather to provide representative comparisons that clarify operating regimes, assumptions, and trade-offs. We will add a comparative table and/or qualitative comparison highlighting strengths, assumptions, and limitations relative to existing tools.

      (4) Core algorithms and robustness

      We agree that reliance on threshold-based segmentation introduces sensitivity to imaging conditions. In revision, we will:

      Explicitly discuss the operating regime and assumptions under which AutoMorphoTrack performs reliably,

      Clarify that the framework is modular and can accept alternative segmentation backends, and

      Include guidance on when outputs should be treated with caution.

      (5) Figure, metric, and statistical issues

      We thank the reviewer for identifying several critical issues and agree that these undermine confidence. In revision, we will correct all figure, metric-definition, and reporting inconsistencies, including:

      Resolving circularity values exceeding 1 by correcting computation and/or labeling errors,

      Revising physically invalid displacement plots and clarifying kernel-density limitations,

      Ensuring colocalization metrics are consistently defined, named, and interpreted, with explicit clarification of whether calculations are intensity- or mask-based,

      Correcting figure legends to match displayed panels, and

      Clearly reporting sample size, sampling units, and statistical assumptions, including handling of multiple comparisons where applicable.

      (6) Value-added demonstration

      We agree that the manuscript would benefit from a clearer demonstration of value-added use cases. In revision, we will include at least one realistic example showing how AutoMorphoTrack enables a complete, reproducible analysis workflow with reduced setup burden compared to manually assembling multiple tools.

      (7) Editorial suggestions

      We agree and will streamline the Results section to reduce procedural repetition and focus more on validation, limitations, and quality-control guidance.

      Reviewer #3

      We thank the reviewer for their positive assessment of clarity and organization, and for the constructive practical feedback.

      Installation issues

      We appreciate the detailed report of installation failures and acknowledge that the current packaging and distribution are inadequate. Prior to revision, we will:

      Fix the package structure to support standard installation methods,

      Ensure all required files (e.g., setup configuration, README) are correctly included,

      Test installation on clean environments across platforms, and

      Correct broken links to notebooks and documentation.

      We agree that without a functional installation pathway, the utility of the tool is severely limited.

      AI-assisted claims

      We agree with the reviewer and echo our responses above. The AI-assisted description will be clarified and appropriately scoped in the revised manuscript.

      Additional suggestions

      Color accessibility: We will revise all figures to use colorblind-safe palettes.

      Velocity–displacement diagonal: We will explicitly explain the origin of this relationship, including whether it reflects dataset properties, tracking assumptions, or minimum detectable motion.

      Integrated correlation metric: We agree that Spearman correlation is more appropriate for many of these relationships and will replace Pearson correlations accordingly.

      Supplementary movies: We agree that providing raw movies would improve interpretability and will add representative examples as supplementary material.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1(Public review):

      Summary:

      In this study, the authors distinguished afferent inputs to different cell populations in the VTA using dimensionality reduction approaches and found significantly distinct patterns between normal and drug treatment conditions. They also demonstrated negative correlations of the inputs induced by drugs with gene expression of ion channels or proteins involved in synaptic transmission and demonstrated the knockdown of one of the voltage-gated calcium ion channels caused decreased inputs.

      Weaknesses:

      (1) For quantifications of brain regions in this study, boundaries were based on the Franklin-Paxinos (FP) atlas according to previous studies (Beier KT et al 2015, Beier KT et al 2019). It has been reported significant discrepancies exist between the anatomical labels on the FP atlas and the Allen Brain Atlas (ref: Chon U et al., Nat Commun 2019). Although a summary of conversion is provided as a sheet, the authors need to describe how consistent or different the brain boundaries they defined in the manuscript with Allen Brain Atlas by adding histology images. Also, I wonder how reliable the annotations were for over a hundred of animals with manual quantification. The authors should briefly explain it rather than citing previous studies in the Material and Methods Section.

      We thank the reviewer for attention to this point; indeed, neuroanatomical detail is often overlooked in modern neuroscience, occasionally leading to spurious conclusions. We acknowledge that there are significant discrepancies in brain region definitions across atlases, which can make cross-study comparisons difficult. Here, all cells were manually quantified by Dr. Kevin Beier, as in previous studies (Beier et al., Cell 2015; Nature 2017; Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychopharmacology, 2025). As such, these studies are internally consistent as relates to the definition of brain regions, which is critical here since our analysis in this manuscript relates to data quantified only by a single individual. Several brain regions were quite easy to distinguish anatomically, such as the medial habenula and lateral habenula. Others, such as the extended amygdala area, are much more difficult. We have now provided example images in Figure S1 that detail the anatomical boundaries that we used, overlayed on images of Neurotrace blue (fluorescent Nissl stain).

      (2) Regarding the ellipsoids in the PC, although it's written in the manuscript that "Ellipsoids were centered at the average coordinate of a condition and stretched one standard deviation along the primary and secondary axes", it's intuitively hard to understand in some figures such as Figure 2O, P and Figure S1. The authors need to make their data analysis methods more accessible by providing source code to the public.

      The source code is now available to the public at https://github.com/ktbartas/Bartas_et_al_eLife_2024, which is noted in the Code Availability statement. The code for generating ellipsoids is in the first notebook, `0-dataexploration-master-euclidean.ipynb`, in the function `confidence_ellipse`, which is called from `make_pca_plots` and `umap_and_heatmap`. Example plots are all live in the notebooks as can be viewed directly from GitHub.

      (3) In histology images (Figure 1B and 3K), the authors need to add dashed lines or arrows to guide the reader's attention.

      Dashed lines have been added to these figure panels as requested.

      (4) In Figure 2A and G, apparently there are significant differences in other brain regions such as NAcMed or PBN. If they are also statistically significant, the authors should note them as well and draw asterisks(*).

      We appreciate the care in ensuring that statistics are being applied and shown appropriately. In panel A (now Figure 3A), the Two-way ANOVA interaction term was not significant (p = 0.9365), we did not find it justified to do further comparisons. However, for Figure 3G, the interaction term was significant (p = 0.0001), and thus further pairwise comparisons were performed with Sidak's correction for multiple comparisons. When done, the only two brain regions that were significantly different were the DStr (p = 0.0051) and GPe (p = 0.0036). While the NAcMed and PBN visually look different, according to the corrected statistics, they were not significantly different (NAcMed p = 0.5037, PBN p = 0.8123). The notations in our original figure thus accurately reflected these statistics.

      (5) In Figure 2N about the spatial distribution of starter cells, the authors need to add histology images for each experimental condition (i.e. saline, fluoxetine, cocaine, methamphetamine, amphetamine, nicotine, and morphine) as supplement figures

      We have now provided these as Figure S2.

      (6) In the manuscript, it is necessary to explain why Cacna1e was selected among other calcium ion channels.

      We have added a sentence to the "Functional validation of link between gene expression and RABV labeling" section (lines 722-724).

      Reviewer #2 (Public review):

      The application of rabies virus (RabV)-mediated transsynaptic tracing has been widely utilized for mapping celltype-specific neural connectivities and examining potential modifications in response to biological phenomena or pharmacological interventions. Despite the predominant focus of studies on quantifying and analyzing labeling patterns within individual brain regions based on labeling abundance, such an approach may inadvertently overlook systemic alterations. There exists a considerable opportunity to integrate RabV tracing data with the global connectivity patterns and the transcriptomic signatures of labeled brain regions. In the present study, the authors take an important step towards achieving these objectives. Specifically, the authors conducted an intensive reanalysis of a previously generated large dataset of RabV tracing to the ventral tegmental area (VTA) using dimension reduction methods such as PCA and UMPA. This reaffirmed the authors' earlier conclusion that different cell types in the VTA, namely dopamine neurons (DA) and GABAergic neurons, exhibit quantitatively distinct input patterns, and a single dose of addictive drugs, such as cocaine and morphine, induced altered labeling patterns. Additionally, the authors illustrate that distinct axes of PCA can discriminate experimental variations, such as minor differences in the injection site of viral tracers, from bona fide alternations in labeling patterns caused by drugs of abuse. While the specific mechanisms underlying altered labeling in most brain regions remain unclear, whether involving synaptic strength, synaptic numbers, pre-synaptic activities, or other factors, the present study underscores the efficacy of an informatics approach in extracting more comprehensive information from the RabV-based circuit mapping data. Moreover, the authors showcased the utility of their previously devised bulk gene expression patterns inferred by the Allen Gene Expression Atlas (AGEA) and "projection portrait" derived from bulk axon mapping data sourced from the Allen Mouse Brain Connectivity Atlas. The utilization of such bulk data rests upon several limitations. For instance, the collection of axon mapping data involves an arbitrary selection of both cell type-specific and non-specific data, which might overlook crucial presynaptic partners, and often includes contamination from neighboring undesired brain regions. Concerns arise regarding the quantitativeness of AGEA, which may also include the potential oversight of key presynaptic partners. Nevertheless, the authors conscientiously acknowledged these potential limitations associated with the dataset. Notably, building on the observation of a positive correlation between the basal expression levels of Ca2+ channels and the extent of drug-induced changes in RabV labeling patterns, the authors conducted a CRISPRi-based knockdown of a single Ca2+ channel gene. This intervention resulted in a reduction of RabV labeling, supporting that the observed gene expression patterns have causality in RabV labeling efficiency. While a more nuanced discussion is necessary for interpreting this result (see below), overall I commend the authors for their efforts to leverage the existing dataset in a more meaningful way. This endeavor has the potential to contribute significantly to our understanding of the mechanisms underlying alterations in RabV labeling induced by drugs of abuse. Finally, drawing upon the aforementioned reanalysis of previous data, the authors underscored that a single administration of ketamine/xylazine anesthesia could induce enduring modifications in RabV labeling patterns for VTA DA neurons, specifically those projecting to the nucleus accumbens and amygdala. Given the potential impact of such alterations on motivational behaviors at a broader level, I fully agree that prudent consideration is warranted when employing ketamine/xylazine for the investigation of motivational behaviors in mice.

      Specific Points:

      (1) Beyond advancements in bioinformatics, readers may find it insightful to explore whether the PCA/UMPAbased approach yields novel biological insights. For example, the authors are encouraged to discuss more functional implications of PBN and LH in the context of drugs of abuse, as their labeling abundance could elucidate the PC2 axis in Fig. 2M.

      Thank you for this suggestion: we added text (Lines 787-795) discussing the LH and PBN (and GPe) specifically, but also highlighted the importance of our approach in hypothesis-generating science.

      (2) While I appreciate the experimental data on Cacna1e knockdown, I am unclear about the rationale behind specifically focusing on Cacna1e. The logic behind the statement, "This means that expression of this gene is not inhibitory towards RABV transmission," is also unclear. Loss-of-function experiments only signify the necessity or permissive functions of a gene. In this context, Cacna1e expression levels are required for efficient RabV labeling, but this neither supports nor excludes the possibility that this gene expression instructively suppresses RabV labeling/transmission, which could be assessed through gain-of-function experiments.

      We thank the reviewer for their suggestions regarding this result, and agree that a gain-of-function would be required to provide clearer evidence on this point.  We therefore understand that our original phrasing may be misleading. Thus, we have edited this section to the more conservative statement: “These results indicate that reduced levels of Cacna1e likely lower the number of RABV-labeled inputs from the NAcLat, and directly link the levels of Cacna1e and RABV input labeling” (lines 742-744) - we refrain from over-interpreting the results. As mentioned above in response to R1, we added a sentence to explain the rationale behind focusing on Cacna1e (lines 722-724).

      Reviewer #3 (Public Review):

      Summary:

      Authors mapped monosynaptic inputs to dopamine, GABA, and glutamate neurons in VTA under different anesthesia methods, and under drugs (cocaine, morphine, methamphetamine, amphetamine, nicotine, fluoxetine). They found that input patterns under different conditions are separated, and identified some key brain areas to contribute to such separation. They also searched a database for gene expression patterns that are common across input brain areas with some changes by anesthesia or drug administration.

      Strengths:

      The whole-brain approach to address drug effects is appealing and their conclusion is clear. The methodology and motivation are clearly explained.

      Weaknesses:

      While gene expression analyses may not be related to their findings on the anatomical effects of drugs, this will be a nice starting point for follow-up studies. 

      We understand and agree with the suggestion that gene expression allows us to provide correlative observations between in situ hybridization datasets and rabies mapping datasets, and that these results do not show causality. As such, future studies would be needed to assess this in more detail. We have added a line in the discussion to this effect (lines 851-853).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation:

      (1) There are a couple of packages available for 3D whole-brain reconstructions based on Allen Brain Atlas (eg. https://github.com/tractatus/wholebrain, https://github.com/lahammond/BrainJ), which would be helpful to align with the gene expression or other data from Allen Institute.

      This comment is related to the noted weakness we responded to previously in this rebuttal also from R1 (see comment 1), about the discrepancies between the Franklin-Paxinos atlas and Allen Brain atlas. We agree that a systematic comparison of these two atlases using a tool like wholebrain or BrainJ would be valuable for the field. However, it would be a substantial amount of work, and likely would be an independent study in itself. We believe that the resolution of these atlases was sufficient to make our key conclusions here (e.g., identify gene expression patterns that relate to drug-induced changes rabies virus labeling patterns, and develop a testable hypothesis for CRISPR-based gene editing). They are also based on the same atlases and region definitions that have been applied in our previous studies (e.g., Beier et al., Cell 2015; Beier et al., Nature 2017; Beier et al., Cell Reports 2019; Tian et al., Cell Reports 2022; Tian et al., Neuron 2024; Hubbard et al., Neuropsychophamacology 2025, etc.)  The expression of Cacna1e is relatively consistent across the NAc, as we have now detailed in Figure S13.

      (2) There are so far two kinds of rabies virus strains available in the neuroscience field (SAD-B19 or CVS-N2c). It is recommended to describe which strain was used in the Material and Methods Section because labeling efficiency and toxicity is quite different between the strains (Reardon TR et al., Neuron 2016).

      We have now noted that we used SAD B19 for all experiments (Lines 141-142).

      Minor corrections to the text and figures:

      (1)  In Figure 1A, the color differences are not clear (i.e. light gray and dark gray). The figure can be simplified.

      In addition, generally, images/figures are recommended not to be overlapped with other figures/images (Figures 2A-F, 2G-L).

      (2)  In Figures 7C and D, the authors could add enlarged views of starter cells in VTA and NAcLat.

      We have attempted to simplify schematics and figures throughout. High-magnification images of cells have been added as insets in what is now Figure 10 (formerly Figure 7).

      Reviewer #2 (Recommendations For the authors):

      The number of animals for each graph should be explicated within the figure legend. For example, Figure 1C and Figure 7E lack this information. It is also advisable to delineate the definition of error bars within the figure legend.

      We have now added mouse numbers to all figures and/or legends, as appropriate. We also indicated in the legend at the end of Figure 1 how error bars and asterisks are defined. Furthermore, we added a sentence to the methods saying that in UMAP and PCA plots each dot is an animal (lines 244-245).

      The visual representations, particularly in Figures 1 and 3, are overcrowding. Furthermore, the arrangement of figure subpanels does not consistently adhere to the sequence of explication in the main text, significantly compromising the readability of the text. The authors are encouraged to consider the possibility of segmenting dense figures into two if there exists no upper limit for the number of figure displays. To illustrate, in Figure 3Q, crucial details about experimental conditions are denoted by numerical references, owing to spatial constraints.

      We agree that the figure layout and mis-alignment with a linear read of the text was unideal. Therefore, we broke our figures, especially the original Figures 1-4, into multiple sub-figures, including both main and supplemental figures. This facilitated the use of space to rearrange the figure panels, allowing the story to be told in a linear fashion. All figures and panels should now be read in order.

      I am seeking clarification on how to interpret the term "overlap" at the bottom of figures illustrating Gene Ontology analysis.

      We have clarified the meaning of overlap in this context (lines 324-325): The ‘overlap’ term on the x-axis of these plots means the number of genes in the correlated gene lists that were also within the list of genes for the corresponding GO term.

      The authors could provide Cacna1e gene expression patterns within the NAc from the AGEA data.

      Cacna1e expression data are now provided in Figure S13.

      Additionally, the meaning of "controls" in Figure 7F, along with the "No gRNA" condition, remains ambiguous. While the text mentions "no shRNA", the involvement of shRNA in this experiment lacks clarity.

      We now clarify that the control conditions are based on previously published data where no AAVs were injected into NAcLat. This is now clarified in the legend for Figure 10F (lines 1277-1578). We also corrected “shRNA” to “gRNA” in the text.

    1. Author response:

      The following is the authors’ response to the original reviews

      We appreciate the reviewers’ insightful comments. In response, we conducted three new experiments, summarized in Author response table 1. After the table, we provide detailed responses to each comment.

      Author response table 1.

      Summary of new experiments and results.

      Reviewer #1 (Public review):

      The authors show that corticotropin-releasing factor (CRF) neurons in the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) monosynaptically target cholinergic interneurons (CINs) in the dorsal striatum of rodents. Functionally, activation of CRFR1 receptors increases CIN firing rate, and this modulation was reduced by pre-exposure to ethanol. This is an interesting finding, with potential significance for alcohol use disorders, but some conclusions could use additional support.

      Strengths:

      Well-conceived circuit mapping experiments identify a novel pathway by which the CeA and BNST can modulate dorsal striatal function by controlling cholinergic tone. Important insight into how CRF, a neuropeptide that is important in mediating aspects of stress, affective/motivational processes, and drug-seeking, modulates dorsal striatal function.

      Weaknesses:

      (1) Tracing and expression experiments were performed both in mice and rats (in a mostly nonoverlapping way). While these species are similar in many ways, some conclusions are based on assumptions of similarities that the presented data do not directly show. In most cases, this should be addressed in the text (but see point number 2).

      In the revised manuscript, we have clarified this limitation in the first paragraph of the Methods and the third paragraph of the Discussion and avoid cross-species claims, limiting our conclusions to the species in which each assay was performed. Specifically, we now state that while mice and rats share many conserved amygdalostriatal components, our tracing and expression studies were performed in a species-specific manner, and direct cross-species comparisons of CRF–CIN connectivity and CRFR1 expression were not assessed. We further note that future studies will be needed to determine the extent to which these observations are conserved across species as more tools become available.

      (2) Experiments in rats show that CRFR1 expression is largely confined to a subpopulation of striatal CINs. Is this true in mice, too? Since most electrophysiological experiments are done in various synaptic antagonists and/or TTX, it does not affect the interpretation of those data, but non-CIN expression of CRFR1 could potentially have a large impact on bath CRF-induced acetylcholine release.

      To address whether CRFR1 expression in striatal CINs is conserved across species, we performed new histological experiments using CRFR1-GFP mice. Striatal sections were immunostained with anti-ChAT, and we found that approximately 10% of CINs express CRFR1 (new Fig. 4D, 4E). This result indicates that, similar to rats, a subset of CINs in mice express CRFR1. However, the proportion of CRFR1<sup>+</sup> CINs is lower than the proportion of CRF-responsive CINs observed during electrophysiology experiments, suggesting that CRF may also modulate CIN activity indirectly through network or synaptic mechanisms. We have also noted in the revised Discussion that while CRFR1 expression is confirmed in a subset of CINs, the broader distribution of CRFR1 among other striatal cell types remains to be determined (third paragraph of Discussion).

      In our study, bath application of CRF increased striatal ACh release. Because striatal ACh is released primarily from CINs, and CRFR1 is an excitatory receptor, this effect is most likely mediated by CRF activation of CRFR1 on CINs, leading to enhanced CIN activity and ACh release. Although CRFR1 may also be expressed on other striatal neurons, these cell types—medium spiny neurons and GABAergic interneurons—are inhibitory. If CRF were to activate CRFR1 on these GABAergic neurons, the resulting increase in GABA release would suppress CIN activity and consequently reduce, rather than enhance, ACh release. Given that most CINs responded functionally while only a small subset expressed CRFR1, these findings imply that indirect mechanisms, such as CRF modulation of local circuits influencing CIN excitability, may also contribute to the observed increase in ACh release. Together, these data support a model in which CRF primarily enhances ACh release via activation of CRFR1-expressing CINs, while indirect network effects may further amplify this response.

      (3) Experiments in rats show that about 30% of CINs express CRFR1 in rats. Did only a similar percentage of CINs in mice respond to bath application of CRF? The effect sizes and error bars in Figure 5 imply that the majority of recorded CINs likely responded. Were exclusion criteria used in these experiments?

      We thank the reviewer for this insightful question. In our mouse cell-attached recordings, ~80% of CINs increased firing during CRF bath application, and all recorded cells were included in the analysis (no exclusions based on response direction/magnitude; cells were only required to meet standard recording-quality criteria such as stable baseline firing and seal).

      Using a CRFR1-GFP reporter mouse, we found that ~10% of striatal CINs are GFP+, suggesting that the high proportion of CRF-responsive CINs cannot be explained solely by somatic reporter-labeled CRFR1 expression. Importantly, the CRF-induced increase in CIN firing is blocked by the selective CRFR1 antagonist NBI 35695 (Fig. 5B–C), supporting a CRFR1-dependent mechanism at the circuit level. We now discuss several non-mutually exclusive explanations for this apparent discrepancy: (i) reporter lines (e.g., CRFR1-GFP) may underestimate functional CRFR1 expression, particularly for low-level or compartmentalized receptor pools; (ii) bath-applied CRF may act indirectly via CRFR1 on presynaptic afferents, thereby enhancing excitatory drive onto CINs; and (iii) electrical coupling among CINs could allow direct effects in a subset of CINs to propagate through the CIN network (Ren, Liu et al. 2021). We added this discussion to the revised manuscript (fourth paragraph of the Discussion).

      (4) The conclusion that prior acute alcohol exposure reduces the ability of subsequent alcohol exposure to suppress CIN activity in the presence of CRF may be a bit overstated. In Figure 6D (no ethanol preexposure), ethanol does not fully suppress CIN firing rate to baseline after CRF exposure. The attenuated effect of CRF on CIN firing rate after ethanol pre-treatment (6E) may just reduce the maximum potential effect that ethanol can have on firing rate after CRF, due to a lowered starting point. It is possible that the lack of significant effect of ethanol after CRF in pre-treated mice is an issue of experimental sensitivity. Related to this point, does pre-treatment with ethanol reduce the later CIN response to acute ethanol application (in the absence of CRF)?

      In the revised manuscript, we have tempered our interpretation in the final Results section and throughout the Discussion to emphasize that ethanol pre-exposure attenuates, rather than abolishes, the CRFinduced increase in CIN firing. We also note the reviewer’s important point that in Figure 6D, ethanol does not fully suppress firing to baseline after CRF exposure, consistent with a partial effect. Regarding the reviewer’s question, our experiments were specifically designed to test interactions between CRF and ethanol, so we did not assess whether ethanol pre-treatment alters subsequent responses to ethanol alone. We now explicitly acknowledge CRF-dependent and CRF-independent effects of ethanol on CIN activity as an important point for future studies to disentangle (sixth paragraph of the Discussion). For example, comparing ethanol responses with and without prior ethanol without any treatment with CRF could resolve this question.

      (5) More details about the area of the dorsal striatum being examined would be helpful (i.e., a-p axis).

      We now provide more detail regarding the anterior–posterior axis of the dorsal striatum examined. Most recordings and imaging were performed in the posterior dorsomedial striatum (pDMS), corresponding to coronal slices posterior to the crossing of the anterior commissure and anterior to the tail of the striatum (starting around 0.62 mm and ending at −1.3 mm relative to the Bregma). While our primary focus was on posterior slices, some anterior slices were included to increase the sample size. These details have been added to the Methods (Last sentence of the ‘Histology and cell counting’ section and of the ‘Slice electrophysiology’ section).

      Reviewer #2 (Public review):

      Essoh and colleagues present a thorough and elegant study identifying the central amygdala and BNST as key sources of CRF input to the dorsal striatum. Using monosynaptic rabies tracing and electrophysiology, they show direct connections to cholinergic interneurons. The study builds on previous findings that CRF increases CIN firing, extending them by measuring acetylcholine levels in slices and applying optogenetic stimulation of CRF+ fibers. It also uncovers a novel interaction between alcohol and CRF signaling in the striatum, likely to spark significant interest and future research.

      Strengths:

      A key strength is the integration of anatomical and functional approaches to demonstrate these projections and assess their impact on target cells, striatal cholinergic interneurons.

      Weaknesses:

      (1) The nature of the interaction between alcohol and CRF actions on cholinergic neurons remains unclear. Also, further clarification of the ACh sensor used and others is required

      We have clarified the nature of the interaction between alcohol and CRF signaling in CINs and have provided additional details regarding the acetylcholine sensor used. These issues are addressed in detail in our responses to the specific comments below.

      Reviewer #2 (Recommendations for the authors):

      (1) The interaction between the effects of alcohol and CRF is a novel and important part of this study. When considering possible mechanisms underlying the findings in the discussion, there is no mention of occlusion. Given that incubation with alcohol produced a similar increase in firing of CINs as CRF, occlusion could be a parsimonious explanation for the observed interaction. Have the author considered blocking the effects of alcohol on CIN with CRF-R1 antagonist? Another experiment that could address the occlusion would be to test if alcohol also increases ACh levels as it did CRF.

      We thank the reviewer for proposing occlusion as a potential mechanism underlying the interaction between alcohol and CRF. We agree that, in principle, alcohol-induced endogenous CRF release could occlude subsequent exogenous CRF-mediated potentiation of CIN firing, and we carefully considered this possibility.

      However, several observations from our data argue against occlusion driven by acute alcohol exposure or withdrawal in this preparation. First, as shown in Fig. 6A, bath application of alcohol transiently reduced CIN firing, and firing recovered to baseline levels after washout without any rebound increase. Second, in Fig. 6D–E, the baseline firing rates under control conditions and following alcohol pretreatment were comparable, indicating that acute alcohol exposure and short-term withdrawal did not produce a sustained increase in CIN excitability. Together, these results suggest that acute withdrawal in slices is less likely to trigger substantial endogenous CRF release capable of occluding subsequent exogenous CRF effects.

      While we and others have previously reported increased spontaneous CIN firing following prolonged in vivo alcohol exposure and extended withdrawal periods (e.g., 21 days), short-term withdrawal (e.g., 1 day) does not robustly alter baseline CIN firing (Ma, Huang et al. 2021, Huang, Chen et al. 2024). Consistent with these prior findings, the absence of a rebound or elevated baseline firing in the present slice experiments discouraged further pursuit of an endogenous CRF occlusion mechanism under acute conditions.

      We also considered experimentally testing occlusion by blocking CRFR1 signaling during alcohol pre-treatment. However, this approach is technically challenging in slice recordings, as CRFR1 antagonists require prolonged incubation (~1 hour) during alcohol exposure. Because it is unclear whether endogenous CRF release is triggered by alcohol incubation itself or by withdrawal, the antagonist would need to remain present throughout both the incubation and withdrawal periods. This leaves insufficient time for complete washout of the CRFR1 antagonist prior to subsequent bath application of exogenous CRF to assess its effects on CIN firing. Consequently, residual antagonist presence would confound the interpretation of the exogenous CRF response.

      Finally, regarding the possibility that alcohol increases acetylcholine release, we did not observe alcohol-induced increases in CIN firing in slices, arguing against elevated ACh signaling under these conditions. Consistent with prior work (Ma, Huang et al. 2021, Huang, Chen et al. 2024), alcohol-induced increases in CIN excitability and cholinergic signaling appear to depend on prolonged in vivo exposure and extended withdrawal rather than acute slice-level manipulations.

      We have now incorporated discussion of occlusion as a potential mechanism (seventh paragraph) and clarified why our data and technical considerations argue against it in the present study. We thank the reviewer for this wonderful suggestion, which we will test in future in vivo studies.

      (2) Retrograde monosynaptic tracing of inputs to CIN. Results state the finding of labeling in all previously reported area..." Can the authors report these areas? A list in the text or a bar plot, if there is quantification, will suffice. This formation will serve as important validation and replication of previous findings.

      We thank the reviewer for this constructive suggestion. We agree that summarizing the anatomical sources of CIN input provides important validation of our tracing results. In the revised Results, we now list the major input regions observed, including the striatum itself, cortex (e.g., cingulate cortex, motor cortex, somatosensory cortex), thalamus (e.g., parafascicular thalamic nucleus, centrolateral thalamic nucleus), globus pallidus, and midbrain (first paragraph of the Results). Quantitative analysis of relative input strength will be presented in a separate study that expands on these findings. Here, we limit the current manuscript to the functional characterization of CRF and alcohol modulation of CINs.

      (3) Given the difference in connectivity among striatal subregions, it would be important to describe in more detail the injection site in the results and figures. In the figure, for example, you might want to include the AP coordinates, given that it is such a zoomed-in image, it is hard to tell how anterior/posterior the site is. I imagine that the picture is a representative image of the injection site, but maybe having a side image with overlay of injection sites in all the animals used, would help.

      The anterior–posterior (AP) coordinates for representative images have been included in the panels and reiterated more clearly in the revised Results section and figure legends. In the legend for Figure 3B, a list of AP coordinates for each animal used for Figure 3A-3E has been added.

      (4) Figure 1D inset, there seem to be some double-labeled cells in the zoomed in BNST images. The authors might want to comment on this. It seemed far from the injection site. Do D1-MSN so far away show connectivity to CINs?

      Upon closer inspection of the BNST images, we noted a small number of double-labeled cells were indeed present, consistent with prior reports that a subset of D1R-expressing neurons (~10%) has been reported previously in our lab in the BNST, with the majority being D2R-expressing neurons (Lu, Cheng et al. 2021). Given the BNST’s anatomical proximity to the dorsal striatum, it is plausible that some D1Rexpressing neurons in this region provide monosynaptic input to CINs, highlighting a potential ventral-to-dorsal connection that merits further study.

      (5) Can the author provide quantification of the onset delay of the optogenetic evoked CRF+ axon responses onto CINs? The claim of monosynaptic connectivity is well supported by the TTX/4AP experiment but additional information on the timing will strengthen that conclusion.

      We thank the reviewer for this insightful suggestion. Quantifying the onset latency of optogenetically evoked CRFMsup+</sup> axon responses onto CINs provides valuable confirmation of monosynaptic connectivity. To address this, we performed new latency measurements under the same recording conditions as the TTX/4-AP experiments. The average onset latency from the start of the optical stimulation was 5.85 ± 0.37 ms (new Figure 3J), consistent with direct monosynaptic transmission.

      As an additional reference, we analyzed latency data from a separate project in which we optogenetically stimulated cholinergic interneurons and recorded synaptic responses in medium spiny neurons. This circuit, known to involve disynaptic transmission from CINs to MSNs via nAChR-expressing interneurons (Autor response image 1) (English, Ibanez-Sandoval et al. 2011), exhibited a significantly longer latency (18.34 ± 0.70 ms; t<sub>(29)</sub> = 10.3, p < 0.001) compared to CRF⁺ CeA/BNST inputs to CINs (5.85 ± 0.37 ms). Together, these results further support that CRF⁺ axons form direct functional synapses onto CINs.

      Author response image 1.

      Latency of disynaptic transmission from CINs to MSNs via interneurons A) Schematic illustrating optogenetic stimulation of Chrimson-expressing CINs, leading to excitation of nAChRexpressing interneurons that release GABA onto recorded MSNs. B) Sample trace of disynaptic transmission (left) and bar graph summarizing onset latency (right) from light stimulation to synaptic response onset (n = 23 neurons from 3 mice).

      (6) The ACh sensor reported is "AAV-GRABACh4m" but the reference is for GRAB-ACh3.0. Also, BrainVTA has GRAB-ACh4.3. Is this the vector? Could you please check the name of the construct and report the corresponding reference, as well as clarify the meaning of the additional "m". They have a mutant version of the GRAB-ACH that researchers use for control, and of course, you want to use it as a control, but not for the test experiment.

      GRAB-ACh4m is the correct acetylcholine sensor used in this study. The ACh4 series (including ACh4h, ACh4m, and ACh4l; personal communication with Dr. Yulong Li’s lab) represents an updated generation following GRAB-ACh3.0. Although the ACh4 family has not yet been formally published, these constructs are publicly available through BrainVTA (https://www.brainvta.tech/plus/view.php?aid=2680).

      The suffix “m” does not indicate a mutant control; rather, it denotes a medium-affinity variant within the ACh4 sensor family. Importantly, the mutant (non-responsive) control sensor is only available for GRAB-ACh3.0 (ACh3.0mut) and does not exist for the ACh4 series.

      Our laboratory has previously used GRAB-ACh4m in multiple peer-reviewed publications (Huang, Chen et al. 2024, Gangal, Iannucci et al. 2025, Purvines, Gangal et al. 2025), and its use has also been reported by independent groups in recent preprints (Potjer, Wu et al. 2025, Touponse, Pomrenze et al. 2025). We have now clarified the construct name, its relationship to GRAB-ACh3.0, in the Methods ‘Reagents’ section, and we have corrected the reference accordingly.

      (7) Are CRF-R1+ CINs equally abundant in the DMS and DLS? From the image in Figure 4, it seems that a larger percentage of CINs are CRFR1+ in the DLS than in DMS. Is this true? The authors probably already have this data, or it should be easy to get, and it could be additional information that was not studied before.

      We did not perform a quantitative comparison of CRFR1+ CIN abundance between the DMS and DLS in the present study. While the representative images in Figure 4 may appear to suggest regional differences, these panels were selected to illustrate labeling quality rather than relative density and should not be interpreted as evidence of unequal distribution. We have clarified this point in the revised Discussion (last sentence of the third paragraph) and note that future studies will be needed to systematically evaluate potential regional differences in CRFR1 expression, which could have important implications for dorsal striatal function.

      (8) The manuscript states several times that there are no CRF+ neurons in the dorsal striatum. At the same time, there are reports of the CRF+ neuron in the ventral striatum and its role in learning. Could the authors include mention of the studies by the Lemos group (10.1016/j.biopsych.2024.08.006)

      We have revised the Discussion section to clarify that our findings pertain specifically to the dorsal striatum and now acknowledge the presence and functional relevance of CRF+ neurons in the ventral striatum, citing the Lemos group’s study (fifth paragraph of the Discussion).

      (9) For the histology analysis, please express cell counts as "density", not just number of cells, by providing an area (e.g., "number of cell/ µm2").

      In the revised manuscript, all histological outcomes have been recalculated as cell density (cells/mm<sup>2</sup>) by normalizing raw cell counts to the measured area of each region of interest (ROI). Figures that previously displayed absolute counts now present densities (cells/mm<sup>2</sup>), with corresponding updates made to figure legends and text. We note one exception in Figure 4B, where the comparison between the total number of CINs and CRFR1+ CINs is best represented as cell counts rather than normalized values, as the counting was conducted in the same area (within the same ROI) of the dorsostriatal subregion.

      (10) Figure 2C, we can see there are some labeled fibers in the striatum cut. Would it be possible to get a better confocal image?

      Figure 2C has been replaced with a higher-quality confocal image captured at the same magnification and scale. The updated image provides improved clarity and resolution, ensuring accurate visualization of labeled CRF+ fibers, but not cell bodies, within the striatum.

      (11) The ACh measurements in the slice are very informative and an important addition. I first thought that these experiments with the GRAB-ACh sensor were performed in ChAT-eGFP mice. After reading more carefully, I realized they were done in wild-type mice. Would you include the wildtype label in the figure as well? The ChATeGFP BAC transgenic line was reported to have enhanced ACh packaging and increased ACh release, which could have magnified the signals. So, it is important to highlight the experiments were done in wildtype mice.

      We now label with ‘WT mice’ and note in the legend that all GRAB-ACh experiments were performed in wild-type mice, not ChAT-eGFP, to avoid confounds in ACh release. We thank the reviewer for this important suggestion.

      Reviewer #3 (Public review):

      The authors demonstrate that CRF neurons in the extended amygdala form GABAergic synapses onto cholinergic interneurons and that CRF can excite these neurons. The evidence is strong, however, the authors fail to make a compelling connection showing CRF released from these extended amygdala neurons is mediating any of these effects. Further, they show that acute alcohol appears to modulate this action, although the effect size is not particularly robust.

      Strengths:

      This is an exciting connection from the extended amygdala to the striatum that provides a new direction for how these regions can modulate behavior. The work is rigorous and well done.

      Weaknesses:

      (1) While the authors show that opto stim of these neurons can increase firing, this is not shown to be CRFR1 dependent. In addition, the effects of acute ethanol are not particularly robust or rigorously evaluated. Further, the opto stim experiments are conducted in an Ai32 mouse, so it is impossible to determine if that is from CEA and BNST, vs. another population of CRF-containing neurons. This is an important caveat.

      We added recordings with the CRFR1 antagonist antalarmin. Light-evoked increases in CIN firing were abolished under CRFR1 blockade, linking the effect to CRFR1 (Figure 5J, 5K). We also clarify that CRFCre;Ai32 does not isolate CeA versus BNST sources, so we temper regional claims and highlight this as a limitation. The acute ethanol effects are modest but consistent; we expanded the discussion of dose and preparation constraints in acute slice physiology and note that in vivo studies will be needed to define the network-level impact.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors could bring some of this data together by examining CRFR1 dependence of optical stimulationinduced increases in firing. Further, the authors have devoted significant effort to exploring how the BNST and CEA project to the CIN, yet their ephys does not explore site-specific infusion of ChR2 into either region. How are we to be sure it is not some other population of CRF neurons mediating this effect? The alcohol data does not appear particularly robust, but I think if the authors wanted to, they could explore other concentrations. Mostly I think it is important to discuss the limitations of acute alcohol on 5a brain slice.

      We thank the reviewer for these thoughtful comments, which helped us strengthen the mechanistic interpretation of the CRF-CIN interaction. In the revised manuscript, we have addressed each point as follows:

      - CRFR1 dependence of optogenetically evoked responses: We performed new recordings in which optogenetic stimulation of CRF⁺ terminals in the dorsal striatum was conducted in the presence of the CRFR1 antagonist antalarmin. The increase in CIN firing evoked by light stimulation was abolished under CRFR1 blockade, confirming that this effect is mediated through CRFR1 activation (new Figure 5J, 5K, third paragraph of the corresponding Result section). These results directly link the functional effects of CRF⁺ terminal activation to CRFR1 signaling on CINs.

      - CeA vs. BNST projection specificity: The reviewer is correct that CeA and BNST projections were not analyzed separately. As unknown pathways, our experiment was designed to first establish the monosynaptic connections between CeA/BNST CRF neurons to striatal CINs. Future studies would further explore the specific contribution of each site. However, our data exclude the possibility of other CRF neurons as we selectively infused Cre-dependent opsins into both CeA and BNST of CRF-Cre mice (Figure 3G-3J).

      - Limitations of acute slice experiments: We have expanded the Discussion (sixth paragraph) to acknowledge that acute slice physiology cannot fully capture the dynamic and network-level effects of ethanol observed in vivo. While this preparation enables mechanistic precision, factors such as washout, diffusion constraints, and the absence of systemic feedback may underestimate ethanol’s impact on CINs. We now explicitly note this limitation and highlight the need for in vivo studies to examine behavioral and circuit-level implications of CRF–alcohol interactions.

      Collectively, these revisions clarify the CRFR1 dependence of CRF<sup>+</sup> terminal effects and reaffirm that both CeA and BNST projections contribute to CIN modulation while addressing the methodological limitations of the slice preparation.

      Reviewer #4 Public Review):

      This manuscript presents a compelling and methodologically rigorous investigation into how corticotropin-releasing factor (CRF) modulates cholinergic interneurons (CINs) in the dorsal striatum - a brain region central to cognitive flexibility and action selection-and how this circuit is disrupted by alcohol exposure. Through an integrated series of anatomical, optogenetic, electrophysiological, and imaging experiments, the authors uncover a previously uncharacterized CRF⁺ projection from the central amygdala (CeA) and bed nucleus of the stria terminalis (BNST) to dorsal striatal CINs.

      Strengths:

      Key strengths of the study include the use of state-of-the-art monosynaptic rabies tracing, CRF-Cre transgenic models, CRFR1 reporter lines, and functional validation of synaptic connectivity and neurotransmitter release. The finding that CRF enhances CIN excitability and acetylcholine (ACh) release via CRFR1, and that this effect is attenuated by acute alcohol exposure and withdrawal, provides important mechanistic insight into how stress and alcohol interact to impair striatal function. These results position CRF signaling in CINs as a novel contributor to alcohol use disorder (AUD) pathophysiology, with implications for relapse vulnerability and cognitive inflexibility associated with chronic alcohol intake. The study is well-structured, with a clear rationale, thorough methodology, and logical progression of results. The discussion effectively contextualizes the findings within broader addiction neuroscience literature and suggests meaningful future directions, including therapeutic targeting of CRFR1 signaling in the dorsal striatum.

      Weaknesses:

      (1) Minor areas for improvement include occasional redundancy in phrasing, slightly overlong descriptions in the abstract and significance sections, and a need for more concise language in some places. Nevertheless, these do not detract from the manuscript's overall quality or impact. Overall, this is a highly valuable contribution to the fields of addiction neuroscience and striatal circuit function, offering novel insights into stress-alcohol interactions at the cellular and circuit level, which requires minor editorial revisions.

      We have streamlined the abstract and significance statement, reduced redundancy, and improved conciseness throughout the text. We appreciate the reviewer’s feedback, which has helped us further strengthen the clarity and readability of the manuscript.

      Reviewer #4 (Recommendations for the authors):

      (1) Line 29-30: Slightly verbose. Consider: "Alcohol relapse is associated with corticotropin-releasing factor (CRF) signaling and altered reward pathway function, though the precise mechanisms are unclear."

      The sentence has been revised as recommended to improve clarity and conciseness in the introductory section (Lines 31-32).

      (2) Lines 39-43: Good synthesis, but could better emphasize the novelty of identifying a CRF-CIN pathway.

      The abstract has been revised to more clearly emphasize the novelty of identifying a CRF-CIN pathway and its functional significance (Line 42-43).

      (3) Lines 66-68: Consider integrating clinical relevance more directly, e.g., "AUD affects over 14 million adults in the U.S., with relapse often triggered by stress...".

      The introduction has been revised to more directly emphasize the clinical relevance of alcohol use disorder, including its high prevalence and the role of stress in relapse, thereby underscoring the translational significance of our findings (Lines 68-69).

      (4) Line 83: Repetition of "goal-directed learning, habit formation, and behavioral flexibility" appears multiple times; consider variety.

      We have varied the phrasing in the Introduction to avoid redundancy. Specifically, in place of repeating “goal-directed learning, habit formation, and behavioral flexibility,” we now use alternative terms such as “action selection,” “habitual responding,” and “cognitive flexibility,” depending on the context.

      (5) Lines 107-116: Clarify why both rats and mice were used-do they serve different experimental purposes?

      We now explain that each species was used for complementary experimental purposes. Rats were used for histological validation of CRFR1 expression using the CRFR1-Cre-tdTomato line, which has been extensively characterized in this species. Mice were used for the majority of electrophysiological, optogenetic, and GRAB-ACh sensor experiments due to the availability of well-established transgenic CRF-Cre-driver lines. This division allowed us to leverage the most appropriate tools in each species to address different aspects of the study. We have clarified this rationale in the Methods (first paragraph of the “Animals” section) and Discussion (third paragraph).

      (6) Electrophysiology section: The distinction between acute exposure vs. withdrawal could be further emphasized.

      To better highlight the distinction between acute alcohol exposure and withdrawal, we have clarified the timing and context of each condition within the Results section for Figure 6. Specifically, we now distinguish the immediate suppressive effects of alcohol observed during bath application (acute exposure) from the subsequent changes in CIN firing measured after washout (withdrawal). These revisions clarify the temporal dynamics and functional implications of CRF–alcohol interactions in our experimental design.

      (7) Lines 227-229: Reword for clarity: "Significantly more BNST neurons projected to CINs compared to the CeA...".

      The sentence has been reworded to clarify as recommended (Lines 247-248).

      (8) Lines 373-374: Consider connecting the CRF-CIN circuit to behavioral inflexibility in AUD more directly.

      We have modified the sentence (Lines 390-395) to more explicitly link alcohol-induced dysregulation of the CRF–CIN circuit to behavioral inflexibility in AUD, consistent with the established role of CINs in action selection and cognitive flexibility.

      (9) Lines 387-389: This is an excellent point about stress resilience; consider expanding with examples or potential implications.

      We thank the reviewer for this insightful suggestion. In the revised Discussion (sixth paragraph), we expanded this section to more directly connect alcohol-induced disruption of CRF–CIN signaling with impaired stress resilience and behavioral inflexibility. Specifically, we now note that such dysregulation may compromise stress resilience mechanisms mediated by CRF–cholinergic interactions in the striatum and related corticostriatal circuits. We further discuss how impaired CIN responsiveness could blunt adaptive behavioral adjustments under stress, biasing animals toward habitual or compulsive alcohol seeking. This addition highlights the broader implication that alcohol-induced alterations in CRF–CIN signaling may contribute to relapse vulnerability by undermining adaptive stress coping.

      References

      English, D. F., O. Ibanez-Sandoval, E. Stark, F. Tecuapetla, G. Buzsaki, K. Deisseroth, J. M. Tepper and T. Koos (2011). "GABAergic circuits mediate the reinforcement-related signals of striatal cholinergic interneurons." Nat Neurosci 15(1): 123–130.

      Gangal, H., J. Iannucci, Y. Huang, R. Chen, W. Purvines, W. T. Davis, A. Rivera, G. Johnson, X. Xie, S. Mukherjee, V. Vierkant, K. Mims, K. O'Neill, X. Wang, L. A. Shapiro and J. Wang (2025). "Traumatic brain injury exacerbates alcohol consumption and neuroinflammation with decline in cognition and cholinergic activity." Transl Psychiatry 15(1): 403.

      Huang, Z., R. Chen, M. Ho, X. Xie, H. Gangal, X. Wang and J. Wang (2024). "Dynamic responses of striatal cholinergic interneurons control behavioral flexibility." Sci Adv 10(51): eadn2446.

      Lu, J. Y., Y. F. Cheng, X. Y. Xie, K. Woodson, J. Bonifacio, E. Disney, B. Barbee, X. H. Wang, M. Zaidi and J. Wang (2021). "Whole-Brain Mapping of Direct Inputs to Dopamine D1 and D2 Receptor-Expressing Medium Spiny Neurons in the Posterior Dorsomedial Striatum." Eneuro 8(1).

      Ma, T., Z. Huang, X. Xie, Y. Cheng, X. Zhuang, M. J. Childs, H. Gangal, X. Wang, L. N. Smith, R. J. Smith, Y. Zhou and J. Wang (2021). "Chronic alcohol drinking persistently suppresses thalamostriatal excitation of cholinergic neurons to impair cognitive flexibility." J Clin Invest 132(4): e154969.

      Potjer, E. V., X. Wu, A. N. Kane and J. G. Parker (2025). "Parkinsonian striatal acetylcholine dynamics are refractory to L-DOPA treatment." bioRxiv.

      Purvines, W., H. Gangal, X. Xie, J. Ramos, X. Wang, R. Miranda and J. Wang (2025). "Perinatal and prenatal alcohol exposure impairs striatal cholinergic function and cognitive flexibility in adult offspring." Neuropharmacology 279: 110627.

      Ren, Y., Y. Liu and M. Luo (2021). "Gap Junctions Between Striatal D1 Neurons and Cholinergic Interneurons." Front Cell Neurosci 15: 674399.

      Touponse, G. C., M. B. Pomrenze, T. Yassine, V. Mehta, N. Denomme, Z. Zhang, R. C. Malenka and N. Eshel (2025). "Cholinergic modulation of dopamine release drives effortful behavior." bioRxiv.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      This study presents an interesting behavioral paradigm and reveals interactive effects of social hierarchy and threat type on defensive behaviors. However, addressing the aforementioned points regarding methodological detail, rigor in behavioral classification, depth of result interpretation, and focus of the discussion is essential to strengthen the reliability and impact of the conclusions in a revised manuscript. 

      Strengths: 

      The paper is logically sound, featuring detailed classification and analysis of behaviors, with a focus on behavioral categories and transitions, thereby establishing a relatively robust research framework. 

      Weaknesses: 

      Several points require clarification or further revision. 

      (1) Methods and Terminology Regarding Social Hierarchy: 

      The study uses the tube test to determine subordinate status, but the methodological description is quite brief. Please provide a more detailed account of the experimental procedure and the criteria used for determination. 

      We will add more details about how the tube test was performed in the revised manuscript.

      The dominance hierarchy is established based on pairs of mice. However, the use of terms like "group cohesion" - typically applied to larger groups - to describe dyadic interactions seems overstated. Please revise the terminology to more accurately reflect the pairwise experimental setup.

      Thanks for the comment. We agree that the term “group cohesion” can be misleading and will replace it with “social engagement”.

      (2) Criteria and Validity of Behavioral Classification: 

      The criteria for classifying mouse behaviors (e.g., passive defense, active defense) are not sufficiently clear. Please explicitly state the operational definitions and distinguishing features for each behavioral category. 

      Passive defense was defined as an immobility-based defensive strategy characterized by suppression of locomotor activity. This category included freezing and tail rattling, which in our study involved minimal body displacement aside from rapid tail vibration. Active defense was defined as movement- or posture-dependent defensive strategy, encompassing behaviors that involved locomotor engagement or spatial repositioning relative to the threat, including approach, investigation, withdrawal, and stretch-attend. We will clarify this in the revised manuscript.

      How was the meaningfulness and distinctness of these behavioral categories ensured to avoid overlap? For instance, based on Figure 3E, is "active defense" synonymous with "investigative defense," involving movement to the near region followed by return to the far region? This requires clearer delineation. 

      Defensive behaviors in the rat exposure paradigm were grouped into two categories: passive and active defense, each comprising distinct behaviors. All the manually annotated behaviors were mutually exclusive; that is, each video frame was assigned a single behavioral label to avoid overlap across behaviors. Active defense includes four behaviors: approach, investigation, withdrawal, and stretch-attend. We will clarify these points in the revised manuscript.

      The current analysis focuses on a few core behaviors, while other recorded behaviors appear less relevant. Please clarify the principles for selecting or categorizing all recorded behaviors.

      Thank you for pointing this out. In the current study, we focused primarily on defensive and social behaviors. We also included several neutral solitary behaviors related to anxiety and defensive state, such as sniffing, grooming, and rearing, which were consistently expressed across animals and closely linked to our main findings. We will clarify this rationale in the revised manuscript.

      (3) Interpretation of Key Findings and Mechanistic Insights:

      Looming exposure increased the proportion of proactive bouts in the dominant zone but decreased it in the subordinate zone (Figure 4G), with a similar trend during rat exposure. Please provide a potential explanation for this consistent pattern. Does this consistency arise from shared neural mechanisms, or do different behavioral strategies converge to produce similar outputs under both threats?

      Thanks for bringing up this important question. The consistent increase in proactive bouts in dominant mice across both paradigms suggests a consistent rank-dependent reorganization of dyadic interaction under threats. We propose that this convergence reflects a shared neural mechanism that links defensive state with social-rank information, potentially mediated by overlapping hypothalamic and prefrontal circuits. We will expand the Discussion to incorporate this explanation.

      (4) Support for Claims and Study Limitations:

      The manuscript states that this work addresses a gap by showing defensive responses are jointly shaped by threat type and social rank, emphasizing survival-critical behaviors over fear or stress alone. However, it is possible that the behavioral differences stem from varying degrees of danger perception rather than purely strategic choices. This warrants a clear description and a deeper discussion to address this possibility.

      We thank the reviewer for this insightful comment. We agree that, in principle, behavioral differences could arise from variations in perceived danger rather than strategic choice. In humans, decisions can sometimes reflect value-based strategies that override perceived danger. In contrast, under naturalistic threat conditions, mice likely rely predominantly on danger perception to make behavioral decisions, and such responses are expected to be consistent with value-based strategies shaped by natural selection. In the revised manuscript, we will expand the Discussion to address the role of threat perception and its relationship to decision-making in our behavioral paradigms.

      The Discussion section proposes numerous brain regions potentially involved in fear and social regulation. As this is a behavioral study, the extensive speculation on specific neural circuitry involvement, without supporting neuroscience data, appears insufficiently grounded and somewhat vague. It is recommended to focus the discussion more on the implications of the behavioral findings themselves or to explicitly frame these neural hypotheses as directions for future research.

      We will revise the Discussion to focus more directly on behavioral findings and add explicit neural hypotheses as potential future directions.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate how dominance hierarchy shapes defensive strategies in mice under two naturalistic threats: a transient visual looming stimulus and a sustained live rat. By comparing single versus paired testing, they report that social presence attenuates fear and that dominant and subordinate mice exhibit different patterns of defensive and social behaviors depending on threat type. The work provides a rich behavioral dataset and a potentially useful framework for studying hierarchical modulation of innate fear.

      Strengths:

      (1) The study uses two ecologically meaningful threat paradigms, allowing comparison across transient and sustained threat contexts.

      (2) Behavioral quantification is detailed, with manual annotation of multiple behavior types and transition-matrix level analysis.

      (3) The comparison of dominant versus subordinate pairs is novel in the context of innate fear.

      (4) The manuscript is well-organized and clearly written.

      (5) Figures are visually informative and support major claims.

      Weaknesses:

      Lack of neural mechanism insights.

      The current study focused on behavior. In the revised manuscript, we will incorporate a discussion of potential neural mechanisms and highlight this as an important direction for future work.

      Reviewer #3 (Public review):

      Summary:

      This study examines how dominance hierarchy influences innate defensive behaviors in pair-housed male mice exposed to two types of naturalistic threats: a transient looming stimulus and a sustained live rat. The authors show that social presence reduces fear-related behaviors and promotes active defense, with dominant mice benefiting more prominently. They also demonstrate that threat exposure reinforces social roles and increases group cohesion. The work highlights the bidirectional interaction between social structure and defensive behavior.

      Strengths:

      This study makes a valuable contribution to behavioral neuroscience through its well-designed examination of socially modulated fear. A key strength is the use of two ethologically relevant threat paradigms - a transient looming stimulus and a sustained live predator, enabling a nuanced comparison of defensive behaviors. The experimental design is robust, systematically comparing animals tested alone versus with their cage mate to cleanly isolate social effects. The behavioral analysis is sophisticated, employing detailed transition maps that reveal how social context reshapes behavioral sequences, going beyond simple duration measurements. The finding that social modulation is rank-dependent adds significant depth, linking social hierarchy to adaptive defense strategies. Furthermore, the demonstration that threat exposure reciprocally enhances social cohesion provides a compelling systems-level perspective. Together, these elements establish a strong behavioral framework for future investigations into the neural circuits underlying socially modulated innate fear.

      Weaknesses:

      The study exhibits several limitations. The neural mechanism proposed is speculative, as the study provides no causal evidence.

      Establishing causal evidence for neural mechanisms is beyond the scope of the current behavioral study. We highlight this as an important direction for future work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates the control signals that drive event model updating during continuous experience. The authors apply predictions from previously published computational models to fMRI data acquired while participants watched naturalistic video stimuli. They first examine the time course of BOLD pattern changes around human-annotated event boundaries, revealing pattern changes preceding the boundary in anterior temporal and then parietal regions, followed by pattern stabilization across many regions. The authors then analyze time courses around boundaries generated by a model that updates event models based on prediction error and another that uses prediction uncertainty. These analyses reveal overlapping but partially distinct dynamics for each boundary type, suggesting that both signals may contribute to event segmentation processes in the brain.

      Strengths:

      (1) The question addressed by this paper is of high interest to researchers working on event cognition, perception, and memory. There has been considerable debate about what kinds of signals drive event boundaries, and this paper directly engages with that debate by comparing prediction error and prediction uncertainty as candidate control signals.

      (2) The authors use computational models that explain significant variance in human boundary judgments, and they report the variance explained clearly in the paper.

      (3) The authors' method of using computational models to generate predictions about when event model updating should occur is a valuable mechanistic alternative to methods like HMM or GSBS, which are data-driven.

      (4) The paper utilizes an analysis framework that characterizes how multivariate BOLD pattern dissimilarity evolves before and after boundaries. This approach offers an advance over previous work focused on just the boundary or post-boundary points.

      We appreciate this reviewer’s recognition of the significance of this research problem, and of the value of the approach taken by this paper.

      Weaknesses:

      (1) While the paper raises the possibility that both prediction error and uncertainty could serve as control signals, it does not offer a strong theoretical rationale for why the brain would benefit from multiple (empirically correlated) signals. What distinct advantages do these signals provide? This may be discussed in the authors' prior modeling work, but is left too implicit in this paper.

      We added a brief discussion in the introduction highlighting the complementary advantages of prediction error and prediction uncertainty, and cited prior theoretical work that elaborates on this point. Specifically, we now note that prediction error can act as a reactive trigger, signaling when the current event model is no longer sufficient (Zacks et al., 2007). In contrast, prediction uncertainty is framed as proactive, allowing the system to prepare for upcoming changes even before they occur (Baldwin & Kosie, 2021; Kuperberg, 2021). Together, this makes clearer why these two signals could each provide complementary benefits for effective event model updating.

      "One potential signal to control event model updating is prediction error—the difference between the system’s prediction and what actually occurs. A transient increase in prediction error is a valid indicator that the current model no longer adequately captures the current activity. Event Segmentation Theory (EST; Zacks et al., 2007) proposes that event models are updated when prediction error increases beyond a threshold, indicating that the current model no longer adequately captures ongoing activity. A related but computationally distinct proposal is that prediction uncertainty (also termed "unpredictability") can serve as a control signal (Baldwin & Kosie, 2021). The advantage of relying on prediction uncertainty to detect event boundaries is that it is inherently proactive: the cognitive system can start looking for cues about what might come next before the next event starts (Baldwin & Kosie, 2021; Kuperberg, 2021). "

      (2) Boundaries derived from prediction error and uncertainty are correlated for the naturalistic stimuli. This raises some concerns about how well their distinct contributions to brain activity can be separated. The authors should consider whether they can leverage timepoints where the models make different predictions to make a stronger case for brain regions that are responsive to one vs the other.

      We addressed this concern by adding an analysis that explicitly tests the unique contributions of prediction error– and prediction uncertainty–driven boundaries to neural pattern shifts. In the revised manuscript, we describe how we fit a combined FIR model that included both boundary types as predictors and then compared this model against versions with only one predictor. This allowed us to identify the variance explained by each boundary type over and above the other. The results revealed two partially dissociable sets of brain regions sensitive to error- versus uncertainty-driven boundaries (see Figure S1), strengthening our argument that these signals make distinct contributions.

      "To account for the correlation between uncertainty-driven boundaries and error-driven boundaries, we also fitted a FIR model that predicted pattern dissimilarity from both types of boundaries (combined FIR) for each parcel. Then, we performed two likelihood ratio tests: combined FIR to error FIR, which measures the unique contribution of uncertainty boundaries to pattern dissimilarity, and combined FIR to uncertainty FIR, which measures the unique contribution of error boundaries to pattern dissimilarity. The analysis also revealed two dissociable sets of brain regions associated with each boundary type (see Figure S1)."

      (3) The authors refer to a baseline measure of pattern dissimilarity, which their dissimilarity measure of interest is relative to, but it's not clear how this baseline is computed. Since the interpretation of increases or decreases in dissimilarity depends on this reference point, more clarity is needed.

      We clarified how the FIR baseline is estimated in the methods section. Specifically, we now explain that the FIR coefficients should be interpreted relative to a reference level, which reflects the expected dissimilarity when timepoints are far from an event boundary. This makes it clear what serves as the comparison point for observed increases or decreases in dissimilarity.

      "The coefficients from the FIR model indicate changes relative to baseline, which can be conceptualized as the expected value when far from event boundaries."

      (4) The authors report an average event length of ~20 seconds, and they also look at +20 and -20 seconds around each event boundary. Thus, it's unclear how often pre- and post-boundary timepoints are part of adjacent events. This complicates the interpretations of the reported time courses.

      This is related to reviewer's 2 comment, and it will be addressed below.

      (5) The authors describe a sequence of neural pattern shifts during each type of boundary, but offer little setup of what pattern shifts we might expect or why. They also offer little discussion of what cognitive processes these shifts might reflect. The paper would benefit from a more thorough setup for the neural results and a discussion that comments on how the results inform our understanding of what these brain regions contribute to event models.

      We thank the reviewer for this advice on how better to set the context for the different potential outcomes of the study. We expanded both the introduction and discussion to better set up expectations for neural pattern shifts and to interpret what these shifts may reflect. In the introduction, we now describe prior findings showing that sensory regions tend to update more quickly than higher-order multimodal regions (Baldassano et al., 2017; Geerligs et al., 2021, 2022), and we highlight that it remains unclear whether higher-order updates precede or follow those in lower-order regions. We also note that our analytic approach is well-suited to address this open question. In the discussion, we then interpret our results in light of this framework. Specifically, we describe how we observed early shifts in higher-order areas such as anterior temporal and prefrontal cortex, followed by shifts in parietal and dorsal attention regions closer to event boundaries. This pattern runs counter to the traditional bottom-up temporal hierarchy view and instead supports a model of top-down updating, where high-level representations are updated first and subsequently influence lower-level processing (Friston, 2005; Kuperberg, 2021). To make this interpretation concrete, we added an example: in a narrative where a goal is reached midway—for instance, a mystery solved before the story formally ends—higher-order regions may update the event representation at that point, and this updated model then cascades down to shape processing in lower-level regions. Finally, we note that the widespread stabilization of neural patterns after boundaries may signal the establishment of a new event model.

      Excerpt from Introduction:

      “More recently, multivariate approaches have provided insights into neural representations during event segmentation. One prominent approach uses hidden Markov models (HMMs) to detect moments when the brain switches from one stable activity pattern to another (Baldassano et al., 2017) during movie viewing; these periods of relative stability were referred to as "neural states" to distinguish them from subjectively perceived events. Sensory regions like visual and auditory cortex showed faster transitions between neural states. Multi-modal regions like the posterior medial cortex, angular gyrus, and intraparietal sulcus showed slower neural state shifts, and these shifts aligned with subjectively reported event boundaries. Geerligs et al. (2021, 2022) employed a different analytical approach called Greedy State Boundary Search (GSBS) to identify neural state boundaries. Their findings echoed the HMM results: short-lived neural states were observed in early sensory areas (visual, auditory, and somatosensory cortex), while longer-lasting states appeared in multi-modal regions, including the angular gyrus, posterior middle/inferior temporal cortex, precuneus, anterior temporal pole, and anterior insula. Particularly prolonged states were found in higher-order regions such as lateral and medial prefrontal cortex.

      The previous evidence about evoked responses at event boundaries indicates that these are dynamic phenomena evolving over many seconds, with different brain areas showing different dynamics (Ben-Yakov & Henson, 2018; Burunat et al., 2024; Kurby & Zacks, 2018; Speer et al., 2007; Zacks, 2010). Less is known about the dynamics of pattern shifts at event boundaries (e.g. whether shifts observed in higher-order regions precedes or follow shifts observed in lower-level regions), because the HMM and GSBS analysis methods do not directly provide moment-by-moment measures of pattern shifts. Both the spatial and temporal aspects of evoked responses and pattern shifts at event boundaries have the potential to provide evidence about two potential control processes (error-driven and uncertainty-driven) for event model updating.”

      Excerpt from Discussion:

      “We first characterized the neural signatures of human event segmentation by examining both univariate activity changes and multivariate pattern changes around subjectively identified event boundaries. Using multivariate pattern dissimilarity, we observed a structured progression of neural reconfiguration surrounding human-identified event boundaries. The largest pattern shifts were observed near event boundaries (~4.5s before) in dorsal attention and parietal regions; these correspond with regions identified by Geerligs et. al as shifting their patterns on a fast to intermediate timescale (2022). We also observed smaller pattern shifts roughly 12 seconds prior to event boundaries in higher-order regions within anterior temporal cortex and prefrontal cortex, and these are slow-changing regions identified by Geerligs et. al (2022). This is puzzling. One prevalent proposal, based on the idea of a cortical hierarchy of increasing temporal receptive windows (TRWs), suggests that higher-order regions should update representations after lower-order regions do (Chang et al., 2021). In this view, areas with shorter TRWs (e.g., word-level processors) pass information upward, where it is integrated into progressively larger narrative units (phrases, sentences, events). This proposal predicts neural shifts in higher-order regions to follow those in lower-order regions. By contrast, our findings indicate the opposite sequence. Our findings suggest that the brain might engage in top-down event representation updating, with changes in coarser-grain representations propagating downward to influence finer-grain representations. (Friston, 2005; Kuperberg, 2021). For example, in a narrative where the main goal is achieved midway—such as a detective solving a mystery before the story formally ends—higher-order regions might update the overarching event representation at that point, and this updated model could then cascade down to reconfigure how lower-level regions process the remaining sensory and contextual details. In the period after a boundary (around +12 seconds), we found widespread stabilization of neural patterns across the brain, suggesting the establishment of a new event model. Future work could focus on understanding the mechanisms behind the temporal progression of neural pattern changes around event boundaries.”

      Reviewer #2 (Public review):

      Summary:

      Tan et al. examined how multivoxel patterns shift in time windows surrounding event boundaries caused by both prediction errors and prediction uncertainty. They observed that some regions of the brain show earlier pattern shifts than others, followed by periods of increased stability. The authors combine their recent computational model to estimate event boundaries that are based on prediction error vs. uncertainty and use this to examine the moment-to-moment dynamics of pattern changes. I believe this is a meaningful contribution that will be of interest to memory, attention, and complex cognition research.

      Strengths:

      The authors have shown exceptional transparency in terms of sharing their data, code, and stimuli, which is beneficial to the field for future examinations and to the reproduction of findings. The manuscript is well written with clear figures. The study starts from a strong theoretical background to understand how the brain represents events and has used a well-curated set of stimuli. Overall, the authors extend the event segmentation theory beyond prediction error to include prediction uncertainty, which is an important theoretical shift that has implications in episodic memory encoding, the use of semantic and schematic knowledge, and attentional processing.

      We thank the reader for their support for our use of open science practices, and for their appreciation of the importance of incorporating prediction uncertainty into models of event comprehension.

      Weaknesses:

      The data presented is limited to the cortex, and subcortical contributions would be interesting to explore. Further, the temporal window around event boundaries of 20 seconds is approximately the length of the average event (21.4 seconds), and many of the observed pattern effects occur relatively distal from event boundaries themselves, which makes the link to the theoretical background challenging. Finally, while multivariate pattern shifts were examined at event boundaries related to either prediction error or prediction uncertainty, there was no exploration of univariate activity differences between these two different types of boundaries, which would be valuable.

      The fact that we observed neural pattern shifts well before boundaries was indeed unexpected, and we now offer a more extensive interpretation in the discussion section. Specifically, we added text noting that shifts emerged in higher-order anterior temporal and prefrontal regions roughly 12 seconds before boundaries, whereas shifts occurred in lower-level dorsal attention and parietal regions closer to boundaries. This sequence contrasts with the traditional bottom-up temporal hierarchy view and instead suggests a possible top-down updating mechanism, in which higher-order representations reorganize first and propagate changes to lower-level areas (Friston, 2005; Kuperberg, 2021). (See excerpt for Reviewer 1’s comment #5.)

      With respect to univariate activity, we did not find strong differences between error-driven and uncertainty-driven boundaries. This makes the multivariate analyses particularly informative for detecting differences in neural pattern dynamics. To support further exploration, we have also shared the temporal progression of univariate BOLD responses on OpenNeuro (BOLD_coefficients_brain_animation_pe_SEM_bold.html and BOLD_coefficients_brain_animation_uncertainty_SEM_bold.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      Reviewer #3 (Public review):

      Summary:

      The aim of this study was to investigate the temporal progression of the neural response to event boundaries in relation to uncertainty and error. Specifically, the authors asked (1) how neural activity changes before and after event boundaries, (2) if uncertainty and error both contribute to explaining the occurrence of event boundaries, and (3) if uncertainty and error have unique contributions to explaining the temporal progression of neural activity.

      Strengths:

      One strength of this paper is that it builds on an already validated computational model. It relies on straightforward and interpretable analysis techniques to answer the main question, with a smart combination of pattern similarity metrics and FIR. This combination of methods may also be an inspiration to other researchers in the field working on similar questions. The paper is well written and easy to follow. The paper convincingly shows that (1) there is a temporal progression of neural activity change before and after an event boundary, and (2) event boundaries are predicted best by the combination of uncertainty and error signals.

      We thank the reviewer for their thoughtful and supportive comments, particularly regarding the use of the computational model and the analysis approaches.

      Weaknesses:

      (1) The current analysis of the neural data does not convincingly show that uncertainty and prediction error both contribute to the neural responses. As both terms are modelled in separate FIR models, it may be that the responses we see for both are mostly driven by shared variance. Given that the correlation between the two is very high (r=0.49), this seems likely. The strong overlap in the neural responses elicited by both, as shown in Figure 6, also suggests that what we see may mainly be shared variance. To improve the interpretability of these effects, I think it is essential to know whether uncertainty and error explain similar or unique parts of the variance. The observation that they have distinct temporal profiles is suggestive of some dissociation,but not as convincing as adding them both to a single model.

      We appreciate this point. It is closely related to Reviewer 1's comment 2; please refer to our response above.

      (2) The results for uncertainty and error show that uncertainty has strong effects before or at boundary onset, while error is related to more stabilization after boundary onset. This makes me wonder about the temporal contribution of each of these. Could it be the case that increases in uncertainty are early indicators of a boundary, and errors tend to occur later?

      We also share the intuition that increases in uncertainty are early indicators of a boundary, and errors tend to occur later. If that is the case, we would expect some lags between prediction uncertainty and prediction error. We examined lagged correlation between prediction uncertainty and prediction error, and the optimal lag is 0 for both uncertainty-driven and error-driven models. This indicates that when prediction uncertainty rises, prediction error also simultaneously rises.

      Author response image 1.

      (3) Given that there is a 24-second period during which the neural responses are shaped by event boundaries, it would be important to know more about the average distance between boundaries and the variability of this distance. This will help establish whether the FIR model can properly capture a return to baseline.

      We have added details about the distribution of event lengths. Specifically, we now report that the mean length of subjectively identified events was 21.4 seconds (median 22.2 s, SD 16.1 s). For model-derived boundaries, the average event lengths were 28.96 seconds for the uncertainty-driven model and 24.7 seconds for the error-driven model.

      " For each activity, a separate group of 30 participants had previously segmented each movie to identify fine-grained event boundaries (Bezdek et al., 2022). The mean event length was 21.4 s (median 22.2 s, SD 16.1 s). Mean event lengths for uncertainty-driven model and error-driven model were 28.96s, and 24.7s, respectively (Nguyen et al., 2024)."

      (4) Given that there is an early onset and long-lasting response of the brain to these event boundaries, I wonder what causes this. Is it the case that uncertainty or errors already increase at 12 seconds before the boundaries occur? Or if there are other makers in the movie that the brain can use to foreshadow an event boundary? And if uncertainty or errors do increase already 12 seconds before an event boundary, do you see a similar neural response at moments with similar levels of error or uncertainty, which are not followed by a boundary? This would reveal whether the neural activity patterns are specific to event boundaries or whether these are general markers of error and uncertainty.

      We appreciate this point; it is similar to reviewer 2’s comment 2. Please see our response to that comment above.

      (5) It is known that different brain regions have different delays of their BOLD response. Could these delays contribute to the propagation of the neural activity across different brain areas in this study?

      Our analyses use ±20 s FIR windows, and the key effects we report include shifts ~12s before boundaries in higher-order cortex and ~4.5s pre-boundary in dorsal attention/parietal areas. Given the literature above, region-dependent BOLD delays are much smaller (~1–2s) than the temporal structure we observe (Taylor et al., 2018), making it unlikely that HRF lag alone explains our multi-second, region-specific progression.

      (6) In the FIR plots, timepoints -12, 0, and 12 are shown. These long intervals preclude an understanding of the full temporal progression of these effects.

      For page length purposes, we did not include all timepoints. We uploaded a brain animation of all timepoints and coefficients for each parcel in Openneuro (PATTERN_coefficients_brain_animation_human_fine_pattern.html and PATTERN_coefficients_lines_human_fine.html in the derivatives/figures/brain_maps_and_timecourses/ directory; https://doi.org/10.18112/openneuro.ds005551.v1.0.4) for interested researchers.

      References

      Taylor, A. J., Kim, J. H., & Ress, D. (2018). Characterization of the hemodynamic response function across the majority of human cerebral cortex. NeuroImage, 173, 322–331. https://doi.org/10.1016/j.neuroimage.2018.02.061

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer 1

      Minor

      The main substance of my previous comment I suppose targeted a deeper issue - namely whether such a result is reflecting a resolution to a 'neural prediction' puzzle or a 'perceptual prediction' puzzle. Of course, these results tell us a great deal about a potential resolution for how dampening and sharpening might co-exist in the brain - but in the absence of corresponding perceptual effects (or a lack of correlation between neural and perceptual variables - as outlined in this revision) I do wonder if any claims about implications for perception might need moderation or caveating. To be honest, I don't think the authors *need* to make any more changes along these lines for this paper to be acceptable - it is more an issue they might wish to consider themselves when contextualizing their findings.

      Thank you for the thoughtful comment. We have now added a caveat to the relevant section of the discussion to make it clearer that we are discussing neural results, not perceptual results (p.20, lines 378-379).

      I am also happy with the changes that the authors have made justifying which claims can and cannot made based on a statistical decoding test against 'chance' in a single condition using t-tests. I was perhaps a little unclear when I spoke about 'comparisons against 0' in my original review, when the key issue (as the authors have intuited!) is about comparisons against 'chance' (where e.g., 0% decoding above chance is the same thing as 'chance'!). The authors are of course correct in the amendment they have made on p.29 to make clear this is a 'fixed effects analysis' - though I still worry this could be a little cryptic for the average reader. I am not suggesting that the authors run more analyses, or revise any conclusions, but I think it would be more transparent if a note was added along the lines of "while the fixed effects approach (one-sample t-test) enables us to establish whether some consistent informative patterns are detectable in these particular subjects, the results from our paired t-tests support inference to the wider population".

      This sentence has been added for increased transparency (p. 27, lines 544-547).

      Reviewer 3

      Major

      (1) In the previous round of comments, I noted that: "I am not fully convinced that Figures 3A/B and the associated results support the idea that early learning stages result in dampening and later stages in sharpening. The inference made requires, in my opinion, not only a significant effect in one-time bin and the absence of an effect in other bins. Instead to reliably make this inference one would need a contrast showing a difference in decoding accuracy between bins, or ideally an analysis not contingent on seemingly arbitrary binning of data, but a decrease (or increase) in the slope of the decoding accuracy across trials. Moreover, the decoding analyses seem to be at the edge of SNR, hence making any interpretation that depends on the absence of an effect in some bins yet more problematic and implausible". The authors responded: "we fitted a logarithmic model to quantify the change of the decoding benefit over trials, then found the trial index for which the change of the logarithmic fit was < 0.1%. Given the results of this analysis and to ensure a sufficient number of trials, we focused our further analyses on bins 1-2". However, I do not see how this new analysis addresses the concern that the conclusion highlights differences in decoding performance between bins 1 and 2, yet no contrast between these bins are performed. While I appreciate the addition of the new model, in my current understanding it does not solve the problem I raised. I still believe that if the authors wish to conclude that an effect differs between two bins they must contrast these directly and/or use a different appropriate analysis approach.

      Relatedly, the logarithmic model fitting and how it justifies the focus on analysis bin 1-2 needs to be explained better, especially the rationale of the analysis, the choice of parameters (e.g., why logarithmic, why change of logarithmic fit < 0.1% as criterion, etc), and why certain inferences follow from this analysis. Also, the reporting of the associated results seems rather sparse in the current iteration of the manuscript.

      We thank the reviewer for this important point. Following your suggestion, we conducted additional post-hoc tests directly comparing the first and second bins. We found significant differences between bins in the invalid trials, but not the valid trials, suggesting that sharpening/dampening effects are condition specific. This is discussed in the manuscript on p.14, lines 268-271; p.15, 280-284; p.20, lines 382-386.

      A logarithmic analysis was chosen as learning is usually found to be a nonlinear process; learning effects occur rapidly before stabilising relatively early, as seen in Fig. 2D. This is consistent with other research which found that logarithmic fits efficiently describe learning curves in statistical learning (Kang et al., 2023; Siegelman et al., 2018; Choi et al., 2020). By utilising a change of logarithmic fit at <0.1% as a criterion, it is ensured that virtually zero learning took place after that point, allowing us to focus our analysis on learning effects as they developed and providing a more accurate model of representational change. This is explained in the manuscript on p.13, lines 250-251; p.27-28, lines 557-563.

      (2) A critical point the authors raise is that they investigate the buildup of expectations during training. They go on to show that the dampening effect disappears quickly, concluding: "the decoding benefit of invalid predictions [...] disappeared after approximately 15 minutes (or 50 trials per condition)". Maybe the authors can correct me, but my best understanding is as follows: Each bin has 50 trials per condition. The 2:1 condition has 4 leading images, this would mean ~12 trials per leading stimulus, 25% of which are unexpected, so ~9 expected trials per pair. Bin 1 represents the first time the participants see the associations. Therefore, the conclusion is that participants learn the associations so rapidly that ~9 expected trials per pair suffice to not only learn the expectations (in a probabilistic context) but learn them sufficiently well such that they result in a significant decoding difference in that same bin. If so, this would seem surprisingly fast, given that participants learn by means of incidental statistical learning (i.e. they were not informed about the statistical regularities). I acknowledge that we do not know how quickly the dampening/sharpening effects develop, however surprising results should be accompanied with a critical evaluation and exceptionally strong evidence (see point 1). Consider for example the following alternative account to explain these results. Category pairs were fixed across and within participants,i.e. the same leading image categories always predicted the same trailing image categories for all participants. Some category pairings will necessarily result in a larger representational overlap (i.e., visual similarity, etc.) and hence differences in decoding accuracy due to adaptation and related effects. For example, house  barn will result in a different decoding performance compared to coffee cup  barn, simply due to the larger visual and semantic similarity between house and barn compared to coffee cup and barn. These effects should occur upon first stimulus presentation, independent of statistical learning, and may attenuate over time e.g., due to increasing familiarity with the categories (i.e., an overall attenuation leading to smaller between condition differences) or pairs.

      We apologise for the confusion, there are 50 expected trials per bin per condition. The trial breakdown is as follows. Each participant completed 1728 trials, split equally across 3 mappings (two 2:1 maps and one 1:2 map), giving 1152 trials in the 2:1 mapping. Stimuli were expected in 75% of trials (864), leaving 216 per bin, and 54 per leading image in each bin. We have clarified this in the script (p.14, line 267; p.15, line 280). This is in line with similar studies in the field (e.g. Han et al., 2019).

      (3) In response to my previous comment, why the authors think their study may have found different results compared to multiple previous studies (e.g. Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011), particularly the sharpening to dampening switch, the authors emphasize the use of non-repeated stimuli (no repetition suppression and no familiarity confound) in their design. However, I fail to see how familiarity or RS could account for the absence of

      sharpening/dampening inversion in previous studies.

      First, if the authors argument is about stimulus novelty and familiarity as described by Feuerriegel et al., 2021, I believe this point does not apply to the cited studies. Feuerriegel et al., 2021 note: "Relative stimulus novelty can be an important confound in situations where expected stimulus identities are presented often within an experiment, but neutral or surprising stimuli are presented only rarely", which indeed is a critical confound. However, none of the studies (Han et al., 2019; Richter et al., 2018; Kumar et al., 2017; Meyer and Olson, 2011) contained this confound, because all stimuli served as expected and unexpected stimuli, with the expectation status solely determined by the preceding cue. Thus, participants were equally familiar with the images across expectation conditions.

      Second, for a similar reason the authors argument for RS accounting for the different results does not hold either in my opinion. Again, as Feuerriegel et al. 2021 correctly point out: "Adaptation-related effects can mimic ES when the expected stimuli are a repetition of the last-seen stimulus or have been encountered more recently than stimuli in neutral expectation conditions." However, it is critical to consider the precise design of previous studies. Taking again the example of Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011. To my knowledge none of these studies contained manipulations that would result in a more frequent or recent repetition of any specific stimulus in the expected compared to unexpected condition. The crucial manipulation in all these previous studies is not that a single stimulus or stimulus feature (which could be subject to familiarity or RS) determines the expectation status, but rather the transitional probability (i.e. cue-stimulus pairing) of a particular stimulus given the cue. Therefore, unless I am missing something critical, simple RS seems unlikely to differ between expectation condition in the previous studies and hence seems implausible to account for differences in results compared to the current study.

      Moreover, studies cited by the authors (e.g. Todorovic & de Lange, 2012) showed that RS and ES are separable in time, again making me wonder how avoiding stimulus repetition should account for the difference in the present study compared to previous ones. I am happy to be corrected in my understanding, but with the currently provided arguments by the authors I do not see how RS and familiarity can account for the discrepancy in results.

      The reviewer is correct in that the studies cited (Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011) ensure that participants are equally familiar with the images across expectation conditions. Where the present study differs is that participants are not familiar with individual exemplars at all. Han et al., 2019 used a pool of 30 individual images, and subjects underwent exposure sessions lasting two hours each daily for 34 days prior to testing. Kumar et al., 2017 used a pool of 12 images with subjects being exposed to each sequential pair 816 times over the course of the training period. Meyer & Olsen, 2011 used pure tones at five different pitch levels. While familiarity of stimuli across conditions was controlled for in these studies in the sense that familiarity was constant across conditions, novelty was not controlled for. The present study uses a pool of ~3500 images, which are unrepeated across trials.

      Feuerriegel et al., 2021 also points out: “There are also effects of adaptation that are dependent on the recent stimulation history extending beyond the last encountered stimulus and long-lag repetition effects that occur when the first and second presentation of a stimulus is separated by tens or even hundreds of intervening images”. Bearing this in mind, and given the very small pool of stimuli being used by Han et al., 2019; Kumar et al., 2017; Meyer and Olson, 2011, it stands to reason that these studies may still have built-in but unaccounted for effects relating to the repetition of exemplars. Thus, our avoidance of those possible confounds, in addition to foregoing any prior training, may elicit differing results. Furthermore, as pointed out by Walsh et al. 2020, methodological heterogeneity (such as subject training) can produce contrasting results as PP makes divergent predictions regarding the properties of prediction error given different permutations of variables such as training, transitional probabilities, and conditional probabilities. In our case, the use of differing methodology was intentional. These issues have been discussed in more detail on p.5, lines 112-115; p.19, lines 368-377; p.20, lines 378-379).

      Minor

      (1) The authors note in their reply to my previous questions that: "As mentioned above, we opted to target our ERP analyses on Oz due to controversies in the literature regarding univariate effects of ES (Feuerriegel et al., 2021)". This might be a lack of understanding on my side, but how are concerns about the reliability of ES, as outlined by Feuerriegel et al. (2021), an argument for restricting analyses to 1 EEG channel (Oz)? Could one not argue equally well that precisely because of these concerns we should be less selective and instead average across multiple (occipital) channels to improve the reliability of results?

      The reviewer is correct in suggesting that a cluster of occipital electrodes may be more reliable than reporting one single electrode. We have amended the analysis to examine electrodes Oz, O1, and O2 (p.9, lines 187-188; p.11, lines 197-201).

      (2) The authors provide a github link for the dataset and code. However, I doubt that github is a suitable location to share EEG data (which at present I also cannot find linked in the github repo). Do the authors plan to share the EEG data and if so where?

      Thank you for bringing this to my attention. EEG data has now been uploaded at osf.io/x7ydf and linked to the github repository (p.28, lines 569-570).

      (3) The figure text could benefit from additional information; e.g. Fig.1C and Fig.3 do not clarify what the asterisk indicates; p < ? with or without multiple comparison correction?

      Thank you for pointing out this oversight, the figure texts have been amended (p. 9, line 168; p.16, line 289).

  3. Jan 2026
    1. Author response:

      We thank all reviewers for their comments. We appreciate the acknowledgement that the paper is important and that results support the major conclusions. We are planning to address the specific concerns as noted by the reviewers in the following way:

      Public Reviews:

      Reviewer #2 (Public review):

      (1) The authors generate a new tool, a Gal4 knock-in of the jam2b locus, to track EGFP-expressing cells over time and follow the developmental trajectory of jam2b-expressing cells. Figure 1 characterizes the line. However, it lacks quantification, e.g., how many etv2-expressing cells also show EGFP expression or the contribution of EGFP-expressing cells to different types of blood vessels. This type of quantification would be useful, as it would also allow for comparison of their findings to their previous data examining the contribution of SVF cells to different types of blood vessels. All the authors state that at 30 hpf, EGFP-expressing cells can be seen in the vasculature (apparently the PCV).

      It is not clear why the authors do not use a nuclear marker for both ECs (as they did in their previous publication) and for jam2b-expressing cells. UAS:nEGFP and UAS:NLS-mcherry (e.g. pt424tg) transgenic lines are available. This would circumvent the problem the authors encounter with the strong fluorescence visible in the yolk extension. It would also facilitate quantifying the contribution of jam2b cells to different types of blood vessels.

      We agree with the importance of quantification. We had performed quantification of jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP contribution to different vascular beds, which was shown in Suppl. Fig. S3. We will clarify this in the revision. We also agree that nuclear GFP or mCherry would help to visualize and quantify cells. Unfortunately, we do not have nuclear UAS:GFP or UAS:mCherry line in our possession, and it will take too long to import it for the standard revision timeline. We are working on the construct, and will attempt to establish the line; therefore we are hoping to clarify these results with the nuclear line in the revised manuscript.

      (2) The time-lapse movie in Figure 2 is not very informative, as it just provides a single example of a dividing cell contributing to the PCV. Also, quantifications are needed. As SVF cells appear to expand significantly after their initial specification, it would be informative to know how many cell divisions and which types of blood vessels jam2b-expressing cells contribute to. Can the authors observe cells that give rise to different types of blood vessels? Jam2b expression in LPM cells apparently precedes expression of etv2. Is etv2 needed for maintenance, or do Jam2b-expressing cells contribute to different types of tissues in etv2 mutant embryos? Comparing time-lapse analysis in wildtype and etv2 mutant embryos would address this question.

      The time-lapse was meant to serve as an illustration and confirmation of jam2b cell contribution to vasculature. As noted above, Suppl. Fig. S3 provides quantification of jam2b cell contribution to different vascular beds. We had previously performed detailed time-lapse analysis and quantification of SVF cell migration to PCV, SIA and SIV using etv2-2A-Venus line (Metikala et al 2022, Dev Cell), which has some of the same (or similar) information. It is very challenging to obtain this data using jam2b reporter line due to extensive and bright GFP expression in the mesothelial layer over the yolk and yolk extension; for that reason we can only trace some GFP cells but not all of them. Regarding etv2 requirement for jam2b maintenance, we intend to address this question by analyzing jam2b cell contribution in etv2 MO injected embryos, which recapitulates the phenotype in jam2b mutants.

      (3) In Figure 3, the authors generate UAS:Cre and UAS:Cre-ERT2 transgenic lines to lineage trace the jam2b-expressing cells. It is again not clear why the authors do not use a responder line containing nuclear-localized fluorescent proteins to circumvent the strong expression of fluorescent proteins in the yolk extension. It is also unclear why the two transgenic lines give very different results regarding the number of cells being labelled. The ERT2 fusions label around 3 cells in the SIA, while the Cre line labels only about 1.5 cells per embryo, with very little contribution of labelled cells to other blood vessels. One would expect the Cre line requiring tamoxifen induction to label fewer cells when compared to the constitutive Cre line. What is the reason for this discrepancy? Are the lines single integration? Is there silencing? This needs to be better characterized, also regarding the reproducibility of the experiments. If the Cre lines were to be multiple copy integrations, outcrossing the line might lead to lower expression levels in future generations. 

      It is also not clear how the authors conclude from these findings that "SVF cells show major contribution to the SIA and SIV" when only 1.5 or 3 cells of the SIA are labelled, with even fewer cells labelled in other blood vessels. They speculate that this might be due to low recombination efficiency, a question they then set out to answer using photoconversion of etv2:KAEDE expressing cells, an experiment that they also performed in their 2014 and 2022 publications. To check for low recombination efficiency, the authors could examine the expression of Cre mRNA in their transgenic embryos. Do many more jam2b expressing cells express Cre mRNA than they observe in their switch lines? They could also compare their experiments using Cre recombinase with those using EGFP expression in jam2b cells. EGFP is relatively stable, and the time frames the authors analyze are short. As no quantification of EGFP-expressing cells is provided in Figure 1, this comparison is currently not possible. Do these two different approaches answer different questions here? 

      The reviewer brings up important points, we appreciate that. Unfortunately, we do not have a nuclear switch line in our possession, and it is not possible to obtain it in the normal manuscript revision time line. Regarding UAS:Cre and UAS:CreERT2 lines, they both show rather similar labeling, with most labeled cells present in the SIA. The difference in cell number (1.5 versus 3) is likely due to different levels of Cre expression, which may vary dependent on the integration site. The lines most likely are multi-copy integrations, which can be helpful, as this would result in higher Cre expression. We will address the silencing question by performing in situ hybridization or HCR analysis for Cre or CreERT2 and comparing it with endogenous jam2b expression, as the reviewer suggested. We have noticed that the switch line used, actb2:loxP-BFP-loxP-dsRed, exhibits lower recombination frequency compared to other switch lines (we used it because it was compatible with endothelial fli1:GFP line). We will attempt to answer this question by crossing to other switch lines, which may exhibit higher recombination frequency. In principle, UAS:GFP and switch lines should produce a similar result, except that GFP decays over time and therefore our initial expectation was that switch lines may produce a more accurate result. However, this may not be the case due to low recombination efficiency, which we will attempt to address in the revision.

      (4) Concerning the etv2:KAEDE photoconversion experiments: The percentages the authors report for SVF cells' contribution to the SIV and SIA differ from their previous study (Dev Cell, 2022). In that publication, SVF cells contributed 28% to the SIA and 48% to the SIV. In the present study, the numbers are close to 80% for both vessels. The difference is that the previous study analyzed 2dpf old embryos and the new one 4dpf old embryos. Do SVF-derived cells proliferate more than PCV-derived cells, or is there another explanation for this change in percentage contribution? 

      These numbers refer to different experiments; we apologize for the confusion. As reported earlier in Metikala et 2022, 28% of SVF cells contributed to the SIA and 48% to the SIV by 3 dpf (not 2 dpf; only PCV analysis was done at 2 dpf); SIA and SIV analysis was done based on time-lapse image analysis of etv2-2A-Venus line at 3 dpf, shown in Fig. 3C in Metikala et al. However, this only refers to SVF cell contribution. It does not mean that 28% or 48% cells in SIA or SIV are derived from SVF. The total fraction of SIA and SIV cells that are derived from SVF has not been quantified in the previous study, because that would require accurate tracking of all SVF cells, which is experimentally challenging. Etv2:Kaede experiment is slighly different, because it reports newly formed cells after 24 hpf. It cannot tell if new cells are all derived from SVF cells, although we are not aware of any other source of new endothelial cells at these stages. In the previous study by Metikala et al 2022, we reported ~22 newly formed SIA and ~50 newly formed cells in SIV by 3 dpf (Fig. 1 in Metikala et al 2022), although the entire number of cells was not quantified, therefore the percentage was not known. In the current study, we attempted to estimate the entire percentage of green only Kaede cells, which was close to 80% in both SIA or SIV at 4 dpf. Please note that this estimate was performed in the posterior portion of SIA and SIV that overlies the yolk extension and where SVF cells are observed. We did not quantify cells in the anterior SIV portion, which forms the basket over the yolk.

      (5) Single-cell sequencing data: Why do the authors not show jam2b expression in their single-cell sequencing data? They sorted for (presumably) jam2b-expressing cells and hypothesize that jam2b expression in ECs at this time point is important for the generation of intestinal vasculature. Do ECs in cluster 15 express jam2b? Why are no other top marker genes (tal1, etv2, egfl7, npas4l) included in the dot blot in Figure 5b?

      We appreciate the suggestion and will include additional marker genes as well as jam2b in the revised version of the manuscript.

      (6) Concerns about cell autonomy of mutant phenotypes: The authors need to perform in situ hybridization to characterize jam2a expression. Can it be seen in SVF cells? The double mutants show a clear phenotype in intestinal vessel development; however, it is unclear whether this is due to a cell-autonomous function of jam2a/b within SVF cells. The authors need to address this issue, as jam2b and potentially also jam2a are expressed within the tissue surrounding the forming SVF. For instance, do transplanted mutant cells contribute to the intestinal vasculature to the same extent as wild-type cells do?

      jam2a expression has been characterized in the previous studies and it is shown in the Suppl. Fig. S4E. It is primarily enriched in the skeletal muscle. However, our single-cell RNA-seq analysis shows that SVF cells also express jam2a. We will include additional data on jam2a expression in the revised manuscript. We agree that transplation to address cell autonomy is an important experiment, yet there are some practical challenges to it. Jam2a,jam2b mutant phenotype is only partially penetrant, and about 50% reduction in SVF cell number, as well as partial SIA and SIV phenotypes are observed. Only a small number of transplanted cells may contribute to intestinal vasculature, therefore it may be challenging to see the differences, given the partial penetrance. In an attempt to address cell -autonomy question, we will try a different approach. We will overexpress jam2b labeled with 2A-mCherry, and test if it can rescue the mutant phenotype in cell autonomous manner. Overexpression will be done in a mosaic manner, with higher number of cells labeled than in a typical transplantation experiment.

      (7) Finally, the authors analyze the phenotypes of hand2 mutants and their impact on the expression of jam2b and etv2. They observe a reduction in jam2b and etv2 expression in SVF cells. However, they do not show the vascular phenotypes of hand2 mutants. Is the formation of the SIA and SIV disturbed? Is hand2 cell autonomously needed in ECs? The authors suggest that hand2 controls SVF development through the regulation of jam2b. However, they also show that jam2b mutants do not have a phenotype on their own. Clearly, hand2, if it were to be required in ECs, regulates other genes important for SVF development. These might then regulate jam2b expression. The clear linear relationship, as the title suggests, is not convincingly shown by the data.

      As suggested, we will add the analysis of SIA and SIV in hand2 mutants during the revision process. We could not assess that easily because the line was not maintained in vascular fli1:GFP background. We do not know if hand2 is required cell-autonomously. This is an important question, but it may be answered better in a separate study. Regarding hand2-jam2b axis, it is very clear that jam2b expression in the posterior lateral plate mesoderm is completely lost in hand2 mutants, except for its more anterior domain over the yolk. This does support the idea that hand2 functions upstream of jam2b. However, the relationship may not be necessarily direct. We agree that hand2 may regulate additional genes involved in SVF cell development. We will attempt to clarify this relationship and test if jam2b overexpression may rescue hand2 mutant phenotype.

      Reviewer #3 (Public review):

      (1) Overall molecular mechanisms of Jam2 function are not fully uncovered in the study. How do the adhesion molecules Jam2a and Jam2b regulate SVF cell formation? Are they responsible for migration, adhesion or fate determination of these structures? The authors should provide a more in-depth study of the jam2a, jam2b mutations and assess the processes affected in these mutants. Combining these mutants with etv2:Kaede can also provide a stronger causative link between their functions and defects in SVF formation.

      Our data argue that the initial SVF cell specification (based on etv2 expression) is reduced in jam2a;jam2b mutants. We do not know if the migration or fate determination of the remaining SVF cells is also affected, although this may be more challenging to answer, as there are only few SVF cells remaining. We agree that further mechanistic studies of jam2a,jam2b function are needed. However, we think that this would be better addressed in a separate study. We are currently raising mutants crossed into fli1:Kaede line, which should confirm that there are fewer new cells that emerge after Kaede photoconversion in jam2a,jam2b mutants.

      (2) Have the authors tested the specificity of the jam2b knock-in reporter line? This is an important experiment, as many of the conclusions derive from lineage tracing and fluorescence reporting from this knock-in line. One suggestion is to cross the jam2b:GFP or jam2b:Gal4, UAS:GFP line to the generated jam2b mutants, and examine the expression pattern of these lines. Considering that the ISH experiment showed lack of jam2b expression, the reporter line should not be expressed in the jam2b mutants.

      We show in Suppl. Fig. 2 that jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP knock-in line has similar expression pattern as jam2b mRNA by in situ hybridization, which argues for its specificity. In the revision, we plan to use HCR analysis to confirm than jam2b mRNA is expressed in the same cells as jam2b<sup>Gt(2A-Gal4)</sup>;UAS:GFP, as an additional evidence for its specificity. Unfortunately, it is not feasible to cross jam2b knock-in line into jam2b mutants, as suggested by the reviewer. Because jam2b knock-in line targets the endogenous jam2b genomic locus, which is very close in the genome to jam2b promoter deletion in jam2b mutants, the recombination frequency would be very low, and we would not get double jam2b knock-in and knock-out events in the same chromosome.

      (3) The rationale behind the regeneration study is not clear, and the mechanisms underlying the phenotype are not well described. How do the authors explain the phenotype with the impaired regeneration, and what is the significance of this finding as it relates to SVF formation and function? 

      We apologize for this omission. This experiment was more thouroughly described in our previous study by Metikala et al 2022. In that study we showed that when endothelial cells are ablated by treating with MTZ from 6 to 45 hpf, this results in ablation of all vascular endothelial cells except for SVF cells, because they originate later than other cells. We subsequently showed that these SVF cells can partially form PCV and intestinal vasculature, helping them regenerate, which was confirmed by time-lapse imaging. In the current study, we tested if jam2a; jam2b double mutants show defects in such vascular regeneration. Indeed, regeneration after cell ablation was reduced, which correlated with reduction in SVF cell number. This argues that jam2a/b function is required for SVF cell emergence and vascular recovery after endothelial cell ablation. We will provide better description of this experiment and discuss interpretations in the revised manuscript.

      (4) The authors need to include representative images of jam2b>CreERT2 with 4-OH activation at different timepoints in Figure 3.

      Yes, thanks for noting this; these images will be included in the revised manuscript.

      (5) The etv2:Kaede photoconversion experiment to show that the majority of intestinal vasculature derives after 24 hours needs to be supplemented with additional data on photoconverted post-24-hour-old endothelial cells, with the expectation that the majority of intestinal endothelial cells at 4 days will then be labeled with red Kaede. In addition, there have been data that show the red Kaede protein is not stable past several days in vivo, and 3 days might be sufficient for the removal or degradation of this photoconverted protein. Thus, the statement that intestinal vasculature forms largely by new vasculogenesis might be too strong based on existing data.

      It is apparent from Fig. 4B that many other vessels, such as the dorsal aorta and many intersegmental vessels show robust red Kaede expression at 4 dpf, arguing that there is sufficient photoconverted Kaede present at this stage, and its degradation is unlikely to be the reason. However, we are planning to include additional control experiments, as suggested by the reviewer, to make this argument stronger.

      (6) To strengthen the claim that hand2 acts upstream of jam2b, the authors can perform combinatorial genetic epistatic analysis and examine whether jam2b mutations worsen hand2 homozygous or heterozygous effects on the SVF. Similarly, overexpressing jam2b might rescue the loss of SVF/etv2 expression in hand2 mutants. 

      We appreciate this suggestion. Double epistatic analysis, while informative, can be tricky. In this case, we are dealing with jam2a; jam2b redundancy and also the maternal effect. It may take a while considerable effort to generate different combinations of tripple mutant lines (jam2a,jam2b,hand2), and it is unclear whether double or tripple heterozygous embryos will show any defects to clarify their epistatic relationship. Instead, as suggested, we are planning to overexpress jam2b in wild-type and hand2 mutants to address this point.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts that move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and modelcomparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and wellstructured.

      We thank the reviewer for recognizing the strengths of our work.

      Weaknesses:

      (Q1) I also have some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      We thank the reviewer for this suggestion. Following the comment, we added a hierarchical Bayesian estimation. We built a hierarchical model with both group-level (adolescent group and adult group) and individual-level structures for the best-fitting model. Four Markov chains with 4,000 samples each were run, and the model converged well (see Figure supplement 7)

      We then analyzed the posterior parameters for adolescents and adults separately. The results were consistent with those from the MLE analysis (see Figure 2—figure supplement 5). These additional results have been included in the Appendix Analysis section (also see Figure supplement 5 and 7). In addition, we have updated the code and provided the link for reference. We appreciate the reviewer’s suggestion, which improved our analysis.

      (Q2) There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma.

      However, our computational modeling explicitly addressed this possibility. Model 4 (inequality aversion) captures decisions that are driven purely by self-interest or aversion to unequal outcomes, including a parameter reflecting disutility from advantageous inequality, which represents self-oriented motives. If participants’ behavior were solely guided by the payoff-dominant strategy, this model should have provided the best fit. However, our model comparison showed that Model 5 (social reward) performed better in both adolescents and adults, suggesting that cooperative behavior is better explained by valuing social outcomes beyond payoff structures.

      Besides, if adolescents’ lower cooperation is that they strategically respond to the payoff structure by adopting defection as the more rewarding option. Then, adolescents should show reduced cooperation across all rounds. Instead, adolescents and adults behaved similarly when partners defected, but adolescents cooperated less when partners cooperated and showed little increase in cooperation even after consecutive cooperative responses. This pattern suggests that adolescents’ lower cooperation cannot be explained solely by strategic responses to payoff structures but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded our Discussion to acknowledge this important point and to clarify how the behavioral and modeling results address the reviewer’s concern.

      “Overall, these findings indicate that adolescents’ lower cooperation is unlikely to be driven solely by strategic considerations, but may instead reflect differences in the valuation of others’ cooperation or reduced motivation to reciprocate. Although defection is the payoffdominant strategy in the Prisoner’s Dilemma, the selective pattern of adolescents’ cooperation and the model comparison results indicate that their reduced cooperation cannot be fully explained by strategic incentives, but rather reflects weaker valuation of social reciprocity.”

      Appraisal & Discussion:

      (Q3) The authors have partially achieved their aims, but I believe the manuscript would benefit from additional methodological clarification, specifically regarding the use of hierarchical model fitting and the inclusion of Bayes Factors, to more robustly support their conclusions. It would also be important to investigate the source of the model confusion observed in two of their models.

      We thank the reviewer for this comment. In the revised manuscript, we have clarified the hierarchical Bayesian modeling procedure for the best-fitting model, including the group- and individual-level structure and convergence diagnostics. The hierarchical approach produced results that fully replicated those obtained from the original maximumlikelihood estimation, confirming the robustness of our findings. Please also see the response to Q1.

      Regarding the model confusion between the inequality aversion (Model 4) and social reward (Model 5) models in the model recovery analysis, both models’ simulated behaviors were best captured by the baseline model. This pattern arises because neither model includes learning or updating processes. Given that our task involves dynamic, multi-round interactions, models lacking a learning mechanism cannot adequately capture participants’ trial-by-trial adjustments, resulting in similar behavioral patterns that are better explained by the baseline model during model recovery. We have added a clarification of this point to the Results:

      “The overlap between Models 4 and 5 likely arises because neither model incorporates a learning mechanism, making them less able to account for trial-by-trial adjustments in this dynamic task.”

      (Q4) I am unconvinced by the claim that failures in mentalising have been empirically ruled out, even though I am theoretically inclined to believe that adolescents can mentalise using the same procedures as adults. While reinforcement learning models are useful for identifying biases in learning weights, they do not directly capture formal representations of others' mental states. Greater clarity on this point is needed in the discussion, or a toning down of this language.

      We sincerely thank the reviewer for this professional comment. We agree that our prior wording regarding adolescents’ capacity to mentalise was somewhat overgeneralized. Accordingly, we have toned down the language in both the Abstract and the Discussion to better align our statements with what the present study directly tests. Specifically, our revisions focus on adolescents’ and adults’ ability to predict others’ cooperation in social learning. This is consistent with the evidence from our analyses examining adolescents’ and adults’ model-based expectations and self-reported scores on partner cooperativeness (see Figure 4). In the revised Discussion, we state:

      “Our results suggest that the lower levels of cooperation observed in adolescents stem from a stronger motive to prioritize self-interest rather than a deficiency in predicting others’ cooperation in social learning”.

      (Q5) Additionally, a more detailed discussion of the incentives embedded in the Prisoner's Dilemma task would be valuable. In particular, the authors' interpretation of reduced adolescent cooperativeness might be reconsidered in light of the zero-sum nature of the game, which differs from broader conceptualisations of cooperation in contexts where defection is not structurally incentivised.

      We thank the reviewer for this comment and agree that adolescents’ lower cooperation may partly reflect a rational response to the incentive structure of the Prisoner’s Dilemma. However, our behavioral and computational evidence suggests that this pattern cannot be explained solely by strategic responses to payoff structures, but rather reflects a reduced sensitivity to others’ cooperative behavior or weaker social reciprocity motives. We have expanded the Discussion to acknowledge this point and to clarify how both behavioral and modeling results address the reviewer’s concern (see also our response to Q2).

      (Q6) Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

      We thank the reviewer for the professional comments, which have helped us improve our work.

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      (1) Rigid model comparison and parameter recovery procedure.

      (2) Conceptually comprehensive model space.

      (3) Well-powered samples.

      We thank the reviewer for highlighting the strengths of our work.

      Weaknesses:

      (Q1) A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-bytrial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      We thank the reviewer for this thoughtful comment. We agree that social learning from human partners may involve higher-order inferences beyond simple reinforcement learning from non-human sources. To address this, we had previously included such mechanisms in our behavioral modeling. In Model 7 (Social Reward Model with Influence), we tested a higher-order belief-updating process in which participants’ expectations about their partner’s cooperation were shaped not only by the partner’s previous choices but also by the inferred influence of their own past actions on the partner’s subsequent behavior. In other words, participants could adjust their belief about the partner’s cooperation by considering how their partner’s belief about them might change. Model comparison showed that Model 7 did not outperform the best-fitting model, suggesting that incorporating higher-order influence updates added limited explanatory value in this context. As suggested by the reviewer, we have further clarified this point in the revised manuscript.

      Regarding trait-based frameworks, we appreciate the reviewer’s reference to Hackel et al. (2015). That study elegantly demonstrated that learners form relatively stable beliefs about others’ social dispositions, such as generosity, especially when the task structure provides explicit cues for trait inference (e.g., resource allocations and giving proportions). By contrast, our study was not designed to isolate trait learning, but rather to capture how participants update their expectations about a partner’s cooperation over repeated interactions. In this sense, cooperativeness in our framework can be viewed as a trait-like latent belief that evolves as evidence accumulates. Thus, while our model does not include a dedicated trait module that directly modulates learning rates, the belief-updating component of our best-fitting model effectively tracks a dynamic, partner-specific cooperativeness, potentially reflecting a prosocial tendency.

      (Q2) This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      We thank the reviewer for the suggestion. Following the comment, we implemented an additional model incorporating a dynamic learning rate based on the magnitude of prediction errors. Specifically, we developed Model 9:  Social reward model with Pearce–Hall learning algorithm (dynamic learning rate), in which participants’ beliefs about their partner’s cooperation probability are updated using a Rescorla–Wagner rule with a learning rate dynamically modulated by the Pearce–Hall (PH) Error Learning mechanism. In this framework, the learning rate increases following surprising outcomes (larger prediction errors) and decreases as expectations become more stable (see Appendix Analysis section for details).

      The results showed that this dynamic learning rate model did not outperform our bestfitting model in either adolescents or adults (see Figure supplement 6). We greatly appreciate the reviewer’s suggestion, which has strengthened the scope of our analysis. We now have added these analyses to the Appendix Analysis section (also Figure Supplement 6) and expanded the Discussion to acknowledge this modeling extension and further discuss its implications.

      (Q3) Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      We thank the reviewer for this professional comment. In addition to the linear analyses, we further conducted exploratory analyses to examine potential non-linear relationships between age and the model parameters. Specifically, we fit LMMs for each of the four parameters as outcomes (α+, α-, β, and ω). The fixed effects included age, a quadratic age term, and gender, and the random effects included subject-specific random intercepts and random slopes for age and gender. Model comparison using BIC did not indicate improvement for the quadratic models over the linear models for α<sup>+</sup> (ΔBIC<sub>quadratic-linear</sub> = 5.09), α<sup>-</sup>(ΔBIC<sub>quadratic-linear</sub> = 3.04), β (ΔBIC<sub>quadratic-linear</sub> = 3.9), or ω (ΔBIC<sub>quadratic-linear</sub>= 0). Moreover, the quadratic age term was not significant for α<sup>+</sup>, α<sup>−</sup>, or β (all ps > 0.10). For ω, we observed a significant linear age effect (b = 1.41, t = 2.65, p = 0.009) and a significant quadratic age effect (b = −0.03, t = −2.39, p = 0.018; see Author response image 1). This pattern is broadly consistent with the group effect reported in the main text. The shaded area in the figure represents the 95% confidence interval. As shown, the interval widens at older ages (≥ 26 years) due to fewer participants in that range, which limits the robustness of the inferred quadratic effect. In consideration of the limited precision at older ages and the lack of BIC improvement, we did not emphasize the quadratic effect in the revised manuscript and present these results here as exploratory.

      Author response image 1.

      Linear and quadratic model fits showing the relationship between age and the ω parameter, with 95% confidence intervals.

      (Q4) Finally, the two age groups compared - adolescents (high school students) and adults (university students) - differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

      We appreciate this comment. Indeed, adolescents (high school students) and adults (university students) differ not only in age but also in sociocultural and socioeconomic backgrounds. In our study, all participants were recruited from Beijing and surrounding regions, which helps minimize large regional and cultural variability. Moreover, we accounted for individual-level random effects and included participants’ social value orientation (SVO) as an individual difference measure.

      Nonetheless, we acknowledge that other contextual factors, such as differences in financial independence, socioeconomic status, and social experience—may also contribute to group differences in cooperative behavior and reward valuation. Although our results are broadly consistent with developmental theories of reward sensitivity and social decisionmaking, sociocultural influences cannot be entirely ruled out. Future work with more demographically matched samples or with socioeconomic and regional variables explicitly controlled will help clarify the relative contributions of biological and contextual factors. Accordingly, we have revised the Discussion to include the following statement:

      “Third, although both age groups were recruited from Beijing and nearby regions, minimizing major regional and cultural variation, adolescents and adults may still differ in socioeconomic status, financial independence, and social experience. Such contextual differences could interact with developmental processes in shaping cooperative behavior and reward valuation. Future research with demographically matched samples or explicit measures of socioeconomic background will help disentangle biological from sociocultural influences.”

      Reviewer #3 (Public review):

      Summary:

      Wu and colleagues find that in a repeated Prisoner's Dilemma, adolescents, compared to adults, are less likely to increase their cooperation behavior in response to repeated cooperation from a simulated partner. In contrast, after repeated defection by the partner, both age groups show comparable behavior.

      To uncover the mechanisms underlying these patterns, the authors compare eight different models. They report that a social reward learning model, which includes separate learning rates for positive and negative prediction errors, best fits the behavior of both groups. Key parameters in this winning model vary with age: notably, the intrinsic value of cooperating is lower in adolescents. Adults and adolescents also differ in learning rates for positive and negative prediction errors, as well as in the inverse temperature parameter.

      Strengths:

      The modeling results are compelling in their ability to distinguish between learned expectations and the intrinsic value of cooperation. The authors skillfully compare relevant models to demonstrate which mechanisms drive cooperation behavior in the two age groups.

      We thank the reviewer’s recognition of our work’s strengths.

      Weaknesses:

      (Q1) Some of the claims made are not fully supported by the data:

      The central parameter reflecting preference for cooperation is positive in both groups. Thus, framing the results as self-interest versus other-interest may be misleading.

      We thank the reviewer for this insightful comment. In the social reward model, the cooperation preference parameter is positive by definition, as defection in the repeated rPDG always yields a +2 monetary advantage regardless of the partner’s action. This positive value represents the additional subjective reward assigned to mutual cooperation (e.g., reciprocity value) that counterbalances the monetary gain from defection. Although the estimated social reward parameter ω was positive, the effective advantage of cooperation is Δ=p×ω−2. Given participants’ inferred beliefs p, Δ was negative for most trials (p×ω<2), indicating that the social reward was insufficient to offset the +2 advantage of defection. Thus, both adolescents and adults valued cooperation positively, but adolescents’ smaller ω and weaker responsiveness to sustained partner cooperation suggest a stronger weighting on immediate monetary payoffs.

      In this light, our framing of adolescents as more self-interested derives from their behavioral pattern: even when they recognized sustained partner cooperation and held high expectations of partner cooperation, adolescents showed lower cooperative behavior and reciprocity rewards compared with adults. Whereas adults increased cooperation after two or three consecutive partner cooperations, this pattern was absent among adolescents. We therefore interpret their behavior as relatively more self-interested, reflecting reduced sensitivity to the social reward from mutual cooperation rather than a categorical shift from self-interest to other-interest, as elaborated in the Discussion.

      (Q2) It is unclear why the authors assume adolescents and adults have the same expectations about the partner's cooperation, yet simultaneously demonstrate age-related differences in learning about the partner. To support their claim mechanistically, simulations showing that differences in cooperation preference (i.e., the w parameter), rather than differences in learning, drive behavioral differences would be helpful.

      We thank the reviewer for raising this important point. In our model, both adolescents and adults updated their beliefs about partner cooperation using an asymmetric reinforcement learning (RL) rule. Although adolescents exhibited a higher positive and a lower negative learning rate than adults, the two groups did not differ significantly in their overall updating of partner cooperation probability (Fig. 4a-b). We then examined the social reward parameter ω, which was significantly smaller in adolescents and determined the intrinsic value of mutual cooperation (i.e., p×ω). This variable differed significantly between groups and closely matched the behavioral pattern.

      Following the reviewer’s suggestion, we conducted additional simulations varying one model parameter at a time while holding the others constant. The difference in mean cooperation probability between adults and adolescents served as the index (positive = higher cooperation in adults). As shown in the Author response image 2, decreases in ω most effectively reproduced the observed group difference (shaded area), indicating that age-related differences in cooperation are primarily driven by variation in the social reward parameter ω rather than by others.

      Author response image 2.

      Simulation results showing how variations in each model parameter affect the group difference in mean cooperation probability (Adults – Adolescents). Based on the bestfitting Model 8 and parameters estimated from all participants, each line represents one parameter (i.e., α+, α-, ω, β) systematically varied within the tested range (α±:0.1–0.9; ω, β:1–9) while other parameters were held constant. Positive values indicate higher cooperation in adults. Smaller ω values most strongly reproduced the observed group difference, suggesting that reduced social reward weighting primarily drives adolescents’ lower cooperation.

      (Q3) Two different schedules of 120 trials were used: one with stable partner behavior and one with behavior changing after 20 trials. While results for order effects are reported, the results for the stable vs. changing phases within each schedule are not. Since learning is influenced by reward structure, it is important to test whether key findings hold across both phases.

      We thank the reviewer for this thoughtful and professional comment. In our GLMM and LMM analyses, we focused on trial order rather than explicitly including the stable vs. changing phase factor, due to concerns about multicollinearity. In our design, phases occur in specific temporal segments, which introduces strong collinearity with trial order. In multi-round interactions, order effects also capture variance related to phase transitions.

      Nonetheless, to directly address this concern, we conducted additional robustness analyses by adding a phase variable (stable vs. changing) to GLMM1, LMM1, and LMM3 alongside the original covariates. Across these specifications, the key findings were replicated (see GLMM<sub>sup</sub>2 and LMM<sub>sup</sub>4–5; Tables 9-11), and the direction and significance of main effects remained unchanged, indicating that our conclusions are robust to phase differences.

      (Q4) The division of participants at the legal threshold of 18 years should be more explicitly justified. The age distribution appears continuous rather than clearly split. Providing rationale and including continuous analyses would clarify how groupings were determined.

      We thank the reviewer for this thoughtful comment. We divided participants at the legal threshold of 18 years for both conceptual and practical reasons grounded in prior literature and policy. In many countries and regions, 18 marks the age of legal majority and is widely used as the boundary between adolescence and adulthood in behavioral and clinical research. Empirically, prior studies indicate that psychosocial maturity and executive functions approach adult levels around this age, with key cognitive capacities stabilizing in late adolescence (Icenogle et al., 2019; Tervo-Clemmens et al., 2023). We have clarified this rationale in the Introduction section of the revised manuscript.

      “Based on legal criteria for majority and prior empirical work, we adopt 18 years as the boundary between adolescence and adulthood (Icenogle et al., 2019; Tervo-Clemmens et al., 2023).”

      We fully agree that the underlying age distribution is continuous rather than sharply divided. To address this, we conducted additional analyses treating age as a continuous predictor (see GLMM<sub>sup</sub>1 and LMM<sub>sup</sub>1–3; Tables S1-S4), which generally replicated the patterns observed with the categorical grouping. Nevertheless, given the limited age range of our sample, the generalizability of these findings to fine-grained developmental differences remains constrained. Therefore, our primary analyses continue to focus on the contrast between adolescents and adults, rather than attempting to model a full developmental trajectory.

      (Q5) Claims of null effects (e.g., in the abstract: "adults increased their intrinsic reward for reciprocating... a pattern absent in adolescents") should be supported with appropriate statistics, such as Bayesian regression.

      We thank the reviewer for highlighting the importance of rigor when interpreting potential null effects. To address this concern, we conducted Bayes factor analyses of the intrinsic reward for reciprocity and reported the corresponding BF10 for all relevant post hoc comparisons. This approach quantifies the relative evidence for the alternative versus the null hypothesis, thereby providing a more direct assessment of null effects. The analysis procedure is now described in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (Q6) Once claims are more closely aligned with the data, the study will offer a valuable contribution to the field, given its use of relevant models and a well-established paradigm.

      We are grateful for the reviewer’s generous appraisal and insightful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) I commend the authors on a well-structured, clear, and interesting piece of work. I have several questions and recommendations that, if addressed, I believe will strengthen the manuscript.

      We thank the reviewer for commending the organization of our paper.

      (2) Introduction: - Why use a zero-sum (Prisoner's Dilemma; PD) versus a mixed-motive game (e.g. Trust Task) to study cooperation? In a finite set of rounds, the dominant strategy can be to defect in a PD.

      We thank the reviewer for this helpful comment. We agree that both the rationale for using the repeated Prisoner’s Dilemma (rPDG) and the limitations of this framework should be clarified. We chose the rPDG to isolate the core motivational conflict between selfinterest and joint welfare, as its symmetric and simultaneous structure avoids the sequential trust and reputation dependencies/accumulation inherent to asymmetric tasks such as the Trust Game (King-Casas et al., 2005; Rilling et al., 2002).

      Although a finitely repeated rPDG theoretically favors defection, extensive prior research shows that cooperation can still emerge in long repeated interactions when players rely on learning and reciprocity rather than backward induction (Rilling et al., 2002; Fareri et al., 2015). Our design employed 120 consecutive rounds, allowing participants to update expectations about partner behavior and to establish stable reciprocity patterns over time. We have added the following clarification to the Introduction:

      “The rPDG provides a symmetric and simultaneous framework that isolates the motivational conflict between self-interest and joint welfare, avoiding the sequential trust and reputation dynamics characteristic of asymmetric tasks such as the Trust Game (Rilling et al., 2002; King-Casas et al., 2005)”

      (3) Methods:

      Did the participants know how long the PD would go on for?

      Were the participants informed that the partner was real/simulated?

      Were the participants informed that the partner was going to be the same for all rounds?

      We thank the reviewer for the meticulous review work, which helped us present the experimental design and reporting details more clearly. the following clarifications: I. Participants were not informed of the total number of rounds in the rPDG. This prevented endgame expectations and avoided distraction from counting rounds, which could introduce additional effects. II. Participants were told that their partner was another human participant in the laboratory. However, the partner’s behavior was predetermined by a computer program. This design enabled tighter experimental control and ensured consistent conditions across age groups, supporting valid comparisons. III. Participants were informed that they would interact with the same partner across all rounds, aligning with the essence of a multiround interaction paradigm and stabilizing partner-related expectations. For transparency, we have clarified these points in the Methods and Materials section:

      “Participants were told that their partner was another human participant in the laboratory and that they would interact with the same partner across all rounds. However, in reality, the actions of the partner were predetermined by a computer program. This setup allowed for a clear comparison of the behavioral responses between adolescents and adults. Participants were not informed of the total number of rounds in the rPDG.”

      (4) The authors mention that an SVO was also recorded to indicate participant prosociality. Where are the results of this? Did this track game play at all? Could cooperativeness be explained broadly as an SVO preference that penetrated into game-play behaviour?

      We thank the reviewer for pointing this out. We agree that individual differences in prosociality may shape cooperative behavior, so we conducted additional analyses incorporating SVO. Specifically, we extended GLMM1 and LMM3 by adding the measured SVO as a fixed effect with random slopes, yielding GLMM<sub>sup</sub>3 and LMM<sub>sup</sub>6 (Tables 12–13). The results showed that higher SVO was associated with greater cooperation, whereas its effect on the reward for reciprocity was not significant. Importantly, the primary findings remained unchanged after controlling for SVO. These results indicate that cooperativeness in our task cannot be explained solely by a broad SVO preference, although a more prosocial orientation was associated with greater cooperation. We have reported these analyses and results in the Appendix Analysis section.

      (5) Why was AIC chosen rather an BIC to compare model dominance?

      Sorry for the lack of clarification. Both the Akaike Information Criterion (AIC, Akaike, 1974) and Bayesian Information Criterion (BIC, Schwarz, 1978) are informationtheoretic criterions for model comparison, neither of which depends on whether the models to be compared are nested to each other or not (Burnham et al., 2002). We have added the following clarification into the Methods.

      “We chose to use the AICc as the metric of goodness-of-fit for model comparison for the following statistical reasons. First, BIC is derived based on the assumption that the “true model” must be one of the models in the limited model set one compares (Burnham et al., 2002; Gelman & Shalizi, 2013), which is unrealistic in our case. In contrast, AIC does not rely on this unrealistic “true model” assumption and instead selects out the model that has the highest predictive power in the model set (Gelman et al., 2014). Second, AIC is also more robust than BIC for finite sample size (Vrieze, 2012).”

      (6) I believe the model fitting procedure might benefit from hierarchical estimation, rather than maximum likelihood methods. Adolescents in particular seem to show multiple outliers in a^+ and w^+ at the lower end of the distributions in Figure S2. There are several packages to allow hierarchical estimation and model comparison in MATLAB (which I believe is the language used for this analysis;

      see https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007043).

      We thank the reviewer for this helpful comment and for referring us to relevant methodological work (Piray et al., 2019). We have addressed this point by incorporating hierarchical Bayesian estimation, which effectively mitigates outlier effects and improves model identifiability. The results replicated those obtained with MLE fitting and further revealed group-level differences in key parameters. Please see our detailed response to Reviewer#1 Q1 for the full description of this analysis and results.

      (7) Results: Model confusion seems to show that the inequality aversion and social reward models were consistently confused with the baseline model. Is this explained or investigated? I could not find an explanation for this.

      The apparent overlap between the inequality aversion (Model 4) and social reward (Model 5) models in the recovery analysis likely arises because neither model includes a learning mechanism, making them unable to capture trial-by-trial adjustments in this dynamic task. Consequently, both were best fit by the baseline model. Please see Response to Reviewer #1 Q3 for related discussion.

      (8) Figures 3e and 3f show the correlation between asymmetric learning rates and age. It seems that both a^+ and a^- are around 0.35-0.40 for young adolescents, and this becomes more polarised with age. Could it be that with age comes an increasing discernment of positive and negative outcomes on beliefs, and younger ages compress both positive and negative values together? Given the higher stochasticity in younger ages (\beta), it may also be that these values simply represent higher uncertainty over how to act in any given situation within a social context (assuming the differences in groups are true).

      We appreciate this insightful interpretation. Indeed, both α+ and α- cluster around 0.35–0.40 in younger adolescents and become increasingly polarized with age, suggesting that sensitivity to positive versus negative feedback is less differentiated early in development and becomes more distinct over time. This interpretation remains tentative and warrants further validation. Based on this comment, we have revised the Discussion to include this developmental interpretation.

      We also clarify that in our model β denotes the inverse temperature parameter; higher β reflects greater choice precision and value sensitivity, not higher stochasticity. Accordingly, adolescents showed higher β values, indicating more value-based and less exploratory choices, whereas adults displayed relatively greater exploratory cooperation. These group differences were also replicated using hierarchical Bayesian estimation (see Response to Reviewer #1 Q1). In response to this comment, we have added a statement in the Discussion highlighting this developmental interpretation.

      “Together, these findings suggest that the differentiation between positive and negative learning rates changes with age, reflecting more selective feedback sensitivity in development, while higher β values in adolescents indicate greater value sensitivity. This interpretation remains tentative and requires further validation in future research.”

      (9) A parameter partial correlation matrix (off-diagonal) would be helpful to understand the relationship between parameters in both adolescents and adults separately. This may provide a good overview of how the model properties may change with age (e.g. a^+'s relation to \beta).

      We thank the reviewer for this helpful comment. We fully agree that a parameter partial correlation matrix can further elucidate the relationships among parameters. Accordingly, we conducted a partial correlation analysis and added the visually presented results to the revised manuscript as Figure 2-figure supplement 4.

      (10) It would be helpful to have Bayes Factors reported with each statistical tests given that several p-values fall within the 0.01 and 0.10.

      We thank the reviewer for this important recommendation. We have conducted Bayes factor analyses and reported BF10 for all relevant post hoc comparisons. We also clarified our analysis in the Methods and Materials section:

      “Post hoc comparisons were conducted using Bayes factor analyses with MATLAB’s bayesFactor Toolbox (version v3.0, Krekelberg, 2024), with a Cauchy prior scale σ = 0.707.”

      (11) Discussion: I believe the language around ruling out failures in mentalising needs to be toned down. RL models do not enable formal representational differences required to assess mentalising, but they can distinguish biases in value learning, which in itself is interesting. If the authors were to show that more complex 'ToM-like' Bayesian models were beaten by RL models across the board, and this did not differ across adults and adolescents, there would be a stronger case to make this claim. I think the authors either need to include Bayesian models in their comparison, or tone down their language on this point, and/or suggest ways in which this point might be more thoroughly investigated (e.g., using structured models on the same task and running comparisons: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0087619).

      We thank the reviewer for the comments. Please see our response to Reviewer 1 (Appraisal & Discussion section) for details.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors may want to show the winning model earlier (perhaps near the beginning of the Results section, when model parameters are first mentioned).

      We thank the reviewer for this suggestion. We agree that highlighting the winning model early improves clarity. Currently, we have mentioned the winning model before the beginning of the Results section. Specifically, in the penultimate paragraph of the Introduction we state:

      “We identified the asymmetric RL learning model as the winning model that best explained the cooperative decisions of both adolescents and adults.”

      Reviewer #3 (Recommendations for the authors):

      (1) In addition to the points mentioned above, I suggest the following:

      Clarify plots by clearly explaining each variable. In particular, the indices 1 vs. 1,2 vs 1,2,3 were not immediately understandable.

      We thank the reviewer for this suggestion. We agree that the indices were not immediately clear. We have revised the figure captions (Figure 1 and 4) to explicitly define these terms more clearly:

      “The x-axis represents the consistency of the partner’s actions in previous trials (t<sub>−1</sub>: last trial; t<sub>−1,2</sub>: last two trials;<sub>t−1,2,3</sub>: last three trials).”

      (2) It's unclear why the index stops at 3. If this isn't the maximum possible number of consecutive cooperation trials, please consider including all relevant data, as adolescents might show a trend similar to adults over more trials.

      We thank the reviewer for raising this point. In our exploratory analyses, we also examined longer streaks of consecutive partner cooperation or defection (up to four or five trials). Two empirical considerations led us to set the cutoff at three in the final analyses. First, the influence of partner behavior diminished sharply with temporal distance. In both GLMMs and LMMs, coefficients for earlier partner choices were small and unstable, and their inclusion substantially increased model complexity and multicollinearity. This recency pattern is consistent with learning and decision models emphasizing stronger weighting of recent evidence (Fudenberg & Levine, 2014; Fudenberg & Peysakhovich, 2016). Second, streaks longer than three were rare, especially among some participants, leading to data sparsity and inflated uncertainty. Including these sparse conditions risked biasing group estimates rather than clarifying them. Balancing informativeness and stability, we therefore restricted the index to three consecutive partner choices in the main analyses, which we believe sufficiently capture individuals’ general tendencies in reciprocal cooperation.

      (3) The term "reciprocity" may not be necessary. Since it appears to reflect a general preference for cooperation, it may be clearer to refer to the specific behavior or parameter being measured. This would also avoid confusion, especially since adolescents do show negative reciprocity in response to repeated defection.

      We thank you for this comment. In our work, we compute the intrinsic reward for reciprocity as p × ω, where p is the partner cooperation expectation and ω is the cooperation preference. In the rPDG, this value framework manifests as a reciprocity-derived reward: sustained mutual cooperation maximizes joint benefits, and the resulting choice pattern reflects a value for reciprocity, contingent on the expected cooperation of the partner. This quantity enters the trade-off between U<sub>cooperation</sub> and U<sub>defection</sub> and captures the participant’s intrinsic reward for reciprocity versus the additional monetary reward payoff of defection. Therefore, we consider the term “reciprocity” an acceptable statement for this construct.

      (4) Interpretation of parameters should closely reflect what they specifically measure.

      We thank the reviewer for pointing this out. We have refined the relevant interpretations of parameters in the current Results and Discussion sections.

      (5) Prior research has shown links between Theory of Mind (ToM) and cooperation (e.g., Martínez-Velázquez et al., 2024). It would be valuable to test whether this also holds in your dataset.

      We thank the reviewer for this thoughtful comment. Although we did not directly measure participants’ ToM, our design allowed us to estimate participants’ trial-by-trial inferences (i.e., expectations) about their partner’s cooperation probability. We therefore treat these cooperation expectations as an indirect representation for belief inference, which is related to ToM processes. To test whether this belief-inference component relates to cooperation in our dataset, we further conducted an exploratory analysis (GLMM<sub>sup</sub>4) in which participants’ choices were regressed on their cooperation expectations, group, and the group × cooperation-expectation interaction, controlling for trial number and gender, with random effects. Consistent with the ToM–cooperation link in prior research (MartínezVelázquez et al., 2024), participants’ expectations about their partner’s cooperation significantly predicted their cooperative behavior (Table 14), suggesting that decisions were shaped by social learning about others’ inferred actions. Moreover, the interaction between group and cooperation expectation was not significant, indicating that this inference-driven social learning process likely operates similarly in adolescents and adults. This aligns with our primary modeling results showing that both age groups update beliefs via an asymmetric learning process. We have reported these analyses in the Appendix Analysis section.

      (6) More informative table captions would help the reader. Please clarify how variables are coded (e.g., is female = 0 or 1? Is adolescent = 0 or 1?), to avoid the need to search across the manuscript for this information.

      We thank the reviewer for raising this point. We have added clear and standardized variable coding in the table notes of all tables to make them more informative and avoid the need to search the paper. We have ensured consistent wording and formatting across all tables.

      (7) I hope these comments are helpful and support the authors in further strengthening their manuscript.

      We thank the three reviewers for their comments, which have been helpful in strengthening this work.

      References

      (1) Fudenberg, D., & Levine, D. K. (2014). Recency, consistent learning, and Nash equilibrium. Proceedings of the National Academy of Sciences of the United States of America, 111(Suppl. 3), 10826–10829. https://doi.org/10.1073/pnas.1400987111.

      (2) Fudenberg, D., & Peysakhovich, A. (2016). Recency, records, and recaps: Learning and nonequilibrium behavior in a simple decision problem. ACM Transactions on Economics and Computation, 4(4), Article 23, 1–18. https://doi.org/10.1145/2956581

      (3) Hackel, L., Doll, B., & Amodio, D. (2015). Instrumental learning of traits versus rewards: Dissociable neural correlates and effects on choice. Nature Neuroscience, 18, 1233– 1235. https://doi.org/10.1038/nn.4080

      (4) Icenogle, G., Steinberg, L., Duell, N., Chein, J., Chang, L., Chaudhary, N., Di Giunta, L., Dodge, K. A., Fanti, K. A., Lansford, J. E., Oburu, P., Pastorelli, C., Skinner, A. T.Sorbring, E., Tapanya, S., Uribe Tirado, L. M., Alampay, L. P., Al-Hassan, S. M.,Takash, H. M. S., & Bacchini, D. (2019). Adolescents’ cognitive capacity reaches adult levels prior to their psychosocial maturity: Evidence for a “maturity gap” in a multinational, cross-sectional sample. Law and Human Behavior, 43(1), 69–85. https://doi.org/10.1037/lhb0000315

      (5) Krekelberg, B. (2024). Matlab Toolbox for Bayes Factor Analysis (v3.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.13744717

      (6) Martínez-Velázquez, E. S., Ponce-Juárez, S. P., Díaz Furlong, A., & Sequeira, H. (2024). Cooperative behavior in adolescents: A contribution of empathy and emotional regulation? Frontiers in Psychology, 15,1342458. https://doi.org/10.3389/fpsyg.2024.1342458

      (7) Tervo-Clemmens, B., Calabro, F. J., Parr, A. C., et al. (2023). A canonical trajectory of executive function maturation from adolescence to adulthood. Nature Communications, 14, 6922. https://doi.org/10.1038/s41467-023-42540-8

      (8) King-Casas, B., Tomlin, D., Anen, C., Camerer, C. F., Quartz, S. R., & Montague, P. R. (2005). Getting to know you: reputation and trust in a two-person economic exchange. Science, 308(5718), 78-83. https://doi.org/10.1126/science.1108062

      (9) Rilling, J. K., Gutman, D. A., Zeh, T. R., Pagnoni, G., Berns, G. S., & Kilts, C. D. (2002).A neural basis for social cooperation. Neuron, 35(2), 395-405. https://doi.org/10.1016/s0896-6273(02)00755-9

      (10) Fareri, D. S., Chang, L. J., & Delgado, M. R. (2015). Computational substrates of social value in interpersonal collaboration. Journal of Neuroscience, 35(21), 8170-8180. https://doi.org/10.1523/JNEUROSCI.4775-14.2015

      (11) Akaike, H. (2003). A new look at the statistical model identification. IEEE transactions on automatic control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705

      (12) Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 461464. https://doi.org/10.1214/aos/1176344136

      (13) Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.https://doi.org/10.1007/b97636

      (14) Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66(1), 8–38. https://doi.org/10.1111/j.2044-8317.2011.02037.x

      (15) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b16018

      (16) Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). Psychological Methods, 17(2), 228–243. https://doi.org/10.1037/a0027127

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      This work by Reitz, Z. L. et al. developed an automated tool for high-throughput identification of microbial metallophore biosynthetic gene clusters (BGCs) by integrating knowledge of chelating moiety diversity and transporter gene families. The study aimed to create a comprehensive detection system combining chelator-based and transporter-based identification strategies, validate the tool through large-scale genomic mining, and investigate the evolutionary history of metallophore biosynthesis across bacteria.

      Major strengths include providing the first automated, high-throughput tool for metallophore BGC identification, representing a significant advancement over manual curation approaches. The ensemble strategy effectively combines complementary detection methods, and experimental validation using HPLC-HRMS strengthens confidence in computational predictions. The work pioneers a global analysis of metallophore diversity across the bacterial kingdom and provides a valuable dataset for future computational modeling.

      Some limitations merit consideration. First, ground truth datasets derived from manual curation may introduce selection bias toward well-characterized systems, potentially affecting performance assessment accuracy. Second, the model's dependence on known chelating moieties and transporter families constrains its ability to detect novel metallophore architectures, limiting discovery potential in metagenomic datasets. Third, while the proposed evolutionary hypothesis is internally consistent, it lacks direct validation and remains speculative without additional phylogenetic studies.

      The authors successfully achieved their stated objectives. The tool demonstrates robust performance metrics and practical utility through large-scale application to representative genomes. Results strongly support their conclusions through rigorous validation, including experimental confirmation of predicted metallophores via HPLC-HRMS analysis.

      The work provides a significant and immediate impact by enabling the transition from labor-intensive manual approaches to automated screening. The comprehensive phylogenetic framework advances understanding of bacterial metal acquisition evolution, informing future studies on microbial metal homeostasis. Community utility is substantial, since the tool and accompanying dataset create essential resources for comparative genomics, algorithm development, and targeted experimental validation of novel metallophores.

      We thank the reviewer for their valuable feedback. We appreciate the positive words, and agree with their listed limitations. Regarding the following comment:

      “Third, while the proposed evolutionary hypothesis is internally consistent, it lacks direct validation and remains speculative without additional phylogenetic studies.”

      We agree that additional phylogenetic analyses are needed in future studies. For the revised manuscript, we have validated our evolutionary hypotheses by additionally analyzing two gene families using the likelihood-based tool AleRax, which implements a probabilistic DTL model. The results were consistent with the eMPRess parsimony-based reconstructions, showing comparable patterns of rare duplication, moderate gene loss, and extensive horizontal transfer. Both methods identified similar lineages as the most probable origin and major recipients of transfer events. This agreement between independent reconciliation frameworks supports the reliability of our evolutionary conclusions. We have added a statement referencing this cross-method validation in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      This study presents a systematic and well-executed effort to identify and classify bacterial NRP metallophores. The authors curate key chelator biosynthetic genes from previously characterized NRP-metallophore biosynthetic gene clusters (BGCs) and translate these features into an HMM-based detection module integrated within the antiSMASH platform.

      The new algorithm is compared with a transporter-based siderophore prediction approach, demonstrating improved precision and recall. The authors further apply the algorithm to large-scale bacterial genome mining and, through reconciliation of chelator biosynthetic gene trees with the GTDB species tree using eMPRess, infer that several chelating groups may have originated prior to the Great Oxidation Event.

      Overall, this work provides a valuable computational framework that will greatly assist future in silico screening and preliminary identification of metallophore-related BGCs across bacterial taxa.

      Strengths:

      (1) The study provides a comprehensive curation of chelator biosynthetic genes involved in NRP-metallophore biosynthesis and translates this knowledge into an HMM-based detection algorithm, which will be highly useful for the initial screening and annotation of metallophore-related BGCs within antiSMASH.

      (2) The genome-wide survey across a large bacterial dataset offers an informative and quantitative overview of the taxonomic distribution of NRP-metallophore biosynthetic chelator groups, thereby expanding our understanding of their phylogenetic prevalence.

      (3) The comparative evolutionary analysis, linking chelator biosynthetic genes to bacterial phylogeny, provides an interesting and valuable perspective on the potential origin and diversification of NRP-metallophore chelating groups.

      We greatly appreciate these comments.

      Weaknesses:

      (1) Although the rule-based HMM detection performs well in identifying major categories of NRP-metallophore biosynthetic modules, it currently lacks the resolution to discriminate between fine-scale structural or biochemical variations among different metallophore types.

      We agree that this is a current limitation to the methodology. More specific metallophore structural prediction is among our future goals for antiSMASH. We have added a statement to this effect in the conclusion.

      (2) While the comparison with the transporter-based siderophore prediction approach is convincing overall, more information about the dataset balance and composition would be appreciated. In particular, specifying the BGC identities, source organisms, and Gram-positive versus Gram-negative classification would improve transparency. In the supplementary tables, the "Just TonB" section seems to include only BGCs from Gram-negative bacteria - if so, this should be clearly stated, as Gram type strongly influences siderophore transport systems.

      The reviewer raises good points here. An additional ZIP file containing all BGCs used for the manual curation was inadvertently left out of the supplemental dataset for the first version of the manuscript. We have added columns with source organisms and Gram stain (retrieved from Bacdive) to Table S2. F1 scores were similar for Gram positive and negative subsets, as seen in the new Table S2.

      We thank the reviewer for suggesting this additional analysis, and have added a brief statement in the revised manuscript.

      The “Just TonB” section (in which we tested the performance of requiring TonB without another transporter) was not used for the manuscript. We will preserve it in the revised Table S2 for transparency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In line 43:

      "excreted" should be replace by "secreted".

      Done.

      (2) In lines 158-159:

      "we manually predicted metallophore production among a large set of BGCs."

      If they are first "annotated with default antiSMASH v6.1", then it is not entirely manual, right? I would suggest making this sentence clearer.

      We have revised the language.

      (3) In lines 165-169:

      It would be good to show the confusion matrix of these results.

      The confusion matrices are found in Table S2, columns AL-AR.

      (4) In Table 1:

      Method names (AntiSMASH rules/Transporter genes) could be misleading, since they are all AntiSMASH-based, right?

      We have adjusted the methods to clarify that while the transporter genes were detected using a modified version of antiSMASH, they are not related to our chelator-based detection rule (which is now correctly singular throughout the text).

      (5) Line 198:

      There are accidental spaces and characters inserted here.

      We could not find any accidental spaces and characters here.

      (6) Line 209:

      "In total, 3,264 NRP metallophore BGC regions were detected"

      Is this number correct? I don't see a correspondence in Table 1.

      We have added the following sentence to the Table 1 legend: “An additional 54 BGC regions were detected as NRP metallophores without meeting the requirements for the antiSMASH NRPS rule.”

      (7) Line 294:

      "From B. brennerae, we identified four catecholic compounds"

      From the bacterial cells or the culture supernatant? I think it is important to state this in a more precise way. If it is from the supernatant, it could be from EVs.

      We state in line 292 that “organic compounds were extracted from the culture supernatants”. As our goal was only to confirm the ability of the strains to produce the predicted metallophores, the precise localization (including cell pellet or EVs) was not explored.

      (8) Lines 349-357:

      These results would benefit greatly from a visualization strategy.

      Thank you, we have added a reference to the existing visualization in Fig. 5, Ring C.

      (9) Lines 452-454:

      How could clusters be de-replicated? Is there an identity equivalence scheme or similarity metric?

      The BGC regions were de-replicated with BiG-SCAPE, which uses multiple similarity metrics as described in Navarro-Muñoz et al, 2020. Clusters could be dereplicated further using a more strict cutoff.

      (10) Line 457:

      "relatively low number of published genomes."

      Could metagenome-assembled genomes help in that matter?

      This is a good question, but we find that MAGs are usually too fragmented to yield complete NRPS BGC regions. We’ve added additional sentences earlier in the discussion: “Detection rates were also lower for fragmented genomes; unfortunately, this limitation (inherent to antiSMASH itself) may hinder the identification of metallophore biosynthesis in metagenomes. As long-read sequencing of metagenomes becomes more common, we expect that detection will improve.”

      (11) Lines 514-515:

      "Adequately-performing pHMMs for Asp and His β-hydroxylase subtypes could not be constructed using the above method."

      What is the overall impact of this discrepancy in the methodology for these specific groups?

      The phylogeny-based methodology was used to reduce false positives. We expect this method will have improved precision at the possible expense of recall.

      (12) Lines 543-545:

      "RefSeq representative bacterial genomes were dereplicated at the genus level using R, randomly selecting one genome for each of the 330 genera determined by GTDB"

      Isn't it more of a random sampling than a dereplication? Dereplication would involve methods such as ANI computation.

      You are correct; we have adjusted the language to clarify.

      (13) Lines 559-560: "were filtered to remove clusters on contig edges."

      This sentence is confusing because networks will be mentioned soon, and they also have edges (not the edges mentioned here), and they could also be clustered (not the clusters mentioned here). Is there a way to make the terminology clearer?

      Thank you, we have adjusted the text to read “BGC regions on contig boundaries”

      (14) Line 560:

      "The resulting 2,523 BGC regions, as well as 78 previously reported BGCs "

      How many were there before filtering?

      We have added the number: 3,264

      (15) Lines 579-580:

      Confusing terminology, as mentioned in Lines 559-560.

      Adjusted as above.

      General comments and questions:

      An objective suggestion to enrich the discussion is to address the role of bacterial extracellular vesicles (EVs) as metallophore carriers. Studies show that EVs, such as outer membrane vesicles, can transport siderophores or other metallophores for iron acquisition in various bacteria, functioning as "public goods" for community-wide nutrient sharing. Highlighting this mechanism would add ecological and functional context to the manuscript. In the future, EV-associated metallophore transport could also be considered for integration into computational detection tools.

      We thank the reviewer for the suggestion; however, we do not think that such a discussion is needed. We briefly discuss the ecological function of metallophores as public goods (and public bads) in the first paragraph of the introduction. We did not find any reports that EV-associated genes co-localize with metallophore BGCs, which would be required for their presence to be a useful marker of metallophore production.

      Is there a feasible path to more generalizable detection of chelating motifs using chemistry-aware features? For example, a machine learning classifier trained on submolecular descriptors (e.g., functional groups, coordination motifs, SMARTS patterns, graph fingerprints, metal-binding propensity scores) could complement the current genome-based approach and broaden coverage beyond known metallophore families. While the discussion mentions future extensions centered on genomic features, integrating chemical information from predicted or known products (or biosynthetic logic inferred from BGC composition) could be explored. A hybrid framework-linking BGC-derived features with chemistry-derived features-may improve both recall for novel metallophore classes and precision in distinguishing true chelators from confounders, thereby increasing overall accuracy.

      We can envision a classifier that uses submolecular descriptors to predict the ability of a molecule to bind metal ions. However, starting with a BGC and accurately predicting the structure of a hitherto unknown chelating moiety will likely prove difficult.  We have added a sentence to the discussion stating that a future tool could use accessory genes to more completely predict chemical structure.

      Although the initial analysis was conducted using RefSeq genomes, what are the anticipated challenges and limitations when scaling this method for BGC prospecting in metagenome-assembled genomes (MAGs), particularly considering the inherent quality differences, assembly fragmentation, and taxonomic uncertainties that characterize MAG datasets compared to curated reference genomes?

      Please see our response to comment 10, line 457. Our pHMM-based approach is designed to be robust to organism taxonomy; however, fragmentation is a significant barrier to accurate antiSMASH-based BGC detection (including in contig-level single-isolate genomes, see Table 1).

      Reviewer #2 (Recommendations for the authors):

      (1) In the "Chemical identification of genome-predicted siderophores across taxa" section, it would be helpful to annotate the cross-species similarities between predicted metallophore BGCs and their reference clusters (Ref BGCs). As currently described, the main text seems to highlight the cross-species resolving power of BiG-SCAPE itself rather than demonstrating the taxonomic generalizability of the chelator HMM-based detection module.

      Thank you for this comment. We intended to display that the new rule is useful for detecting BGCs in unexplored taxa, but we acknowledge that there is not a great diversity in the strains we selected. We have removed “across taxa” to avoid misleading the reader and clarify our intent.

      (2) In addition to using eMPRess for gene-species reconciliation, it may be beneficial to explore or at least reference alternative reconciliation tools to validate the inferred duplication, transfer, and loss (DTL) scenarios. Incorporating such cross-method comparisons would enhance the robustness and credibility of the evolutionary conclusions.

      We appreciate this valuable suggestion. To validate the robustness of our reconciliation-based inferences, we additionally analyzed two gene families using the likelihood-based tool AleRax, which implements a probabilistic DTL model. The results were consistent with the eMPRess parsimony-based reconstructions, showing comparable patterns of rare duplication, moderate gene loss, and extensive horizontal transfer. Both methods identified similar lineages as the most probable origin and major recipients of transfer events. This agreement between independent reconciliation frameworks supports the reliability of our evolutionary conclusions. We have added a brief statement referencing this cross-method validation in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) Legionella effectors are often activated by binding to eukaryote-specific host factors, including actin. The authors should test the following: a) whether Lfat1 can fatty acylate small G-proteins in vitro; b) whether this activity is dependent on actin binding; and c) whether expression of the Y240A mutant in mammalian cells affects the fatty acylation of Rac3 (Figure 6B), or other small G-proteins.

      We were not able to express and purify the full-length recombinant Lfat1 to perform fatty acylation of small GTPases in vitro. However, In cellulo overexpression of the Y240A mutant still retained ability to fatty acylate Rac3 and another small GTPase RheB (see Figure 6-figure supplement 2). We postulate that under infection conditions, actin-binding might be required to fatty acylate certain GTPases due to the small amount of effector proteins that secreted into the host cell.

      (2) It should be demonstrated that lysine residues on small G-proteins are indeed targeted by Lfat1. Ideally, the functional consequences of these modifications should also be investigated. For example, does fatty acylation of G-proteins affect GTPase activity or binding to downstream effectors?

      We have mutated K178 on RheB and showed that this mutation abolished its fatty acylation by Lfat1 (see Author response image 1 below). We were not able to test if fatty acylation by Lfat1 affect downstream effector binding.

      Author response image 1.

      (3) Line 138: Can the authors clarify whether the Lfat1 ABD induces bundling of F-actin filaments or promotes actin oligomerization? Does the Lfat1 ABD form multimers that bring multiple filaments together? If Lfat1 induces actin oligomerization, this effect should be experimentally tested and reported. Additionally, the impact of Lfat1 binding on actin filament stability should be assessed. This is particularly important given the proposed use of the ABD as an actin probe.

      The ABD domain does not form oligomer as evidenced by gel filtration profile of the ABD domain. However, we do see F-actin bundling in our in vitro -F-actin polymerization experiment when both actin and ABD are in high concentration (data not shown). Under low concentration of ABD, there is not aggregation/bundling effect of F-actin.

      (4) Line 180: I think it's too premature to refer to the interaction as having "high specificity and affinity." We really don't know what else it's binding to.

      We have revised the text and reworded the sentence by removing "high specificity and affinity."

      (5) The authors should reconsider the color scheme used in the structural figures, particularly in Figures 2D and S4.

      Not sure the comments on the color scheme of the structure figures.

      (6) In Figure 3E, the WT curve fits the data poorly, possibly because the actin concentration exceeds the Kd of the interaction. It might fit better to a quadratic.

      We have performed quadratic fitting and replaced Figure 3E.

      (7) The authors propose that the individual helices of the Lfat1 ABD could be expressed on separate proteins and used to target multi-component biological complexes to F-actin by genetically fusing each component to a split alpha-helix. This is an intriguing idea, but it should be tested as a proof of concept to support its feasibility and potential utility.

      It is a good suggestion. We plan to thoroughly test the feasibility of this idea as one of our future directions.

      (8) The plot in Figure S2D appears cropped on the X-axis or was generated from a ~2× binned map rather than the deposited one (pixel size ~0.83 Å, plot suggests ~1.6 Å). The reported pixel size is inconsistent between the Methods and Table 1-please clarify whether 0.83 Å refers to super-resolution.

      Yes, 0.83 Å is super-resolution.  We have updated in the cryoEM table

      Reviewer #2:

      Weaknesses:

      (1) The authors should use biochemical reactions to analyze the KFAT of Llfat1 on one or two small GTPases shown to be modified by this effector in cellulo. Such reactions may allow them to determine the role of actin binding in its biochemical activity. This notion is particularly relevant in light of recent studies that actin is a co-factor for the activity of LnaB and Ceg14 (PMID: 39009586; PMID: 38776962; PMID: 40394005). In addition, the study should be discussed in the context of these recent findings on the role of actin in the activity of L. pneumophila effectors.

      We have new data showed that Actin binding does not affect Lfat1 enzymatic activity. (see response to Reviewer #1). We have added this new data as Figure S7 to the paper. Accordingly, we also revised the discussion by adding the following paragraph.

      “The discovery of Lfat1 as an F-actin–binding lysine fatty acyl transferase raised the intriguing question of whether its enzymatic activity depends on F-actin binding. Recent studies have shown that other Legionella effectors, such as LnaB and Ceg14, use actin as a co-factor to regulate their activities. For instance, LnaB binds monomeric G-actin to enhance its phosphoryl-AMPylase activity toward phosphorylated residues, resulting in unique ADPylation modifications in host proteins  (Fu et al, 2024; Wang et al, 2024). Similarly, Ceg14 is activated by host actin to convert ATP and dATP into adenosine and deoxyadenosine monophosphate, thereby modulating ATP levels in L. pneumophila–infected cells (He et al, 2025). However, this does not appear to be the case for Lfat1. We found that Lfat1 mutants defective in F-actin binding retained the ability to modify host small GTPases when expressed in cells (Figure S7). These findings suggest that, rather than serving as a co-factor, F-actin may serve to localize Lfat1 via its actin-binding domain (ABD), thereby confining its activity to regions enriched in F-actin and enabling spatial specificity in the modification of host targets.”

      (2) The development of the ABD domain of Llfat1 as an F-actin domain is a nice extension of the biochemical and structural experiments. The authors need to compare the new probe to those currently commonly used ones, such as Lifeact, in labeling of the actin cytoskeleton structure.

      We fully agree with the reviewer’s insightful suggestion. However, a direct comparison of the Lfat1 ABD domain with commonly used actin probes such as Lifeact, as well as evaluation of the split α-helix probe (as suggested by Reviewer #1), would require extensive and technically demanding experiments. These are important directions that we plan to pursue in future studies.

      For all other minors, we have made corrections/changes in our revised text and figures.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Yamamoto et al. presents a model by which the four main axes of the limb are required for limb regeneration to occur in the axolotl. A longstanding question in regeneration biology is how existing positional information is used to regenerate the correct missing elements. The limb provides an accessible experimental system by which to study the involvement of the anteroposterior, dorsoventral, and proximodistal axes in the regenerating limb. Extensive experimentation has been performed in this area using grafting experiments. Yamamoto et al. use the accessory limb model and some molecular tools to address this question. There are some interesting observations in the study. In particular, one strength the potent induction of accessory limbs in the dorsal axis with BMP2+Fgf2+Fgf8 is very interesting. Although interesting, the study makes bold claims about determining the molecular basis of DV positional cues, but the experimental evidence is not definitive and does not take into account the previous work on DV patterning in the amniote limb. Also, testing the hypothesis on blastemas after limb amputation would be needed to support the strong claims in the study.

      Strengths:

      The manuscript presents some novel new phenotypes generated in axolotl limbs due to Wnt signaling. This is generally the first example in which Wnt signaling has provided a gain of function in the axolotl limb model. They also present a potent way of inducing limb patterning in the dorsal axis by the addition of just beads loaded with Bmp2+Fgf8+Fgf2.

      Comments on revised version:

      Re-evaluation: The authors have significantly improved the manuscript and their conclusions reflect the current state of knowledge in DV patterning of tetrapod limbs. My only point of consideration is their claim of mesenchymal and epithelial expression of Wnt10b and the finding that Fgf2 and Wnt10b are lowly expressed. It is based upon the failed ISH, but this doesn't mean they aren't expressed. In interpreting the Li et al. scRNAseq dataset, conclusions depend heavily on how one analyzes and interprets it. The 7DPA sample shows a very low representation of epithelial cells compared to other time points, but this is likely a technical issue. Even the epithelial marker, Krt17, and the CT/fibroblast marker show some expression elsewhere. If other time points are included in the analysis, Wnt10b, would be interpreted as relatively highly expressed almost exclusively in the epithelium. By selecting the 7dpa timepoint, which may or may not represent the MB stage as it wasn't shown in the paper, the conclusions may be based upon incomplete data. I don't expect the authors to do more work, but it is worth mentioning this possibility. The authors have considered and made efforts to resolve previous concerns.

      We are grateful for the constructive comments. As Reviewer #1 suggested, we noted that clearer expression patterns of Wnt10b and Fgf2 may be detectable in scRNA-seq analyses at other stages, and we also clarified that low-level signals of epithelial and CT/fibroblast markers outside their expected clusters may reflect technical bias in the Discussion section. In addition, we agree with the reviewer’s point that our unsuccessful ISH experiments and the low abundance detected by RT-qPCR do not demonstrate absence of expression, and that conclusions from reanalyzing the Li et al. scRNA-seq dataset can depend strongly on analytical choices; therefore, while we focused on the 7 dpa sample because our RT-qPCR data suggested that Wnt10b and Fgf2 may be most enriched around the MB stage (the original study refers to 7 dpa as MB), we explicitly acknowledged that analyzing a single time point—especially one with a low representation of epithelial cells—may yield incomplete or stage-biased interpretations, and that inclusion of additional datasets could reveal clearer and potentially different expression patterns in the Discussion section. We also tempered our wording regarding the inferred cellular sources to avoid over-interpretation based on the current data in the Results section.

      Reviewer #2 (Public review):

      Summary:

      This study explores how signals from all sides of a developing limb, front/back and top/bottom, work together to guide the regrowth of a fully patterned limb in axolotls, a type of salamander known for its impressive ability to regenerate limbs. Using a model called the Accessory Limb Model (ALM), the researchers created early staged limb regenerates (called blastemas) with cells from different sides of the limb. They discovered that successful limb regrowth only happens when the blastema contains cells from both the top (dorsal) and bottom (ventral) of the limb. They also found that a key gene involved in front/back limb patterning, called Shh (Sonic hedgehog), is only turned on when cells from both the dorsal and ventral sides come into contact. The study identified two important molecules, Wnt10B and FGF2, that help activate Shh when dorsal and ventral cells interact. Finally, the authors propose a new model that explains how cells from all four sides of a limb, dorsal, ventral, anterior (front), and posterior (back), contribute at both the cellular and molecular level to rebuilding a properly structured limb during regeneration.

      Strengths:

      The techniques used in this study, like delicate surgeries, tissue grafting, and implanting tiny beads soaked with growth factors, are extremely difficult, and only a few research groups in the world can do them successfully. These methods are essential for answering important questions about how animals like axolotls regenerate limbs with the correct structure and orientation. To understand how cells from different sides of the limb communicate during regeneration, the researchers used a technique called in situ hybridization, which lets them see where specific genes are active in the developing limb. They clearly showed that the gene Shh, which helps pattern the front and back of the limb, only turns on when cells from both the top (dorsal) and bottom (ventral) sides are present and interacting. The team also took a broad, unbiased approach to figure out which signaling molecules are unique to dorsal and ventral limb cells. They tested these molecules individually and discovered which could substitute for actual dorsal and ventral cells, providing the same necessary signals for proper limb development. Overall, this study makes a major contribution to our understanding of how complex signals guide limb regeneration, showing how different regions of the limb work together at both the cellular and molecular levels to rebuild a fully patterned structure.

      Weaknesses:

      Because the expressional analyses are performed on thin sections of regenerating tissue, in the original manuscript, they provided only a limited view of the gene expression patterns in their experiments, opening the possibility that they could be missing some expression in other regions of the blastema. Additionally, the quantification method of the expressional phenotypes in most of the experiments did not appear to be based on a rigorous methodology. The authors' inclusion of an alternate expression analysis, qRT-PCR, on the entire blastema helped validate that the authors are not missing something in the revised manuscript.

      Overall, the number of replicates per sample group in the original manuscript was quite low (sometimes as low as 3), which was especially risky with challenging techniques like the ones the authors employ. The authors have improved the rigor of the experiment in the revised manuscript by increasing the number of replicates. The authors have not performed a power analysis to calculate the number of animals used in each experiment that is sufficient to identify possible statistical differences between groups. However, the authors have indicated that there was not sufficient preliminary data to appropriately make these quantifications.

      Likewise, in the original manuscript, the authors used an AI-generated algorithm to quantify symmetry on the dorsal/ventral axis, and my concern was that this approach doesn't appear to account for possible biases due to tissue sectioning angles. They also seem to arbitrarily pick locations in each sample group to compare symmetry measurements. There are other methods, which include using specific muscle groups and nerve bundles as dorsal/ventral landmarks, that would more clearly show differences in symmetry. The authors have now sufficiently addressed this concern by including transverse sections of the limbs annd have explained the limitations of using a landmark-based approach in their quantification strategy.

      We are grateful for the careful evaluation of the technical rigor and quantification. We have benefited from the reviewer’s earlier feedback, which guided revisions that improved the manuscript’s rigor and presentation.

      Reviewer #3 (Public review):

      Summary:

      After salamander limb amputation, the cross-section of the stump has two major axes: anterior-posterior and dorsal-ventral. Cells from all axial positions (anterior, posterior, dorsal, ventral) are necessary for regeneration, yet the molecular basis for this requirement has remained unknown. To address this gap, Yamamoto et al. took advantage of the ALM assay, in which defined positional identities can be combined on demand and their effects assessed through the outgrowth of an ectopic limb. They propose a compelling model in which dorsal and ventral cells communicate by secreting Wnt10b and Fgf2 ligands respectively, with this interaction inducing Shh expression in posterior cells. Shh was previously shown to induce limb outgrowth in collaboration with anterior Fgf8 (PMID: 27120163). Thus, this study completes a concept in which four secreted signals from four axial positions interact for limb patterning. Notably, this work firmly places dorsal-ventral interactions upstream of anterior-posterior, which is striking for a field that has been focussed on anterior-posterior communication. The ligands identified (Wnt10b, Fgf2) are different to those implicated in dorsal-ventral patterning in the non-regenerative mouse and chick models. The strength of this study is in the context of ALM/ectopic limb engineering. Although the authors attempt to assay the expression of Wnt10b and Fgf2 during limb regeneration after amputation, they were unable to pinpoint the precise expression domains of these genes beyond 'dorsal' and 'ventral' blastema. Given that experimental perturbations were not performed in regenerating limbs - almost exclusively under ALM conditions - this author finds the title "Dorsoventral-mediated Shh induction is required for axolotl limb regeneration" a little misleading.

      Strengths:

      (1) The ALM and use of GFP grafts for lineage tracing (Figures 1-3) take full advantage of the salamander model's unique ability to outgrow patterned limbs under defined conditions. As far as I am aware, the ALM has not been combined with precise grafts that assay 2 axial positions at once, as performed in Figure 3. The number of ALMs performed in this study deserves special mention, considering the challenging surgery involved.

      (2) The authors identify that posterior Shh is not expressed unless both dorsal and ventral cells are present. This echoes previous work in mouse limb development models (AER/ectoderm-mesoderm interaction) but this link between axes was not known in salamanders. The authors elegantly reconstitute dorsal-ventral communication by grafting, finding that this is sufficient to trigger Shh expression (Figure 3 - although see also section on Weaknesses).

      (3) Impressively, the authors discovered two molecules sufficient to substitute dorsal or ventral cells through electroporation into dorsal- or ventral- depleted ALMs (Figure 5). These molecules did not change the positional identity of target cells. The same group previously identified the ventral factor (Fgf2) to be a nerve-derived factor essential for regeneration. In Figure 6, the authors demonstrate that nerve-derived factors, including Fgf2, are alone sufficient to grow out ectopic limbs from a dorsal wound. Limb induction with a 3-factor cocktail without supplementing with other cells is conceptually important for regenerative engineering.

      (4) The writing style and presentation of results is very clear.

      Overall appraisal:

      This is a logical and well-executed study that creatively uses the axolotl model to advance an important framework for understanding limb patterning. The relevance of the mechanisms to normal limb regeneration are not yet substantiated, in the opinion of this reviewer. Additionally, Wnt10b and Fgf2 should be considered molecules sufficient to substitute dorsal and ventral identity (solely in terms of inducing Shh expression). It is not yet clear whether these molecules are truly necessary (loss of function would address this).

      Comments on revisions:

      Congratulations - I still find this an elegant and easy-to-read study with significant implications for the field! Linking your mechanisms to normal limb regeneration (i.e. regenerating blastema, not ALM), as well as characterising the cell populations involved, will be interesting directions for the future.

      We are grateful for the constructive comments. To mitigate the concerns raised by Reviewer #3, we cited a previous study suggesting that ALM was used as the alternative experimental system for studying limb regeneration (Nacu et al., 2016, Nature, PMID: 27120163; Satoh et al., 2007, Developmental Biology, PMID: 17959163) in the Introduction section. We are confident that our ALM-based data provide a reasonable basis for understanding limb regeneration. We agree that there are important remaining questions—such as which cell populations express Wnt10b and Fgf2 and how endogenous WNT10B and FGF2 signals induce Shh expression in normal regeneration—which should be investigated in future studies to deepen our understanding of limb regeneration.


      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Comments:

      The authors should be commended for addressing this gap - how cues from the DV axis interact with the AP axis during limb regeneration. Overall, the concept presented in this manuscript is extremely interesting and could be of high value to the field. However, the manuscript in its current form is lacking a few important data and resolution to fully support their conclusions, and the following needs to be addressed before publication:

      (1) ISH data on Wnt10b and FGF2 from various regeneration time points are essential to derive the conclusion. Preferably multiplex ISH of Wnt10b/Fgf2/Shh or at least canonical ISH on serial sections to demonstrate their expression in dermis/epidermis and order of gene expression i.e. Shh is only expressed after expression of Wnt10b/FGF2. It would certainly help if this can also be shown in regular blastema.

      We are grateful for the constructive suggestion on assessing Wnt10b and Fgf2 expression during regular regeneration, and we agree that clarifying their expression patterns in regular blastemas is important for strengthening the conclusions of our study. Because we cannot currently ensure sufficient sensitivity with multiplex FISH in our laboratory—partly due to high background—, we conducted conventional ISH on serial sections of regular blastemas at several time points (Fig. S5A). However, the expression patterns of Wnt10b and Fgf2 were not clear. To complement the ISH results, we performed RT-qPCR on microdissected dorsal and ventral halves of regular blastemas at the MB stage (Fig. S5B). We found that Wnt10b and Fgf2 were expressed at significantly higher levels in the dorsal and ventral halves, respectively, compared to the opposite half. This dorsal/ventral biased expression of Wnt10b/Fgf2 is consistent with our RNA-seq data. We further quantified expression levels of Wnt10b, Fgf2, and Shh across stages (intact, EB, MB, LB, and ED) and found that Wnt10b and Fgf2 peaked at the MB stage, whereas Shh peaked at the LB stage—consistent with the editor’s request regarding the order of gene expression (Fig. S5C). This temporal offset in upregulation supports our model. These results are now included in the revised manuscript (Line 294‒306).

      To identify the cell types expressing Wnt10b or Fgf2, we analyzed published single-cell RNA-seq data (7 dpa blastema (MB), Li et al., 2021). As a result, Fgf2 expression was observed in the mesenchymal cluster, whereas Wnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. The apparent low abundance likely contributes to the weak ISH signals and reflects current technical limitations. In addition, Wnt10b and Fgf2 expression did not follow Lmx1b expression (Fig. S6J, K), and Wnt10b and Fgf2 themselves were not exclusive (Fig. S6L). These results are now included in the revised manuscript (Line 307‒321). Together with the RT-qPCR data (Fig. S5B), these results suggest that Wnt10b and Fgf2 are not exclusively confined to purely dorsal or ventral cells at the single-cell level, even though they show dorsoventral bias when assessed in bulk tissue. These results suggest that Wnt10b/Fgf2 expression is not restricted to dorsal/ventral cells but mediated by dorsal/ventral cells, and co-existence of both signals should provide a permissive environment for Shh induction. Defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will therefore be an important goal for future work.  

      (2) Validation of the absence of gene expression via qRT PCR in the given sample will increase the rigor, as suggested by reviewers.

      We thank for this important suggestion and agree that validation by qRT-PCR increases the rigor of our study. Accordingly, we performed RT-qPCR on AntBL, PostBL, DorBL, and VentBL to corroborate the ISH results. The results are now included in Fig. 2. We also verified by RT-qPCR that Shh expression following electroporation and the quantitative results are now provided in Fig. 5.

      (3) Please increase n for experiments where necessary and mention n values in the figures.

      We thank for this helpful comment and agree on the importance of providing sufficient sample sizes. Accordingly, we increased the n for the relevant experiments and have indicated the n values in the corresponding figure legends.

      (4) Most comments by all three reviewers are constructive and largely focus on improving the tone and language of the manuscript, and I expect that the authors should take care of them.

      We thank the reviewers for their constructive feedback on the tone and language of the manuscript. We have carefully revised the text according to each comment, and we hope these modifications have improved both clarity and readability.

      In addition, in revising the manuscript we also refined the conceptual framework. Our new analysis of Wnt10b and Fgf2 expression during normal regeneration suggests that these genes are not expressed in a strictly dorsal- or ventral-specific manner at the single-cell level. When these observations are considered together with (i) the RNA-seq comparison of dorsally and ventrally induced ALM blastemas, (ii) RT-qPCR of microdissected dorsal and ventral halves of regenerating blastemas, and (iii) the functional electroporation experiments, our interpretation is that Wnt10b and Fgf2 act as dorsal- and ventral-mediated signals, respectively: their production is regulated by dorsal or ventral cells, and the presence of both signals is required to induce Shh expression. Given those, we now think our conclusion might be explained without using the confusing term, “positional cue”. Because the distinction between “positional cue” and “positional information” could be confusing as noted by the reviewers, we rewrote our manuscript without using “positional cue.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 61: More explanation for what a double-half limb means is needed.

      We thank the reviewer for this suggestion. We have revised the manuscript (Line 73‒76). Specifically, we now explain that a double-dorsal limb, for example, is a chimeric limb generated by excising the ventral half and replacing it with a dorsal half from the contralateral limb while preserving the anteroposterior orientation.

      (2) Line 63-65: "Such blastemas form hypomorphic, spike-like structures or fail to regenerate entirely." This statement does not represent the breadth of work on the APDV axis in limb regeneration. The cited Bryant 1976 reference tested only double-posterior and double-anterior newt limbs, demonstrating the importance of disposition along the AP axis, not DV. Others have shown that the regeneration of double-half limbs depends upon the age of the animal and the length of time between the grafting of double-half limbs and amputation. Also, some double-dorsal or double-ventral limbs will regenerate complete AP axes with symmetrical DV duplications (Burton, Holder, and Jesani, 1986). Also, sometimes half dorsal stylopods regenerate half dorsal and half ventral, or regenerate only half ventral, suggesting there are no inductive cues across the DV axis as there are along the AP axis. Considering this is the basis of the study under question, more is needed to convince that the DV axis is necessary for the generation of the AP axis.

      We thank the reviewer for this detailed and constructive comment. We acknowledge that previous studies have reported a range of outcomes for double-half limbs. For example, Burton et al. (1986) described regeneration defects in double-dorsal (DD) and double-ventral (VV) limbs, although limb patterning did occur in some cases (Burton et al., 1986, Table 1). As the reviewer notes, regenerative outcomes depend on variables such as animal age and the interval between construction of the double-half limb and amputation, sometimes called the effect of healing time (Tank and Holder, 1978). Moreover, variability has been reported not only in DD/VV limbs but also in double-anterior (AA) and double-posterior (PP) limbs (e.g., Bryant, 1976; Bryant and Baca, 1978; Burton et al., 1986). In the revised manuscript, we have therefore modified the statement to avoid over-generalization and to emphasize that regeneration can be incomplete under these conditions (Line 76‒82). Importantly, in order to provide the additional evidence requested and to directly re-evaluate whether dorsal and ventral cells are required for limb patterning, we performed the ALM experiments shown in Fig. 1. The ALM system allows us to assess this question in a binary manner (regeneration vs. non-regeneration), thereby strengthening the rationale for our conclusions regarding the necessity of the APDV orientations. We also revised a sentence at the beginning of the Results section to emphasize this point (Line 139‒140).

      (3) Line 71: These findings suggest that specific signals from all four positional domains must be integrated for successful limb patterning, such that the absence of any one of them leads to failure." I was under the impression that half posterior limbs can grow all elements, but half anterior can only grow anterior elements.

      We thank the reviewer for this helpful clarification. As summarized by Stocum, half-limb experiments show that while some digit formation can occur, limb patterning remains incomplete in both anterior-half and posterior-half limbs in some cases (Stocum, 2017). We see this point as closely related to the broader question of whether proper limb patterning requires the integration of signals from all four positional domains. As noted in our response above, our ALM experiments in Fig. 1 were designed to test this point directly, and our data support the interpretation that cells from all four orientations are necessary for correct limb patterning.

      (4) Line 79-81: This is stated later in lines 98-105. I suggest expanding here or removing it here.

      We thank the reviewer for this suggestion. In the original version, lines 79–81 introduced our use of the terms “positional cue” and “positional information,” and this content partially overlapped with what later appeared in lines 98–105. In the revised manuscript, we have substantially rewritten this section (Line 82‒84), including the sentences corresponding to lines 79–81 in the original version, to remove the term “positional cue,” as explained in our response to the Editor’s comment (4); our revision reflects new analyses indicating that Wnt10b and Fgf2 appear not be strictly restricted to dorsal or ventral cell populations, and we now describe these factors as dorsal- or ventral-mediated signals that act across dorsoventral domains to induce Shh expression. Accordingly, we no longer maintain the original use of “positional cue” and “positional information.”

      (5) Line 92 - 93: "Similarly, an ALM blastema can be induced in a position-specific manner along the limb axes. In this case, the induced ALM blastema will lack cells from the opposite side." This sentence is difficult to follow. Isn't it the same thing stated in lines 88-90?

      We thank the reviewer for this comment. We revised the sentence to improve readability and to avoid redundancy with original Lines 88–90 (Line 104‒106).

      (6) Line 107: I think the appropriate reference is McCusker et al., 2014 (Position-specific induction of ectopic limbs in non-regenerating blastemas on axolotl forelimbs), although Vieira et al., 2019 can be included here. In addition, Ludolph et al 1990 should be cited.

      We thank the reviewer for this suggestion. We have added McCusker et al. (2014) and Ludolph et al. (1990) as references in the revised manuscript (Line 120‒121).

      (7) Line 107-109: A missing point is how the ventral information is established in the amniote limb. From what I remember, it is the expression of Engrailed 1, which inhibits the ventral expression of Wnt7a, and hence Lmx1b. This would suggest that there is no secreted ventral cue. This is a relatively large omission in the manuscript.

      We thank the reviewer for this comment. We agree that ventral fate in amniotes is specified by En1 in the ventral ectoderm, which represses Wnt7a and thereby prevents induction of Lmx1b; accordingly, a secreted ventral morphogen analogous to dorsal Wnt7a has not been established. We added this point to the revised Introduction (Line 61‒64).

      By contrast, in axolotl limb regeneration, our previous work on Lmx1b expression suggests that DV identities reflect the original positional identity rather than being re-specified during regeneration (Yamamoto et al., 2022). Within this framework, our original use of the term “ventral positional cue” does not imply a ventral patterning morphogen in the amniote sense; rather, it denotes downstream signals induced by cells bearing ventral identity that are required for the blastema to form a patterned limb. This interpretation is consistent with classic studies on double-half chimeras and ectopic contacts between opposite regions (Iten & Bryant, 1975; Bryant & Iten, 1976; Maden, 1980; Stocum, 1982) as well as with our ALM data (Fig. 1). For this reason, we intentionally used the term “positional cues” to refer to signals provided by cells bearing ventral identity, which can be considered separable from the DV patterning mechanism itself, in the original text. As explained in our response to the Editor’s comment (4), we describe these signals as “signals mediated by dorsal/ventral cells,” rather than “positional cues” in the revised manuscript.

      The necessity of dorsal- and ventral-mediated signals is supported by classic studies on the double-half experiment. In the non-regenerating cases, structural patterns along the anteroposterior axis appear to be lost even though both anterior and posterior cells should, in principle, be present in a blastema induced from a double-dorsal or double-ventral limbs. In limb development of amniotes, Wnt7a/Lmx1b or En-1 mutants show that limbs can exhibit anteroposterior patterning even when tissues are dorsalized or ventralized—that is, in the relative absence of ventral or dorsal cells, respectively (Riddle et al., 1995; Chen et al., 1998; Loomis et al., 1996). Taken together, axolotl limb regeneration, in which the presence of both dorsal and ventral cells plays a role in anteroposterior patterning, should differ from other model organisms. It is reasonable to predict the dorsal- and ventral-mediated signals in axolotl limb regeneration. We included this point in the revised manuscript (Line 82‒89). However, there is no evidence that these signals are secreted molecules. For this reason, we have carefully used the term “dorsal-/ventral-mediated signals” in the Introduction without implying secretion.

      (8) Introduction - In general, the argument is a bit misleading. It is written as if it is known that a ventral cue is necessary, but the evidence from other animal models is lacking, from what I know. I may be wrong, but further argument would strengthen the reasoning for the study.

      We thank the reviewer for this thoughtful comment. We agree that it should not read as if it is known that a ventral cue is necessary. In the revised Introduction, we have addressed this in several ways. First, as described in our response to comment (7), we now explicitly note that in amniote limb development ventral identity is specified by En1-mediated repression of Wnt7a, and that a secreted ventral morphogen equivalent to dorsal Wnt7a has not been established. Second, we removed the term “positional cue” and no longer present “ventral positional cue” as a defined entity. Instead, we use mechanistic phrasing such as “signals mediated by ventral cells” and “signals mediated by dorsal cells,” which does not assume that such signals are secreted morphogens or universally conserved. Third, we have reframed the role of dorsal- and ventral-mediated signals as a working hypothesis specific to axolotl limb regeneration, rather than as a general conclusion across model systems.

      (9) Line 129: Remove "As mentioned before".

      We thank the reviewer for this suggestion. We have removed the phrase “As mentioned before” in the revised manuscript (Line 143).

      (10) Figure 1: Are Lmx1, Fgf8, and Shh mutually exclusive? Multiplexed FISH would provide this information, and is a relatively important question considering the strong claims in the study.

      We thank the reviewer for raising this important point. As noted in our response to the editor’s comment, we cannot currently ensure sufficiently high detection sensitivity with multiplex FISH in our laboratory. However, based on previous reports (Nacu et al., 2016), Fgf8 and Shh should be mutually exclusive. In contrast, with respect to Lmx1b, our analysis suggests that its expression is not mutually exclusive with either Fgf8 or Shh, at least their expression domains. To confirm this, we analyzed the published scRNA-seq data and the results were added to the supplemental figure 6. Fgf8 and Shh were expressed in both Lmx1b-positive and Lmx1b-negative cells (Fig. S6H, I), but Fgf8 and Shh themselves were mutually exclusive (Fig. S6M). This point is now included in the revised manuscript (Line 314‒317).

      (11) Results section and Figure 2: More evidence is needed for the lack of Shh expression ISH in tissue sections. Demonstrating the absence of something needs some qPCR or other validation to make such a claim.

      We thank the reviewer for this suggestion. We performed qRT-PCR on ALM blastemas to complement the ISH data (Fig. 2).

      (12) Line 179: I think they are likely leucistic d/d animals and not wild-type animals based upon the images.

      We thank the reviewer for this observation. In the revised manuscript, we have corrected the description to “leucistic animals” (Line 194).

      (13) Line 183-186: I'm a bit confused about this interpretation. If Shh turns on in just a posterior blastema, wouldn't it turn on in a grafted posterior tissue into a dorsal or ventral region? Isn't this independent of environment, meaning Shh turns on if the cells are posterior, regardless of environment?

      Our interpretation is that only posterior-derived cells possess the competency to express Shh. In other words, whether a cell is capable of expressing Shh depends on its original positional identity (Iwata et al., 2020), but whether it actually expresses Shh depends on the environment in which the cell is placed. The results of Fig. 3E and G indicate that Shh activation is dependent on environment and that the posterior identity is not sufficient to activate Shh expression. We have revised the manuscript to emphasize this distinction more clearly (Line 198‒203).

      (14) Figure 4: Do the limbs have an elbow, or is it just a hand?

      We thank the reviewer for this thoughtful question. From the appearance, an elbow-like structure can occasionally be seen; however, we did not examine the skeletal pattern in detail because all regenerated limbs used for this analysis were sectioned for the purpose of symmetry evaluation, and we therefore cannot state this conclusively. While this is indeed an important point, analyzing proximodistal patterning would require a very large number of additional experiments, which falls outside the main focus of the present study. For this reason, and also to minimize animal use in accordance with ethical considerations, we did not pursue further experiments here. In response to this point, we have added a description of the skeletal morphology of ectopic limbs induced by BMP2+FGF2+FGF8 bead implantation (Fig. 6). In these experiments, multiple ectopic limbs were induced along the same host limb. In most cases, these ectopic limbs did not show fusion with the proximal host skeleton, similar to standard ALM-induced limbs, although in one case we observed fusion at the stylopod level. We now note this observation in the revised manuscript (Line 347‒354).

      We regard the relationship between APDV positional information and proximodistal patterning as an important subject for future investigation.

      (15) Line 203 - 237: I appreciate the symmetry score to estimate the DV axis. Are there landmarks that would better suggest a double-dorsal or double-ventral phenotype, like was done in the original double-half limb papers?

      We thank the reviewer for this thoughtful comment. In most cases, the limbs induced by the ALM exhibit abnormal and highly variable morphologies compared to normal limbs, making it difficult to apply consistent morphological landmarks as used in the original double-half limb studies. For this reason, we focused our analysis on “morphological symmetry” as a quantitative measure of DV axis patterning, and we have added this explanation to the manuscript (Line 232‒235). Additionally, we provided transverse sections along the proximodistal axis as supplemental figures (Figs. S2 and S4). In addition to reporting the symmetry score, we have explicitly stated in the text that symmetry was also assessed by visual inspection of these sections.

      (16) Line 245-247: The experiment was done using bulk sequencing, so both the epithelium and mesenchyme were included in the sample. The posterior (Shh) and anterior (Fgf8) patterning cues are mesenchymally expressed. In amniotes, the dorsal cue has been thought to be Wnt7a from the epithelium. Can ISH, FISH, or previous scRNAseq data be used to identify genes expressed in the mesenchyme versus epithelium? This is very important if the authors want to make the claim for defining "The molecular basis of the dorsal and ventral positional cues" as was stated by the authors.

      We thank the reviewer for highlighting this important point. As the reviewer notes, our bulk RNA-seq data do not distinguish between epithelial and mesenchymal expression domains. As noted in our response to the editor’s comment, we performed ISH and qPCR on regular blastemas. However, these approaches did not provide definitive information regarding the specific cell types expressing Wnt10b and Fgf2. To complement this, we re-analyzed publicly available single-cell RNA-seq data (from Li et al., 2021). As a results, Fgf2 was expressed mainly by the mesenchymal cells, and Wnt10b expression was observed in both mesenchymal and epithelial cells. These results are now included in the revised manuscript (Line 294‒321) and in supplemental figures (Fig. S6, S7).

      (17) Was engrailed 1, lmx1b, or Wnt7a differentially expressed along the DV axis, suggesting similar signaling between? Are these expressed in mesenchyme? Previous work suggests Wnt7a is expressed throughout the mesenchyme, but publicly available scRNAseq suggests that it is expressed in the epithelium.

      We thank the reviewer for this important comment. As noted, the reported expression patterns of DV-related genes are not consistent across studies, which likely reflects the technical difficulty of detecting these genes with high sensitivity. In our own experiments, expression of DV markers other than Lmx1b has been very weak or unclear by ISH. Whether these genes are expressed in the epithelium or mesenchyme also appears to vary depending on the detection method used. In our RNA-seq dataset, Wnt7a expression was detected at very low levels and showed no significant difference along the DV axis, while En1 expression was nearly absent. We have clarified these results in the revised manuscript (Line 437‒441). Our reanalysis of the published scRNA-seq likewise detected Wnt7a in only a very small fraction of cells. Accordingly, we consider it premature to reach a definitive conclusion—such as whether Wnt7a is broadly mesenchymal or restricted to epithelium—as suggested in prior reports. We also note that whether Wnt7a is epithelial or mesenchymal does not affect the conclusions or arguments of the present study. Although the roles of Wnt7a and En1 in axolotl DV patterning are certainly important, we feel that drawing a definitive conclusion on this issue lies beyond the scope of the present study, and we have therefore limited our description to a straightforward presentation of the data.

      (18) Line 247-249: The sentence suggests that all the ligands were tried. This should be included in the supplemental data.

      We thank the reviewer for this clarification. In fact, we tested only Wnt4, Wnt10b, Fgf2, Fgf7, and Tgfb2, and all of these results are presented in the figures. To avoid misunderstanding, we have revised the text to explicitly state that our analysis focused on these five genes (Line 272‒274).

      (19) Line 249: An n =3 seems low and qPCR would be a more sensitive means of measuring gene induction compared to ISH. The ISH would confirm the qPCR results. Figure 5C is also not the most convincing image of Shh induction without support from a secondary method.

      We have increased the sample size for these experiments (Line 277‒280). In addition, to complement the ISH results, we confirmed Shh induction by qPCR following electroporation of Wnt10b and Fgf2 (Fig. 5D, E). In addition, because Shh signal in the Wnt10b-electroporated VentBL images was particularly weak and difficult to discern, we replaced that panel with a representative example in which Shh signal is more clearly visible. These data are now included in the revised manuscript (Line 280‒282).

      (20) Line 253: It is confusing why Wnt10b, but not Wnt4 would work? As far as I know, both are canonical Wnt ligands. Was Wnt7a identified as expressed in the RNAseq, but not dorsally localized? Would electroporation of Wnt7a do the same thing as Wnt10b and hence have the same dorsalizing patterning mechanisms as amniotes?

      We thank the reviewer for raising this challenging but important question. Wnt10b was identified directly from our bulk RNA-seq analysis, as was Wnt4. The difference in the ability of Wnt10b and Wnt4 to induce Shh expression in VentBL may reflect differences in how these ligands activate downstream WNT signaling programs. WNT10B is a potent activator of the canonical WNT/β-catenin pathway (Bennett et al., 2005), although WNT10B has also been reported to trigger a β-catenin–independent pathway (Lin et al., 2021). By contrast, WNT4 can signal through both canonical and non-canonical (β-catenin–independent) pathways, and the balance between these outputs is known to depend on cellular context (Li et al., 2013; Li et al., 2019). Consistent with a requirement for canonical WNT signaling, we found that pharmacological activation of canonical WNT signaling with BIO (a GSK3 inhibitor) was also sufficient to induce Shh expression in VentBL. However, despite this, it is still unclear why Wnt10b, but not Wnt4, was able to induce Shh under our experimental conditions. One possible explanation is that different WNT ligands can engage the same receptors (e.g., Frizzled/LRP6) yet can drive distinct downstream transcriptional programs (This may depend on the state of the responding cells, as Voss et al. predicted), resulting in ligand-specific outputs (Voss et al., 2025). This point is now included in the revised discussion section (Line 402‒412). At present, we cannot distinguish between these possibilities experimentally, and we therefore refrain from making a stronger mechanistic claim.

      With respect to Wnt7a, we detected Wnt7a expression at very low levels, and without a clear dorsoventral bias, in our RNA-seq analysis of ALM blastemas (we describe this point in Line 437‒440). This is consistent with previous work suggesting that axolotl Wnt7a is not restricted to the dorsal region in regeneration. Because of this low and unbiased expression, and because our data already implicated Wnt10b as a dorsal-mediated signal that can act across dorsoventral domains to permit Shh induction, we did not prioritize Wnt7a electroporation in the present study. We therefore cannot conclude whether Wnt7a would behave similarly to Wnt10b in this context.

      Importantly, these uncertainties about ligand-specific mechanisms do not alter our main conclusion. Our data support the idea that a dorsal-mediated WNT signal (represented here by WNT10B and canonical WNT activation) and a ventral-mediated FGF signal (FGF2) must act together to permit Shh induction, and that the coexistence of these dorsal- and ventral-mediated signals is required for patterned limb formation in axolotl limb regeneration.

      (21) Is canonical Wnt signaling induced after electroporation of Wnt10b or Wnt4? qPCR of Lef1 and axin is the most common way of showing this.

      We thank the reviewer for this helpful suggestion. In addition to examining Shh expression, we also assessed canonical WNT signaling by qPCR analysis of Axin2 and Lef1 following Wnt10b electroporation. The data is now included in Fig. 5.

      (22) Line 255-256: qPCR was presented for Figure 5D, but ISH was used for everything else. Is there a technical reason that just qPCR was used for the bead experiments?

      We thank the reviewer for this helpful comment. In the original submission, our goal was to test whether treatment with commercial FGF2 protein or BIO could reproduce the results obtained by electroporation. In the revised manuscript, to avoid confusion between distinct experimental aims, we removed the FGF2–bead data from this section and instead used RT-qPCR to quantitatively corroborate Shh induction after electroporation (Fig. 5D–E). RT-qPCR provided a sensitive, whole-blastema readout and allowed a paired design (left limb: factor; right limb: GFP control) that increased statistical power while minimizing animal use. To address the reviewer’s point more directly, we additionally performed ISH for the BIO treatment and now include those results in Supplementary Figure 3 (Line 287‒288).

      (23) Line 261-263: The authors did not show where Wnt10B or Fgf2 is expressed in the limb as claimed. The RNAseq was bulk, so ISH of these genes is needed to make this claim. Where are Wnt10b and Fgf2 expressed in the amputated limb? Do they show a dorsal (Wnt10b) and ventral (Fgf2) expression pattern?

      We thank the reviewer for raising this important point. As noted in our response to the editor’s comment, we performed ISH on serial sections of regular blastemas at several time points (Fig. S5A). However, the expression patterns of Wnt10b and Fgf2 along the dorsoventral axis were not clear. To complement the ISH results, we performed RT-qPCR on microdissected dorsal and ventral halves of regular blastemas at the MB stage (Fig. S5B). We found that Wnt10b and Fgf2 were expressed at significantly higher levels in the dorsal and ventral halves, respectively, compared to the opposite half. This dorsal/ventral biased expression of Wnt10b/Fgf2 is consistent with our RNA-seq data. To identify the cell types expressing Wnt10b or Fgf2, we analyzed published single-cell RNA-seq data (7 dpa blastema (MB), Li et al., 2021). As a result, Fgf2 expression was observed in the mesenchymal cluster, whereas Wnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. The apparent low abundance likely contributes to the weak ISH signals and reflects current technical limitations. In addition, Wnt10b and Fgf2 expression did not follow Lmx1b expression (Fig. S6J, K), and Wnt10b and Fgf2 themselves were not exclusive (Fig. S6L). Together with the RT-qPCR data (Fig. S5B), these results suggest that Wnt10b and Fgf2 are not exclusively confined to purely dorsal or ventral cells at the single-cell level, even though they show dorsoventral bias when assessed in bulk tissue, suggesting that Wnt10b/Fgf2 expression is not dorsal-/ventral-specific but mediated by dorsal/ventral cells. Defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will therefore be an important goal for future work. These points are now included in the revised manuscript (Line 485‒501).

      (24) Line 266-288: The formation of multiple limbs is impressive. Do these new limbs correspond to the PD location they are generated?

      We thank the reviewer for this interesting question. Interestingly, from our observations, there does appear to be a tendency for the induced limbs to vary in length depending on their PD location. The skeletal patterns of the induced multiple limbs are now included in Fig. 6. However, as noted earlier, the supernumerary limbs exhibit highly variable morphologies, and a rigorous analysis of PD correlation would require a large number of induced limbs. Since this lies outside the main focus of the present study, we have not pursued this point further in the manuscript.

      (25) Line 288: The minimal requirement for claiming the molecular basis for DV signaling was identified is to ISH or multiplexed FISH for Wnt10b and Fgf2 in amputated limb blastemas to show they are expressed in the mesenchyme or epithelium and are dorsally and ventrally expressed, respectively. In addition, the current understanding of DV patterning through Wnt7a, Lmx1b, and En1 shown not to be important in this model.

      We thank the reviewer for this comment and fully agree with the point raised. We would like to clarify that we are not claiming to have identified the molecular basis of DV patterning. As the reviewer notes, molecules such as Lmx1b, Wnt7a, and En1 are well identified in other animal models as key regulators of DV positional identity. There is no doubt that these molecules play central roles in DV patterning. However, in axolotl limb regeneration, clear DV-specific expression has not been demonstrated for these genes except for Lmx1b. Therefore, further studies will be required to elucidate the molecular basis of DV patterning in axolotls.

      Our focus here is more limited: we aim to identify the molecular basis for the mechanisms in which positional domain-mediated signals (FGF8, SHH, WNT10B, and FGF2) regulate the limb patterning process, rather than the molecular basis of DV patterning. In fact, our results on Wnt10b and Fgf2 suggest that these genes did not affect dorsoventral identities.

      We recognize that this distinction was not sufficiently clear in the original text, and we have revised the manuscript to describe DV patterning mechanisms in other animals and clarify that the dorsal- and ventral-mediated signals are distinct from DV patterning (Line 444‒450). At least, we avoid claiming that the molecular basis for DV signaling was identified.

      (26) Line 335: References are needed for this statement. From what I found, Wnt4 can be canonical or non-canonical.

      We thank the reviewer for this helpful comment. We have revised the manuscript (Line 404‒407). We added these citations at the relevant location and adjusted nearby wording to avoid implying pathway exclusivity, in alignment with our response to comment (20).

      (27) Line 337-338: The authors cannot claim "that canonical, but not non-canonical, WNT signaling contributes to Shh induction" as this was not thoroughly tested is based upon the negative result that Wnt4 electroporation did not induce Shh expression.

      We thank the reviewer for this important clarification. We agree that our data do not allow us to conclude that non-canonical WNT signaling in general does not contribute to Shh induction. Accordingly, we have removed the phrase “but not non-canonical” and revised the text to emphasize that, within the scope of our experiments, Shh induction was not observed following Wnt4 electroporation, whereas it was observed with Wnt10b.

      (28) Line 345: In order to claim "WNT10B via the canonical WNT pathway...appears to regulate Shh expression" needs at least qPCR to show WNT10B induces canonical signaling.

      We thank the reviewer for this comment. As noted in our response to comment (21), we also assessed canonical WNT signaling by qPCR analysis of Axin2 and Lef1 following Wnt10b electroporation (Line 282‒285).

      (29) Lines 361-372: A few studies have been performed on DV patterning of the mouse digit regeneration in regards to Lmx1b and En1. It may be good to discuss how the current study aligns with these findings.

      We appreciate the reviewer’s suggestion. As the reviewer refers, several studies have been performed on dorsoventral (DV) patterning in mouse digit tip regeneration in relation to Lmx1b and En1 (e.g., Johnson et al., 2022; Castilla-Ibeas et al., 2023). In the present study, however, our main conclusion is different in the scope of studies on mouse digit tip regeneration. We show that, in the axolotl, pre-existing dorsal and ventral identities (as reflected by dorsally derived and ventrally derived cells in the ALM blastema) are required together to induce Shh expression, and that this Shh induction in turn supports anteroposterior interaction at the limb level. This mechanism—dorsal-mediated and ventral-mediated signals acting in combination to permit Shh expression—does not have a clear direct counterpart in the mouse digit tip literature. Moreover, even with respect to Lmx1b, the two systems behave differently. In mouse digit tip regeneration, loss of Lmx1b during regeneration does not grossly affect DV morphology of the regenerate (Johnson et al., 2022). By contrast, in our axolotl ALM system, the presence or absence of Lmx1b-positive dorsal tissue correlates with the final dorsoventral organization of the induced limb-like structures (e.g., production of double-dorsal or double-ventral symmetric structures in the absence of appropriate dorsoventral contact). Thus, the role of dorsoventral identity in our model is directly tied to patterned limb outgrowth at the whole-limb scale, whereas in the mouse digit tip it has been reported primarily in the context of digit tip regrowth and bone regeneration competence, not robust DV repatterning (Johnson et al., 2022).

      For these reasons, we believe that an extended discussion of mouse digit tip regeneration would risk implying a mechanistic equivalence between axolotl limb regeneration and mouse digit tip regeneration that is not supported by current data. Because the regenerative contexts differ, and because Lmx1b does not appear to re-establish DV patterning in the mouse regenerates (Johnson et al., 2022), we have chosen not to include an explicit discussion of mouse digit tip regeneration in the main text.

      (30) Line 408-433: Although I appreciate generating a model, this section takes some liberties to tell a narrative that is not entirely supported by previous literature or this study. For example, lines 415-416 state "Wnt10b and Fgf2 are expressed at higher levels in dorsal and the ventral blastemal cells, respectively" which were not shown in the study or other studies.

      We thank the reviewer for this important comment. We agree that the original model based on RNA-seq data overstated the evidence. To address this point experimentally, we examined Wnt10b and Fgf2 expression in regular blastemas (Supplemental Figure 5 and 6). Accordingly, our model is now framed as an inductive mechanism for Shh expression—supported by results in ALM (WNT10B in VentBL; FGF2 in DorBL) and by DV-biased expression. Concretely, the sentence previously paraphrased as “Wnt10b and Fgf2 are expressed at higher levels in dorsal and ventral blastemal cells, respectively” has been replaced with wording that (i) avoids single-cell DV specificity and (ii) emphasizes dorsal-/ventral-mediated regulation and the requirement for both signals to allow Shh induction (Line 510‒511).

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction:

      The authors' definitions of positional cues vs positional information are a little hard to follow, and do not appear to be completely accurate. From my understanding of what the authors explain, "positional information" is defined as a signal that generates positional identities in the regenerating tissue. This is a somewhat different definition than what I previously understood, which is the intrinsic (likely epigenetic) cellular identity associated with specific positional coordinates. On the other hand, the authors define "positional cues" as signals that help organize the cells according to the different axes, but don't actually generate positional identities in the regenerating cells. The authors provide two examples: Wnt7a as an example of positional information, and FGF8 as a positional cue. I think that coording to the authors definitions, FGF8 (and probobly Shh) are bone fide positional cues, since both signals work together to organize the regenerating limb cells - yet do not generate positional identities, because ectopic limbs formed from blastemas where these pathways have been activated do not regenerate (Nacu et al 2016). However, I am not sure Wnt7a constitutes an example of a "positional information" signal, since as far as I know, it has not been shown to generate stable dorsal limb identities (that remain after the signal has stopped) - at least yet. If it has, the authors should cite the paper that showed this. I think that some sort of diagram to help define these visually will be really helpful, especially to people who do not study regenerative patterning.

      We thank the reviewer for this thoughtful comment. We now agree with the reviewer that our use of “positional cue” and “positional information” may have been confusing. In the revision—and as noted in our response to the Editor’s comment (4)—we have removed the term “positional cue” and no longer attempt to contrast it with “positional information.” Instead, we adopt phrasing that reflects our data and hypothesis: during limb patterning, dorsal-mediated signals act on ventral cells and ventral-mediated signals act on dorsal cells to induce Shh expression. This wording avoids implying that these signals specify dorsoventral identity.

      Regarding WNT7A, we agree it has not been shown to generate a stable dorsal identity after signal withdrawal. In the revised Introduction we therefore describe WNT7A in amniote limb development as an extracellular regulator that induces Lmx1b in dorsal mesenchyme (with En1 repressing Wnt7a ventrally), rather than labeling it as “positional information” in a strict, identity-imprinting sense. We highlight this contrast because, in our axolotl experiments, WNT10B and FGF2 did not alter Lmx1b expression or dorsal–ventral limb characteristics when overexpressed, consistent with the idea that they act downstream of DV identity to enable Shh induction, not to establish DV identity.

      (2) Results:

      It would be helpful if the number of replicates per sample group were reported in the figure legends.

      We thank the reviewer for this suggestion. In accordance with the comment, we have added the number of replicates (n) for each sample group in the figure legends.

      Figure 2 shows ISH for A/P and D/V transcripts in different-positioned blastemas without tissue grafts. The images show interesting patterns, including the lack of Shh expression in all blastemas except in posterior-located blastemas, and localization of the dorsal transcript (Lmx1b) to the dorsal half of A or P located blastemas. My only concern about this data is that the expression patterns are described in only a small part of the ectopic blastema (how representative is it?) and the diagrams infer that these expression patterns are reflective of the entire blastema, which can't be determined by the limited field of view. It is okay if the expression patterns are not present in the entire blastema -in fact, that might be an important observation in terms of who is generating (and might be receiving) these signals.

      We thank the reviewer for this insightful comment. Because Fgf8 and Shh expression was detectable only in a limited subset of cells, the original submission included only high-magnification images. In response to the reviewer’s valid concern about representativeness, we have now added low-magnification overviews of the entire blastema as a supplemental figure (Fig. S1) and clarified in the figure legend that these expression patterns can be focal rather than pan-blastemal (Line 795‒796).

      In Figure 3, they look at all of these expression patterns in the grafted blastemas, showing that Shh expression is only visible when both D and V cells are present in the blastema. My only concern about this data is that the number of replicates is very low (some groups having only an N=3), and it is unclear how many sections the authors visualized for each replicate. This is especially important for the sample groups where they report no Shh expression -I agree that it is not observable in the single example sections they provide, but it is uncertain what is happening in other regions of the blastema.

      We thank the reviewer for this important comment. To increase the reliability of the results, we have increased the number of biological replicates in groups where n was previously low. For all samples, we collected serial sections spanning the entire blastema. For blastemas in which Shh expression was observed, we present representative sections showing the signal. For blastemas without detectable Shh expression, we selected a section from the central region that contains GFP-positive cells for the Figure. To make these points explicit, we have added the following clarification to the Fig. 3 legend (Line 811‒815).

      Figure 4: Shh overexpression in A/P/D/V blastemas - expression induces ectopic limbs in A/D/V locations. They analyzed the symmetry of these regenerates (assuming that Do and V located blastemas will exhibit D/V symmetry because they only contain cells from one side of that axis. I am a little concerned about how the symmetry assay is performed, since oblique sections through the digits could look asymmetric, while they are actually symmetric. It is also unclear how the angle of the boxes that the symmetry scores were based on was decided - I imagine that the score would change depending on the angle. It also appears that the authors picked different digits to perform this analysis on the different sample groups. I also admit that the logic of classification scheme that the authors used AI to perform their symmetry scoring analysis (both in Figures 4 and 5) is elusive to me. I think it would have been more informative if the authors leveraged the structural landmarks, like the localization of specific muscle groups. (If this experiment were performed in WT animals, the authors could have used pigment cell localization)... or generate more proximal sections to look at landmarks in the zeugopod.

      We thank the reviewer for these detailed comments regarding the symmetry analysis. Because reliance on a computed symmetry score alone could raise the concerns noted by the reviewer, we now provide transverse sections along the proximodistal axis as supplemental figures (Figs. S2 and S4). These include levels corresponding to the distal end of the zeugopod and the proximal end of the autopod. In addition to reporting the symmetry score, we have explicitly stated in the text that symmetry was also assessed by visual inspection of these sections.

      As also noted in our response to Reviewer #1 (comment 15), ALM-induced limbs frequently exhibit abnormal and highly variable morphologies, which makes it difficult to use consistent anatomical landmarks such as particular digits or muscle groups. For this reason, we focused our analysis on morphological symmetry rather than landmark-based metrics, and we emphasize this rationale in the revised text (Line 232‒235).

      Regarding the use of bounding boxes, this procedure was chosen to minimize the effects of curvature or fixation-induced distortion. For each section, the box angle was adjusted so that the outer contour (epidermal surface) was aligned symmetrically; this procedure was applied uniformly across all conditions to avoid bias. We analyzed multiple biological replicates in each group, which helps mitigate potential artifacts due to oblique sectioning. To further reduce bias, we increased the number of fields included in the analysis to n = 24 per group in the revised version.

      In addition, staining intensity varied among samples, such that a region identified as “muscle” in one sample could be assigned differently in another if classification were based solely on color. To avoid this problem, we used a machine-learning classifier trained separately for each sample, allowing us to group the same tissues consistently within that sample irrespective of intensity differences. In the context of ALM-induced limbs, where stable anatomical landmarks are not available, we consider this strategy the most appropriate. We have added this rationale to the revised manuscript for clarity (Line 239‒247).

      Figure 5: The number of replicates in sample groups is relatively low and is quite variable between groups (ranging between 3 and 7 replicates). Zoom in to visualize Shh expression is small relative to the blastema, and it is difficult to discern why the authors positioned the window where they did, and how they maintained consistency among their different sample groups. In the examples of positive Shh expression - the signal is low and hard to see. Validating these expression patterns using some sort of quantitative transcriptional assay (like qRTPCR) would increase the rigor of this experiment ... especially given that they will be able to analyze gene expression in the entire blastema as opposed to sections that might not capture localized expression.

      We thank the reviewer for this important comment. To increase the rigor of these experiments, we have increased the number of biological replicates in groups where n was previously low. In addition, because Shh signal in the Wnt10b-electroporated VentBL images was particularly weak and difficult to discern, we replaced that panel with a representative example in which Shh signal is more clearly visible. We also validated the Shh expression for Wnt10b–electroporated VentBL and Fgf2–electroporated DorBL by RT-qPCR, which assesses gene expression across the entire blastema. These results are now included in Fig. 5 and Line 280‒282. Finally, we clarified in the figure legend how the “window” for imaging was chosen: for samples with detectable Shh expression, the window was placed in the region where the signal was observed; for conditions without detectable Shh expression, the window was positioned in a comparable region containing GFP-positive cells (Line 836‒839). These revisions are included in the revised manuscript.

      Figure 6: They treat dorsal and ventral wounds with gelatin beads soaked in a combination of BMP2+FGF8 (nerve factors) and FGF2 proposed ventral factor). Remarkably, they observe ectopic limb expression in only dorsal wounds, further supporting the idea that FGF2 provides the "ventral" signal. They show examples of this impressive phenotype on limbs with multiple ectopic structures that formed along the Pr/Di axis. Including images of tubulin staining (as they have in Figures 1 and 2) to ensure that the blastemas (or final regenerates) are devoid of nerves. The authors' whole-mount skeletal staining which shows fusion of the ectopic humerus with the host humerus, is a phenotype associated with deep wounding, which could provide an opportunity for more cellular contribution from different limb axes.

      We thank the reviewer for these constructive comments. As noted in the prior study, when beads are used to induce blastemas without surgical nerve orientation, fine nerve ingrowth can still occur (Makanae et al., 2014), and the induced blastemas are not completely devoid of nerves. While it is still uncertain whether these recruited nerves are functional after blastema induction, it is an important point, and we added sentences about this in the revised manuscript (Line 341‒345).

      Regarding the skeletal phenotype, despite careful implantation to avoid injuring deep tissues, bead-induced ectopic limbs on the dorsal side occasionally displayed fusion of the stylopod with the host humerus—a phenotype associated with deep wounding, as the reviewer notes. This observation suggests that contributions from a broader cellular population cannot be excluded. However, because fusion was observed in only 1 of 16 induced limbs analyzed, and because ectopic limbs induced at the forearm (zeugopod) level did not exhibit such fusion (n=1/6 for stylopod-level inductions; n=0/10 for zeugopod-level inductions), we believe that our main conclusion remains valid. Because fusion is not a typical outcome, we now present representative non-fusion cases—including zeugopod-origin examples—in the figure (Fig. 6L1, L2), and we report the fusion incidence explicitly in the text (Line 350‒354). We also note in the revised manuscript that stylopod fusion can occur in a minority of cases (Line 347‒349).

      Figure 7 nicely summarizes their findings and model for patterning.

      We thank the reviewer for this positive comment.

      The table is cut off in the PDF, so it cannot be evaluated at this time.

      In our copy of the PDF, the table appears in full, so this may have been a formatting issue. We have carefully checked the file and ensured that the table is completely included in the revised submission.

      There is a supplemental figure that doesn't seem to be referenced in the text.

      The supplemental figure (Fig. S1 of the original manuscript) is referenced in the text, but it may have been overlooked. To improve clarity, we have expanded the description in the manuscript so that the supplemental figure is more clearly referenced (Line 285‒291).

      (3) Materials and Methods:

      No power analysis was performed to calculate sample group sizes. The authors have used these experimental techniques in the past and could have easily used past data to inform these calculations.

      We thank the reviewer for this important comment. We did not include a power analysis in the manuscript because this was the first time we compared Shh and other gene expression levels among ALM blastemas of different positional origins using RT-qPCR in our experimental system. As we did not have prior knowledge of the expected variability under these specific conditions, it was difficult to predetermine appropriate sample sizes.

      Reviewer #3 (Recommendations for the authors):

      General:

      Congratulations - I found this an elegant and easy-to-read study with significant implications for the field! If possible, I would urge you to consider adding some more characterisation of Wnt10b and Fgf2- which cell types are they expressed in? If you can link your mechanisms to normal limb regeneration too (i.e., regenerating blastema, not ALM), this would significantly elevate the interest in your study.

      We sincerely thank the reviewer for these encouraging comments. As also noted in our response to the editor’s comment, we have analyzed the expression patterns of Wnt10b and Fgf2 in regular blastemas (Line 294‒306). Although clear specific expression patterns along dorsoventral axis were not detected by ISH, likely due to technical limitations of sensitivity, RT-qPCR revealed significantly higher expression levels of Wnt10b in the dorsal half and Fgf2 in the ventral half of a regular blastema (Fig. S5). In addition, we analyzed published single-cell RNA-seq data (7 dpa blastema, Li et al., 2021) (Line 307‒321). As a result, Fgf2 expression was observed in the mesenchymal clusters, whereasWnt10b expression was observed in both mesenchymal and epithelial clusters (Fig. S6). However, because only a small fraction of cells expressed Wnt10b, the principal cellular source of WNT10B protein remains unclear. Therefore, defining the precise spatial patterns of Wnt10b and Fgf2 in regular regeneration will be an important goal for future work.

      Data availability:

      I assume that the RNA-sequencing data will be deposited at a public repository.

      RNA-seq FASTQ files have been deposited in the DNA Data Bank of Japan (DDBJ; https://www.ddbj.nig.ac.jp/) under BioProject accession PRJDB38065. We have added a Data availability section to the revised manuscript.

      References

      Castilla-Ibeas, A., Zdral, S., Oberg, K. C., & Ros, M. A. (2024). The limb dorsoventral axis: Lmx1b’s role in development, pathology, evolution, and regeneration. Developmental Dynamics, 253(9), 798–814. https://doi.org/10.1002/dvdy.695

      Johnson, G. L., Glasser, M. B., Charles, J. F., Duryea, J., & Lehoczky, J. A. (2022). En1 and Lmx1b do not recapitulate embryonic dorsal-ventral limb patterning functions during mouse digit tip regeneration. Cell Reports, 41(8), 111701. https://doi.org/10.1016/j.celrep.2022.111701

      Stocum, D. (2017). Mechanisms of urodele limb regeneration. Regeneration, 4. https://doi.org/10.1002/reg2.92

      Tank, P. W., & Holder, N. (1978). The effect of healing time on the proximodistal organization of double-half forelimb regenerates in the axolotl, Ambystoma mexicanum. Developmental Biology, 66(1), 72–85. https://doi.org/10.1016/0012-1606(78)90274-9

    1. Author response:

      Global answer about the ATP analogs (concerns the 3 reviewers)

      We use ATP-Vanadate essentially for detecting the FRET efficiency for the closed state. But these data are not included in our theoretical model. Thus, even if the comments of the reviewers on the observation of a non-negligible fraction of proteins in the open state in the presence of ATP-vanadate are justified, this has no consequence on our conclusions on the effect of curvature on BmrA on the conformational changes with ATP or AMP-PNP.

      We agree with the comments of the reviewers that the binding of vanadate is not irreversible, but the reported lifetime of the closed state is very long compared to our experimental conditions (see (Urbatsch et al. JBC (1995)) on PgP).

      Nevertheless, we will perform new experiments independent of ATP analogs using the E504A BmrA mutant. It has been shown structurally and enzymatically to bind and not hydrolyze ATP and to be 100% in a closed conformation at 5 mM ATP (A. Gobet et al., Nat. Commun. 16, 1745 (2025)). It will clear up all doubts about our experiments.

      We will also add new references:

      I. L. Urbatsch, B. Sankaran, J. Weber, A. E. Senior, J. Biol. Chem. 270, 19383 (1995)

      T. Baukrowitz, T.-C. Hwang, A. C. Nairn, D. C. Gadsby, Neuron 12, 473 (1994)

      A. Gobet et al., Nat. Commun. 16, 1745 (2025)

      Y. Liu, M. Liao, Sci. Adv. 11, eadv9721 (2025) (on the effect of vanadate and temperature on a plant ABC)

      Public Reviews:

      Reviewer #1 (Public review):

      (1) An important aspect of this paper is the difference in mechanism between inhibitors AMP-PNP (a substrate analog) and vanadate (together with ADP, forms a transition state analog inhibitor). The mechanisms and inhibitory constants/binding affinities of these inhibitors are not very well-supported in the current form of the manuscript, either through citations or through experiments. Related to this, the interpretation of the different curvature response of BmrA in the presence of vanadate vs AMPPNP is not very clear.

      See the global answer about ATP-analogs (above)

      (2) Overall, the energetic contribution of the membrane curvature is subtle (less than a kT), so while the principles seem generalizable among membrane proteins, whether these principles impact transport or cell physiology remains to be established.

      This is correct that the effect is limited to high curvature in the case of BmrA. Our theoretical model allows predictions for different protein parameters. The effect is particularly dependent on the protein size and on protein conicity, which can vary over a wide range. We show that larger proteins, such as piezo 1 are in principle expected to display a much stronger curvature dependence than BmrA. But testing our predictions on other proteins and on their physiological function is indeed an exciting perspective but beyond the objective of the current manuscript.

      Reviewer #2 (Public review):

      (1) Although this study may be considered as a purely biophysical investigation of the sensitivity of an ABC transporter to mechanical perturbation of the membrane, the impact would be strengthened if a physiological rationale for this mode of regulation were discussed. Many factors, including temperature, pH, ionic strength, or membrane potential, are likely to affect flux through the transport cycle to some extent, without justifying describing BmrA as a sensor for changes in any of these. Indeed, a much stronger dependence on temperature than on membrane curvature was measured. It is not clear what radii of curvature BmrA would normally be exposed to, and whether this range of curvatures corresponds to the range at which modulation of transport activity could occur. Similarly, it is not clear what biological condition would involve a substantial change to membrane curvature or tension that would necessitate altered BmrA activity.

      Reviewers 1 and 2 both stressed that we showed that activity and conformational changes are mechanosensitive, not that the function of the protein is to be a mechanosensor. This will be corrected.

      Regarding the physiological relevance of the mechanosensitivity of BmrA, we have addressed this point in the manuscript (bottom of page 10 and top of page 11). This discussion was positively appreciated by Reviewer #3. We stress that we have used BmrA as a model system, but considering our results and the theoretical model, we can predict the parameters that are relevant for future studies on the sensitivity of other transmembrane proteins to membrane mechanical properties. And, as stated by the reviewer, "mechanosensitivity of proteins is an understudied phenomenon".

      (2) The size distributions of vesicles were estimated by cryoEM. However, grid blotting leaves a very thin layer of vitreous ice that could sterically exclude large vesicles, leading to a systematic underestimation of the vesicle size distribution.

      We used Lacey carbon grids with large mesh size ranges for our cryoEM images, and we blot on the backside, precisely to measure the largest size range accessible to cryoEM. In our hands, this was not the case when using Quantifoil or C-Flat grids with uniform hole sizes and a large fraction of carbon where the vesicles adhere. With our grids, we are able to image vesicles from 20 to 200 nm diameter and the precision on the diameter is high, but the statistics might not be as good as with DLS or other diffusion-based methods. DLS is an indirect method (as compared to cryoEM) to measure vesicle size distribution, that may overestimate the fraction of large objects and underestimate the small ones. We will perform DLS experiments for comparison purpose.

      (3) The relative difference in ATP turnover rates for BmrA in small versus large vesicles is modest (~2-fold) and could arise from different success rates of functional reconstitution with the different protocols.

      The ATPase activity is sensitive to several parameters. We thus carefully characterized our reconstituted samples, including ATPase activity, yield of incorporation and orientation of proteins that are often reported. In addition, we showed by cryo-EM the unilamellarity of the proteoliposomes and their stability during the experiments, which were never reported. The ATPase activity of our samples reconstituted in liposomes at 20 ° and at 4°C are high, among the highest reported for BmrA, and less sensitive to errors as compared to the low activities in micelles of detergent.

      We would also like to stress that with our protocol, we have prepared the same batch of lipid/protein mixture that we have split it 2 for the reconstitution at 4°C and 20°C conversely. Both preparations contain the same amount of detergent. The only difference is that we include more BioBeads for the preparation at 4°C to account for the difference of absorption of the detergent on the beads at low temperature (D. Lévy, A. Bluzat, M. Seigneuret, J.L. Rigaud Biochim. Biophys. Acta. 179 (1990)), but we also showed that the proteins do not adsorb on the BioBeads (J.-L. Rigaud, B. Pitard, D. Levy, Biochim. Biophys. Acta 1231, 223 (1995)). In addition, the activity of the protein at 37°C is high and comparable to those reported in the literature (E. Steinfels et al., Biochemistry 43, 7491 (2004)., W. Mi et al., Nature 549, 233 (2017).), which speaks for a good functional reconstitution. Finally, our results are consistent between the smFRET where we have only one protein maximum per vesicle and the activity measurements where the amount of protein is higher.

      We also performed reconstitution from molar LPR= 1:13600 to 1:1700 and found the same activity per protein, confirming that the proteins are functional, independently of their surface fraction. We will add these data in the revision.

      Altogether, these data suggest that we correctly estimate the rate of functional reconstitution in our experiments.

      Nevertheless, we will design additional experiments to further compare the activity of the proteins before and after reconstitution.

      (4) The conformational state of the NBDs of BmrA was measured by smFRET imaging. Several aspects of these investigations could be improved or clarified. Firstly, the inclusion and exclusion criteria for individual molecules should be more quantitatively described in the methods. Secondly, errors were estimated by bootstrapping. Given the small differences in state occupancies between conditions, true replicates and statistical tests would better establish confidence in their significance. Thirdly, it is concerning that very few convincing dynamic transitions between states were observed. This may in part be due to fast photobleaching compared to the rate of isomerization, but this could be overcome by reducing the imaging frequency and illumination power. Alternatively, several labs have established the ability to exchange solution during imaging to thereby monitor the change in FRET distribution as a ligand is delivered or removed. Visualizing dynamic and reversible responses to ligands would greatly bolster confidence in the condition-dependent changes in FRET distributions. Such pre-steady state experiments would also allow direct comparison of the kinetics of isomerization from the inward-facing to the outward-facing conformation on delivery of ATP between small and large vesicles.

      (a) We will better detail the inclusion and exclusion criteria.

      (b) For the smFRET, we have performed N=3 true replicates. We will add statistical tests on our graphs.

      (c) We will detail more how we have optimized our illumination protocol, considering the signal to noise ratio and the photobleaching. Practically, we cannot add ATP to our sealed observation chamber on our TIRF system to detect dynamical changes on our immobilized liposomes. The experiment suggested by the reviewer would imply to build a flow chamber to exchange the medium around immobilized liposomes, compatible with TIRF microscopy. This is an excellent idea, which has been achieved only recently (S. N. Lefebvre, M. Nijland, I. Maslov, D. J. Slotboom, Nat. Commun. 16, 4448 (2025)). It will require a full new study to optimize both the flow chamber and the dyes to track the smFRET changes over long periods of time.

      Nevertheless, we would like to stress that our objective is not to study the dynamics of the conformational changes, and that we expect it to be slow for BmrA, even at 33°C.

      (5) A key observation is that BmrA was more prone to isomerize ATP- or AMP-PNP-dependently to the outward-facing conformations in large vesicles. Surprisingly, the same was not observed with vanadate-trapping, although the sensitivity of state occupancy to membrane curvature would be predicted to be greatest when state occupancies of both inward- and outward-facing states are close to 50%. It is argued that this was due to irreversibility of vanadate-trapping, but both vanadate and AMP-PNP should work fully reversibly on ABC transporters (see e.g. PMID: 7512348 for vanadate). Further, if trapping were fully irreversible, a quantitative shift to the outward-facing condition would be predicted.

      See the global answer about ATP-analogs (above)

      Reviewer #3 (Public review):

      (1) The authors say that the protein activity is irreversibly inhibited by orthovanadate, but 50% of the proteins are still in open conformation, while being accessible to the analogue (Table 2). It is unclear what this means in the context of activity vs. conformation.

      See the global answer about ATP-analogs (above)

      (2) The difference in the fraction of proteins in closed conformation is quite similar between LV and SV treated with AMP-PNP at 20 {degree sign}C (Figure 2B), and it is not clear if the difference is significant. The presence of a much higher FRET tail in the plots of smFRET experiment in SVs at 20 {degree sign}C or 33 {degree sign}C in the apo conformation of the protein (Figure 3A-B) is cause of some concern since one would not expect BmrA to access the closed states more frequently in the Apo conformation especially when incorporated in the SV. This is because the subtraction of the higher fraction of closed states in the Apo conformation contributes directly to enhancing the bias between the closed states in SV versus LV membrane bilayers.

      We have consistently observed, both at 20°C and at 33°C, a fraction of proteins with a high FRET signal in our measurements, higher in SV (about 15% and 17%) than in LV (about 10% and 6%). We have quantified the fraction of proteins with NBDs facing inside the liposomes (page 5), 20% in LV and 23.85% in SV. Considering the inverted curvature of the membrane, this orientation could favor the closed conformation, even in the absence of ATP, more for SV than LV. The fraction with inverted orientation could explain our higher fraction of high FRET signal in SV.

      Moreover, for part of it, it can be due to a fraction of proteins with a non-specific labeling that would produce a higher FRET signal. We will add data with Cys-less mutants showing that less than 4% are labeled.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #3 (Public review):

      To summarize: The authors' overfilling hypothesis depends crucially on the premise that the very quickly reverting paired-pulse depression seen after unusually short rest intervals of << 50 ms is caused by depletion of release sites whereas Dobrunz and Stevens (1997) concluded that the cause was some other mechanism that does not involve depletion on. The authors now include experiments where switching extracellular Ca2+ from 1.2 to 2.5 mM increases synaptic strength on average, but not by as much as at other synapse types. They contend that the result supports the depletion on hypothesis. I didn't agree because the model used to generate the hypothesis had no room for any increase at all, and because a more granular analysis revealed a mixed population with a subset where: (a) synaptic strength increased by as much as at standard synapses; and yet (b) the quickly reverting depression for the subset was the same as the overall population.

      The authors raise the possibility of additional experiments, and I do think this could clarify things if they pre-treat with EGTA as I recommended initially. They've already shown they can do this routinely, and it would allow them to elegantly distinguish between pv and pocc explanations for both the increases in synaptic strength and the decreases in the paired pulse ratio upon switching Ca2+ to 2.5 mM. Plus/minus EGTA pre-treatment trials could be interleaved and done blind with minimal additional effort.

      Showing reversibility would be a great addition too, because, in our experience, this does not always happen in whole-cell recordings in ex-vivo tissue even when electrical properties do not change. If the goal is to show that L2/3 synapses are less sensitive to changes in Ca2+ compared to other synapse types - which is interesting but a bit off point - then I would additionally include a positive control, done by the same person with the same equipment, at one of those other synapse types using the same kind of presynaptic stimulation (i.e. ChRs).

      Specific points (quotations are from the Authors' rebuttal)

      (1) Regarding the Author response image 1, I was instead suggesting a plot of PPR in 1.2 mM Ca2+ versus the relative increase in synaptic strength in 2.5 versus in 1.2 mM. This continues to seem relevant.

      Complying with your suggestion, we studied the effects of external [Ca<sup>2+</sup>] ([Ca<sup>2+</sup>]<sub>o</sub>) after pre-incubating the slice in aCSF containing 50 μM EGTA-AM, and added the results as Figure 3—figure supplement 3C-D. Elevation of ([Ca<sup>2+</sup>]<sub>o</sub>) from 1.3 to 2.5 mM produced no significant change in either baseline EPSC amplitude or PPR, supporting that the p<sub>v</sub> is already saturated at 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> and implying that the modest Ca<sup>2+</sup> dependence of baseline EPSCs and PPR in the absence of EGTA (Figure 3—figure supplement 3A-B) is mediated by the change in baseline vesicular occupancy of release sites (p<sub>occ</sub>) rather than fusion probability of docked vesicles (p<sub>v</sub>).

      We found some correlation of high Ca<sup>2+</sup>-induced relative increase in synaptic strength with the PPR at low Ca<sup>2+</sup> (Author response image 1-A). But this correlation was abolished by pre-incubating the slices in EGTA-AM too (Author response image 1-B). It should be noted that high PPR does not always mean low p<sub>v</sub>. For example, when the replenishment is equal between high and low baseline p<sub>occ</sub> synapses, the PPR would be higher at low p<sub>occ</sub> synapses than that at high p<sub>occ</sub> synapses, even if p<sub>v</sub> is close to unity. Therefore, high baseline release probability (Pr), whatever it is attributed to high p<sub>v</sub> or high p<sub>occ</sub>, can result in low PPR, considering that Pr = p<sub>occ</sub> x p<sub>v</sub>.

      As we have already mentioned in our previous letter, the relationship of PPR with refilling rate is complicated and can be bidirectional, whereas an increase in p<sub>v</sub> always results in a reduction of PPR. For example, PPR can be reduced by both a decrease and an increase in the refilling rate (Figure 2— figure supplement 1 and Lin et al., 2025). Therefore, the PPR analysis alone is insufficient to differentiate the contributions of p<sub>v</sub> and p<sub>occ</sub> Thanks to your suggestion, we could resolve this ambiguity by the EGTA-AM pre-incubation study (Figure 3—figure supplement 3C-D).

      Author response image 1.

      Plot of PPR at low [Ca<sup>2+</sup>]<sub>o</sub> (1.3 mM) as a function of the baseline EPSC at high [Ca<sup>2+</sup>]<sub>o</sub> (2.5 mM) normalized to that at low [Ca<sup>2+</sup>]<sub>o</sub> measured at recurrent excitatory synapses in L2/3 of the prelimbic cortex under the conditions without EGTA-AM (A) and after pre-incubating the slices in EGTA-AM (50 μM) (B)

      (2) "Could you explain in detail why two-fold increase implies pv < 0.2?"

      (a) start with power((2.5/(1 + (2.5/K1) + 1/2.97)),4) = 2<sup>*</sup>power((1.3/(1 + (1.3/K1) + 1/2.97)),4);

      (b) solve for K1 (this turns out to be 0.48);

      (c) then implement the premise that pv -> 1.0 when Ca2+ is high by calculating Max = power((C/(1 + (C/K1) + 1/2.97)),4) where C is [Ca] -> infinity.

      (d) pv when [Ca] = 1.3. mM must then be power((1.3/(1 + (1.3/K1) + 1/2.97)),4)/Max, which is <0.2. Note that modern updates of Dodge and Rahamimoff typically include a parameter that prevents pv from approaching 1.0; this is the gamma parameter in the versions from Neher group.

      Thank you very much for your kind explanation. This interpretation, however, based on the premise that pv is not saturated at low[Ca<sup>2+</sup>]<sub>o</sub>, and that Pr = p<sub>v</sub>. In the present study, however, we presented multiple convergent lines of evidence supporting that p<sub>v</sub> is already saturated at 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> as follows: (1) little effect of EGTA-AM on the baseline EPSCs (Figure 2—figure supplement 1); (2) high double failure rates (Figure 3—figure supplement 2); (3) little effect of high [Ca<sup>2+</sup>]<sub>o</sub> on baseline EPSC (Figure 3—figure supplement 3). Therefore, our results suggest that the classical Dodge-Rahamimoff fourth-power relationship can not be applied to estimate p<sub>v</sub> at the L2/3 recurrent excitatory synapses. 

      (3) "If so, we can not understand why depletion-dependent PPD should lead to PPF." When PPD is caused by depletion and pv < 0.2, the number of occupied release sites should not be decreased by more than one-filth at the second stimulus so, without facilitation, PPR should be > 0.8. The EGTA results then indicate there should be strong facilitation, driving PPR to something like 1.2 with conservative assumptions. And yet, a value of < 0.4 is measured, which is a large miss.

      As mentioned above, the framework used for inferring that p<sub>v</sub> < 0.2, the Dodge-Rahamimoff equation, is not applicable to our experimental system. Consequently, the subsequent deduction— that depletion-dependent PPD should logically lead to PPF—is based on a model that does not compatible with aforementioned multiple convergent lines of evidence, which supports high p<sub>v</sub> rather than the low p<sub>v</sub> facilitation model.

      (4) Despite the authors' suggestion to the contrary, I continue to think there is a substantial chance that Ca2+-channel inactivation is the mechanism underlying the very quickly reverting paired-pulse depression. However, this is only one example of a non-depletion mechanism among many, with the main point being that any non-depletion mechanism would undercut the reasoning for overfilling. And, this is what Dobrunz and Stevens claimed to show; that the mechanism - whatever it is - does not involve depletion. The most effective way to address this would be affirmative experiments showing that the quickly reverting depression is caused by depletion after all. Attempting to prove that Ca2+channel inactivation does not occur does not seem like a worthwhile strategy because it would not address the many other possibilities.

      We have systematically ruled out alternative possibilities that may underlie the strong PPD observed at our synapses and demonstrated that it arises from high p<sub>v</sub>-induced vesicle depletion through multiple independent lines of evidence. First, we excluded (1) AMPAR desensitization or saturation (Figure 1—figure supplement 5), (2) Ca<sup>2+</sup> channel inactivation (Figure 2—figure supplement 2), (3) channelrhodopsin inactivation (Figure 1—figure supplement 2), (4) artificial bouton stimulation (Figure 1—figure supplement 4), and (5) transient vesicle undocking (Figure 5; addressed in our previous rebuttal). Second, EGTA-AM experiments (Figure 2, Figure 2—figure supplement 1) revealed that release sites are tightly coupled to Ca<sup>2+</sup>  channels, and that EGTA further exacerbates PPD. Third, we validated high baseline p<sub>v</sub> through analysis of double failure rates (Figure 3—figure supplement 2). Fourth, the minimal increase in baseline EPSCs upon elevation of external [Ca<sup>2+</sup>] (Figure 3—figure supplement 3) further supports that baseline p<sub>v</sub> is already saturated at low [Ca<sup>2+</sup>]<sub>o</sub>. Additionally, to further validate our hypothesis, we performed the specific experiment suggested by the reviewer. We have now added EGTA pre-incubation experiments (Figure 3—figure supplement 3C-D) and have revised the manuscript. Specifically, when slices were pre-incubated with 50 μM EGTA-AM, elevation of extracellular [Ca<sup>2+</sup>] from 1.3 to 2.5 mM produced no significant change in either baseline EPSC amplitude or PPR, strongly supporting that the high [Ca<sup>2+</sup>]<sub>o</sub> effects in the absence of EGTA are primarily mediated by changes in p<sub>occ</sub> rather than p<sub>v</sub>

      (5) True that Kusick et al. observed morphological re-docking, but then vesicles would have to re-prime and Mahfooz et al. (2016) showed that re-priming would have to be slower than 110 ms (at least during heavy use at calyx of Held).

      As previously discussed, Kusick et al. (2020) demonstrated that the transient destabilization of the docked vesicle pool recovers very rapidly within 14 ms after stimulation. This implies that any posts stimulation undocking events are likely recovered before the 20 ms ISI used in our PPR experiments. Consequently, transient undocking/re-docking events are unlikely to significantly influence the PPR measured at this interval. Furthermore, regarding the slow re-priming kinetics (>100 ms) reported by Mahfooz et al. (2016) and Kusick et al., (2020), our 20 ms ISI effectively falls into a me window that avoids the potential confounds of both processes: it is long enough for the rapid morphological recovery (~14 ms) of docked vesicles to occur, yet too short for the slow re-priming process to make a substantial  contribution. Furthermore, Vevea et al. (2021) showed that post-stimulus undocking is facilitated in synaptotagmin-7 (Syt7) knockout synapses. In our study, however, Syt7 knockdown did not affect PPR at 20 ms ISI, suggesting that the undocking process described in Kusick et al. (2020) is not a major contributor to the PPD observed at 20 ms intervals in our experiments. Therefore, we conclude that the 20 ms ISI used in our experiments falls within a me window that is influenced neither by the rapid undocking (<14 ms) reported nor by the slow re-priming process (>100 ms).

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The revised manuscript presents an interesting and technically competent set of experiments exploring the role of the infralimbic cortex (IL) in extinction learning. The inclusion of histological validation in the supplemental material improves the transparency and credibility of the results, and the overall presentation has been clarified. However, several key issues remain that limit the strength of the conclusions.

      We thank the Reviewer for their positive assessment of our revised manuscript. We discussed the issues raised by the Reviewer below.

      The behavioral effects reported are modest, as evident from the trial-by-trial data included in the supplemental figures. Although the authors interpret their findings as evidence that IL stimulation facilitates extinction only after prior inhibitory learning, this conclusion is not directly supported by their data. The experiments do not include a condition in which IL stimulation is delivered during extinction training alone, without prior inhibitory experience. Without this control, the claim that prior inhibitory memory is necessary for facilitation remains speculative.

      The manuscript provides evidence across five experiments (Figures 2-6) that IL stimulation fails to facilitate extinction training in the absence of prior inhibitory experience. We therefore remain confident that the data support our conclusion: prior inhibitory learning enables IL stimulation to facilitate subsequent inhibitory learning.

      The electrophysiological example provided shows that IL stimulation induces a sustained inhibition that outlasts the stimulation period. This prolonged suppression could potentially interfere with consolidation processes following tone presentation rather than facilitating them. The authors should consider and discuss this alternative interpretation in light of their behavioral data.

      The possibility that IL stimulation exerted its effects by interfering with consolidation processes is inconsistent with the literature. Disrupting consolidation processes in the IL impairs extinction learning (1), even when animals have prior inhibitory learning experience (2). Yet our experiments found that IL stimulation failed to interfere with initial extinction learning but instead facilitated subsequent learning. Furthermore, the electrophysiological example demonstrates that the inhibitory effect is transient: the cell returned to firing properties similar to those observed pre-stimulation, making it unlikely that inhibition persists during the consolidation window.

      It is unfortunate that several animals had to be excluded after histological verification, but the resulting mismatch between groups remains a concern. Without a power analysis indicating the number of subjects required to achieve reliable effects, it is difficult to determine whether the modest behavioral differences reflect genuine biological variability or insufficient statistical power. Additional animals may be needed to properly address this imbalance.

      As noted in the revised manuscript, we are confident about the reliability of the findings reported. The manuscript provides evidence across five experiments that IL stimulation fails to facilitate brief extinction in the absence of prior inhibitory experience, replicating previous findings (3, 4). The manuscript also replicates these prior studies by demonstrating that experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the present experiments replicate the facilitative effects of IL stimulation following fear or appetitive backward conditioning.

      Overall, while the manuscript is improved in clarity and methodological detail, the behavioral effects remain weak, and the mechanistic interpretation requires stronger experimental support and consideration of alternative explanations.

      We respectfully disagree with the assertion that the reported results are weak. The manuscript replicates all main findings internally or reproduces findings from previously published studies. While alternative explanations cannot be entirely excluded, we are not aware of any competing account that predicts the pattern of results reported here.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      We thank the Reviewer for their positive assessment.

      Strengths to highlight:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, also are considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition. The authors have addressed the prior reviews. I still think it is unfortunate that the groups were not properly balanced in some of the figures (as noted by the authors, they were matched appropriately in real time, but some animals had to be dropped after histology, which caused some balancing issues). I think the overall pattern of results is compelling enough that more subjects do not need to be added, but it would still be nice to see more acknowledgement and statistical analyses of how these pre-existing differences may have impacted test performance.

      We thank the Reviewer for their positive assessment of our revised manuscript. We discussed the comments regarding group balancing below.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment

      Weaknesses:

      The various group differences in Figure 2 prior to any manipulation are still problematic. There was a reliable effect of subsequent group assignment in Figure 2 (p<0.05, described as "marginal" in multiple places). Then there are differences in extinction (nonsignificant at p=.07). The test difference between ReExt OFF/ON is identical to the difference at the end of extinction and the beginning of Forward 2, in terms of absolute size. I really don't think much can be made of the test result. The authors state in their response that this difference was not evident during the forward phase, but there clearly is a large ordinal difference on the first trial. I think it is appropriate to only focus on test differences when groups are appropriately matched, but when there are pre-existing differences (even when not statistically significant) then they really need to be incorporated into the statistical test somehow.

      We carefully considered the Reviewer's suggestion, but it is not possible to adjust the statistical analyses at test because these analyses do not directly compare the two ReExt groups. Any scaling of performance would require including the two Ext groups, which is not feasible since these groups did not receive initial extinction. Moreover, the analyses provide no conclusive evidence of pre-existing differences between the two ReExt groups: the difference was not significant during initial extinction and was absent during the Forward 2 stage. We acknowledge that closer performance between the two ReExt groups during initial extinction would have been preferable. However, we remain confident in the results obtained because they replicate previous experiments in which the two ReExt groups displayed identical performance during initial extinction.

      The same problem is evident in Figure 4B, but here the large differences in the Same groups are opposite to the test differences. It's hard to say how those large differences ultimately impacted the test results. I suppose it is good that the differences during Forward conditioning did not ultimately predict test differences, but this really should have been addressed with more subjects in these experiments. The authors explore the interactions appropriately but with n=6 in the various subgroups, it's not surprising that some of these effects were not detected statistically.

      As the Reviewer noted, the unexpected differences in Figure 4B are opposite in direction to the test differences. Importantly, Figure 4B replicates the main findings from Figure 3, which did not show these unexpected differences.

      It is useful to see the trial-by-trial test data now presented in the supplement. I think the discussion does a good job of addressing the issues of retrieval, but the ideas of Estes about session cues that the authors bring up in their response haven't really held up over the years (e.g., Robbins, 1990, who explicitly tested this; other demonstrations of within-session spontaneous recovery), for what it's worth.

      We thank the Reviewer for bringing our attention to Robbins’ work on session cues. We understand that the issue of retrieval is important but as we noted before, our manuscript and its conclusions do not claim to differentiate retrieval from additional learning.

      References

      (1) K. E. Nett, R. T. LaLumiere, Infralimbic cortex functioning across motivated behaviors: Can the differences be reconciled Neurosci Biobehav Rev 131, 704–721 (2021).

      (2) V. Laurent, R. F. Westbrook, Inactivation of the infralimbic but not the prelimbic cortex impairs consolidation and retrieval of fear extinction Learn Mem 16, 520–529 (2009).

      (3) N. W. Lingawi, R. F. Westbrook, V. Laurent, Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex Cereb Cortex 27, 5547–5556 (2017).

      (4) N. W. Lingawi, N. M. Holmes, R. F. Westbrook, V. Laurent, The infralimbic cortex encodes inhibition irrespective of motivational significance Neurobiol Learn Mem 150, 64–74 (2018).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript reports a series of experiments designed to test whether optogenetic activation of infralimbic (IL) neurons facilitates extinction retrieval and whether this depends on animals' prior experience. In Experiment 1, rats underwent fear conditioning followed by either one or two extinction sessions, with IL stimulation given during the second extinction; stimulation facilitated extinction retrieval only in rats with prior extinction experience. Experiments 2 and 3 examined whether backward conditioning (CS presented after the US) could establish inhibitory properties that allowed IL stimulation to enhance extinction, and whether this effect was specific to the same stimulus or generalized to different stimuli. Experiments 5 - 7 extended this approach to appetitive learning: rats received backward or forward appetitive conditioning followed by extinction, and then fear conditioning, to determine whether IL stimulation could enhance extinction in contexts beyond aversive learning and across conditioning sequences. Across studies, the key claim is that IL activation facilitates extinction retrieval only when animals possess a prior inhibitory memory, and that this effect generalizes across aversive and appetitive paradigms.

      Strengths:

      (1) The design attempts to dissect the role of IL activity as a function of prior learning, which is conceptually valuable.

      We thank the Reviewer for their positive assessment.

      (2) The experimental design of probing different inhibitory learning approaches to probe how IL activation facilitates extinction learning was creative and innovative.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) Non-specific manipulation.

      ChR2 was expressed in IL without distinction between glutamatergic and GABAergic populations. Without knowing the relative contribution of these cell types or the percentage of neurons affected, the circuit-level interpretation of the results is unclear.

      ChR2 was intentionally expressed in the infralimbic cortex (IL) without distinction between local neuronal populations for two reasons. First, the primary aim of this was to uncover some of the features characterizing the encoding of inhibitory memories in the IL, and this encoding likely engages interactions among various neuronal populations within the IL. Second, the hypotheses tested in the manuscript derived from findings that indiscriminately stimulated the IL using the GABA<sub>A</sub> receptor antagonist picrotoxin, which is best mimicked by the approach taken. We agree that it is also important to determine the respective contributions of distinct IL neuronal populations to inhibitory encoding; however, the global approach implemented in the present experiments represents a necessary initial step. These matters have been incorporated in the Discussion of the revised manuscript.

      (2) Extinction retrieval test conflates processes

      The retrieval test included 8 tones. Averaging across this many tone presentations conflate extinction retrieval/expression (early tones) with further extinction learning (later tones). A more appropriate analysis would focus on the first 2-4 tones to capture retrieval only. As currently presented, the data do not isolate extinction retrieval.

      It is unclear when retrieval of what has been learned across extinction ceases and additional extinction learning occurs. In fact, it is only the first stimulus presentation that unequivocally permits a distinction between retrieval and additional extinction learning, as the conditions for this additional learning have not been fulfilled at that presentation. However, confining evidence for retrieval to the first stimulus presentation introduces concerns that other factors could influence performance. For instance, processing of the stimulus present at the start of the session may differ from that present at the end of the previous session, thereby affecting what is retrieved. Such differences between the stimuli present at the start and end of an extinction session have been long recognized as a potential explanation for spontaneous recovery (Estes, 1955). More importantly, whether the test data presented confound retrieval and additional extinction learning or not, the interpretation remains the same with respect to the effects of a prior history of inhibitory learning on enabling the facilitative effects of IL stimulation. Finally, it is unclear how these facilitative effects could occur in the absence of the subjects retrieving the extinction memory formed under the stimulation. Nevertheless, the revised manuscript now provides the trial-by-trial performance (see Supplemental Figure 3) during the post-extinction retrieval tests and addresses this issue in the Discussion.

      (3) Under-sampling and poor group matching.

      Sample sizes appear small, which may explain why groups are not well matched in several figures (e.g., 2b, 3b, 6b, 6c) and why there are several instances of unexpected interactions (protocol, virus, and period). This baseline mismatch raises concerns about the reliability of group differences.

      Efforts were made to match group performance upon completion of each training stage and before IL stimulation. Unfortunately, these efforts were not completely successful due to exclusions following post-mortem analyses. This has been made explicit in the revised manuscript (Materials and Methods, Subjects section). However, we acknowledge that the unexpected interactions deserve further discussion, and this has been incorporated into the revised manuscript (see also comment from Reviewer 2). Although we cannot exclude the possibility that sample sizes may have contributed to some of these interactions, we remain confident about the reliability of the main findings reported, especially given their replication across the various protocols. Overall, the manuscript provides evidence that IL stimulation does not facilitate brief extinction in the absence of prior inhibitory experience in five different experiments, replicating previous findings (Lingawi et al., 2018; Lingawi et al., 2017). It also replicates these previous findings by showing that prior experience with either fear or appetitive extinction enables IL stimulation to facilitate subsequent fear extinction. Furthermore, the facilitative effects of such stimulation following fear or appetitive backward conditioning are replicated in the present manuscript. This is discussed in the Discussion of the revised manuscript.

      (4) Incomplete presentation of conditioning data

      Figure 3 only shows a single conditioning session despite five days of training. Without the full dataset, it is difficult to evaluate learning dynamics or whether groups were equivalent before testing.

      We apologize, as we incorrectly labeled the X axis for the backward conditioning data in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. This error has been corrected in the revised manuscript (see also second comment from Reviewer 2).

      (5) Interpretation stronger than evidence.

      The authors conclude that IL activation facilitates extinction retrieval only when an inhibitory memory has been formed. However, given the caveats above, the data are insufficient to support such a strong mechanistic claim. The results could reflect nonspecific facilitation or disruption of behavior by broad prefrontal activation. Moreover, there is compelling evidence that optogenetic activation of IL during fear extinction does facilitate subsequent extinction retrieval without prior extinction training (DoMonte et al 2015, Chen et al 2021), which the authors do not directly test in this study.

      As noted above, the interpretations of the main findings stand whether the test data confounds retrieval with additional extinction learning or not. The revised manuscript also clarifies the plotting of the data for the backward conditioning stages. We do agree that further discussion of the unexpected interactions is necessary, and this has been incorporated into the revised manuscript. However, the various replications of the core findings provide strong evidence for their reliability and the interpretations advanced in the original manuscript. The proposal that the results reflect non-specific facilitation or disruption of behavior seems highly unlikely. Indeed, the present experiments and previous findings (Lingawi et al., 2018; Lingawi et al., 2017) provide multiple demonstrations that IL stimulation fails to produce any facilitation in the absence of prior inhibitory experience with the target stimulus. Although these demonstrations appear inconsistent with previous studies (Do-Monte et al., 2015; Chen et al., 2021), this inconsistency is likely explained by the fact that these studies manipulated activity in specific IL neuronal populations. Previous work has already revealed differences between manipulations targeting discrete IL neuronal populations as opposed to general IL activity (Kim et al., 2016). Importantly, as previously noted, the present manuscript aimed to generally explore inhibitory encoding in the IL that is likely to engage several neuronal populations within the IL. Adequate statements on these matters have been included in the Discussion of the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors examine the mechanisms by which stimulation of the infralimbic cortex (IL) facilitates the retention and retrieval of inhibitory memories. Previous work has shown that optogenetic stimulation of the IL suppresses freezing during extinction but does not improve extinction recall when extinction memory is probed one day later. When stimulation occurs during a second extinction session (following a prior stimulation-free extinction session), freezing is suppressed during the second extinction as well as during the tone test the following day. The current study was designed to further explore the facilitatory role of the IL in inhibitory learning and memory recall. The authors conducted a series of experiments to determine whether recruitment of IL extends to other forms of inhibitory learning (e.g., backward conditioning) and to inhibitory learning involving appetitive conditioning. Further, they assessed whether their effects could be explained by stimulus familiarity. The results of their experiments show that backward conditioning, another form of inhibitory learning, also enabled IL stimulation to enhance fear extinction. This phenomenon was not specific to aversive learning, as backward appetitive conditioning similarly allowed IL stimulation to facilitate extinction of aversive memories. Finally, the authors ruled out the possibility that IL facilitated extinction merely because of prior experience with the stimulus (e.g., reducing the novelty of the stimulus). These findings significantly advance our understanding of the contribution of IL to inhibitory learning. Namely, they show that the IL is recruited during various forms of inhibitory learning, and its involvement is independent of the motivational value associated with the unconditioned stimulus.

      Strengths:

      (1) Transparency about the inclusion of both sexes and the representation of data from both sexes in figures.

      We thank the Reviewer for their positive assessment.

      (2) Very clear representation of groups and experimental design for each figure.

      We thank the Reviewer for their positive assessment.

      (3) The authors were very rigorous in determining the neurobehavioral basis for the effects of IL stimulation on extinction. They considered multiple interpretations and designed experiments to address these possible accounts of their data.

      We thank the Reviewer for their positive assessment.

      (4) The rationale for and the design of the experiments in this manuscript are clearly based on a wealth of knowledge about learning theory. The authors leveraged this expertise to narrow down how the IL encodes and retrieves inhibitory memories.

      We thank the Reviewer for their positive assessment.

      Weaknesses:

      (1) In Experiment 1, although not statistically significant, it does appear as though the stimulation groups (OFF and ON) differ during Extinction 1. It seems like this may be due to a difference between these groups after the first forward conditioning. Could the authors have prevented this potential group difference in Extinction 1 by re-balancing group assignment after the first forward conditioning session to minimize the differences in fear acquisition (the authors do report a marginally significant effect between the groups that would undergo one vs. two extinction sessions in their freezing during the first conditioning session)?

      Efforts were made daily to match group performance across the training stages, but these efforts were ultimately hampered by the necessary exclusions following postmortem analyses. This has been made explicit in the revised manuscript (Materials and Methods, Subjects section). Regarding freezing during Extinction 1, as noted by the Reviewer, the difference, which was not statistically significant, was absent across trials during the subsequent forward fear conditioning stage. Likewise, the protocol difference observed during the initial forward fear conditioning was absent in subsequent stages. We are therefore confident that these initial differences (significant or not) did not impact the main findings at test. Importantly, these findings replicate previous work using identical protocols in which no differences were present during the training stages. These considerations have been addressed in the revised manuscript (see Results for Experiment 1).

      (2) Across all experiments (except for Experiment 1), the authors state that freezing during the initial conditioning increased across "days". The figures that correspond to this text, however, show that freezing changes across trials. In the methods, the authors report that backward conditioning occurred over 5 days. It would be helpful to understand how these data were analyzed and collated to create the final figures. Was the freezing averaged across the five days for each trial for analyses and figures?

      We apologize, as noted above, for having incorrectly labeled the X axis across the backward conditioning data sets in Figures 3B, 4B, 4D and 5B. It should have indicated “Days” instead of “Trials”. The data shown in these Figures use the average of all trials on a given day. This has been clarified in the methods section of the revised manuscript (Statistical Analyses section). The labeling errors on the Figures have been corrected.

      (3) In Experiment 3, the authors report a significant Protocol X Virus interaction. It would be useful if the authors could conduct post-hoc analyses to determine the source of this interaction. Inspection of Figure 4B suggests that freezing during the two different variants of backward conditioning differs between the virus groups. Did the authors expect to see a difference in backward conditioning depending on the stimulus used in the conditioning procedure (light vs. tone)? The authors don't really address this confounding interaction, but I do think a discussion is warranted.

      We agree with the Reviewer that further discussion of the Protocol x Virus interaction that emerged during the backward conditioning and forward conditioning stages of Experiment 3 is warranted. This discussion has been provided in the revised manuscript (see Results section). Briefly, during both stages, follow-up analyses did not reveal any differences (main effects or interactions) between the two groups trained with the light stimulus (Diff-EYFP and Diff-ChR2). By contrast, the ChR2 group trained with the tone (Back-ChR2) froze more overall than the EYFP group (Back-EYFP), but there were no other significant differences between the two groups. Based on these analyses, the Protocol x Virus interaction appears to be driven by greater freezing in the ChR2 group trained with the tone rather than a difference in the backward conditioning performance based on stimulus identity. Consistent with this, the statistical analyses did not reveal a main effect of Protocol during either the backward conditioning stage or the stimulus trials during the forward conditioning stage. Nevertheless, during this latter stage, a main effect of Protocol emerged during baseline performance, but once again, this seems to be driven by the Back-ChR2 group. Critically, it is unclear how greater stimulus freezing in the Back-ChR2 group during forward conditioning would lead to lower freezing during the post-extinction retrieval test.

      We note that an unexpected Protocol x Period interaction was found during appetitive backward conditioning in Experiment 5. For consistency, we conducted additional analyses to determine the source of this interaction (see Results section). As previously noted, performance during appetitive backward conditioning is noisy and cannot be taken as a failure to generate inhibitory learning. It is therefore unlikely that this interaction implied a difference in such learning.

      (4) In this same experiment, the authors state that freezing decreased during extinction; however, freezing in the Diff-EYFP group at the start of extinction (first bin of trials) doesn't look appreciably different than their freezing at the end of the session. Did this group actually extinguish their fear? Freezing on the tone test day also does not look too different from freezing during the last block of extinction trials.

      We confirm that overall, there was a significant decline in freezing across the extinction session shown in Figure 4B. The Reviewer is correct to point out that this decline was modest (if not negligible) in the Diff-EYFP group, which was receiving its first inhibitory training with the target tone stimulus. It is worth noting that across all experiments, most groups that did not receive infralimbic stimulation displayed a modest decline in freezing during the extinction session since it was relatively brief, involving only 6 or 8 tone alone presentations. This was intentional, as we aimed for the brief extinction session to generate minimal inhibitory learning and thereby to detect any facilitatory effect of infralimbic stimulation. This has been clarified and explained in the revised version of the manuscript (see Results section, description of Experiment 1).

      (5) The Discussion explored the outcomes of the experiments in detail, but it would be useful for the authors to discuss the implications of their findings for our understanding of circuits in which the IL is embedded that are involved in inhibitory learning and memory. It would also be useful for the authors to acknowledge in the Discussion that although they did not have the statistical power to detect sex differences, future work is needed to explore whether IL functions similarly in both sexes.

      In line with the Reviewer’s suggestion (see also Reviewer 3), the Discussion section has been substantially altered in the revised manuscript. Among other things, it does mention that future studies will need to examine the role of additional brain regions in the effects reported and it acknowledges the need to further explore sex differences and IL functions.

      Reviewer #3 (Public review):

      Summary:

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition.

      Strengths:

      The experimental designs are very rigorous with an unusual level of behavioral sophistication.

      We thank the Reviewer for their positive assessment

      Weaknesses:

      (1) More justification for parametric choices (number of days of backwards vs forwards conditioning) could be provided.

      All experimental parameters were based on previously published experiments showing the capacity of the backward conditioning protocols to generate inhibitory learning and the forward conditioning protocols to produce excitatory learning. Although this was mentioned in the methods section, we acknowledge that further explanation was required to justify the need for multiple days of backward training. This has been provided in the revised manuscript (see Results section and description of the backward parameters.

      (2) The current discussion could be condensed and could focus on broader implications for the literature.

      The discussion has been severely condensed and broader implications have been discussed with respect to the existing literature looking at the neural circuitry underlying inhibitory learning.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Re-analyze extinction retrieval, focusing only on the first 2-4 tones to capture extinction expression.

      This recommendation corresponds to the second public comment made by the Reviewer, and we have replied to this comment.

      (2) Directly test whether activation of IL during fear extinction is insufficient to facilitate extinction retrieval without prior extinction training.

      The manuscript provides five separate demonstrations that the optogenetic approach to stimulate IL activity did not facilitate the initial brief extinction session. This reproduces what had been found with indiscriminate pharmacological stimulation in our previous research (Lingawi et al., 2018; Lingawi et al., 2017). We appreciate that other work that stimulated specific IL neuronal populations has observed facilitation of extinction but, the present manuscript focuses on the role of all IL neuronal populations in encoding inhibitory memories. The Reviewer’s request would imply contrasting the role of various neuronal populations, which is beyond the scope of this manuscript. Nevertheless, we have modified our discussion to indicate that future research should establish which IL neuronal population(s) contribute to the effects reported here.

      (3) Show the percentage of neurons that exhibit excitatory or inhibitory responses in IL after non-specific optogenetic activation to better understand how this manipulation is affecting IL circuitry.

      All electrophysiological recordings (n = 10 cells) are presented in Figure 1C. ChR2 excitation was substantial and overwhelming. Based on the physiological and morphological characteristics of the recorded cells, one was non-pyramidal and was excited by LED light delivery. The remaining 9 cells were pyramidal. One did not respond to LED delivery, but we cannot exclude the possibility that this was due to a lack of ChR2 expression in the somatic compartment. Another cell showed a mild reduction in activity following LED stimulation, while the remaining 7 cells displayed clear excitation upon LED stimulation. We have modified our manuscript to reflect these observations. We did not include percentages since only 10 recordings are shown.

      (4) Present data from all five conditioning sessions, not just one, to allow evaluation of learning history.

      This recommendation corresponds to the fourth public comment made by the Reviewer, and we have replied to this comment.

      (5) Address the issue of small and poorly matched groups, particularly in Figures 2b, 3b, 6b, and 6c.

      This recommendation corresponds to the third public comment made by the Reviewer, and we have replied to this comment.

      (6) Temper the conclusions to reflect the limitations of sampling, group matching, and the lack of specificity in the manipulation.

      We have modified our Discussion to address potential issues related to sampling and group matching. However, we are unsure how the lack of specificity of the IL stimulation has any impact on the interpretations made, since no statement is made about neuronal specificity. That said, as noted above, “we have modified our discussion to indicate that future research should establish which IL neuronal population(s) contribute to the effects reported here”.

      Reviewer #2 (Recommendations for the authors):

      Nothing additional to include beyond what is written for public view.

      Reviewer #3 (Recommendations for the authors):

      This is a really nice manuscript with different lines of evidence to show that the IL encodes inhibitory memories that can then be manipulated by optogenetic stimulation of these neurons during extinction. The behavioral designs are excellent, with converging evidence using extinction/re-extinction, backwards/forwards aversive conditioning, and backwards appetitive/forwards aversive conditioning. Additional factors, such as nonassociative effects of the CS or US, are also considered, and the authors evaluate the inhibitory properties of the CS with tests of conditioned inhibition. I only have a couple of comments that the authors may want to consider.

      We thank the Reviewer for their positive assessment.

      First, in Figure 2, it is unfortunate that there is a general effect of the LED assignment before the LED experience (p=.07 during that first extinction session). This is in the same direction as the difference during the test, so it is not clear that the test difference really reflects differences due to Extinction 2 treatment or to preexisting differences based on group assignments.

      The Reviewer’s comment is identical to the first public comment of Reviewer 2, which has been addressed.

      Second, it is notable that the backwards fear conditioning phase was conducted over 5 days, but the forward conditioning phase was conducted over one day. The rationale for these differences should be presented. There is an old idea going back to Konorski that backwards conditioning may lead to excitation initially, and it is only after more extensive trials that inhibitory conditioning occurs (a finding supported by Heth, 1976). Some discussion of the potential biphasic nature of backwards conditioning would be useful, especially for people who want to run this type of experiment but with only a single session of backwards conditioning.

      In line with the Reviewer’s suggestion, the revised manuscript (see results section) provide an explanation for conducting backward conditioning across multiple days.

      Third, as written, each paragraph of the discussion is mostly a recapitulation of the findings from each experiment. This could be condensed significantly, and it would be nice to see more integration with the current literature and how these results challenge or suggest nuance in current thinking about IL function.

      We have significantly condensed the recapitulation of our findings in the Discussion of the revised manuscript. The Discussion now dedicates space to address comments from the other Reviewers and integrate the present findings with the current literature.

      References

      Chen, Y.-H., Wu, J.-L., Hu, N.-Y., Zhuang, J.-P., Li, W.-P., Zhang, S.-R., Li, X.-W., Yang, J.-M., & Gao, T.-M. (2021). Distinct projections from the infralimbic cortex exert opposing effects in modulating anxiety and fear. J Clin Invest, 131(14), e145692. https://doi.org/10.1172/JCI145692

      Do-Monte, F. H., Manzano-Nieves, G., Quiñones-Laracuente, K., Ramos-Medina, L., & Quirk, G. J. (2015). Revisiting the role of infralimbic cortex in fear extinction with optogenetics. J Neurosci, 35(8), 3607-3615. https://doi.org/10.1523/JNEUROSCI.3137-14.2015

      Estes, W. K. (1955). Statistical theory of spontaneous recovery and regression. Psychol Rev, 62(3), 145-154. https://doi.org/10.1037/h0048509

      Kim, H.-S., Cho, H.-Y., Augustine, G. J., & Han, J.-H. (2016). Selective Control of Fear Expression by Optogenetic Manipulation of Infralimbic Cortex after Extinction. Neuropsychopharmacology, 41(5), 1261-1273. https://doi.org/10.1038/npp.2015.276

      Lingawi, N. W., Holmes, N. M., Westbrook, R. F., & Laurent, V. (2018). The infralimbic cortex encodes inhibition irrespective of motivational significance. Neurobiol Learn Mem, 150, 64-74. https://doi.org/10.1016/j.nlm.2018.03.001

      Lingawi, N. W., Westbrook, R. F., & Laurent, V. (2017). Extinction and Latent Inhibition Involve a Similar Form of Inhibitory Learning that is Stored in and Retrieved from the Infralimbic Cortex. Cereb Cortex, 27(12), 5547-5556.

      https://doi.org/10.1093/cercor/bhw322.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #4 (Public review):

      Summary:

      The authors demonstrate a computational rational design approach for developing RNA aptamers with improved binding to the Receptor Binding Domain (RBD) of the SARS-CoV-2 spike protein. They demonstrate the ability of their approach to improve binding affinity using a previously identified RNA aptamer, RBD-PB6-Ta, which binds to the RBD. They also computationally estimate the binding energies of various RNA aptamers with the RBD and compare against RBD binding energies for a few neutralizing antibodies from the literature. Finally, experimental binding affinities are estimated by electrophoretic mobility shift assays (EMSA) for various RNA aptamers and a single commercially available neutralizing antibody to support the conclusions from computational studies on binding. The authors conclude that their computational framework, CAAMO, can provide reliable structure predictions and effectively support rational design of improved affinity for RNA aptamers towards target proteins. Additionally, they claim that their approach achieved design of high affinity RNA aptamer variants that bind to the RBD as well or better than a commercially available neutralizing antibody.

      Strengths:

      The thorough computational approaches employed in the study provide solid evidence of the value of their approach for computational design of high affinity RNA aptamers. The theoretical analysis using Free Energy Perturbation (FEP) to estimate relative binding energies supports the claimed improvement of affinity for RNA aptamers and provides valuable insight into the binding model for the tested RNA aptamers in comparison to previously studied neutralizing antibodies. The multimodal structure prediction in the early stages of the presented CAAMO framework, combined with the demonstrated outcome of improved affinity using the structural predictions as a starting point for rational design, provide moderate confidence in the structure predictions.

      We thank the reviewer for this accurate summary and for recognizing the strength of our integrated computational–experimental workflow in improving aptamer affinity.

      Weaknesses:

      The experimental characterization of RBD affinities for the antibody and RNA aptamers in this study present serious concerns regarding the methods used and the data presented in the manuscript, which call into question the major conclusions regarding affinity towards the RBD for their aptamers compared to antibodies. The claim that structural predictions from CAAMO are reasonable is rational, but this claim would be significantly strengthened by experimental validation of the structure (i.e. by chemical footprinting or solving the RBD-aptamer complex structure).

      The conclusions in this work are somewhat supported by the data, but there are significant issues with experimental methods that limit the strength of the study's conclusions.

      (1) The EMSA experiments have a number of flaws that limit their interpretability. The uncropped electrophoresis images, which should include molecular size markers and/or positive and negative controls for bound and unbound complex components to support interpretation of mobility shifts, are not presented. In fact, a spliced image can be seen for Figure 4E, which limits interpretation without the full uncropped image.

      Thank you for your valuable comments and careful review.

      In response to your suggestion, we will provide all uncropped electrophoresis raw images corresponding to the results in the main figures and supplementary figures (Figure 2F, 3D, 3E, 4E, S9A and S10 of the original manuscript) in the revised version. Regarding the spliced image in Figure 4E, the uncropped raw gel image clearly shows that the two C23U samples were run on an adjacent lane of the same gel due to the total number of samples exceeding the well capacity of a single lane. All samples were electrophoresed and signal-detected under identical experimental conditions in one single experiment, ensuring the validity of direct signal intensity comparison across all samples. These complete uncropped raw images will be supplemented in the revised manuscript as Figure S12 (also see Author response image 1).

      Author response image 1.

      Uncropped electrophoresis images corresponding to Figures 2F, 3D, 3E, 4E, S9A and S10 of the original manuscript.

      Additionally, he volumes of EMSA mixtures are not presented when a mass is stated (i.e. for the methods used to create Figure 3D), which leaves the reader without the critical parameter, molar concentration, and therefore leaves in question the claim that the tested antibody is high affinity under the tested conditions.

      Thank you for your valuable comment on this oversight.

      For the EMSA assay in Figure 3D, the reaction mixture (10 μL total volume) contained 3 μg of RBD protein and 3 μg of antibody (40592-R001), either individually or in combination, with incubation at room temperature for 20 minutes. Based on the molecular weights (35 kDa for RBD and 150 kDa for the IgG antibody), the corresponding molar concentrations in the mixture were calculated as 8.57 μM for RBD and 2 μM for the antibody. To ensure consistency, clarity and provide the critical molar concentration parameter, we will revise the legend of Figure 3D, replacing the mass values with the calculated molar concentrations as you suggested in the revised manuscript.

      Additionally, protein should be visualized in all gels as a control to ensure that lack of shifts is not due to absence/aggregation/degradation of the RBD protein. In the case of Figure 3E, for example, it can be seen that there are degradation products included in the RBD-only lane, introducing a reasonable doubt that the lack of a shift in RNA tests (i.e. Figure 2F) is conclusively due to a lack of binding.

      We sincerely appreciate your careful evaluation of our work, which helps us further clarify the experimental details and data reliability.

      First, we would like to clarify the nature of the gel electrophoresis in Figure 3E: the RBD protein was separated by native-PAGE rather than denaturing SDS-PAGE. The RBD protein used in all experiments was purchased from HUABIO (Cat. No. HA210064) with guaranteed quality, and its integrity and purity were independently verified in our laboratory via denaturing SDS-PAGE (see Author response image 2), which showed a single, intact band without any degradation products. The ladder-like bands observed in the RBD-only lane of the native-PAGE gel are not a result of protein degradation. Instead, they arise from two well-characterized properties of recombinant SARS-CoV-2 Spike RBD protein expressed in human cells: intrinsic conformational heterogeneity (the RBD domain exists in multiple dynamic conformations due to its structural flexibility) (Cai et al., Science, 2020; Wrapp et al., Science, 2020) and heterogeneity in N-glycosylation modification (variable glycosylation patterns at the conserved N-glycosylation sites of RBD) (Casalino et al., ACS Cent. Sci., 2020; Ives et al., eLife, 2024), both of which could cause distinct migration bands in native-PAGE under non-denaturing conditions.

      Second, to ensure the reliability of the RNA-binding results, the EMSA experiments for determining the binding affinity (K<sub>d</sub>) of RBD to Ta, Tc and Ta variants were performed with three independent biological replicates (the original manuscript includes all replicate data in Figure 2F and S9). Consistent results were obtained across all replicates, which effectively rules out false-negative outcomes caused by accidental absence or loss of functional RBD protein in the reaction system. In addition, our gel images (Figure 2F and S9 in the original manuscript) and uncropped raw images of all EMSA gels (see Author response image 1) show no significant signal accumulation in the sample wells, confirming the absence of RBD protein aggregation in the binding reactions—an issue that would otherwise interfere with RNA-protein interaction and band shift detection.

      New results for RBD analysis by denaturing SDS-PAGE, along with the associated discussion, will be added to the revised manuscript as Figure S10 (also see Author response image 2).

      Author response image 2.

      SDS-PAGE analysis of the SARS-CoV-2 Spike RBD protein, neutralizing antibody (40592-R001) and BSA reference. This gel validates the high purity and structural integrity of the commercially sourced RBD protein and neutralizing antibody used in this study.

      References

      Cai, Y. et al. Distinct conformational states of SARS-CoV-2 spike proteins. Science 369, 1586-1592 (2020).

      Casalino, L. et al. Beyond shielding: the roles of glycans in the SARS-CoV-2 spike protein. ACS Cent. Sci. 6, 1722-1734 (2020).

      Ives, C.M. et al. Role of N343 glycosylation on the SARS-CoV-2 S RBD structure and co-receptor binding across variants of concern. eLife 13, RP95708 (2024).

      Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 367, 1260-1263 (2020).

      Finally, there is no control for nonspecific binding, such as BSA or another non-target protein, which fails to eliminate the possibility of nonspecific interactions between their designed aptamers and proteins in general. A nonspecific binding control should be included in all EMSA experiments.

      Thank you for this constructive comment.

      Following your recommendation, we are currently supplementing the EMSA assays with BSA as a non-target protein control to rigorously exclude potential non-specific binding between our designed aptamers (Ta and Ta variants) and exogenous proteins. These additional experiments are designed to directly assess whether the aptamers exhibit unintended interactions with unrelated proteins and to further validate the protein specificity of the RBD–aptamer interaction observed in our study.

      The resulting nonspecific binding control data will be formally incorporated into the revised manuscript as Figure S11, and the corresponding Results and Discussion sections will be updated accordingly to reflect this critical validation once the experiments are completed.

      (2) The evidence supporting claims of better binding to RBD by the aptamer compared to the commercial antibody is flawed at best. The commercial antibody product page indicates an affinity in low nanomolar range, whereas the fitted values they found for the aptamers in their study are orders of magnitude higher at tens of micromolar. Moreover, the methods section is lacking in the details required to appropriately interpret the competitive binding experiments. With a relatively short 20-minute equilibration time, the order of when the aptamer is added versus the antibody makes a difference in which is apparently bound. The issue with this becomes apparent with the lack of internal consistency in the presented results, namely in comparing Fig 3E (which shows no interference of Ta binding with 5uM antibody) and Fig 5D (which shows interference of Ta binding with 0.67-1.67uM antibody). The discrepancy between these figures calls into question the methods used, and it necessitates more details regarding experimental methods used in this manuscript.

      Thank you for your insightful comments, which have helped us refine the rigor of our study. We address each of your concerns in detail below:

      First, we agree with your observation that the commercial neutralizing antibody (Sino Biological, Cat# 40592-R001) is reported to bind Spike RBD with low nanomolar affinity on its product page. However, this discrepancy in affinity values (nanomolar vs. micromolar) stems from the use of distinct analytical methods. The product page affinity was determined via the Octet RED System, a technique analogous to Surface Plasmon Resonance (SPR) that offers high sensitivity for kinetic and affinity measurements. In contrast, our study employed EMSA, a method primarily optimized for semi-quantitative assessment of binding interactions. The inherent differences in sensitivity and principle between these two techniques—with Octet RED System enabling real-time monitoring of biomolecular interactions and EMSA relying on gel separation—account for the observed variation in affinity values.

      Second, regarding the competitive binding experiments, we appreciate your note on the critical role of reagent addition order and equilibration time. To eliminate potential biases from sequential addition, we clarify that Cy3-labeled RNAs, RBD proteins, and the neutralizing antibody were added simultaneously to the reaction system. We will revise the Methods section in the revised manuscript to provide a detailed protocol for the EMSA experiments, to ensure full reproducibility and appropriate interpretation of the results.

      Third, we acknowledge and apologize for a critical error in the figure legends of Figure 3E: the concentrations reported (5 μM aptamer and antibody 40592-R001) refer to stock solutions, not the final concentrations in the EMSA reaction mixture. The correct final concentrations are 0.5 μM for aptamer Ta, and 0.5 μM for the antibody. This correction resolves the apparent inconsistency between Figure 3E and Figure 5D, as the final antibody concentration in Figure 3E is now consistent with the concentration range used in Figure 5D. We will update the figure legends for Figure 3E and revise the Methods section to explicitly distinguish between stock and final reaction concentrations, ensuring clarity and internal consistency of the results.

      We sincerely thank you for highlighting these issues, which will prompt important revisions to improve the clarity, accuracy, and rigor of our manuscript.

      (3) The utility of the approach for increasing affinity of RNA aptamers for their targets is well supported through computational and experimental techniques demonstrating relative improvements in binding affinity for their G34C variant compared to the starting Ta aptamer. While the EMSA experiments do have significant flaws, the observations of relative relationships in equilibrium binding affinities among the tested aptamer variants can be interpreted with reasonable confidence, given that they were all performed in a consistent manner.

      We sincerely appreciate your valuable concerns and constructive feedback, which have greatly facilitated the improvement of our manuscript. Regarding the flaws of the EMSA experiments you pointed out, we have provided a detailed response to clarify the related issues and supplemented necessary experimental details to enhance the rigor and reproducibility of our work (see corresponding response above). It is worth noting that EMSA remains a classic and widely used technique for studying biomolecular interactions, and its reliability in qualitative and semi-quantitative analysis of binding events has been well recognized in the field. Furthermore, we fully agree with and are grateful for your view that, since all tested aptamer variants were analyzed using a consistent experimental protocol, the observations on the relative relationships of their equilibrium binding affinities can be interpreted with reasonable confidence. This recognition reinforces the validity of the relative affinity improvements we observed for the G34C variant compared to the parental Ta aptamer, which is a key finding of our study.

      (4) The claim that the structure of the RBD-Aptamer complex predicted by the CAAMO pipeline is reliable is tenuous. The success of their rational design approach based on the structure predicted by several ensemble approaches supports the interpretation of the predicted structure as reasonable, however, no experimental validation is undertaken to assess the accuracy of the structure. This is not a main focus of the manuscript, given the applied nature of the study to identify Ta variants with improved binding affinity, however the structural accuracy claim is not strongly supported without experimental validation (i.e. chemical footprinting methods).

      We thank the reviewer for this comment and agree that experimental validation would be required to establish the structural accuracy of the predicted RBD–aptamer complex. We note, however, that the primary aim of this study is not structural determination, but the development of a general computational framework for aptamer affinity maturation. In most practical applications, experimentally resolved structures of aptamer–protein complexes are unavailable. Accordingly, CAAMO is designed to operate under such conditions, using computationally generated binding models as working hypotheses to guide rational optimization rather than as definitive structural descriptions. In this context, the predicted structure is evaluated by its utility for affinity improvement, rather than by direct structural validation. We will revise the manuscript accordingly to further clarify this scope.

      (5) Throughout the manuscript, the phrasing of "all tested antibodies" was used, despite there being only one tested antibody in experimental methods and three distinct antibodies in computational methods. While this concern is focused on specific language, the major conclusion that their designed aptamers are as good or better than neutralizing antibodies in general is weakened by only testing only three antibodies through computational binding measurements and a fourth single antibody for experimental testing. The contact residue mapping furthermore lacks clarity in the number of structures that were used, with a vague description of structures from the PDB including no accession numbers provided nor how many distinct antibodies were included for contact residue mapping.

      We thank the reviewer for this important comment regarding language precision, experimental scope, and clarity of the antibody dataset used in this study. We agree that the phrase “all tested antibodies” was imprecise and could lead to overgeneralization. We will carefully revise the manuscript to use more accurate and explicit wording throughout, clearly distinguishing between experimentally tested antibodies, computationally analyzed antibodies, and antibody structures used for large-scale contact analysis.

      Specifically, the experimental comparison in this study was performed using one commercially available SARS-CoV-2 neutralizing antibody, whereas free energy–based computational analyses were conducted on three representative neutralizing antibodies with available structural data. We will revise the manuscript to explicitly state these distinctions and avoid general statements referring to neutralizing antibodies as a class.

      Importantly, the residue-level contact frequency analysis was not based solely on these individual antibodies. Instead, this analysis leveraged a comprehensive set of experimentally resolved SARS-CoV-2 RBD–antibody complex structures curated from the Coronavirus Antibody Database (CoV-AbDab), a publicly available and actively maintained resource developed by the Oxford Protein Informatics Group. CoV-AbDab aggregates all published coronavirus-binding antibodies with associated PDB structures and provides a systematic and unbiased structural foundation for antibody–RBD interaction analysis. All available high-resolution RBD–antibody complex structures indexed in CoV-AbDab at the time of analysis were included to compute contact residue frequencies across the structural ensemble. We will explicitly state this data source, clarify the number and nature of structures used, and add the appropriate citation (Raybould et al., Bioinformatics, 2021, doi: 10.1093/bioinformatics/btaa739).

      Finally, we will revise the conclusions to avoid claims that extend beyond the scope of the data. The comparison between aptamers and antibodies is now framed in terms of representative antibodies and consensus interaction patterns derived from a large structural ensemble, rather than as a general statement about all neutralizing antibodies. These revisions will improve the clarity, rigor, and reproducibility of the manuscript, while preserving the core conclusion that the CAAMO framework enables effective structure-guided affinity maturation of RNA aptamers.

      Overall, the manuscript by Yang et al presents a valuable tool for rational design of improved RNA aptamer binding affinity toward target proteins, which the authors call CAAMO. Notably, the method is not intended for de novo design, but rather as a tool for improving aptamers that have been selected for binding affinity by other methods such as SELEX. While there are significant issues in the conclusions made from experiments in this manuscript, the relative relationships of observed affinities within this study provide solid evidence that the CAAMO framework provides a valuable tool for researchers seeking to use rational design approaches for RNA aptamer affinity maturation.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, the authors attempt to devise general rules for aptamer design based on structure and sequence features. The main system they are testing is an aptamer targeting a viral sequence.

      Strengths:

      The method combines a series of well-established protocols, including docking, MD, and a lot of system-specific knowledge, to design several new versions of the Ta aptamer with improved binding affinity.

      We thank the reviewer for this accurate summary and for recognizing the strength of our integrated computational–experimental workflow in improving aptamer affinity.

      Weaknesses:

      The approach requires a lot of existing knowledge and, importantly, an already known aptamer, which presumably was found with SELEX. In addition, although the aptamer may have a stronger binding affinity, it is not clear if any of it has any additional useful properties such as stability, etc.

      Thanks for these critical comments.

      (1) On the reliance on a known aptamer: We agree that our CAAMO framework is designed as a post-SELEX optimization platform rather than a tool for de novo discovery. Its primary utility lies in rationally enhancing the affinity of existing aptamers that may not yet be sequence-optimal, thereby complementing experimental technologies such as SELEX. The following has been added to “Introduction” of the revised manuscript. (Page 5, line 108 in the revised manuscript)

      ‘Rather than serving as a de novo aptamer discovery tool, CAAMO is designed as a post-SELEX optimization platform that rationally improves the binding capability of existing aptamers.’

      (2) On stability and developability: We also appreciate the reviewer’s important reminder that affinity alone is not sufficient for therapeutic development. We acknowledge that the present study has focused mainly on affinity optimization, and properties such as nuclease resistance, structural stability, and overall developability were not evaluated. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 25, line 595 in the revised manuscript)

      ‘While the present study primarily focused on affinity optimization, we acknowledge that other key developability traits—such as nuclease resistance, structural and thermodynamic stability, and in vivo persistence—are equally critical for advancing aptamers toward therapeutic applications. These properties were not evaluated here but will be systematically addressed in future iterations of the CAAMO framework to enable comprehensive optimization of aptamer candidates.’

      Reviewer #2 (Public review):

      Summary:

      This manuscript proposes a workflow for discovering and optimizing RNA aptamers, with application in the optimization of a SARS-CoV-2 RBD. The authors took a previously identified RNA aptamer, computationally docked it into one specific RBD structure, and searched for variants with higher predicted affinity. The variants were subsequently tested for RBD binding using gel retardation assays and competition with antibodies, and one was found to be a stronger binder by about three-fold than the founding aptamer.

      Overall, this would be an interesting study if it were performed with truly high-affinity aptamers, and specificity was shown for RBD or several RBD variants.

      Strengths:

      The computational workflow appears to mostly correctly find stronger binders, though not de novo binders.

      We thank the reviewer for the clear summary and for acknowledging that our workflow effectively prioritizes stronger binders.

      Weaknesses:

      (1) Antibody competition assays are reported with RBD at 40 µM, aptamer at 5 µM, and a titration of antibody between 0 and 1.2 µg. This approach does not make sense. The antibody concentration should be reported in µM. An estimation of the concentration is 0-8 pmol (from 0-1.2 µg), but that's not a concentration, so it is unknown whether enough antibody molecules were present to saturate all RBD molecules, let alone whether they could have displaced all aptamers.

      Thanks for your insightful comment. We have calculated that 0–1.2 µg antibody corresponds to a final concentration range of 0–1.6 µM (see Author response image 1). In practice, 1.2 µg was the maximum amount of commercial antibody that could be added under the conditions of our assay. In the revised manuscript, all antibody amounts previously reported in µg have been converted to their corresponding molar concentrations in Fig. 1F and Fig. 5D. In addition, the exact antibody concentrations used in the EMSA assays are now explicitly stated in the Materials and Methods section under “EMSA experiments.” The following has been added to “EMSA experiments” of the revised manuscript. (Page 30 in the revised manuscript)

      ‘For competitive binding experiments, 40 μM of RBP proteins, 5 μM of annealed Cy3-labelled RNAs and increasing concentrations of SARS-CoV-2 neutralizing antibody 40592-R001 (0–1.67 μM) were mixed in the EMSA buffer and incubated at room temperature for 20 min.’

      Author response image 1.

      Estimation of antibody concentration. Assuming a molecular weight of 150 kDa, dissolving 1.2 µg of antibody in a 5 µL reaction volume results in a final concentration of 1.6 µM.

      As shown in Figure 5D, the purpose of the antibody–aptamer competition assay was not to achieve full saturation but rather to compare the relative competitive binding of the optimized aptamer (Ta<sup>G34C</sup>) versus the parental aptamer (Ta). Molecular interactions at this scale represent a dynamic equilibrium of binding and dissociation. While the antibody concentration may not have been sufficient to saturate all available RBD molecules, the experimental results clearly reveal the competitive binding behavior that distinguishes the two aptamers. Specifically, two consistent trends emerged:

      (1) Across all antibody concentrations, the free RNA band for Ta was stronger than that of Ta<sup>G34C</sup>, while the RBD–RNA complex band of the latter was significantly stronger, indicating that Ta<sup>G34C</sup> bound more strongly to RBD.

      (2) For Ta, increasing antibody concentration progressively reduced the RBD–RNA complex band, consistent with antibody displacing the aptamer. In contrast, for Ta<sup>G34C</sup>, the RBD–RNA complex band remained largely unchanged across all tested antibody concentrations, suggesting that the antibody was insufficient to displace Ta<sup>G34C</sup> from the complex.

      Together, these observations support the conclusion that Ta<sup>G34C</sup> exhibits markedly stronger binding to RBD than the parental Ta aptamer, in line with the predictions and objectives of our CAAMO optimization framework.

      (2) These are not by any means high-affinity aptamers. The starting sequence has an estimated (not measured, since the titration is incomplete) K<sub>d</sub> of 110 µM. That's really the same as non-specific binding for an interaction between an RNA and a protein. This makes the title of the manuscript misleading. No high-affinity aptamer is presented in this study. If the docking truly presented a bound conformation of an aptamer to a protein, a sub-micromolar K<sub>d</sub> would be expected, based on the number of interactions that they make.

      In fact, our starting sequence (Ta) is a high-affinity aptamer, and then the optimized sequences (such as Ta<sup>G34C</sup>) with enhanced affinity are undoubtedly also high-affinity aptamers. See descriptions below:

      (1) Origin and prior characterization of Ta. The starting aptamer Ta (referred to as RBD-PB6-Ta in the original publication by Valero et al., PNAS 2021, doi:10.1073/pnas.2112942118) was selected through multiple positive rounds of SELEX against SARS-CoV-2 RBD, together with counter-selection steps to eliminate non-specific binders. In that study, Ta was reported to bind RBD with an IC₅₀ of ~200 nM as measured by biolayer interferometry (BLI), supporting its high affinity and specificity. The following has been added to “Introduction” of the revised manuscript. (Page 4 in the revised manuscript)

      ‘This aptamer was originally identified through SELEX and subsequently validated using surface plasmon resonance (SPR) and biolayer interferometry (BLI), which confirmed its high affinity (sub-nanomolar) and high specificity toward the RBD. Therefore, Ta provides a well-characterized and biologically relevant starting point for structure-based optimization.’

      (2) Methodological differences between EMSA and BLI measurements. We acknowledge that the discrepancy between our obtained binding affinity (K<sub>d</sub> = 110 µM) and the previously reported one (IC<sub>50</sub> ~ 200 nM) for the same Ta sequence arises primarily from methodological and experimental differences between EMSA and BLI. Namely, different experimental measurement methods can yield varied binding affinity values. While EMSA may have relatively low measurement precision, its relatively simple procedures were the primary reason for its selection in this study. Particularly, our framework (CAAMO) is designed not as a tool for absolute affinity determination, but as a post-SELEX optimization platform that prioritizes relative changes in binding affinity under a consistent experimental setup. Thus, the central aim of our work is to demonstrate that CAAMO can reliably identify variants, such as Ta<sup>G34C</sup>, that bind more strongly than the parental sequence under identical assay conditions. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘Although the absolute K<sub>d</sub> values determined by EMSA cannot be directly compared with surface-based methods such as SPR or BLI, the relative affinity trends remain highly consistent. While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here. In future studies, additional orthogonal biophysical techniques (e.g., filter-binding, SPR, or BLI) will be employed to further validate and refine the protein–aptamer interaction models.’

      (3) Evidence of specific binding in our assays. We emphasize that the binding observed in our EMSA experiments reflects genuine aptamer–protein interactions. As shown in Figure 2G, a control RNA (Tc) exhibited no detectable binding to RBD, whereas Ta produced a clear binding curve, confirming that the interaction is specific rather than non-specific.

      (3) The binding energies estimated from calculations and those obtained from the gel-shift experiments are vastly different, as calculated from the K<sub>d</sub> measurements, making them useless for comparison, except for estimating relative affinities.

      Author Reply: We thank the reviewer for raising this important point. CAAMO was developed as a post-SELEX optimization tool with the explicit goal of predicting relative affinity changes (ΔΔG) rather than absolute binding free energies (ΔG). Empirically, CAAMO correctly predicted the direction of affinity change for 5 out of 6 designed variants (e.g., ΔΔG < 0 indicates enhanced binding free energy relative to WT); such predictive power for relative ranking is highly valuable for prioritizing candidates for experimental testing. Our prior work on RNA–protein interactions likewise supports the reliability of relative affinity predictions (see: Nat Commun 2023, doi:10.1038/s41467-023-39410-8). The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here.’

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors)

      (1) Overall, the paper is well-written and, in the opinion of this reviewer, could remain as it is.

      We thank the reviewer for the positive evaluation and supportive comments regarding our manuscript. We are grateful for the endorsement of its quality and suitability for publication.

      Reviewer #2 (Recommendations for the authors)

      (1) All molecules present in experiments need to be reported with their final concentrations (not µg).

      We thank the reviewer for raising this important point. In the revised manuscript, all antibody amounts previously reported in µg have been converted to their corresponding molar concentrations in Fig. 1F and Fig. 5D. In addition, the exact antibody concentrations used in the EMSA assays are now explicitly stated in the Materials and Methods section under “EMSA experiments.” The following has been added to “EMSA experiments” of the revised manuscript. (Page 30 in the revised manuscript)

      ‘For competitive binding experiments, 40 μM of RBP proteins, 5 μM of annealed Cy3-labelled RNAs and increasing concentrations of SARS-CoV-2 neutralizing antibody 40592-R001 (0–1.67 μM) were mixed in the EMSA buffer and incubated at room temperature for 20 min.’

      (2) An independent K<sub>d</sub> measurement, for example, using a filter binding assay, would greatly strengthen the results.

      We thank the reviewer for this constructive suggestion and agree that an orthogonal biophysical measurement (e.g., a filter-binding assay, SPR or BLI) would further strengthen confidence in the reported dissociation constants. Unfortunately, all available SARS-CoV-2 RBD protein used in this study has been fully consumed and, due to current supply limitations, we were unable to perform new orthogonal binding experiments for the revised manuscript. We regret this limitation and have documented it in the Discussion as an item for future work.

      Importantly, although we could not perform a new filter-binding experiment at this stage, we have multiple independent lines of evidence that support the reliability of the EMSA-derived affinity trends reported in the manuscript:

      (1) Rigorous EMSA design and reproducibility. All EMSA binding curves reported in the manuscript (e.g., Figs. 2F–G, 4E–F, 5A and Fig. S9) are derived from three independent biological replicates and include standard deviations; the measured binding curves show good reproducibility across replicates.

      (2) Appropriate positive and negative controls. Our gel assays include clear internal controls. The literature-reported strong binder Ta forms a distinct aptamer–RBD complex band under our conditions, whereas the negative-control aptamer Tc shows no detectable binding under identical conditions (see Fig. 2F). These controls demonstrate that the EMSA system discriminates specific from non-binding sequences with high sensitivity.

      (3) Orthogonal computational validation (FEP) that agrees with experiment. The central strength of the CAAMO framework is the integration of rigorous physics-based calculations with experiments. We performed FEP calculations for the selected single-nucleotide mutations and computed ΔΔG values for each mutant. The direction and rank order of binding changes predicted by FEP are in good agreement with the EMSA measurements: five of six FEP-predicted improved mutants (Ta<sup>G34C</sup>, Ta<sup>G34U</sup>, Ta<sup>G34A</sup>, Ta<sup>C23A</sup>, Ta<sup>C23U</sup>) were experimentally confirmed to have stronger apparent affinity than wild-type Ta (see Fig. 4D–F, Table S2), yielding a success rate of 83%. The concordance between an independent, rigorous computational method and our experimental measurements provides strong mutual validation.

      (4) Independent competitive binding experiments. We additionally performed competitive EMSA assays against a commercial neutralizing monoclonal antibody (40592-R001). These competition experiments show that Ta<sup>G34C</sup>–RBD complexes are resistant to antibody displacement under conditions that partially displace the wild-type Ta–RBD complex (see Fig. 5D). This result provides an independent, functionally relevant line of evidence that Ta<sup>G34C</sup> binds RBD with substantially higher affinity and specificity than WT Ta under our assay conditions.

      Given these multiple, independent lines of validation (rigorous EMSA replicates and controls, FEP agreement, and antibody competition assays), we are confident that the relative affinity improvements reported in the manuscript are robust, even though the absolute K<sub>d</sub> values measured by EMSA are not directly comparable to surface-based methods (EMSA typically reports larger apparent K<sub>d</sub> values than SPR/BLI due to methodological differences). The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘Although the absolute K<sub>d</sub> values determined by EMSA cannot be directly compared with surface-based methods such as SPR or BLI, the relative affinity trends remain highly consistent. While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here. In future studies, additional orthogonal biophysical techniques (e.g., filter-binding, SPR, or BLI) will be employed to further validate and refine the protein–aptamer interaction models.’

      (3) The project would really benefit from a different aptamer-target system. Starting with a 100 µM aptamer is really not adequate.

      We thank the reviewer for this important suggestion and for highlighting the value of testing the CAAMO framework in additional aptamer–target systems.

      First, we wish to clarify the rationale for selecting the Ta–RBD system as the proof-of-concept. The Ta aptamer is not an arbitrary or weak binder: it was originally identified by independent SELEX experiments and subsequently validated by rigorous biophysical assays (SPR and BLI) (see: Proc. Natl. Acad. Sci. 2021, doi: 10.1073/pnas.2112942118). That study confirmed that Ta exhibits high-affinity and high-specificity binding to the SARS-CoV-2 RBD, which is why it serves as a well-characterized and biologically relevant system for method validation and optimization. We have added a brief clarification to the “Introduction” to emphasize these points. The following has been added to “Introduction” of the revised manuscript. (Page 4 in the revised manuscript)

      ‘This aptamer was originally identified through SELEX and subsequently validated using surface plasmon resonance (SPR) and biolayer interferometry (BLI), which confirmed its high affinity and high specificity toward the RBD. Therefore, Ta provides a well-characterized and biologically relevant starting point for structure-based optimization.’

      Second, we agree that apparent discrepancies in absolute K<sub>d</sub> values can arise from different experimental platforms. Surface-based methods (SPR/BLI) and gel-shift assays (EMSA) have distinct measurement principles; EMSA yields semi-quantitative, solution-phase, apparent K<sub>d</sub> values that are not directly comparable in absolute magnitude to surface-based measurements. Crucially, however, our study focuses on relative affinity change. EMSA is well suited for parallel, comparative measurements across multiple variants when all samples are assayed under identical conditions, and thus provides a reliable readout for ranking and validating designed mutations. We have added a short statement in the “Discussion and conclusion”. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 24 in the revised manuscript)

      ‘Although the absolute K<sub>d</sub> values determined by EMSA cannot be directly compared with surface-based methods such as SPR or BLI, the relative affinity trends remain highly consistent. While EMSA provides semi-quantitative affinity estimates, the close agreement between experimental EMSA trends and FEP-calculated ΔΔG values supports the robustness of the relative affinity changes reported here. In future studies, additional orthogonal biophysical techniques (e.g., filter-binding, SPR, or BLI) will be employed to further validate and refine the protein–aptamer interaction models.’

      Third, and importantly, CAAMO is inherently generalizable. In addition to the Ta–RBD application presented here, we have already begun applying CAAMO to other aptamer–target systems. In particular, we have successfully deployed the framework in preliminary optimization studies of RNA aptamers targeting the epidermal growth factor receptor (EGFR) (see: Gastroenterology 2021, doi: 10.1053/j.gastro.2021.05.055) (see Author response image 2). These preliminary results support the transferability of the CAAMO pipeline beyond the SARS-CoV-2 RBD system. We have added a short statement in the “Discussion and conclusion”. The following has been added to “Discussion and conclusion” of the revised manuscript. (Page 259 in the revised manuscript)

      ‘In addition to the Ta–RBD system, the CAAMO framework itself is inherently generalizable. More work is currently underway to apply CAAMO to optimize aptamers targeting other therapeutically relevant proteins, such as the epidermal growth factor receptor (EGFR) [45], in order to further explore its potential for broader aptamer engineering.’

      Author response image 2.

      Overview of the predicted binding model of the EGFR–aptamer complex generated using the CAAMO framework.

      (4) Several RBD variants should be tested, as well as other proteins, for specificity. At such weak affinities, it is likely that these are non-specific binders.

      We thank the reviewer for this important concern. Below we clarify the basis for selecting Ta and its engineered variants, summarize the experimental controls that address specificity, and present the extensive in silico variant analysis we performed to assess sensitivity and breadth of binding.

      (1) Origin and validation of Ta. As noted in our response to “Comment (3)”, the Ta aptamer was not chosen arbitrarily. Ta was identified by independent SELEX with both positive and negative selection and subsequently validated using surface-based biophysical assays (SPR and BLI), which reported low-nanomolar affinity and high specificity for the SARS-CoV-2 RBD. Thus, Ta is a well-characterized, experimentally validated starting lead for method development and optimization.

      (2) Experimental specificity controls. We appreciate the concern that weak apparent affinities can reflect non-specific binding. As noted in our response to “Comment (2)”, we applied multiple experimental controls that argue against non-specificity: (i) a literature-reported weak binder (Tc) was used as a negative control and produced no detectable complex under identical EMSA conditions (see Figs. 2F–G), demonstrating the assay’s ability to discriminate non-binders from specific binders; (ii) competitive EMSA assays with a commercial neutralizing monoclonal antibody (40592-R001) show that both Ta and Ta<sup>G34C</sup> engage the same or overlapping RBD site as the antibody, and that Ta<sup>G34C</sup> is substantially more resistant to antibody displacement than WT Ta (see Figs. 3D–E, 5D). Together, these wet-lab controls support that the observed aptamer-RBD bands reflect specific interactions rather than general, non-specific adsorption.

      (3) Variant and specificity analysis by rigorous FEP calculations. To address the reviewer’s request to evaluate variant sensitivity, we performed extensive free energy perturbation combined with Hamiltonian replica-exchange molecular dynamics (FEP/HREX) for improved convergence efficiency and increased simulation time to estimate relative binding free energy changes (ΔΔG) of both WT Ta and the optimized Ta<sup>G34C</sup> against a panel of RBD variants. Results are provided in Tables S4 and S5. Representative findings include: For WT Ta versus early lineages, FEP reproduces the experimentally observed trends: Alpha (B.1.1.7; N501Y) yields ΔΔG<sub>FEP</sub> = −0.42 ± 0.07 kcal/mol (ΔΔG<sub>exp</sub> = −0.24), while Beta (B.1.351; K417N/E484K/N501Y) gives ΔΔG<sub>FEP</sub> = 0.64 ± 0.25 kcal/mol (ΔΔG<sub>exp</sub> = 0.36) (see Table S4). The agreement between the computational and experimental results supports the fidelity of our computational model for variant assessment. For the engineered Ta<sup>G34C</sup>, calculations across a broad panel of variants indicate that Ta<sup>G34C</sup> retains or improves binding (ΔΔG < 0) for the majority of tested variants, including Alpha, Beta, Gamma and many Omicron sublineages. Notable examples: BA.1 (ΔΔG = −3.00 ± 0.52 kcal/mol), BA.2 (ΔΔG = −2.54 ± 0.60 kcal/mol), BA.2.75 (ΔΔG = −5.03 ± 0.81 kcal/mol), XBB (ΔΔG = −3.13 ± 0.73 kcal/mol) and XBB.1.5 (ΔΔG = −2.28 ± 0.96 kcal/mol). A minority of other Omicron sublineages (e.g., BA.4 and BA.5) show modest positive ΔΔG values (2.11 ± 0.67 and 2.27 ± 0.68 kcal/mol, respectively), indicating a predicted reduction in affinity for those specific backgrounds. Overall, these data indicate that the designed Ta<sup>G34C</sup> aptamer can maintain its binding ability with most SARS-CoV-2 variants, showing potential for broad-spectrum antiviral activity (see Table S5). The following has been added to “Results” of the revised manuscript. (Page 22 in the revised manuscript)

      ‘2.6 Binding performance of Ta and Ta<sup>G34C</sup> against SARS-CoV-2 RBD variants

      To further evaluate the binding performance and specificity of the designed aptamer Ta<sup>G34C</sup> toward various SARS-CoV-2 variants [39], we conducted extensive free energy perturbation combined with Hamiltonian replica-exchange molecular dynamics (FEP/HREX) [40–42] for both the wild-type aptamer Ta and the optimized Ta<sup>G34C</sup> against a series of RBD mutants. The representative variants include the early Alpha (B.1.1.7) and Beta (B.1.351) lineages, as well as a panel of Omicron sublineages (BA.1–BA.5, BA.2.75, BQ.1, XBB, XBB.1.5, EG.5.1, HK.3, JN.1, and KP.3) carrying multiple mutations within the RBD region (residues 333–527). For each variant, mutations within 5 Å of the bound aptamer were included in the FEP to accurately estimate the relative binding free energy change (ΔΔG).

      For the wild-type Ta aptamer, the FEP-predicted binding affinities toward the Alpha and Beta RBD variants were consistent with the previous experimental results, further validating the reliability of our model (see Table S4). Specifically, Ta maintained comparable or slightly enhanced binding to the Alpha variant and showed only marginally reduced affinity for the Beta variant.

      In contrast, the optimized aptamer Ta<sup>G34C</sup> exhibited markedly improved and broad-spectrum binding capability toward most tested variants (see Table S5). For early variants such as Alpha, Beta, and Gamma, Ta<sup>G34C</sup> maintained enhanced affinities (ΔΔG < 0). Notably, for multiple Omicron sublineages—including BA.1, BA.2, BA.2.12.1, BA.2.75, XBB, XBB.1.5, XBB.1.16, XBB.1.9, XBB.2.3, EG.5.1, XBB.1.5.70, HK.3, BA.2.86, JN.1 and JN.1.11.1—the calculated binding free energy changes ranged from −1.89 to −7.58 kcal/mol relative to the wild-type RBD, indicating substantially stronger interactions despite the accumulation of multiple mutations at the aptamer–RBD interface. Only in a few other Omicron sublineages, such as BA.4, BA.5, and KP.3, a slight reduction in binding affinity was observed (ΔΔG > 0).

      These computational findings demonstrate that the Ta<sup>G34C</sup> aptamer not only preserves high affinity for the RBD but also exhibits improved tolerance to the extensive mutational landscape of SARS-CoV-2. Collectively, our results suggest that Ta<sup>G34C</sup> holds promise as a high-affinity and potentially cross-variant aptamer candidate for targeting diverse SARS-CoV-2 spike protein variants, showing potential for broad-spectrum antiviral activity.’

      The following has been added to “Materials and Methods” of the revised manuscript. (Page 29 in the revised manuscript)

      ‘4.7 FEP/HREX

      To evaluate the binding sensitivity of the optimized aptamer Ta<sup>G34C</sup> toward SARS-CoV-2 RBD variants, we employed free energy perturbation combined with Hamiltonian replica-exchange molecular dynamics (FEP/HREX) simulations for enhanced sampling efficiency and improved convergence. The relative binding free energy changes (ΔΔG) upon RBD mutations were estimated as:

      ΔΔ𝐺 = Δ𝐺<sub>bound</sub> − Δ𝐺<sub>free</sub>

      where ΔG<sub>bound</sub> and ΔG<sub>free</sub> represent the RBD mutations-induced free energy changes in the complexed and unbound states, respectively. All simulations were performed using GROMACS 2021.5 with the Amber ff14SB force field. For each mutation, dual-topology structures were generated in a pmx-like manner, and 32 λ-windows (0.0, 0.01, 0.02, 0.03, 0.06, 0.09, 0.12, 0.16, 0.20, 0.24, 0.28, 0.32, 0.36, 0.40, 0.44, 0.48, 0.52, 0.56, 0.60, 0.64, 0.68, 0.72, 0.76, 0.80, 0.84, 0.88, 0.91, 0.94, 0.97, 0.98, 0.99, 1.0) were distributed uniformly between 0.0 and 1.0. To ensure sufficient sampling, each window was simulated for 5 ns, with five independent replicas initiated from distinct velocity seeds. Replica exchange between adjacent λ states was attempted every 1 ps to enhance phase-space overlap and sampling convergence. The van der Waals and electrostatic transformations were performed simultaneously, employing a soft-core potential (α = 0.3) to avoid singularities. For each RBD variant system, this setup resulted in an accumulated simulation time of approximately 1600 ns (5 ns × 32 windows × 5 replicas × 2 states). The Gromacs bar analysis tool was used to estimate the binding free energy changes.’

      Tables S4 and S5 have been added to Supplementary Information of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The temporal regulation of neuronal specification and its molecular mechanisms are important problems in developmental neurobiology. This study focuses on Kenyon cells (KCs), which form the mushroom body in Drosophila melanogaster, in order to address this issue. Building on previous findings, the authors examine the role of the transcription factor Eip93F in the development of late-born KCs. The authors revealed that Eip93F controls the activity of flies at night through the expression of the calcium channel Ca-α1T. Thus, the study clarifies the molecular machinery that controls temporal neuronal specification and animal behavior.

      Strengths:

      The convincing results are based on state-of-the-art molecular genetics, imaging, and behavioral analysis.

      Weaknesses:

      Temporal mechanisms of neuronal specification are found in many nervous systems. However, the relationship between the temporal mechanisms identified in this study and those in other systems remains unclear.

      We have discussed the temporal mechanisms between different nervous systems at the beginning of the Discussion section.

      Reviewer #2 (Public review):

      Summary:

      Understanding the mechanisms of neural specification is a central question in neurobiology. In Drosophila, the mushroom body (MB), which is the associative learning region in the brain, consists of three major cell types: γ, α'/β', and α/β kenyon cells. These classes can be further subdivided into seven subtypes, together comprising ~2000 KCs per hemi-brain. Remarkably, all of these neurons are derived from just four neuroblasts in each hemisphere. Therefore, a lot of endeavors are put into understanding how the neuron is specified in the fly MB.

      Over the past decade, studies have revealed that MB neuroblasts employ a temporal patterning mechanism, producing distinct neuronal types at different developmental stages. Temporal identity is conveyed through transcription factor expression in KCs. High levels of Chinmo, a BTB-zinc finger transcription factor, promote γ-cell fate (Zhu et al., Cell, 2006). Reduced Chinmo levels trigger expression of mamo, a zinc finger transcription factor that specifies α'/β' identity (Liu et al., eLife, 2019). However, the specification of α/β neurons remains poorly understood. Some evidence suggests that microRNAs regulate the transition from α'/β' to α/β fate (Wu et al., Dev Cell, 2012; Kucherenko et al., EMBO J, 2012). One hypothesis even proposes that α/β represents a "default" state of MB neurons, which could explain the difficulty in identifying dedicated regulators.

      The study by Chung et al. challenges this hypothesis. By leveraging previously published RNA-seq datasets (Shih et al., G3, 2019), they systematically screened BAC transgenic lines to selectively label MB subtypes. Using these tools, they analyzed the consequences of manipulating E93 expression and found that E93 is required for α/β specification. Furthermore, loss of E93 impairs MB-dependent behaviors, highlighting its functional importance.

      Strengths:

      The authors conducted a thorough analysis of E93 manipulation phenotypes using LexA tools generated from the Janelia Farm and Bloomington collections. They demonstrated that E93 knockdown reduces expression of Ca-α1T, a calcium channel gene identified as an α/β marker. Supporting this conclusion, one LexA line driven by a DNA fragment near EcR (R44E04) showed consistent results. Conversely, overexpression of E93 in γ and α'/β' Kenyon cells led to downregulation of their respective subtype markers.

      Another notable strength is the authors' effort to dissect the genetic epistasis between E93 and previously known regulators. Through MARCM and reporter analyses, they showed that Chinmo and Mamo suppress E93, while E93 itself suppresses Mamo. This work establishes a compelling molecular model for the regulatory network underlying MB cell-type specification.

      Weaknesses:

      The interpretation of E93's role in neuronal specification requires caution. Typically, two criteria are used to establish whether a gene directs neuronal identity:

      (1) gene manipulation shifts the neuronal transcriptome from one subtype to another, and

      (2) gene manipulation alters axonal projection patterns.

      The results presented here only partially satisfy the first criterion. Although markers are affected, it remains possible that the reporter lines and subtype markers used are direct transcriptional targets of E93 in α/β neurons, rather than reflecting broader fate changes. Future studies using single-cell transcriptomics would provide a more comprehensive assessment of neuronal identity following E93 perturbation.

      We do plan conduct multi-omics experiments to provide a more comprehensive assessment of neuronal identity upon loss-of-function of E93. However, omics results take time to be conducted and analyzed, so the result will be summarized in a future manuscript.

      With respect to the second criterion, the evidence is also incomplete. While reporter patterns were altered, the overall morphology of the α/β lobes appeared largely intact after E93 knockdown. Overexpression of E93 in γ neurons produced a small subset of cells with α/β-like projections, but this effect warrants deeper characterization before firm conclusions can be drawn. While the results might be an intrinsic nature of KC types in flies, the interpretation of the reader of the data should be more careful, and the authors should also mention this in their main text.

      We have toned down our description on the effect of E93 (especially in the loss-offunction) in specifying the α/β-specific cell identity and discussed whether unidentified regulators would work together with E93 in α/β neural fate specification.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Changes in nighttime activity in flies upon knocking down Ca_α1T and Eip93F are interesting (Fig. 2C). However, examining the morphological changes in the mushroom body under these conditions would be essential.

      We did not find the morphological change of mushroom body lobes by examining with the Fas2 staining (shown in Figure S8D).

      (2) Temporal mechanisms of neuronal specification have been identified in various nervous systems, including the embryonic central nervous system (CNS), the optic lobe of Drosophila, and the nervous systems of other organisms. The Discussion section should address the relationship between the temporal mechanisms identified in this study and those identified in other systems.

      We have discussed the temporal mechanisms between different nervous systems at the beginning of the Discussion section.

      (3) Eip93F is an Ecdysone-induced protein. In the Discussion section, the authors should discuss the relationship between the ecdysone signal and the roles of Eip93F.

      We have added the discussion on the relationship between the ecdysone signal and the roles of Eip93F.

      Reviewer #2 (Recommendations for the authors):

      (1) The behavioral effect of Ca-α1T knockdown is pretty interesting. But how the downregulation of Ca-α1T in the mushroom body can affect locomotion is puzzling. Even though the mushroom body is known to suppress locomotion (Matin et al., Learn Mem, 1998), the real results are opposite. Can authors give further explanation in the discussion? Also, the behavioral experiments are hard to interpret, given that Figure 2C(1) and Figure 2C(3) as a control, also vary a lot. Since the behavioral experiments don't affect the main conclusion of the paper, I would suggest removing that part or adding more explanation in the discussion.

      First, we have discussed the puzzling part on the MB influence in locomotion between the previous study using tetanus toxin light chain (TeNT-Ln) and our Ca-α1T knockdown result. It is possible that the different effect is derived from TeNT-Ln’s function in MB axons and Ca-α1T’s function in MB dendrites. Secondly, we have re-conducted the behavioral results using a new α/β driver (13F02-AD/70F05-DBD) to replace our initial behavioral results (using c739-GAL4, which would cause the abnormal wing when drives E93 RNAi expression; see S8C(2) Fig). Current results (now in Fig 2I) are more consistent in control groups.

      (2) In the manuscript, the authors use "subtype" to describe γKC, α'/β'KC and α/βKC in the fly MB. However, in most of the literature, people use "main types" to summarize these three types, and "subtype" is mostly about the difference in γd, γm, α'/β'ap, α'/β'm, α/βp, α/βs and α/βc KC (Shih et al., G3, 2019). Replacing "subtypes" with "main types" will help to increase the clarity.

      We have replaced "KC subtypes" with "main KC types" or just “KC types”.

      (3) The authors have identified a lot of new markers for the KC cell types, and some of them are used in this manuscript. It will be helpful if they can have a figure to summarize the markers they used in this study and what cell types they labeled.

      We have summarized expression patterns of these markers in Supplemental table 1.

      (4) In the method, the authors mentioned that only females were selected for analysis of Ca-α1T-GFSTF. Could the authors explain the reasons in more detail?

      Since homozygous Ca-α1T-GFSTF female flies and hemizygous Ca-α1T-GFSTF male are a bit sick and hard to collect, we therefore used heterozygous Ca-α1T-GFSTF female in our experiments. I have added this description in the Materials and Methods section.

      (5) Figure S1: The legend of magenta fluorescence is missing. Please add which protein is shown in magenta.

      We have added the legend of magenta fluorescence, which is Trio.

      (6) The detailed genotypes of Figure 2C and Figure S7 are missing in Supplementary Table 1. Please include that, so that readers can know the genetic background.

      We have added genotypes of Figure 2I (previously Figure 2C) and Figure S8 (previously as Figure S7) in Supplementary Table 2.

      (7) Figure 2D-G: It will be helpful if the authors can outline the lobe (γ, α'/β', and α/β) in the figure, which will help readers to understand the images.

      We have outlined α, α', β, β' and γ lobes in Figure 2C-F (previously as Figure 2D-G).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This is a rigorous data-driven modeling study, extending the authors' previous model of spinal locomotor central pattern generator (CPG) circuits developed for the mouse spinal cord and adapted here to the rat to explore potential circuit-level changes underlying altered speeddependent gaits, due to asymmetric (lateral) thoracic spinal hemisection and symmetric midline contusion. The model reproduces key features of the rat speed-dependent gait-related experimental data before injury and after recovery from these two different thoracic spinal cord injuries and suggests injury-specific mechanisms of circuit reorganization underlying functional recovery. There is much interest in the mechanisms of locomotor behavior recovery after spinal cord injury, and data-driven behaviorally relevant circuit modeling is an important approach. This study represents an important advance in the authors' previous experimental and modeling work on locomotor circuitry and in the motor control field.

      Strengths:

      (1) The authors use an advanced computational model of spinal locomotor circuitry to investigate potential reorganization of neural connectivity underlying locomotor control following recovery from symmetrical midline thoracic contusion and asymmetrical (lateral) hemisection injuries, based on an extensive dataset for the rat model of spinal cord injury.

      (2) The rat dataset used is from an in vivo experimental paradigm involving challenging animals to perform overground locomotion across the full range of speeds before and after the two distinct spinal cord injury models, enabling the authors to more completely reveal injury-specific deficits in speed-dependent interlimb coordination and locomotor gaits.

      (3) The model reproduces the rat gait-related experimental data before injury and after recovery from these two different thoracic spinal cord injuries, which exhibit roughly comparable functional recovery, and suggests injury-specific, compensatory mechanisms of circuit reorganization underlying recovery.

      (4) The model simulations suggest that recovery after lateral hemisection mechanistically involves partial functional restoration of descending drive and long propriospinal pathways. In contrast, recovery following midline contusion relies on reorganization of sublesional lumbar circuitry combined with altered descending control of cervical networks.

      (5) These observations suggest that symmetrical (contusion) and asymmetrical (lateral hemisection) injuries induce distinct types of plasticity in different spinal cord regions, suggesting that injury symmetry partly dictates the location and type of neural plasticity supporting recovery.

      (6) The authors suggest that therapeutic strategies may be more effective by targeting specific circuits according to injury symmetry.

      Weaknesses:

      The recovery mechanisms implemented in the model involve circuit connectivity/connection weights adjustment based on assumptions about the structures involved and compensatory responses to the injury. As the authors acknowledge, other factors affecting locomotor patterns and compensation, such as somatosensory afferent feedback, neurochemical modulator influences, and limb/body biomechanics, are not considered in the model.

      We appreciate the positive review and critical comments. We added a dedicate limitation and future direction section (see response recommendations below). Further, we also performed a sensitivity analysis: while the model still relies on a set of hypothesized connectivity changes, this analysis quantifies how robust our conclusions are to these parameter choices and indicates which pathways most strongly affect the recovered locomotor pattern.

      Reviewer #1 (Recommendations for the authors):

      The authors have used an advanced model of rodent spinal locomotor CPG circuits, adapted to the rat spinal cord, which remarkably reproduces the key features of the rat speed-dependent gait-related experimental data before injury and after recovery from the two different thoracic spinal cord injuries studied. Importantly, they have exploited the extensive dataset for the in vivo rat spinal cord injury model involving overground locomotion across the full range of speeds before and after the two distinct spinal cord injuries, enabling the authors to more completely reveal injury-specific deficits in speed-dependent interlimb coordination and locomotor gaits. The paper is well-written and well-illustrated.

      (1) My only general suggestion is that the authors include a section that succinctly summarizes the limitations of the modeling and points to elaborations of the model and experimental data required for future studies. Some important caveats are dispersed throughout the Discussion, but a more consolidated section would be useful.

      We added a dedicated Limitations and future directions section (page XX) that consolidates shortcomings and broadly outlines potential next steps in terms of modeling and experimental data. Specifically, we highlight the issue of lack of afferent feedback connections in the model, lack of consideration of biomechanic mechanisms, and restriction of the model to beneficial plasticity. To resolve these issues, we need neuromechancial models (integration of the neural circuits with a model of the musculoskeletal system), experimental data validating our predictions and data to constrain future models to be able to distinguish between beneficial and maladaptive plasticity.

      (2) Please correct the Figure 11 legend title to indicate recovery after contusion (not hemisection). 

      Done. Thanks for noticing.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors present a detailed computational model and experimental data concerning overground locomotion in rats before and after recovery from spinal cord injury. They are able to manually tune the parameters of this physiologically based, detailed model to reproduce many aspects of the observed animals locomotion in the naive case and in two distinct injury cases.

      Strengths:

      The strengths are that the model is driven to closely match clean experimental data, and the model itself has detailed correspondence to proposed anatomical reality. As such, this makes the model more readily applicable to future experimental work. It can make useful suggestions. The model reproduces a large number of conditions across frequencies, and with the model structure changed by injury and recovery. The model is extensive and is driven by known structures, with links to genetic identities, and has been extensively validated across multiple experiments and manipulations over the years. It models a system of critical importance to the field, and the tight coupling to experimental data is a real strength.

      Weaknesses:

      A downside is that, scientifically, here, the only question tackled is one of sufficiency. By manually tuning parameters in a manner that aligns with the field's understanding from experimental work, the detailed model can accurately reproduce the experimental findings. One of the benefits of computational models is that the counterfactual can be tested to provide evidence against alternative hypotheses. That isn't really done here. I'm fairly certain that there are competing theories regarding what happens during recovery from a hemi-section injury and a contusion injury. The model could be used to make predictions for some alternative hypotheses, supporting or rejecting theories of recovery. This may be part of future plans. Here, the focus is on showing that the model is capable of reproducing the experimental results at all, for any set of parameters, however tuned.

      We agree with the reviewer that the present study focuses on sufficiency, and we now explicitly acknowledge this in the revised limitations section. We also added sensitivity analysis (for details see response to reviewer 3) that provides an initial assessment of robustness to the assumed connectivity changes. We note that the model reproduces a broad set of experimentally observed features across the full range of locomotor frequencies (including loss and emergence of specific gaits, reduced maximum stepping frequency, and altered variability of interlimb phase differences) using only a small set of hypothesized circuit reorganizations that have been experimentally observed but previously only correlated with recovery. Our results therefore suggest that this limited set of changes is indeed sufficient to account for the complex pattern of recovered locomotor behavior.

      Finally, although exploring alternative solutions is of interest, we believe such efforts will be most informative once afferent feedback is incorporated, which we see as the logical next step in our studies.

      Reviewer #2 (Recommendations for the authors):

      The paper could be strengthened with some more scientific interpretation and future directions. What are some novel predictions that can be made with the model, now that it has shown sufficiency here, that could guide future experimental work? Does it contradict in any way theories of CPG structure or neuronal plastic recovery?

      The sensitivity analysis that we performed in response to reviewer 3’s suggestion expanded our interpretation/conclusions by showing that, although injury symmetry (contusion vs. lateral hemisection) influences which pathways reorganize, recovered locomotion across injury type depends most strongly on restored activation of lumbar rhythm-generating and strengths of lumbar commissural circuits.

      Interestingly, this sensitivity analysis also showed that variations of strengths of long propriospinal pathways (ascending, descending, spared, injured-and-recovered) have a much smaller, almost negligible effect, when compared to variations of drive to lumbar rhythm generators or lumbar commissural interneuron connection weights in the same range (see Fig 13, 13-supplement 1 and 2). This is in accordance with our initial model suggestions that after contusion LPN connections weight had to be lowered to values substantially lower than what was expected by the severity of the injury. Which is also corroborated by our anatomical findings that in parallel to recovery from contusion, the number of synaptic connections by LAPNs to the cervical enlargement were reduced, and that silencing of LPNs post-contusion improves locomotion. These surprising findings have been extensively discussed in the discussion section.

      Together, these findings suggest that experimental characterization of reorganization of the lumbar circuitry with a specific focus on commissural interneurons and inputs to the lumbar circuitry that could restore activation of sublesional lumbar rhythm generators is a crucial next step for understanding post-injury plasticity and recovery of locomotor function. This is now clearly discussed.

      Finally, we note that a key contribution of this work is that the model demonstrates a plausible mechanistic link between specific circuit reorganizations and recovered locomotor function, a relationship previously supported mainly by correlative evidence.

      Reviewer #3 (Public review):

      Summary:

      This study describes a computational model of the rat spinal locomotor circuit and how it could be reconfigured after lateral hemisection or contusion injuries to replicate gaits observed experimentally.

      The model suggests the emergence of detour circuits after lateral hemisection, whereas after a midline contusion, the model suggests plasticity of left-right and sensory inputs below the injury.

      Strengths:

      The model accurately models many known connections within and between forelimb and hindlimb spinal locomotor circuits.

      The simulation results mirror closely gait parameters observed experimentally. Many gait parameters were studied, as well as variability in these parameters in intact versus injured conditions.

      Weaknesses:

      The study could provide some sense of the relative importance of the various modified connectivities after injury in setting the changes in gait seen after the two types of injuries.

      We performed a local sensitivity analysis of the hemisection and contusion models to identify which connectivity changes most strongly influence post-injury locomotor behavior. Key parameters (descending drive to sublesional rhythm generators and the strength of selected commissural and propriospinal pathways) were perturbed within 80–125% of their baseline values, and for each perturbation we quantified changes in model output using the Earth Mover’s Distance between baseline and perturbed simulations in a 7-dimensional space (six interlimb phase differences plus locomotor frequency). We then trained a surrogate model and computed Sobol first- and total-order sensitivity indices, which quantify how much each parameter and its interactions contribute to variability in this distance measure. This analysis showed that, across both injuries, variations in drive to sublesional lumbar rhythm generators and in lumbar V0/V3 commissural connectivity have the largest impact on recovered gait expression, whereas other pathways had comparatively minor effects within the tested range.

      The sensitivity analysis further refined our conclusions by showing that, although injury symmetry (contusion vs. lateral hemisection) influences which pathways reorganize, effective recovery in both cases depends on re-engaging lumbar rhythm-generating and commissural circuits, highlighting these networks as key therapeutic targets.

      Overall, the authors achieved their aims, and the model provides solid support for the changes in connectivity after the two types of injuries were modelled. This work emphasizes specific changes in connectivity after lateral hemisection or after contusion that could be investigated experimentally. The model is available for public use and could serve as a tool to analyze the relative importance of various highlighted or previously undiscovered changes in connectivity that may underlie the recovery of locomotor function in spinalized rats.

      Reviewer #3 (Recommendations for the authors):

      (1) It would be useful to study the sensitivity of the injured models to small changes in the connectivity changes to determine which ones play a greater role in the gait after injury.

      See response above on the added sensitivity analysis.

      (2) Was there any tissue analysis from the original experiments with the contusion experiments, as contusion experiments can be variable, so it would be good to know the level of variability in the injuries?

      Unfortunately, we were unable to complete tissue analysis of the injury epicenters for these animals because the tissue was not handled appropriately for histology. However, in the past, comparable animals with T10 12.5g-cm contusion injuries delivered by the NYU (MASCIS) Impactor had variability of up to ~30% of the mean (spared white matter, e.g. see Smith et al., 2006). It is also worth noting that spared white matter at the epicenter, at least in our hands, is generally well-correlated with BBB overground locomotor scale scores.

      (3) There is more variability in phase difference in rats than model in the lateral hemisection. Is there any way to figure out which of the connectivity changes is most responsible for that variability? 

      We agree that the variability of phase differences after lateral hemisection is larger in rats than in the model. One possible contributor to this discrepancy is the strength of spared long propriospinal neuron (LPN) pathways, which we kept fixed at pre-injury levels in the model. As an exploratory analysis, we varied the weights of these spared LPN connections and quantified the circular standard deviation of the phase differences (Author response image 1). Decreasing spared LPN weights increased the variability of all phase differences. This suggests that plasticity of spared LPNs (potentially reducing their effective connectivity and partly compensating for the asymmetry introduced by the lesion) could contribute to the higher variability seen in vivo. However, because these results remain speculative, we chose to include them in this response only and not in the main manuscript.

      Author response image 1.

      Variability of phase differences as a function of spared long propriospinal neuron connection weights (hemisection model).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Monziani and Ulitsky present a large and exhaustive study on the lncRNA EPB41L4A-AS1 using a variety of genomic methods. They uncover a rather complex picture of an RNA transcript that appears to act via diverse pathways to regulate the expression of large numbers of genes, including many snoRNAs. The activity of EPB41L4A-AS1 seems to be intimately linked with the protein SUB1, via both direct physical interactions and direct/indirect of SUB1 mRNA expression.

      The study is characterised by thoughtful, innovative, integrative genomic analysis. It is shown that EPB41L4A-AS1 interacts with SUB1 protein and that this may lead to extensive changes in SUB1's other RNA partners. Disruption of EPB41L4A-AS1 leads to widespread changes in non-polyA RNA expression, as well as local cis changes. At the clinical level, it is possible that EPB41L4A-AS1 plays disease-relevant roles, although these seem to be somewhat contradictory with evidence supporting both oncogenic and tumour suppressive activities.

      A couple of issues could be better addressed here. Firstly, the copy number of EPB41L4A-AS1 is an important missing piece of the puzzle. It is apparently highly expressed in the FISH experiments. To get an understanding of how EPB41L4A-AS1 regulates SUB1, an abundant protein, we need to know the relative stoichiometry of these two factors. Secondly, while many of the experiments use two independent Gapmers for EPB41L4A-AS1 knockdown, the RNA-sequencing experiments apparently use just one, with one negative control (?). Evidence is emerging that Gapmers produce extensive off-target gene expression effects in cells, potentially exceeding the amount of on-target changes arising through the intended target gene. Therefore, it is important to estimate this through the use of multiple targeting and non-targeting ASOs, if one is to get a true picture of EPB41L4A-AS1 target genes. In this Reviewer's opinion, this casts some doubt over the interpretation of RNA-seq experiments until that work is done. Nonetheless, the Authors have designed thorough experiments, including overexpression rescue constructs, to quite confidently assess the role of EPB41L4A-AS1 in snoRNA expression.

      It is possible that EPB41L4A-AS1 plays roles in cancer, either as an oncogene or a tumour suppressor. However, it will in the future be important to extend these observations to a greater variety of cell contexts.

      This work is valuable in providing an extensive and thorough analysis of the global mechanisms of an important regulatory lncRNA and highlights the complexity of such mechanisms via cis and trans regulation and extensive protein interactions.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Monziani et al. identified long noncoding RNAs (lncRNAs) that act in cis and are coregulated with their target genes located in close genomic proximity. The authors mined the GeneHancer database, and this analysis led to the identification of four lncRNA-target pairs. The authors decided to focus on lncRNA EPB41L4A-AS1.

      They thoroughly characterised this lncRNA, demonstrating that it is located in the cytoplasm and the nuclei, and that its expression is altered in response to different stimuli. Furthermore, the authors showed that EPB41L4A-AS1 regulates EPB41L4A transcription, leading to a mild reduction in EPB41L4A protein levels. This was not recapitulated with siRNA-mediated depletion of EPB41L4AAS1. RNA-seq in EPB41L4A-AS1-depleted cells with single LNA revealed 2364 DEGs linked to pathways including the cell cycle, cell adhesion, and inflammatory response. To understand the mechanism of action of EPB41L4A-AS1, the authors mined the ENCODE eCLIP data and identified SUB1 as an lncRNA interactor. The authors also found that the loss of EPB41L4A-AS1 and SUB1 leads to the accumulation of snoRNAs, and that SUB1 localisation changes upon the loss of EPB41L4A-AS1. Finally, the authors showed that EPB41L4A-AS1 deficiency did not change the steady-state levels of SNORA13 nor RNA modification driven by this RNA. The phenotype associated with the loss of EPB41L4A-AS1 is linked to increased invasion and EMT gene signature.

      Overall, this is an interesting and nicely done study on the versatile role of EPB41L4A-AS1 and the multifaceted interplay between SUB1 and this lncRNA, but some conclusions and claims need to be supported with additional experiments. My primary concerns are using a single LNA gapmer for critical experiments, increased invasion, and nucleolar distribution of SUB1- in EPB41L4A-AS1-depleted cells. These experiments need to be validated with orthogonal methods.

      Strengths:

      The authors used complementary tools to dissect the complex role of lncRNA EPB41L4A-AS1 in regulating EPB41L4A, which is highly commendable. There are few papers in the literature on lncRNAs at this standard. They employed LNA gapmers, siRNAs, CRISPRi/a, and exogenous overexpression of EPB41L4A-AS1 to demonstrate that the transcription of EPB41L4A-AS1 acts in cis to promote the expression of EPB41L4A by ensuring spatial proximity between the TAD boundary and the EPB41L4A promoter. At the same time, this lncRNA binds to SUB1 and regulates snoRNA expression and nucleolar biology. Overall, the manuscript is easy to read, and the figures are well presented. The methods are sound, and the expected standards are met.

      Weaknesses:

      The authors should clarify how many lncRNA-target pairs were included in the initial computational screen for cis-acting lncRNAs and why MCF7 was chosen as the cell line of choice. Most of the data uses a single LNA gapmer targeting EPB41L4A-AS1 lncRNA (eg, Fig. 2c, 3B, and RNA-seq), and the critical experiments should be using at least 2 LNA gapmers. The specificity of SUB1 CUT&RUN is lacking, as well as direct binding of SUB1 to lncRNA EPB41L4A-AS1, which should be confirmed by CLIP qPCR in MCF7 cells. Finally, the role of EPB41L4A-AS1 in SUB1 distribution (Figure 5) and cell invasion (Figure 8) needs to be complemented with additional experiments, which should finally demonstrate the role of this lncRNA in nucleolus and cancer-associated pathways. The use of MCF7 as a single cancer cell line is not ideal.

      Reviewer #3 (Public review):

      Summary:

      In this paper, the authors made some interesting observations that EPB41L4A-AS1 lncRNA can regulate the transcription of both the nearby coding gene and genes on other chromosomes. They started by computationally examining lncRNA-gene pairs by analyzing co-expression, chromatin features of enhancers, TF binding, HiC connectome, and eQTLs. They then zoomed in on four pairs of lncRNA-gene pairs and used LNA antisense oligonucleotides to knock down these lncRNAs. This revealed EPB41L4A-AS1 as the only one that can regulate the expression of its cis-gene target EPB41L4A. By RNA-FISH, the authors found this lncRNA to be located in all three parts of a cell: chromatin, nucleoplasm, and cytoplasm. RNA-seq after LNA knockdown of EPB41L4A-AS1 showed that this increased >1100 genes and decreased >1250 genes, including both nearby genes and genes on other chromosomes. They later found that EPB41L4A-AS1 may interact with SUB1 protein (an RNA-binding protein) to impact the target genes of SUB1. EPB41L4A-AS1 knockdown reduced the mRNA level of SUB1 and altered the nuclear location of SUB1. Later, the authors observed that EPB41L4A-AS1 knockdown caused an increase of snRNAs and snoRNAs, likely via disrupted SUB1 function. In the last part of the paper, the authors conducted rescue experiments that suggested that the full-length, intron- and SNORA13-containing EPB41L4A-AS1 is required to partially rescue snoRNA expression. They also conducted SLAM-Seq and showed that the increased abundance of snoRNAs is primarily due to their hosts' increased transcription and stability. They end with data showing that EPB41L4A-AS1 knockdown reduced MCF7 cell proliferation but increased its migration, suggesting a link to breast cancer progression and/or metastasis.

      Strengths:

      Overall, the paper is well-written, and the results are presented with good technical rigor and appropriate interpretation. The observation that a complex lncRNA EPB41L4A-AS1 regulates both cis and trans target genes, if fully proven, is interesting and important.

      Weaknesses:

      The paper is a bit disjointed as it started from cis and trans gene regulation, but later it switched to a partially relevant topic of snoRNA metabolism via SUB1. The paper did not follow up on the interesting observation that there are many potential trans target genes affected by EPB41L4A-AS1 knockdown and there was limited study of the mechanisms as to how these trans genes (including SUB1 or NPM1 genes themselves) are affected by EPB41L4A-AS1 knockdown. There are discrepancies in the results upon EPB41L4A-AS1 knockdown by LNA versus by CRISPR activation, or by plasmid overexpression of this lncRNA.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Copy number:

      Perhaps I missed it, but it seems that no attempt is made to estimate the number of copies of EPB41L4A-AS1 transcripts per cell. This should be possible given RNAseq and FISH. At least an order of magnitude estimate. This is important for shedding light on the later observations that EPB41L4A-AS1 may interact with SUB1 protein and regulate the expression of thousands of mRNAs.

      We thank the reviewer for the insightful suggestion. We agree that an estimate of EPB41L4A-AS1 copy number might further strengthen the hypotheses presented in the manuscript. Therefore, we analyzed the smFISH images and calculated the copy number per cell of this lncRNA, as well as that of GAPDH as a comparison.

      Because segmenting MCF-7 cells proved to be difficult due to the extent of the cell-cell contacts they establish, we imaged multiple (n = 14) fields of view, extracted the number of EPB41L4A-AS1/GAPDH molecules in each field and divided them by the number of cells (as assessed by DAPI staining, 589 cells in total). We detected an average of 33.37 ± 3.95 EPB41L4A-AS1 molecules per cell, in contrast to 418.27 ± 61.79 GAPDH molecules. As a comparison, within the same qPCR experiment the average of the Ct values of these two RNAs is about  22.3 and 17.5, the FPKMs in the polyA+ RNA-seq are ~ 2479.4 and 35.6, and the FPKMs in the rRNA-depleted RNA-seq are ~ 3549.9 and 19.3, respectively. Thus, our estimates of the EPB41L4A-AS1 copy number in MCF-7 cells fits well into these observations.

      The question whether an average of ~35 molecules per cell is sufficient to affect the expression of thousands of genes is somewhat more difficult to ascertain. As discussed below, it is unlikely that all the genes dysregulated following the KD of EPB41L4A-AS1 are all direct targets of this lncRNA, and indeed SUB1 depletion affects an order of magnitude fewer genes. It has been shown that lncRNAs can affect the behavior of interacting RNAs and proteins in a substoichiometric fashion (Unfried & Ulitsky, 2022), but whether this applies to EPB41L4A-AS1 remains to be addressed in future studies. Nonetheless, this copy number appears to be sufficient for a trans-acting functions for this lncRNA, on top of its cis-regulatory role in regulating EPB41L4A. We added this information in the text as follows:

      “Using single-molecule fluorescence in-situ hybridization (smFISH) and subcellular fractionation we found that EPB41L4A-AS1 is expressed at an average of 33.37 ± 3.95 molecule per cell, and displays both nuclear and cytoplasmic localization in MCF-7 cells (Fig. 1D), with a minor fraction associated with chromatin as well (Fig. 1E).”

      We have updated the methods section as well:

      “To visualize the subcellular localization of EPB41L4A-AS1 in vivo, we performed single-molecule fluorescence in situ hybridization (smFISH) using HCR™ amplifiers. Probe sets (n = 30 unique probes) targeting EPB41L4A-AS1 and GAPDH (positive control) were designed and ordered from Molecular Instruments. We followed the Multiplexed HCR v3.0 protocol with minor modifications. MCF-7 cells were plated in 8-well chambers (Ibidi) and cultured O/N as described above. The next day, cells were fixed with cold 4% PFA in 1X PBS for 10 minutes at RT and then permeabilized O/N in 70% ethanol at -20°C. Following permeabilization, cells were washed twice with 2X SSC buffer and incubated at 37°C for 30 minutes in hybridization buffer (HB). The HB was then replaced with a probe solution containing 1.2 pmol of EPB41L4A-AS1 probes and 0.6 pmol of GAPDH probes in HB. The slides were incubated O/N at 37°C. To remove excess probes, the slides were washed four times with probe wash buffer at 37°C for 5 minutes each, followed by two washes with 5X SSCT at RT for 5 minutes. The samples were then pre-amplified in amplification buffer for 30 minutes at RT and subsequently incubated O/N in the dark at RT in amplification buffer supplemented with 18 pmol of the appropriate hairpins. Finally, excess hairpins were removed by washing the slides five times in 5X SSCT at RT. The slides were mounted with ProLong™ Glass Antifade Mountant (Invitrogen), cured O/N in the dark at RT, and imaged using a Nikon CSU-W1 spinning disk confocal microscope. In order to estimate the RNA copy number, we imaged multiple distinct fields, extracted the number of EPB41L4A-AS1/GAPDH molecules in each field using the “Find Maxima” tool in ImageJ/Fiji, and divided them by the number of cells (as assessed by DAPI staining).”

      (2) Gapmer results:

      Again, it is quite unclear how many and which Gapmer is used in the genomics experiments, particularly the RNA-seq. In our recent experiments, we find very extensive off-target mRNA changes arising from Gapmer treatment. For this reason, it is advisable to use both multiple control and multiple targeting Gapmers, so as to identify truly target-dependent expression changes. While I acknowledge and commend the latter rescue experiments, and experiments using multiple Gapmers, I'd like to get clarification about how many and which Gapmers were used for RNAseq, and the authors' opinion on the need for additional work here.

      We agree with the Reviewer that GapmeRs are prone to off-target and unwanted effects (Lai et al., 2020; Lee & Mendell, 2020; Maranon & Wilusz, 2020). Early in our experiments, we found out that LNA1 triggers a non-specific CDKN1A/p21 activation (Fig. S5A-C), and thus, we have initially performed some experiments such as RNA-seq with only LNA2.

      Nonetheless, other experiments were performed using both GapmeRs, such as multiple RT-qPCRs, UMI-4C, SUB1 and NPM1 imaging, and the in vitro assays, among others, and consistent results were obtained with both LNAs.

      To accommodate the request by this and the other reviewers, we have now performed another round of polyA+ RNA-seq following EPB41L4A-AS1 knockdown using LNA1 or LNA2, as well as the previously used and an additional control GapmeR. The FPKMs of the control samples are highly-correlated both within replicates and between GapmeRs (Fig. S6A). More importantly, the fold-changes to control are highly correlated between the two on-target GapmeRs LNA1 and LNA2, regardless of the GapmeR used for normalization (Fig. S6B), thus showing that the bulk of the response is shared and likely the direct result of the reduction in the levels of EPB41L4A-AS1. Notably, key targets NPM1 and MTREX (see discussion, Fig. S12A-C and comments to Reviewer 3) were found to be downregulated by both LNAs (Fig. S6C).

      However, we acknowledge that some of the dysregulated genes are observed only when using one GapmeR and not the other, likely due to a combination of indirect, secondary and non-specific effects, and as such it is difficult to infer the direct response. Supporting this, LNA2 yielded a total of 1,069 DEGs (617 up and 452 down) and LNA1 2,493 DEGs (1,328 up and 1,287 down), with the latter triggering a stronger response most likely as a result of the previously mentioned CDKN1A/p21 induction. Overall, 45.1% of the upregulated genes following LNA2 transfection were shared with LNA1, in contrast to only the 24.3% of the downregulated ones.

      We have now included these results in the Results section (see below) and in Supplementary Figure (Fig. S6).

      “Most of the consequences of the depletion of EPB41L4A-AS1 are thus not directly explained by changes in EPB41L4A levels. An additional trans-acting function for EPB41L4A-AS1 would therefore be consistent with its high expression levels compared to most lncRNAs detected in MCF-7 (Fig. S5G). To strengthen these findings, we have transfected MCF-7 cells with LNA1 and a second control GapmeR (NT2), as well as the previous one (NT1) and LNA2, and sequenced the polyadenylated RNA fraction as before. Notably, the expression levels (in FPKMs) of the replicates of both control samples are highly correlated with each other (Fig. S6A), and the global transcriptomic changes triggered by the two EPB41L4A-AS1-targeting LNAs are largely concordant (Fig. S6B and S6C). Because of this concordance and the cleaner (i.e., no CDKN1A upregulation) readout in LNA2-transfected cells, we focused mainly on these cells for subsequent analyses.”

      (3) Figure 1E:

      Can the authors comment on the unusual (for a protein-coding mRNA) localisation of EPB41L4A, with a high degree of chromatin enrichment?

      We acknowledge that mRNAs from protein-coding genes displaying nuclear and chromatin localizations are quite unusual. The nuclear and chromatin localization of some mRNAs are often due to their low expression, length, time that it takes to be transcribed, repetitive elements and strong secondary structures (Bahar Halpern et al., 2015; Didiot et al., 2018; Lubelsky & Ulitsky, 2018; Ly et al., 2022).

      We now briefly mention this in the text:

      “In contrast, both EPB41L4A and SNORA13 were mostly found in the chromatin fraction (Fig. 1E), the former possibly due to the length of its pre-mRNA (>250 kb), which would require substantial time to transcribe (Bahar Halpern et al., 2015; Didiot et al., 2018; Lubelsky & Ulitsky, 2018; Ly et al., 2022).”

      Supporting our results, analysis of the ENCODE MCF-7 RNA-seq data of the cytoplasmic, nuclear and total cell fractions indeed shows a nuclear enrichment of the EPB41L4A mRNA (Author response image 1), in line with what we observed in Fig. 1E by RT-qPCR. 

      Author response image 1.

      The EPB41L4A transcript is nuclear-enriched in the MCF-7 ENCODE subcellular RNA-seq dataset. Scatterplot of gene length versus cytoplasm/nucleus ratio (as computed by DESeq2) in MCF-7 cells. Each dot represents an unique gene, color-coded reflecting if their DESeq2 adjusted p-value < 0.05 and absolute log<sub>2</sub>FC > .41 (33% enrichment or depletion).GAPDH and MALAT1 are shown as representative cytoplasmic and nuclear transcripts, respectively. Data from ENCODE.

      (4) Annotation and termini of EPB41L4A-AS1:

      The latest Gencode v47 annotations imply an overlap of the sense and antisense, different from that shown in Figure 1C. The 3' UTR of EPB41L4A is shown to extensively overlap EPB41L4A-AS1. This could shed light on the apparent regulation of the former by the latter that is relevant for this paper. I'd suggest that the authors update their figure of the EPB41L4A-AS1 locus organisation with much more detail, particularly evidence for the true polyA site of both genes. What is more, the authors might consider performing RACE experiments for both RNAs in their cells to definitely establish whether these transcripts contain complementary sequence that could cause their Watson-Crick hybridisation, or whether their two genes might interfere with each other via some kind of polymerase collision.

      We thank the reviewer for pointing this out. Also in previous GENCODE annotations, multiple isoforms were reported with some overlapping the 3’ UTR of EPB41L4A. In the EPB41L4A-AS1 locus image (Fig. 1C), we report at the bottom the different transcripts isoforms currently annotated, and a schematics of the one that is clearly the most abundant in MCF-7 cells based on RNA-seq read coverage. This is supported by both the polyA(+) and ribo(-) RNA-seq data, which are strand-specific, as shown in the figure.

      We now also examined the ENCODE/CSHL MCF-7 RNA-seq data from whole cell, cytoplasm and nucleus fractions, as well as 3P-seq data (Jan et al., 2011) (unpublished data from human cell lines), reported in Author response image 2. All these data support the predominant use of the proximal polyA site in human cell lines. This shorter isoform does not overlap EPB41L4A.

      Author response image 2.

      Most EPB41L4A-AS1 transcripts end before the 3’ end of EPB41L4A. UCSC genome browser view showing tracks from 3P-seq data in different cell lines and neural crest (top, with numbers representing the read counts, i.e. how many times that 3’ end has been detected), and stranded ENCODE subcellular RNA-seq (bottom).

      Based on these data, the large majority of cellular transcripts of EPB41L4A-AS1 terminate at the earlier polyA site and don’t overlap with EPB41L4A. There is a small fraction that appears to be restricted to the nucleus that terminates later at the annotated isoform. 3' RACE experiments are not expected to provide substantially different information beyond what is already available.

      (5) Figure 3C:

      There is an apparent correlation between log2FC upon EPB41L4A-AS1 knockdown, and the number of clip sites for SUB1. However, I expect that the clip signal correlates strongly with the mRNA expression level, and that log2FC may also correlate with the same. Therefore, the authors would be advised to more exhaustively check that there really is a genuine relationship between log2FC and clip sites, after removing any possible confounders of overall expression level.

      As the reviewer suggested, there is a correlation between the baseline expression level and the strength of SUB1 binding in the eCLIP data. To address this issue, we built expression-matched controls for each group of SUB1 interactors and checked the fold-changes following EPB41L4A-AS1 KD, similarly to what we have done in Fig. 3C. The results are presented, and are now part of Supplementary Figure 7 (Fig. S7C). 

      Based on this analysis, while there is a tendency of increased expression with increased SUB1 binding, when controlling for expression levels the effect of down-regulation of SUB1-bound RNAs upon lncRNA knockdown remains, suggesting that it is not merely a confounding effect. We have updated the text as follows:

      “We hypothesized that loss of EPB41L4A-AS1 might affect SUB1, either via the reduction in its expression or by affecting its functions. We stratified SUB1 eCLIP targets into confidence intervals, based on the number, strength and confidence of the reported binding sites. Indeed, eCLIP targets of SUB1 (from HepG2 cells profiled by ENCODE) were significantly downregulated following EPB41L4A-AS1 KD in MCF-7, with more confident targets experiencing stronger downregulation (Fig. 3C). Importantly, this still holds true when controlling for gene expression levels (Fig. S7C), suggesting that this negative trend is not due to differences in their baseline expression.”

      (6) The relation to cancer seems somewhat contradictory, maybe I'm missing something. Could the authors more clearly state which evidence is consistent with either an Oncogene or a Tumour Suppressive function, and discuss this briefly in the Discussion? It is not a problem if the data are contradictory, however, it should be discussed more clearly.

      We acknowledge this apparent contradiction. Cancer cells are characterized by a multitude of hallmarks depending on the cancer type and stage, including high proliferation rates and enhanced invasive capabilities. The notion that cells with reduced EPB41L4A-AS1 levels exhibit lower proliferation, yet increased invasion is compatible with a function as an oncogene. Cells undergoing EMT may reduce or even completely halt proliferation/cell division, until they revert back to an epithelial state (Brabletz et al., 2018; Dongre & Weinberg, 2019). Notably, downregulated genes following EPB41L4A-AS1 KD are enriched in GO terms related to cell proliferation and cell cycle progression (Fig. 2I), whereas those upregulated are enriched for terms linked to EMT processes. Thus, while we cannot rule out a potential function as tumor suppressor gene, our data fit better the notion that EPB41L4A-AS1 promotes invasion, and thus, primarily functions as an oncogene. We now address this in point in the discussion:

      “The notion that cells with reduced EPB41L4A-AS1 levels exhibit lower proliferation (Fig. 8C), yet increased invasion (Fig. 8A and 8B) is compatible with a function as an oncogene by promoting EMT (Fig. 8D and 8E). Cells undergoing this process may reduce or even completely halt proliferation/cell division, until they revert back to an epithelial state (Brabletz et al., 2018; Dongre & Weinberg, 2019). Notably, downregulated genes following EPB41L4A-AS1 KD are enriched in GO terms related to cell proliferation and cell cycle progression (Fig. 2I), whereas those upregulated for terms linked to EMT processes. Thus, while we cannot rule out a potential function as tumor suppressor gene, our data better fits the idea that this lncRNA promotes invasion, and thus, primarily functions as an oncogene.”

      Reviewer #2 (Recommendations for the authors):

      Below are major and minor points to be addressed. We hope the authors find them useful.

      (1) Figure 1:

      Where are LNA gapmers located within the EPB41L4A-AS1 gene? Are they targeting exons or introns of the EPB41L4A-AS1? Please clarify or include in the figure.

      We now report the location of the two GapmeRs in Fig. 1C. LNA1 targets the intronic region between SNORA13 and exon 2, and LNA2 the terminal part of exon 1.

      (2) Figure 2B:

      Why is a single LNA gapmer used for EPB41L4A Western? In addition, are the qPCR data in Figure 2B the same as in Figure 1B? Please clarify.

      The Western Blot was performed after transfecting the cells with either LNA1 or LNA2. We now have replaced Fig. 2C with the full Western Blot image, in order to show both LNAs. With respect to the qPCRs in Fig. 1B and 2B, they represent the results from two independent experiments.

      (3) Figure 2F:

      2364 DEGs for a single LNA is a lot of deregulated genes in RNA-seq data. How do the authors explain such a big number in DEGs? Is that because this LNA was intronic? Additional LNA gapmer would minimise the "real" lncRNA target and any potential off-target effect.

      We agree with the Reviewer that GapmeRs are prone to off-target and unwanted effects (Lai et al.,2020; Lee & Mendell, 2020; Maranon & Wilusz, 2020). Early in our experiments, we found out that LNA1 triggers a non-specific CDKN1A/p21 activation (Fig. S5A-C), and thus, we have initially performed some experiments such as RNA-seq with only LNA2.

      Nonetheless, other experiments were performed using both GapmeRs, such as multiple RT-qPCRs, UMI-4C, SUB1 and NPM1 imaging, and the in vitro assays, among others, and consistent results were obtained with both LNAs.

      To accommodate the request by this and the other reviewers, we have now performed another round of polyA+ RNA-seq following EPB41L4A-AS1 knockdown using LNA1 or LNA2, as well as the previously used and an additional control GapmeR. The FPKMs of the control samples are highly-correlated both within replicates and between GapmeRs (Fig. S6A). More importantly, the fold-changes to control are highly correlated between the two on-target GapmeRs LNA1 and LNA2, regardless of the GapmeR used for normalization (Fig. S6B), thus showing that despite significant GapmeR-specific effects, the bulk of the response is shared and likely the direct result of the reduction in the levels of EPB41L4A-AS1. Notably, key targets NPM1 and MTREX (see discussion, Fig. S12A-C and comments to Reviewer 3) were found to be downregulated by both LNAs (Fig. S6C).

      However, we acknowledge that some of the dysregulated genes are observed only when using one GapmeR and not the other, likely due to a combination of indirect, secondary and non-specific effects, and as such it is difficult to infer the direct response. Supporting this, LNA2 yielded a total of 1,069 DEGs (617 up and 452 down) and LNA1 2,493 DEGs (1,328 up and 1,287 down), with the latter triggering a stronger response most likely as a result of the previously mentioned CDKN1A/p21 induction. Overall, 45.1% of the upregulated genes following LNA2 transfection were shared with LNA1, in contrast to only the 24.3% of the downregulated ones.

      We have now included these results in the Results section (see below) and in Supplementary Figure (Fig. S6).

      “Most of the consequences of the depletion of EPB41L4A-AS1 are thus not directly explained by changes in EPB41L4A levels. An additional trans-acting function for EPB41L4A-AS1 would therefore be consistent with its high expression levels compared to most lncRNAs detected in MCF-7 (Fig. S5G). To strengthen these findings, we have transfected MCF-7 cells with LNA1 and a second control GapmeR (NT2), as well as the previous one (NT1) and LNA2, and sequenced the polyadenylated RNA fraction as before. Notably, the expression levels (in FPKMs) of the replicates of both control samples are highly correlated with each other (Fig. S6A), and the global transcriptomic changes triggered by the two EPB41L4A-AS1-targeting LNAs are largely concordant (Fig. S6B and S6C). Because of this concordance and the cleaner (i.e., no CDKN1A upregulation) readout in LNA2-transfected cells, we focused mainly on these cells for subsequent analyses.”

      (4) Figure 3B: Does downregulation of SUB1 and NPM1 reflect at the protein level with both LNA gapmers? The authors should show a heatmap and metagene profile for SUB1 CUT & RUN. How did the author know that SUB1 binding is specific, since CUT & RUN was not performed in SUB1-depleted cells?

      As requested by both Reviewer #2 and #3, we have performed WB for SUB1, NPM1 and FBL following EPB41L4A-AS1 KD with two targeting (LNA1 and LNA2) and the previous control GapmeRs. Interestingly, we did not detect any significant downregulation of either proteins (Author response image 3), although this might be the result of the high variability observed in the control samples. Moreover, the short timeframe in which the experiments have been conducted━that is, transient transfections for 3 days━might not be sufficient time for the existing proteins to be degraded, and thus, the downregulation is more evident at the RNA (Fig. 3B and Supplementary Figure 6C) rather than protein level.

      Author response image 3.

      EPB41L4A-AS1 KD has only marginal effects on the levels of nucleolar proteins. (A) Western Blots for the indicated proteins after the transfection for 3 days of the control and targeting GapmeRs. (B) Quantification of the protein levels from (A).  All experiments were performed in n=3 biological replicates, with the error bars in the barplots representing the standard deviation. ns - P>0.05; * - P<0.05; ** - P<0.01; *** - P<0.001 (two-sided Student’s t-test).

      Following the suggestion by the Reviewer, we now show both the SUB1 CUT&RUN metagene profile (previously available as Fig. 3F) and the heatmap (now Fig. 3G) around the TSS of all genes, stratified by their expression level. Both graphs are reported.

      We show that the antibody signal is responsive to SUB1 depletion via siRNAs in both WB (Fig. S8F) and IF (Fig. 5E) experiments. As mentioned below, this and the absence of non-specific signals makes us confident in the CUT&RUN data. Performing CUT&RUN in SUB1 depleted cells would be difficult to interpret as perturbations are typically not complete, and so the remaining protein can still bind the same regions. Since there isn’t a clear way to add spike-ins to CUT&RUN experiments, it is very difficult to show specificity of binding by CUT&RUN in siRNA-knockdown cells.

      (5) Figure 3D: The MW for the depicted proteins are lacking. Why is there no SUB1 protein in the input? Please clarify. Since the authors used siRNA to deplete SUB1, it would be good to know if the antibody is specific in their CUT & RUN (see above)

      We apologize for the lack of the MW in Fig. 3D. As shown in Fig. S8F, SUB1 is ~18 kDa and the antibody signal is responsive to SUB1 depletion via siRNAs in both WB (Fig. S8F) and IF (Fig. 5E) experiments. Thus, given its 1) established specificity in those two settings and 2) the lack of generalized signal at most open chromatin regions, which is typical of nonspecific CUT&RUN experiments, we are confident in the specificity of the CUT&RUN results.

      We now mention the MW of SUB1 in Fig. 3D as well and we provide in Author response image 4 the full SUB1 WB picture, enhancing the contrast to highlight the bands. We agree that the SUB1 band in the input is weak, likely reflecting the low abundance in that fraction and the detection difficulty due to its low MW (see Fig. S8F).

      Author response image 4.

      Western blot for SUB1 following RIP using either a SUB1 or IgG antibody. IN - input, SN - supernatant/unbound, B - bound.

      (6) Supplementary Figure 6C:

      The validation of lncRNA EPB41L4A-AS1 binding to SUB1 should be confirmed by CLIP qPCR, since native RIP can lead to reassociation of RNA-protein interactions (PMID: 15388877). Additionally, the eclip data presented in Figure 3a were from a different cell line and not MCF7.

      We acknowledge that the SUB1 eCLIP data was generated in a different cell line, as we mentioned in the text:

      “Indeed, eCLIP targets of SUB1 (from HepG2 cells profiled by ENCODE) were significantly downregulated following EPB41L4A-AS1 KD in MCF-7, with more confident targets experiencing stronger downregulation (Fig. 3C). Importantly, this still holds true when controlling for gene expression levels (Fig. S7C), suggesting that this negative trend is not due to differences in their baseline expression. To obtain SUB1-associated transcripts in MCF-7 cells; we performed a native RNA immunoprecipitation followed by sequencing of polyA+ RNAs (RIP-seq) (Fig. 3D, S7D and S7E).”

      Because of this, we resorted to native RIP, in order to get binding information in our experimental system. As we show independent evidence for binding using both eCLIP and RIP, and the substantial challenge in establishing the CLIP method, which has not been successfully used in our group, we respectfully argue that further validations are out of scope of this study. We nonetheless agree that several genes which are nominally significantly enriched in our RIP data are likely not direct targets of SUB1, especially given that it is difficult to assign the perfect threshold that discriminates between bound and unbound RNAs.

      We now additionally mention this at the beginning of the paragraph as well:

      “In order to identify potential factors that might be associated with EPB41L4A-AS1, we inspected protein-RNA binding data from the ENCODE eCLIP dataset(Van Nostrand et al., 2020). The exons of the EPB41L4A-AS1 lncRNA were densely and strongly bound by SUB1 (also known as PC4) in both HepG2 and K562 cells (Fig. 3A).”

      (7) Figure 3G:

      Can the authors distinguish whether loss of EPB41L4A-AS1 affects SUB1 chromatin binding or its activity as RBP? Please discuss.

      Distinguishing between altered SUB1 chromatin and RNA binding is challenging, as this protein likely does not interact directly with chromatin and exhibits rather promiscuous RNA binding properties (Ray et al., 2023). In particular, SUB1 (also known as PC4) interacts with and regulates the activity of all three RNA polymerases, and was reported to be involved in transcription initiation and elongation, response to DNA damage, chromatin condensation (Conesa & Acker, 2010; Das et al., 2006; Garavís & Calvo, 2017; Hou et al., 2022) and telomere maintenance (Dubois et al., 2025; Salgado et al., 2024).

      Based on our data, genes whose promoters are occupied by SUB1 display marginal, yet highly significant changes in their steady-state expression levels upon lncRNA perturbations. We also show that upon EPB41L4A-AS1 KD, SUB1 acquires a stronger nucleolar localization (Fig. 5A), which likely affects its RNA interactome as well. However, further elucidating these activities would require performing RIP-seq and CUT&RUN in lncRNA-depleted cells, which we argue is out of the scope of the current study. We note that  KD of SUB1 with siRNAs have milder effects than that of EPB41L4A-AS1 (Fig. S8G), suggesting that additional players and effects shape the observed changes. Therefore, it is highly likely that the loss of this lncRNA affects both SUB1 chromatin binding profile and RNA binding activity, with the latter likely resulting in the increased snoRNAs abundance.

      (8) Figure 4: Can the authors show that a specific class of snorna is affected upon depletion of SUB1 and EPB41L4A-AS1? Can they further classify the effect of their depletion on H/ACA box snoRNAs, C/D box snoRNAs, and scaRNAs?

      Such potential distinct effect on the different classes of snoRNAs was considered, and the results are available in Fig. S8B and S8H (boxplots, after EPB41L4A-AS1 and SUB1 depletion), as well as Fig. 4F and S9F (scatterplots between EPB41L4A-AS1 and SUB1 depletion, and EPB41L4A-AS1 and GAS5 depletion, respectively). We see no preferential effect on one group of snoRNAs or the other.

      (9) Figure 5: From the representative images, it looks to me that LNA 2 targeting EPB41L4A-AS1 has a bigger effect on nucleolar staining of SUB1. To claim that EPB41L4A-AS1 depletion "shifts SUB1 to a stronger nucleolar distribution", the authors need to perform IF staining for SUB1 and Fibrillarin, a known nucleolar marker. Also, how does this data fit with their qPCR data shown in Figure 3B? It is instrumental for the authors to demonstrate by IF or Western blotting that SUB1 levels decrease in one fraction and increase specifically in the nucleolus. They could perform Western blot for SUB1 and Fibrillarin in EPB41L4A-AS1-depleted cells and isolate cytoplasmic, nuclear, and nucleolar fractions.This experiment will strengthen their finding. The scale bar is missing for all the images in Figure 5. The authors should also show magnified images of a single representative cell at 100x.

      We apologize for the confusion regarding the scale bars. As mentioned here and elsewhere, the scale bars are present in the top-left image of each panel only, in order to avoid overcrowding the panel. All the images are already at 100X, with the exception of Fig. 5E (IF for SUB1 upon siSUB1 transfection) which is 60X in order to better show the lack of signal. We however acknowledge that the images are sometimes confusing, due to the PNG features once imported into the document. In any case, in the submission we have also provided the original images in high-quality PDF and .ai formats.  The suggested experiment would require establishing a nucleolar fractionation protocol which we currently don’t have available and we argue that it is out of scope of the current study.

      (10) Additionally, is rRNA synthesis affected in SUB1- and EPB41L4A-AS1-depleted cells? The authors could quantify newly synthesised rRNA levels in the nucleoli, which would also strengthen their findings about the role of this lncRNA in nucleolar biology.

      We acknowledge that there are many aspects of the role of EPB41L4A-AS1 in nucleolar biology that remain to be explored, as well as in nucleolar biology itself, but given the extensive experimental data we already provide in this and other subjects, we respectfully suggest that this experiment is out of scope of the current work. We note that a recent study has shown that SUB1 is required for Pol I-mediated rDNA transcription in the nucleolus (Kaypee et al., 2025). In the presence of nucleolar SUB1, rDNA transcription proceeds as expected, but when SUB1 is depleted or its nucleolar localization is affected—by either sodium butyrate treatment or inhibition of KAT5-mediated phosphorylation at its lysine 35 (K35)—the levels of the 47S pre-rRNA are significantly reduced. In our settings, SUB1 enriches into the nucleolus following EPB41L4A-AS1 KD; thus, we might expect to see a slightly increased rDNA transcription or no effect at all, given that SUB1 localizes in the nucleolus in baseline conditions as well. We now mention this novel role of SUB1 both in the results and discussion.

      “SUB1 interacts with all three RNA polymerases and was reported to be involved in transcription initiation and elongation, response to DNA damage, chromatin condensation(Conesa & Acker, 2010; Das et al., 2006; Garavís & Calvo, 2017; Hou et al., 2022), telomere maintenance(Dubois et al., 2025; Salgado et al., 2024) and rDNA transcription(Kaypee et al., 2025). SUB1 normally localizes throughout the nucleus in various cell lines, yet staining experiments show a moderate enrichment for the nucleolus (source: Human Protein Atlas; https://www.proteinatlas.org/ENSG00000113387-SUB1/subcellular)(Kaypee et al., 2025).”

      “Several features of the response to EPB41L4A-AS1 resemble nucleolar stress, including altered distribution of NPM1(Potapova et al., 2023; Yang et al., 2016). SUB1 was shown to be involved in many nuclear processes, including transcription(Conesa & Acker, 2010), DNA damage response(Mortusewicz et al., 2008; Yu et al., 2016), telomere maintenance(Dubois et al., 2025), and nucleolar processes including rRNA biogenesis(Kaypee et al., 2025; Tafforeau et al., 2013). Our results suggest a complex and multi-faceted relationship between EPB41L4A-AS1 and SUB1, as SUB1 mRNA levels are reduced by the transient (72 hours) KD of the lncRNA (Fig. 3B), the distribution of the protein in the nucleus is altered (Fig. 5A and 5C), while the protein itself is the most prominent binder of the mature EPB41L4A-AS1 in ENCODE eCLIP data (Fig. 3A). The most striking connection between EPB41L4A-AS1 and SUB1 is the similar phenotype triggered by their loss (Fig. 4). We note that a recent study has shown that SUB1 is required for Pol I-mediated rDNA transcription in the nucleolus(Kaypee et al., 2025). In the presence of nucleolar SUB1, rDNA transcription proceeds as expected, but when SUB1 is depleted or its nucleolar localization is affected—by either sodium butyrate treatment or inhibition of KAT5-mediated phosphorylation at its lysine 35 (K35)—the levels of the 47S pre-rRNA are significantly reduced. In our settings, SUB1 enriches into the nucleolus following EPB41L4A-AS1 KD; thus, we might expect to see a slightly increased rDNA transcription or no effect at all, given that SUB1 localizes in the nucleolus in baseline conditions as well. It is however difficult to determine which of the connections between these two genes is the most functionally relevant and which may be indirect and/or feedback interactions. For example, it is possible that EPB41L4A-AS1 primarily acts as a transcriptional regulator of SUB1 mRNA, or that its RNA product is required for proper stability and/or localization of the SUB1 protein, or that EPB41L4A-AS1 acts as a scaffold for the formation of protein-protein interactions of SUB1.”

      (11) Figure 8: The scratch assay alone cannot be used as a measure of increased invasion, and this phenotype must be confirmed with a transwell invasion or migration assay. Thus, I highly recommend that the authors conduct this experiment using the Boyden chamber. Do the authors see upregulation of N-cadherin, Vimentin, and downregulation of E-cadherin in their RNA-seq?

      We agree with the reviewer that those phenotypes are complex and normally require multiple in vitro, as well as in vivo assays to be thoroughly characterized. However, we respectfully consider those as out of scope of the current work, which is more focused on RNA biology and the molecular characterization and functions of EPB41L4A-AS1.

      Nevertheless, in Fig. 8D we show that the canonical EMT signature (taken from MSigDB) is upregulated in cells with reduced expression of EPB41L4A-AS1. Notably, EMT has been found to not possess an unique gene expression program, but it rather involves distinct and partially overlapping gene signatures (Youssef et al., 2024). In Fig. 8D, the most upregulated gene is TIMP3, a matrix metallopeptidase inhibitor linked to a particular EMT signature that is less invasive and more profibrotic (EMT-T2, (Youssef et al., 2024)). Interestingly, we observed a strong upregulation of other genes linked to EMT-T2, such as TIMP1, FOSB, SOX9, JUNB, JUN and KLF4, whereas MPP genes (linked to EMT-T1, which is highly proteolytic and invasive) are generally downregulated or not expressed. With regards to N- and E-cadherin, the first does not pass our cutoff to be considered expressed, and the latter is not significantly changing. Vimentin is also not significantly dysregulated. All these examples are reported, which were added as Fig. 8E:

      The text has also been updated accordingly:

      “These findings suggest that proper EPB41L4A-AS1 expression is required for cellular proliferation, whereas its deficiency results in the onset of more aggressive and migratory behavior, likely linked to the increase of the gene signature of epithelial to mesenchymal transition (EMT) (Fig. 8D). Because EMT is not characterized by a unique gene expression program and rather involves distinct and partially overlapping gene signatures (Youssef et al., 2024), we checked the expression level of marker genes linked to different types of EMTs (Fig. 8E). The most upregulated gene in Fig. 8D is TIMP3, a matrix metallopeptidase inhibitor linked to a particular EMT signature that is less invasive and more profibrotic (EMT-T2) (Youssef et al., 2024). Interestingly, we observed a stark upregulation of other genes linked to EMT-T2, such as TIMP1, FOSB, SOX9, JUNB, JUN and KLF4, whereas MPP genes (linked to EMT-T1, which is highly proteolytic and invasive) are generally downregulated or not expressed. This suggests that the downregulation of EPB41L4A-AS1 is primarily linked to a specific EMT program (EMT-T2), and future studies aimed at uncovering the exact mechanisms and relevance will shed light upon a possible therapeutic potential of this lncRNA.”

      (12) Minor points:

      (a) What could be the explanation for why only the EPB41L4A-AS1 locus has an effect on the neighbouring gene?

      There might be multiple reasons why EPB41L4A-AS1 is able to modulate the expression of the neighboring genes. First, it is expressed from a TAD boundary exhibiting physical contacts with several genes in the two flanking TADs (Fig. 1F and 2A), placing it in the right spot to regulate their expression. Second, it is highly expressed when compared to most of the genes nearby, with transcription having been linked to the establishment and maintenance of TAD boundaries (Costea et al., 2023). Accordingly, the (partial) depletion of EPB41L4A-AS1 via GapmeRs transfection slightly reduces the contacts between the lncRNA and EPB41L4A loci (Fig. 2E and S4J), although this effect could also be determined by a premature transcription termination triggered by the GapmeRs. 

      There are a multitude of mechanisms by which lncRNAs with regulatory functions modulate the expression of one or more target genes in cis (Gil & Ulitsky, 2020), and our data do not unequivocally point to one of them. Distinguishing between these possibilities is a major challenge in the field and would be difficult to address in the context of this one study. It could be that the processive RNA polymerases at the EPB41L4A-AS1 locus are recruited to the neighboring loci, facilitated by the close proximity in the 3D space. It could also be possible that chromatin remodeling factors are recruited by the nascent RNA, and then promote and/or sustain the opening of chromatin at the target site. The latter possibility is intriguing, as this mechanism is proposed to be widespread among lncRNAs (Gil & Ulitsky, 2020; Oo et al., 2025) and we observed a significant reduction of H3K27ac levels at the EPB41L4A promoter region (Fig. 2D). Future studies combining chromatin profiling (e.g., CUT&RUN and ATAC-seq) and RNA pulldown experiments will shed light upon the exact mechanisms by which this lncRNA regulates the expression of target genes in cis and its interacting partners.

      (b) The scale bar is missing on all the images in the Supplementary Figures as well.

      The scale bars are present in the top-left figure of each panel. We acknowledge that due to the export as PNG, some figures (including those with microscopy images) display abnormal font sizes and aspect ratio. All images were created using consistent fonts, sizes and ratio, and are provided as high-quality PDF in the current submission.

      (13) Methods:

      The authors should double-check if they used sirn and LNA gapmers at 25 and 50um concentrations, as that is a huge dose. Most papers used these reagents in the range of 5-50nM maximum.

      We apologize for the typo, the text has been fixed. We performed the experiments at 25 and 50nM, respectively, as suggested by the manufacturer’s protocol.

      (14) Discussion:

      Which cell lines were used in reference 27 (Cheng et al., 2024 Cell) to study the role of SNORA13? It may be useful to include this in the discussion.

      We already mentioned the cell system in the discussion, and now we edited to include the specific cell line that was used:

      “A recent study found that SNORA13 negatively regulates ribosome biogenesis in TERT-immortalized human fibroblasts (BJ-HRAS<Sup>G12V</sup>), by decreasing the incorporation of RPL23 into the maturing 60S ribosomal subunits, eventually triggering p53-mediated cellular senescence(Cheng et al., 2024).”

      Reviewer #3 (Recommendations for the authors):

      Major comments on weaknesses:

      (1) The paper is quite disjointed:

      (a) Figures1/2 studied the cis- and potential trans target genes altered by EPB41L4A-AS1 knockdown. They also showed some data about EPB41L4A-AS1 overlaps a strong chromatin boundary.

      (b) Figures3/4/5 studied the role of SUB1 - as it is altered by EPB41L4A-AS1 knockdown - in affecting genes and snoRNAs, which may partially underlie the gene/snoRNA changes after EPB41L4A-AS1 knockdown.

      (c) Figure 6 showed that EPB41L4A-AS1 knockdown did not directly affect SNORA13, the snoRNA located in the intron of EPB41L4A-AS1. Thus, the upregulation of many snoRNAs is not due to SNORA13.

      (d) Figure 7 studied whether the changes of cis genes or snoRNAs are due to transcriptional stability.

      (e) Figure 8 studied cellular phenotypes after EPB41L4A-AS1 knockdown.

      These points are overly spread out and this dilutes the central theme of these results, which this Reviewer considered to be on cis or trans gene regulation by this lncRNA.The title of the paper implies EPB41L4A-AS1 knockdown affected trans target genes, but the paper did not focus on studying cis or trans effects, except briefly mentioning that many genes were changed in Figure 2. The many changes of snoRNAs are suggested to be partially explained by SUB1, but SUB1 itself is affected (>50%, Figure 3B) by EPB41L4A-AS1 knockdown, so it is unclear if these are mostly secondary changes due to SUB1 reduction. Given the current content of the paper, the authors do not have sufficient evidence to support that the changes of trans genes are due to direct effects or indirect effects. And so they are encouraged to revise their title to be more on snoRNA regulation, as this area took the majority of the efforts in this paper.

      We respectfully disagree with the reviewer. We show that the effect on the proximal genes are cis-acting, as they are not rescued by exogenous expression, whereas the majority of the changes observed in the RNA-seq datasets appear to be indirect, and the snoRNA changes, that indeed might be indirect and not necessarily involve direct interaction partners of the lncRNA, such as SUB1, appear to be trans-regulated, as they can be rescued partially by exogenous expression of the lncRNA. We also show that KD of the main cis-regulated gene, EPB41L4A, results in a much milder transcriptional response, further solidifying the contribution of trans-acting effects. While we agree that the snoRNA effects are interesting, we do not consider them to be the main result, as they are accompanied by many additional changes in gene expression, and changes in the subnuclear distribution of the key nucleolar proteins, so it is difficult for us to claim that EPB41L4A-AS1 is specifically relevant to the snoRNAs rather than to the more broad nucleolar biology. Therefore, we prefer not to mention snoRNAs specifically in the title.

      (2) EPB41L4A-AS1 knockdown caused ~2,364 gene changes. This is a very large amount of change on par with some transcriptional factors. It thus needs more scrutiny. First, on Page 9, second paragraph, the authors used|log2Fold-change| >0.41 to select differential genes, which is an unusual cutoff. What is the rationale? Often |log2Fold-change| >1 is more common. How many replicates are used? To examine how many gene changes are likely direct target genes, can the authors show how many of the cist-genes that are changed by EPB41L4A-AS1 knockdown have direct chromatin contacts with EPB41L4A-AS1 in HiC data? Is there any correlation between HiC contact with their fold changes? Without a clear explanation of cis target genes as direct target genes, it is more difficult to establish whether any trans target genes are directly affected by EPB41L4A-AS1 knockdown.

      A |log<sub>2</sub>Fold-change| >0.41 equals a change of 33% or more, which together with an adjusted P < 0.05 is a threshold that has been used in the past. All RNA-seq experiments have been performed in triplicates, in line with the standards in the field. While it is possible that the EPB41L4A-AS1 establishes multiple contacts in trans—a process that has been observed in at least another lncRNA, namely Firre but involving its mature RNA product—we do believe this to be less likely that the alternative, namely that the > 2,000 DEGs are predominantly result from secondary changes rather than genes directly regulated by EPB41L4A-AS1 contacts.

      In any case, we have inspected our UMI-4C data to identify other genes exhibiting higher contact frequencies than background levels, and thus, potentially regulated in cis. To this end, we calculated the UMI-4C coverage in a 10kb window centered around the TSS of the genes located on chromosome 5, which we subsequently normalized based on the distance from EPB41L4A-AS1, in order to account for the intrinsic higher DNA recovery the closer to the target DNA sequence. However, in our UMI-4C experiment we have employed baits targeting three different genes—EPB41L4A-AS1, EPB41L4A and STARD4—and therefore such approach assumes that the lncRNA locus has the most regulatory features in this region. As expected, we detected a strong negative correlation between the normalized coverage and the distance from the EPB41L4A-AS1 locus (⍴ = -0.51, p-value < 2.2e-16), and the genes in the two neighboring TADs exhibited the strongest association with the bait region (Author response image 5). The genes that we see are down-regulated in the adjacent TADs, namely NREP, MCC and MAN2A1 (Fig. 2F) show substantially higher contacts than background with the EPB41L4A-AS1 gene, thus potentially constituting additional cis-regulated targets of this lncRNA. We note that both SUB1 and NPM1 are located on chromosome 5 as well, albeit at distances exceeding 75 and 50 Mb, respectively, and they do not exhibit any striking association with the lncRNA locus.

      Author response image 5.

      UMI-4C coverage over the TSS of the genes located on chromosome 5. (A) Correlation between the normalized UMI-4C coverage over the TSS (± 5kb) of chromosome 5 genes and the absolute distance (in megabases, Mb) from EPB41L4A-AS1. (B) Same as in (A), but with the x axis showing the relative distance from EPB41L4A-AS1. In both cases, the genes in the two flanking TADs are colored in red and their names are reported.

      To increase the confidence in our RNA-seq data, we have now performed another round of polyA+ RNA-seq following EPB41L4A-AS1 knockdown using LNA1 or LNA2, as well as the previously used and an additional control GapmeR. The FPKMs of the control samples are highly-correlated both within replicates and between GapmeRs (Fig. S6A). More importantly, the fold-changes to control are highly correlated between the two on-target GapmeRs LNA1 and LNA2, regardless of the GapmeR used for normalization (Fig. S6B), thus showing that despite significant GapmeR-specific effects, the bulk of the response is shared and likely the direct result of the reduction in the levels of EPB41L4A-AS1. Notably, key targets NPM1 and MTREX (see discussion, Fig. S12A-C and comments to Reviewer 3) were found to be downregulated by both LNAs (Fig. S6C).

      However, we acknowledge that some of the dysregulated genes are observed only when using one GapmeR and not the other, likely due to a combination of indirect, secondary and non-specific effects, and as such it is difficult without short time-course experiments (Much et al., 2024) to infer the direct response. Supporting this, LNA2 yielded a total of 1,069 DEGs (617 up and 452 down) and LNA1 2,493 DEGs (1,328 up and 1,287 down), with the latter triggering a stronger response most likely as a result of the previously mentioned CDKN1A/p21 induction. Overall, 45.1% of the upregulated genes following LNA2 transfection were shared with LNA1, in contrast to only the 24.3% of the downregulated ones.

      We have now included these results in the Results section (see below) and in Supplementary Figure (Fig. S6).

      “Most of the consequences of the depletion of EPB41L4A-AS1 are thus not directly explained by changes in EPB41L4A levels. An additional trans-acting function for EPB41L4A-AS1 would therefore be consistent with its high expression levels compared to most lncRNAs detected in MCF-7 (Fig. S5G). To strengthen these findings, we have transfected MCF-7 cells with LNA1 and a second control GapmeR (NT2), as well as the previous one (NT1) and LNA2, and sequenced the polyadenylated RNA fraction as before. Notably, the expression levels (in FPKMs) of the replicates of both control samples are highly correlated with each other (Fig. S6A), and the global transcriptomic changes triggered by the two EPB41L4A-AS1-targeting LNAs are largely concordant (Fig. S6B and S6C). Because of this concordance and the cleaner (i.e., no CDKN1A upregulation) readout in LNA2-transfected cells, we focused mainly on these cells for subsequent analyses.”

      Figure 3B, SUB1 mRNA is reduced >half by EPB41L4A-AS1 KD. How much did SUB1 protein reduce after EPB41L4A-AS1 KD? Similarly, how much is the NPM1 protein reduced? If these two important proteins were affected by EPB41L4A-AS1 KD simultaneously, it is important to exclude how many of the 2,364 genes that changed after EPB41L4A-AS1 KD are due to the protein changes of these two key proteins. For SUB1, Figures S7E,F,G provided some answers. But NPM1 KD is also needed to fully understand such. Related to this, there are many other proteins perhaps changed in addition to SUB1 and NPM1, this renders it concerning how many of the EPB41L4A-AS1 KD-induced changes are directly caused by this RNA. In addition to the suggested study of cist targets, the alternative mechanism needs to be fully discussed in the paper as it remains difficult to fully conclude direct versus indirect effect due to such changes of key proteins or ncRNAs (such as snoRNAs or histone mRNAs).

      As requested by both Reviewer #2 and #3, we have performed WB for SUB1, NPM1 and FBL following EPB41L4A-AS1 KD with two targeting (LNA1 and LNA2) and the previous control GapmeRs. Interestingly, we did not detect any significant downregulation of either proteins (Author response image 3), although this might be the result of the high variability observed in the control samples. Moreover, the short timeframe in which the experiments have been conducted━that is, transient transfections for 3 days━might not be sufficient time for the existing proteins to be degraded, and thus, the downregulation is more evident at the RNA (Fig. 3B and Supplementary Figure 6C) rather than protein level.

      We acknowledge that many proteins might change simultaneously, and to pinpoint which ones act upstream of the plethora of indirect changes is extremely challenging when considering such large-scale changes in gene expression. In the case of SUB1 and NPM1━which were prioritized for their predicted binding to the lncRNA (Fig. 3A)━we show that the depletion of the former affects the latter in a similar way than that of the lncRNA (Fig. 5F). Moreover, snoRNAs changes are also similarly affected (as the reviewer pointed out, Fig. 4F), suggesting that at least this phenomenon is predominantly mediated by SUB1. Other effects might also be indirect consequences of cellular responses, such as the decrease in histone mRNAs (Fig. 4A) that might reflect the decrease in cellular replication (Fig. 8C) and cell cycle genes (Fig. 2I) (although a link between SUB1 and histone mRNA expression has been described (Brzek et al., 2018)). 

      Supporting the notion that additional proteins might be involved in driving the observed phenotypes, one of the genes that most consistently was affected by EPB41L4A-AS1 KD with GapmeRs is MTREX (also known as MTR4), that becomes downregulated at both the RNA and protein levels (now presented in the main text as Supplementary Figure 12). MTREX it’s part of the NEXT and PAXT complexes (Contreras et al., 2023), that target several short-lived RNAs for degradation, and the depletion of either MTREX or other complex members leads to the upregulation of such RNAs, that include PROMPTs, uaRNAs and eRNAs, among others. Given the lack in our understanding in snoRNA biogenesis from introns in mammalian systems(Monziani & Ulitsky, 2023), it is tempting to hypothesize a role for MTREX-containing complexes in trimming and degrading those introns and release the mature snoRNAs.  

      We updated the discussion section to include these observations:

      “Beyond its site of transcription, EPB41L4A-AS1 associates with SUB1, an abundant protein linked to various functions, and these two players are required for proper distribution of various nuclear proteins. Their dysregulation results in large-scale changes in gene expression, including up-regulation of snoRNA expression, mostly through increased transcription of their hosts, and possibly through a somewhat impaired snoRNA processing and/or stability. To further hinder our efforts in discerning between these two possibilities, the exact molecular pathways involved in snoRNAs biogenesis, maturation and decay are still not completely understood. One of the genes that most consistently was affected by EPB41L4A-AS1 KD with GapmeRs is MTREX (also known as MTR4), that becomes downregulated at both the RNA and protein levels (Fig. S12A-C). Interestingly, MTREX it is part of the NEXT and PAXT complexes(Contreras et al., 2023), that target several short-lived RNAs for degradation, and the depletion of either MTREX or other complex members leads to the upregulation of such RNAs, that include PROMPTs, uaRNAs and eRNAs, among others. It is therefore tempting to hypothesize a role for MTREX-containing complexes in trimming and degrading those introns, and releasing the mature snoRNAs. Future studies specifically aimed at uncovering novel players in mammalian snoRNA biology will both conclusively elucidate whether MTREX is indeed involved in these processes.”

      With regards to the changes in gene expression between the two LNAs, we provide a more detailed answer above and to the other reviewers as well.

      (3) A Strong discrepancy of results by different approaches of knockdown or overexpression:

      (a) CRISPRa versus LNA knockdown: Figure S4 - CRISPRa of EPB41L4A-AS1 did not affect EPB41L4A expression (Figure S4B). The authors should discuss how to interpret this result. Did CRISPRa not work to increase the nuclear/chromatin portion of EPB41L4A-AS1? Did CRISPRa of EPB41L4A-AS1 affect the gene in the upstream, the STARD4? Did CRISPRa of EPB41L4A-AS1 also affect chromatin interactions between EPB41L4A-AS1 and the EPB41L4A gene? If so, this may argue that chromatin interaction is not necessary for cis-gene regulation.

      There are indeed several possible explanations, the most parsimonious is that since the lncRNA is already very highly transcribed, the relatively modest effect of additional transcription mediated by CRISPRa is not sufficient to elicit a measurable effect. For this reason, we did not check by UMI-4C the contact frequency between the lncRNA and EPB41L4A upon CRISPRa.

      CRISPRa augments transcription at target loci, and thus, the nuclear and chromatin retention of EPB41L4A-AS1 are not expected to be affected. We did not check the expression of STARD4, because we focused on EPB41L4A which appears to be the main target locus according to Hi-C (Fig. 2A), UMI-4C (Fig. 2E and S4J) and GeneHancer (Fig. S1). 

      We already provide extensive evidence of a cis-regulation of EPB41L4A-AS1 over EPB41L4A, and show that EPB41L4A is lowly-expressed and likely has a limited role in our experimental settings. Thus, we respectfully propose that an in-deep exploration of the mechanism of action of this regulatory axis is out of scope of the current study, that instead focused more on the global effects of EPB41L4A-AS1 perturbation.

      (b) Related to this, while CRISPRa alone did not show an effect, upon LNA knockdown of EPB41L4A-AS1, CRISPRa of EPB41L4A-AS1 can increase EPB41L4A expression. It is perplexing as to why, upon LNA treatment, CRISPRa will show an effect (Figure S4H)? Actually, Figures S4H and I are very confusing in the way they are currently presented. They will benefit from being separated into two panels (H into 2 and I into two). And for Ectopic expression, please show controls by empty vector versus EPB41L4A-AS1, and for CRISPRa, please show sgRNA pool versus sgRNA control.

      The results are consistent with the parsimonious assumption mentioned above that the high transcription of the lncRNA at baseline is sufficient for maximal positive regulation of EPB41L4A, and that upon KD, the reduced transcription and/or RNA levels are no longer at saturating levels, and so CRISPRa can have an effect. We now mention this interpretation in the text:

      “Levels of EPB41L4A were not affected by increased expression of EPB41L4A-AS1 from the endogenous locus by CRISPR activation (CRISPRa), nor by its exogenous expression from a plasmid (Fig. S4B and S4C). The former suggests that endogenous levels of EPB41L4A-AS1—that are far greater than those of EPB41L4A—are sufficient to sustain the maximal expression of this target gene in MCF7 cells.”

      We apologize for the confusion regarding the control used in the rescue experiments in Fig. S4H and S4I. The “-” in the Ectopic overexpression and CRISPRa correspond to the Empty Vector and sgControl, respectively, and not the absence of any vector. We changed the text in the figure legends:

      “(H) Changes in EPB41L4A-AS1 expression after rescuing EPB41L4A-AS1 with an ectopic plasmid or CRISPRa following its KD with GapmeRs. In both panels (Ectopic OE and CRISPRa) the “-” samples represent those transfected with the Empty Vector or sgControl. Asterisks indicate significance relative to the –/– control (transfected with both the control GapmeR and vector). (I) Same as in (H), but for changes in EPB41L4A expression.”

      (c) siRNA versus LNA knockdown: Figure S3A showed that siRNA KD of EPB41L4A-AS1 does not affect EPB41L4A expression. How to understand this data versus LNA?

      As explained in the text, siRNA-mediated KD presumably affects mostly the cytoplasmic pool of EPB41L4A-AS1 and not the nuclear one, which we assume explains the different effects of the two perturbations, as observed for other lncRNAs (e.g., (Ntini et al., 2018)). However, we acknowledge that we do not know what aspect of the nuclear RNA biology is relevant, let it be the nascent EPB41L4A-AS1 transcription, premature transcriptional termination or even the nuclear pool of this lncRNA, and this can be elucidated further in future studies.

      (d) EPB41L4A-AS1 OE versus LNA knockdown: Figure 6F showed that EPB41L4A-AS1 OE caused reduction of EPB41L4A mRNA, particularly at 24hr. How to interpret that both LNA KD and OE of EPB41L4A-AS1 reduce the expression of EPB41L4A mRNA?

      We do not believe that the OE of EPB41L4A-AS1, and in particular the one elicited by an ectopic plasmid affects EPB41L4A RNA levels. In the experiment in Fig. 6F, EPB41L4A relative expression at 24h is ~0.65 (please note the log<sub>2</sub> scale in the graph), which is significant as reported. However, throughout this study (and as shown in Fig. S4C for the ectopic and Fig. S4B for the CRISPRa overexpression, respectively), we observed no such behavior, suggesting that the effect reported in Fig. 6F is the result of either that particular setting, and unlikely to reflect a general phenomenon.

      (e) Did any of the effects on snoRNAs or trans target genes after EPB41L4A-AS1 knockdown still appear by CRISPRa?

      As mentioned above, we did a limited number of experiments after CRISPRa, prompted by the fact that endogenous levels of EPB41L4A-AS1 are already high enough to sustain its functions. Pushing the expression even higher will likely result in no or artifactual effects, which is why we respectfully propose such experiments are not essential in this current work, which instead mostly relies on loss-of-function experiments.

      For issue 3, extensive data repetition using all these methods may be unrealistic, but key data discrepancy needs to be fully discussed and interpreted.

      Other comments on weakness:

      (1) This manuscript will benefit from having line numbers so comments from Reviewers can be made more specifically.

      We added line numbers as suggested by the reviewer.

      (2) Figure 2G, to distinguish if any effects of EPB41L4A-AS1 come from the cytoplasmic or nuclear portion of EPB41L4A-AS1, an siRNA KD RNA-seq will help to filter out the genes affected by EPB41L4A-AS1 in the cytoplasm, as siRNA likely mainly acts in the cytoplasm.

      This experiment would be difficult to interpret as while the siRNAs mostly deplete the cytoplasmic pool of their target, they can have some effects in the nucleus as well (e.g., (Sarshad et al., 2018)) and so siRNAs knockdown will not necessarily report strictly on the cytoplasmic functions.

      (3) Figure 2H, LNA knockdown of EPB41L4A should check the protein level reduction, is it similar to the change caused by knockdown of EPB41L4A-AS1?

      As suggested by reviewer #2, we have now replaced the EPB41L4A Western Blot that now shows the results with both LNA1 and LNA2. Please note that the previous Fig. 2C was a subset of this, i.e., we have previously cropped the results obtained with LNA1. Unfortunately, we did not have sufficient antibody to check for EPB41L4A protein reduction following LNA KD of EPB41L4A in a timely manner.

      (4) There are two LNA Gapmers used by the paper to knock down EPB41L4A-AS1, but some figures used LNA1, some used LNA2, preventing a consistent interpretation of the results. For example, in Figures 2A-D, LNA2 was used. But in Figures 2E-H, LNA1 was used. How consistent are the two in changing histone H3K27ac (like in Figure 2D) versus gene expression in RNA-seq? The changes in chromatin interaction appear to be weaker by LNA2 (Figure S4J) versus LNA1 (Figure 2E).

      As explained above and in response to Reviewer #1, we now provide more RNA-seq data for LNA1 and LNA2. We note that besides the unwanted and/or off-target effects, these two GapmeRs might be not equally effective in knocking down EPB41L4A-AS1, which could explain why LNA1 seems to have a stronger effect on chromatin than LNA2. Nonetheless, when we have employed both we have obtained similar and consistent results (e.g., Fig. 5A-D and 8A-C), suggesting that these and the other effects are indeed on target effects due to EPB41L4A-AS1 depletion.

      (5) It will be helpful if the authors provide information on how long they conducted EPB41L4A-AS1 knockdown for most experiments to help discern direct or indirect effects.

      The length of all perturbations was indicated in the Methods section, and we now mention them also  in the Results. Unless specified otherwise, they were carried out for 72 hours. We agree with the reviewer that having time course experiments can have added value, but due to the extensive effort that these will require, we suggest that they are out of scope of the current study.

      (6) In Figures 1C and F, the authors showed results about EPB41L4A-AS1 overlapping a strong chromatin boundary. But these are not mentioned anymore in the later part of the paper. Does this imply any mechanism? Does EPB41L4A-AS1 knockdown or OE, or CRISPRa affect the expression of genes near the other interacting site, STARD4? Do genes located in the two adjacent TADs change more strongly as compared to other genes far away?

      We discuss this point in the Discussion section:

      “At the site of its own transcription, which overlaps a strong TAD boundary, EPB41L4A-AS1 is required to maintain expression of several adjacent genes, regulated at the level of transcription. Strikingly, the promoter of EPB41L4A-AS1 ranks in the 99.8th percentile of the strongest TAD boundaries in human H1 embryonic stem cells(Open2C et al., 2024; Salnikov et al., 2024). It features several CTCF binding sites (Fig. 2A), and in MCF-7 cells, we demonstrate that it blocks the propagation of the 4C signal between the two flanking TADSs (Fig. 1F). Future studies will help elucidate how EPB41L4A-AS1 transcription and/or the RNA product regulate this boundary. So far, we found that EPB41L4A-AS1 did not affect CTCF binding to the boundary, and while some peaks in the vicinity of EPB41L4A-AS1 were significantly affected by its loss, they did not appear to be found near genes that were dysregulated by its KD (Fig. S11C). We also found that KD of EPB41L4A-AS1—which depletes the RNA product, but may also affect the nascent RNA transcription(Lai et al., 2020; Lee & Mendell, 2020)—reduces the spatial contacts between the TAD boundary and the EPB41L4A promoter (Fig. 2E). Further elucidation of the exact functional entity needed for the cis-acting regulation will require detailed genetic perturbations of the locus, that are difficult to carry out in the polypoid MCF-7 cells, without affecting other functional elements of this locus or cell survival as we were unable to generate deletion clones despite several attempts.”

      As mentioned in the text (pasted below) and in Fig. 2F, most genes in the two flanking TADs become downregulated following EPB41L4A-AS1 KD. While STARD4 – which was chosen because it had spatial contacts above background with EPB41L4A-AS1 – did not reach statistical significance, others did and are highlighted. Those included NREP, which we also discuss:

      “Consistently with the RT-qPCR data, KD of EPB41L4A-AS1 reduced EPB41L4A expression, and also reduced expression of several, but not all other genes in the TADs flanking the lncRNA (Fig. 2F).Based on these data, EPB41L4A-AS1 is a significant cis-acting activator according to TransCistor (Dhaka et al., 2024) (P=0.005 using the digital mode). The cis-regulated genes reduced by EPB41L4A-AS1 KD included NREP, a gene important for brain development, whose homolog was downregulated by genetic manipulations of regions homologous to the lncRNA locus in mice(Salnikov et al., 2024). Depletion of EPB41L4A-AS1 thus affects several genes in its vicinity.”

      (7) Related to the description of SUB1 regulation of genes are DNA and RNA levels: "Of these genes, transcripts of only 56 genes were also bound by SUB1 at the RNA level, suggesting largely distinct sets of genes targeted by SUB1 at both the DNA and the RNA levels." SUB1 binding to chromatin by Cut&Run only indicates that it is close to DNA/chromatin, and this interaction with chromatin may still likely be mediated by RNAs. The authors used SUB1 binding sites in eCLIP-seq to suggest whether it acts via RNAs, but these binding sites are often from highly expressed gene mRNAs/exons. Standard analysis may not have examined low-abundance RNAs close to the gene promoters, such as promoter antisense RNAs. The authors can examine whether, for the promoters with cut&run peaks of SUB1, SUB1 eCLIP-seq shows binding to the low-abundance nascent RNAs near these promoters.

      In response to a related comment by Reviewer 1, we now show that when considering expression level–matched control genes, knockdown of EPB41L4A-AS1 still significantly affects expression of SUB1 targets over controls. The results are presented in Supplementary Figure 7 (Fig. S7C).

      Based on this analysis, while there is a tendency of increased expression with increased SUB1 binding, when controlling for expression levels the effect of down-regulation of SUB1-bound RNAs upon lncRNA knockdown remains, suggesting that it is not merely a confounding effect. We have updated the text as follows:

      “We hypothesized that loss of EPB41L4A-AS1 might affect SUB1, either via the reduction in its expression or by affecting its functions. We stratified SUB1 eCLIP targets into confidence intervals, based on the number, strength and confidence of the reported binding sites. Indeed, eCLIP targets of SUB1 (from HepG2 cells profiled by ENCODE) were significantly downregulated following. EPB41L4A-AS1 KD in MCF-7, with more confident targets experiencing stronger downregulation (Fig. 3C). Importantly, this still holds true when controlling for gene expression levels (Fig. S7C), suggesting that this negative trend is not due to differences in their baseline expression.”

      (8) Figure 8, the cellular phenotype is interesting. As EPB41L4A-AS1 is quite widely expressed, did it affect the phenotypes similarly in other breast cancer cells? MCF7 is not a particularly relevant metastasis model. Can a similar phenotype be seen in commonly used metastatic cell models such as MDA-MB-231?

      We agree that further expanding the models in which EPB41L4A-AS1 affects cellular proliferation, migration and any other relevant phenotype is of potential interest before considering targeting this lncRNA as a therapeutic approach. However, given that 1) others have already identified similar phenotypes upon the modulation of EPB41L4A-AS1 in a variety of different systems (see Results and Discussion), and 2) we were most interested in the molecular consequences following the loss of this lncRNA, we respectfully suggest that these experiments are out of scope of the current study.

      References

      Bahar Halpern, K., Caspi, I., Lemze, D., Levy, M., Landen, S., Elinav, E., Ulitsky, I., & Itzkovitz, S. (2015). Nuclear Retention of mRNA in Mammalian Tissues. Cell Reports, 13(12), 2653–2662.

      Brabletz, T., Kalluri, R., Nieto, M. A., & Weinberg, R. A. (2018). EMT in cancer. Nature Reviews. Cancer, 18(2), 128–134.

      Brzek, A., Cichocka, M., Dolata, J., Juzwa, W., Schümperli, D., & Raczynska, K. D. (2018). Positive cofactor 4 (PC4) contributes to the regulation of replication-dependent canonical histone gene expression. BMC Molecular Biology, 19(1), 9.

      Cheng, Y., Wang, S., Zhang, H., Lee, J.-S., Ni, C., Guo, J., Chen, E., Wang, S., Acharya, A., Chang, T.-C., Buszczak, M., Zhu, H., & Mendell, J. T. (2024). A non-canonical role for a small nucleolar RNA in ribosome biogenesis and senescence. Cell, 187(17), 4770–4789.e23.

      Conesa, C., & Acker, J. (2010). Sub1/PC4 a chromatin associated protein with multiple functions in transcription. RNA Biology, 7(3), 287–290.

      Contreras, X., Depierre, D., Akkawi, C., Srbic, M., Helsmoortel, M., Nogaret, M., LeHars, M., Salifou, K., Heurteau, A., Cuvier, O., & Kiernan, R. (2023). PAPγ associates with PAXT nuclear exosome to control the abundance of PROMPT ncRNAs. Nature Communications, 14(1), 6745.

      Costea, J., Schoeberl, U. E., Malzl, D., von der Linde, M., Fitz, J., Gupta, A., Makharova, M., Goloborodko, A., & Pavri, R. (2023). A de novo transcription-dependent TAD boundary underpins critical multiway interactions during antibody class switch recombination. Molecular Cell, 83(5), 681–697.e7.

      Das, C., Hizume, K., Batta, K., Kumar, B. R. P., Gadad, S. S., Ganguly, S., Lorain, S., Verreault, A., Sadhale, P. P., Takeyasu, K., & Kundu, T. K. (2006). Transcriptional coactivator PC4, a chromatin-associated protein, induces chromatin condensation. Molecular and Cellular Biology, 26(22), 8303–8315.

      Dhaka, B., Zimmerli, M., Hanhart, D., Moser, M. B., Guillen-Ramirez, H., Mishra, S., Esposito, R., Polidori, T., Widmer, M., García-Pérez, R., Julio, M. K., Pervouchine, D., Melé, M., Chouvardas, P., & Johnson, R. (2024). Functional identification of cis-regulatory long noncoding RNAs at controlled false discovery rates. Nucleic Acids Research, 52(6), 2821–2835.

      Didiot, M.-C., Ferguson, C. M., Ly, S., Coles, A. H., Smith, A. O., Bicknell, A. A., Hall, L. M., Sapp, E., Echeverria, D., Pai, A. A., DiFiglia, M., Moore, M. J., Hayward, L. J., Aronin, N., & Khvorova, A. (2018). Nuclear Localization of Huntingtin mRNA Is Specific to Cells of Neuronal Origin. Cell Reports, 24(10), 2553–2560.e5.

      Dongre, A., & Weinberg, R. A. (2019). New insights into the mechanisms of epithelial-mesenchymal transition and implications for cancer. Nature Reviews. Molecular Cell Biology, 20(2), 69–84.

      Dubois, J.-C., Bonnell, E., Filion, A., Frion, J., Zimmer, S., Riaz Khan, M., Teplitz, G. M., Casimir, L., Méthot, É., Marois, I., Idrissou, M., Jacques, P.-É., Wellinger, R. J., & Maréchal, A. (2025). The single-stranded DNA-binding factor SUB1/PC4 alleviates replication stress at telomeres and is a vulnerability of ALT cancer cells. Proceedings of the National Academy of Sciences of the United States of America, 122(2), e2419712122.

      Garavís, M., & Calvo, O. (2017). Sub1/PC4, a multifaceted factor: from transcription to genome stability. Current Genetics, 63(6), 1023–1035.

      Gil, N., & Ulitsky, I. (2020). Regulation of gene expression by cis-acting long non-coding RNAs. Nature Reviews. Genetics, 21(2), 102–117.

      Hou, Y., Gan, T., Fang, T., Zhao, Y., Luo, Q., Liu, X., Qi, L., Zhang, Y., Jia, F., Han, J., Li, S., Wang, S., & Wang, F. (2022). G-quadruplex inducer/stabilizer pyridostatin targets SUB1 to promote cytotoxicity of a transplatinum complex. Nucleic Acids Research, 50(6), 3070–3082.

      Jan, C. H., Friedman, R. C., Ruby, J. G., & Bartel, D. P. (2011). Formation, regulation and evolution of Caenorhabditis elegans 3’UTRs. Nature, 469(7328), 97–101.

      Kaypee, S., Ochiai, K., Shima, H., Matsumoto, M., Alam, M., Ikura, T., Kundu, T. K., & Igarashi, K. (2025). Positive coactivator PC4 shows dynamic nucleolar distribution required for rDNA transcription and protein synthesis. Cell Communication and Signaling : CCS, 23(1), 283.

      Lai, F., Damle, S. S., Ling, K. K., & Rigo, F. (2020). Directed RNase H Cleavage of Nascent Transcripts Causes Transcription Termination. Molecular Cell, 77(5), 1032–1043.e4.

      Lee, J.-S., & Mendell, J. T. (2020). Antisense-Mediated Transcript Knockdown Triggers Premature Transcription Termination. Molecular Cell, 77(5), 1044–1054.e3.

      Lubelsky, Y., & Ulitsky, I. (2018). Sequences enriched in Alu repeats drive nuclear localization of long RNAs in human cells. Nature, 555(7694), 107–111.

      Ly, S., Didiot, M.-C., Ferguson, C. M., Coles, A. H., Miller, R., Chase, K., Echeverria, D., Wang, F., Sadri-Vakili, G., Aronin, N., & Khvorova, A. (2022). Mutant huntingtin messenger RNA forms neuronal nuclear clusters in rodent and human brains. Brain Communications, 4(6), fcac248.

      Maranon, D. G., & Wilusz, J. (2020). Mind the Gapmer: Implications of Co-transcriptional Cleavage by Antisense Oligonucleotides. Molecular Cell, 77(5), 932–933.

      Monziani, A., & Ulitsky, I. (2023). Noncoding snoRNA host genes are a distinct subclass of long noncoding RNAs. Trends in Genetics : TIG, 39(12), 908–923.

      Mortusewicz, O., Roth, W., Li, N., Cardoso, M. C., Meisterernst, M., & Leonhardt, H. (2008). Recruitment of RNA polymerase II cofactor PC4 to DNA damage sites. The Journal of Cell Biology, 183(5), 769–776.

      Much, C., Lasda, E. L., Pereira, I. T., Vallery, T. K., Ramirez, D., Lewandowski, J. P., Dowell, R. D., Smallegan, M. J., & Rinn, J. L. (2024). The temporal dynamics of lncRNA Firre-mediated epigenetic and transcriptional regulation. Nature Communications, 15(1), 6821.

      Ntini, E., Louloupi, A., Liz, J., Muino, J. M., Marsico, A., & Ørom, U. A. V. (2018). Long ncRNA A-ROD activates its target gene DKK1 at its release from chromatin. Nature Communications, 9(1), 1636.

      Oo, J. A., Warwick, T., Pálfi, K., Lam, F., McNicoll, F., Prieto-Garcia, C., Günther, S., Cao, C., Zhou, YGavrilov, A. A., Razin, S. V., Cabrera-Orefice, A., Wittig, I., Pullamsetti, S. S., Kurian, L., Gilsbach, R., Schulz, M. H., Dikic, I., Müller-McNicoll, M., … Leisegang, M. S. (2025). Long non-coding RNAs direct the SWI/SNF complex to cell type-specific enhancers. Nature Communications, 16(1), 131.

      Open2C, Abdennur, N., Abraham, S., Fudenberg, G., Flyamer, I. M., Galitsyna, A. A., Goloborodko, A., Imakaev, M., Oksuz, B. A., Venev, S. V., & Xiao, Y. (2024). Cooltools: Enabling high-resolution Hi-C analysis in Python. PLoS Computational Biology, 20(5), e1012067.

      Potapova, T. A., Unruh, J. R., Conkright-Fincham, J., Banks, C. A. S., Florens, L., Schneider, D. A., & Gerton, J. L. (2023). Distinct states of nucleolar stress induced by anticancer drugs. https://doi.org/10.7554/eLife.88799.

      Ray, D., Laverty, K. U., Jolma, A., Nie, K., Samson, R., Pour, S. E., Tam, C. L., von Krosigk, N., Nabeel-Shah, S., Albu, M., Zheng, H., Perron, G., Lee, H., Najafabadi, H., Blencowe, B., Greenblatt, J., Morris, Q., & Hughes, T. R. (2023). RNA-binding proteins that lack canonical RNA-binding domains are rarely sequence-specific. Scientific Reports, 13(1), 5238.

      Salgado, S., Abreu, P. L., Moleirinho, B., Guedes, D. S., Larcombe, L., & Azzalin, C. M. (2024). Human PC4 supports telomere stability and viability in cells utilizing the alternative lengthening of telomeres mechanism. EMBO Reports, 25(12), 5294–5315.

      Salnikov, P., Korablev, A., Serova, I., Belokopytova, P., Yan, A., Stepanchuk, Y., Tikhomirov, S., & Fishman, V. (2024). Structural variants in the Epb41l4a locus: TAD disruption and Nrep gene misregulation as hypothetical drivers of neurodevelopmental outcomes. Scientific Reports, 14(1), 5288.

      Sarshad, A. A., Juan, A. H., Muler, A. I. C., Anastasakis, D. G., Wang, X., Genzor, P., Feng, X., Tsai, P.-F., Sun, H.-W., Haase, A. D., Sartorelli, V., & Hafner, M. (2018). Argonaute-miRNA Complexes Silence Target mRNAs in the Nucleus of Mammalian Stem Cells. Molecular Cell, 71(6), 1040–1050.e8.

      Tafforeau, L., Zorbas, C., Langhendries, J.-L., Mullineux, S.-T., Stamatopoulou, V., Mullier, R., Wacheul, L., & Lafontaine, D. L. J. (2013). The complexity of human ribosome biogenesis revealed by systematic nucleolar screening of Pre-rRNA processing factors. Molecular Cell, 51(4), 539–551.

      Unfried, J. P., & Ulitsky, I. (2022). Substoichiometric action of long noncoding RNAs. Nature Cell Biology, 24(5), 608–615.

      Van Nostrand, E. L., Freese, P., Pratt, G. A., Wang, X., Wei, X., Xiao, R., Blue, S. M., Chen, J.-Y.,Cody, N. A. L., Dominguez, D., Olson, S., Sundararaman, B., Zhan, L., Bazile, C., Bouvrette, L. P. B., Bergalet, J., Duff, M. O., Garcia, K. E., Gelboin-Burkhart, C., … Yeo, G. W. (2020). A large-scale binding and functional map of human RNA-binding proteins. Nature, 583(7818), 711–719.

      Yang, K., Wang, M., Zhao, Y., Sun, X., Yang, Y., Li, X., Zhou, A., Chu, H., Zhou, H., Xu, J., Wu, M., Yang, J., & Yi, J. (2016). A redox mechanism underlying nucleolar stress sensing by nucleophosmin. Nature Communications, 7, 13599.

      Youssef, K. K., Narwade, N., Arcas, A., Marquez-Galera, A., Jiménez-Castaño, R., Lopez-Blau, C., Fazilaty, H., García-Gutierrez, D., Cano, A., Galcerán, J., Moreno-Bueno, G., Lopez-Atalaya, J. P., & Nieto, M. A. (2024). Two distinct epithelial-to-mesenchymal transition programs control invasion and inflammation in segregated tumor cell populations. Nature Cancer, 5(11), 1660–1680.

      Yu, L., Ma, H., Ji, X., & Volkert, M. R. (2016). The Sub1 nuclear protein protects DNA from oxidative damage. Molecular and Cellular Biochemistry, 412(1-2), 165–171.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      In their previous publication (Dong et al. Cell Reports 2024), the authors showed that citalopram treatment resulted in reduced tumor size by binding to the E380 site of GLUT1 and inhibiting the glycolytic metabolism of HCC cells, instead of the classical citalopram receptor. Given that C5aR1 was also identified as the potential receptors of citalopram in the previous report, the authors focused on exploring the potential of immune-dependent anti-tumor effect of citalopram via C5aR1. C5aR1 was found to be expressed on tumor-associated macrophages (TAMs) and citalopram administration showed potential to improve the stability of C5aR1 in vitro. Through macrophage depletion and adoptive transfer approaches in HCC mouse models, the data demonstrated the potential importance of C5aR1-expressing macrophage in the anti-tumor effect of citalopram in vivo. Mechanistically, their in vitro data suggested that citalopram may regulate the phagocytosis potential and polarization of macrophages through C5aR1. Next, they tried to investigate the direct link between citalopram and CD8+T cells by including an additional MASH-associated HCC mouse model. Their data suggest that citalopram may upregulate the glycolytic metabolism of CD8+T cells, probability via GLUT3 but not GLUT1-mediated glucose uptake. Lastly, as the systemic 5-HT level is down-regulated by citalopram, the authors analyzed the association between a low 5-HT and a superior CD8+T cell function against tumor. Although the data is informative, the rationale for working on additional mechanisms and logical link among different parts are not clear. In addition, some of the conclusion is also not fully supported by the current data. 

      Strengths: 

      The idea of repurposing clinical-in-used drugs showed great potential for immediate clinical translation. The data here suggested that the anti-depression drug, citalopram displayed immune regulatory role on TAM via a new target C5aR1 in HCC. 

      Comments on revised version: 

      The authors have addressed most of my concerns about the paper.

      We thank you the reviewer. We appreciate the reviewer’s constructive suggestions that helped improve the clarity and robustness of the study.

      Reviewer #2 (Public review):

      Summary: 

      Dong et al. present a thorough investigation into the potential of repurposing citalopram, an SSRI, for hepatocellular carcinoma (HCC) therapy. The study highlights the dual mechanisms by which citalopram exerts anti-tumor effects: reprogramming tumor-associated macrophages (TAMs) toward an anti-tumor phenotype via C5aR1 modulation and suppressing cancer cell metabolism through GLUT1 inhibition, while enhancing CD8+ T cell activation. The findings emphasize the potential of drug repurposing strategies and position C5aR1 as a promising immunotherapeutic target.

      Strengths:

      It provides detailed evidence of citalopram's non-canonical action on C5aR1, demonstrating its ability to modulate macrophage behavior and enhance CD8+ T cell cytotoxicity. The use of DARTS assays, in silico docking, and gene signature network analyses offers robust validation of drug-target interactions. Additionally, the dual focus on immune cell reprogramming and metabolic suppression presents a comprehensive strategy for HCC therapy. By highlighting the potential for existing drugs like citalopram to be repurposed, the study also emphasizes the feasibility of translational applications. During revision, the authors experimentally demonstrated that TAM has lower GLUT1, which further strengthens their claim of C5aR1 modulation-dependent TAM improvement for tumor therapy.

      Weaknesses:

      The authors proposed that CD8+ T cells have an TAM-independent role upon Citalopram treatment. However, this claim requires further investigation to confirm that the effect is truly "TAM independent".

      We appreciate the reviewer’s insightful comment regarding the interpretation of CD8<sup>+</sup> T cell roles. In this study, in vitro analyses show that citalopram directly enhances CD8<sup>+</sup>T cell activity, as evidenced by increased CFSE proliferation, upregulation of activation markers, and cytotoxic effector readouts (Figures S10A–E). Accordingly, we infer a TAM-independent CD8<sup>+</sup> T cell activation by citalopram in vitro.

      Our in vivo data indicate that the primary anti-tumor mechanism of citalopram involves targeting C5aR1<sup>+</sup> TAMs, which subsequently enhances CD8<sup>+</sup> T cell immunity. This conclusion is supported by the near-complete ablation of citalopram’s therapeutic effect upon TAM depletion with clodronate liposomes (Figure S5). Additionally, citalopram reduces serum serotonin (5-HT) levels (Figure 4E), recapitulating the serotonergic state of Tph1<sup>−/−</sup> mice. Notably, the anti-tumor effect and CD8<sup>+</sup> T cell activation induced by citalopram exceed those observed in Tph1<sup>−/−</sup> mice (Figures 4G–I), suggesting that 5-HT reduction contributes to CD8<sup>+</sup> T cell activation but operates alongside other mechanisms in vivo, prominently including TAM targeting. As suggested, we further tested CD8<sup>+</sup> T cell activity in the context of macrophage depletion. The result showed that citalopram did not further enhance CD8<sup>+</sup> T cell cytotoxicity after macrophage depletion, indicating that TAM-dependent pathways are central to CD8<sup>+</sup> T cell–mediated anti-tumor immunity and largely underlie the anti-tumor effects of citalopram.

      To accurately reflect our main findings, we had made several revisions to the manuscript. First, we have revised the title to “Citalopram exhibits immune-dependent anti-tumor effects by modulating C5aR1<sup>+</sup> TAMs”. In the Results section, the Conclusions have been updated to: “These data not only corroborate recent reports that SSRIs modulate CD8<sup>+</sup> T cell function via serotonergic-dependent mechanism, but also reveals additional in vivo regulatory avenues by which citalopram affects CD8<sup>+</sup> T cells, such as its ability to reprogram C5aR1<sup>+</sup> TAMs. Notably, in the context of macrophage depletion, CD8<sup>+</sup> T cell cytotoxicity was not further enhanced by citalopram, indicating that TAM-dependent pathways are central to CD8<sup>+</sup> T cell-mediated anti-tumor immunity and largely underlie the anti-tumor effects of citalopram”. In the Discussion part, we have included the following content: “Although citalopram directly stimulates CD8<sup>+</sup> T cells in vitro, the TAM-independent activation is not evident in vivo within the complex TME, as CD8<sup>+</sup> T cell responses are abolished by macrophage depletion, indicating that the in vivo effects of citalopram on CD8<sup>+</sup> T cells and tumor growth are largely TAM-dependent”.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Fig S5 and Fig 3: To improve clarity regarding the roles of TAMs and CD8+ T cells, can the authors experimentally demonstrate the macrophage-independent function of CD8+ T cells? An experiment in Fig 3J using or not using Clodro-Liposome to deplete TAMs would be more informative.

      We thank the reviewer for the insightful suggestion. In this study, in vitro analyses show that citalopram directly enhances CD8<sup>+</sup> T cell activity, as evidenced by increased CFSE proliferation, upregulation of activation markers, and cytotoxic effector readouts (Figures S10A–E). Therefore, we conclude a TAM-independent CD8<sup>+</sup> T cell activation induced by citalopram. Previously, in Figure S5, we analyzed the therapeutic effect of citalopram after macrophage depletion by clodronate liposomes and also probed the immune profiles. The result showed that CD8<sup>+</sup> T cell cytotoxic activities were not significantly affected by citalopram in this context (Figure S5E), indicating that the TAM-dependent pathway is central to CD8<sup>+</sup> T cell-mediated anti-tumor immunity and to the anti-tumor effects of citalopram. We have incorporated this result into the revised manuscript.

      Fig S4: The figure panel showing sample/treatment annotations is missing.

      Thank you for pointing this out. We have updated Fig. S4 to include explicit sample identifiers, treatment group labels, and drug concentrations.

      Since Glut3 is vital in both TAMs and CD8+ T cells, the authors should discuss the interaction between Glut3 and Citalopram. Additionally, include details about the structural homology between Glut1 and Glut3 in the discussion.

      Thank you for the suggestion. Citalopram was docked into the GLUT1 substrate-binding pocket, with the best poses showing an electrostatic interaction centered on E380 accompanied by hydrophobic contacts within the pocket (Our previous publication, Dong et al. Cell Reports 2024). Although GLUT1 and GLUT3 share a highly conserved core substrate-binding pocket, isoform-specific regulation arises from features outside the canonical site. Structural homology between GLUT1 and GLUT3 is high in the transmembrane core, but regulatory features, such as the cytosolic Sugar Porter (SP) motif network, the conserved A motif, lipid interfaces, and gating dynamics, differ between the two isoforms (PMID: 33536238). These regulatory differences can alter pocket accessibility, coupling to conformational transitions, and allosteric communication with the cytosol, such that a ligand binding GLUT1 in the inward-facing state may not stabilize a GLUT3 conformation that yields appreciable transport inhibition. Consistently, functional experiments have indicated robust GLUT1 engagement in cancer cells (Dong et al. Cell Reports 2024), while equivalent GLUT3 inhibition has not been observed in TAMs (Figure S8), suggesting isoform-selective targeting by citalopram. We have included these discussion in the revised manuscript.

      Fig 3O: Please clarify the statement regarding the requirements of CD8 T cells for the pro-tumor phenotype of C5aR1+ TAMs. Specify whether this relates to a pro- or anti-tumor effect of CD8 T cells.

      Thanks. As suggested, we have improved the statement as follows: “depletion of CD8<sup>+</sup> T cells abrogated the C5aR1<sup>+</sup> TAM-mediated enhancement of tumor growth (Figure 3O), suggesting that the anti-tumor effects of CD8<sup>+</sup> T cells are required for the pro-tumor phenotype of C5aR1<sup>+</sup> TAMs”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work presents an interesting circuit dissection of the neural system allowing a ctenophore to keep its balance and orientation in its aquatic environment by using a fascinating structure called the statocyst. By combining serial-section electron microscopy with behavioral recordings, the authors found a population of neurons that exists as a syncytium and could associate these neurons with specific functions related to controlling the beating of cilia located in the statocyst. The type A ANN neurons participate in arresting cilia beating, and the type B ANN neurons participate in resuming cilia beating and increasing their beating frequency.

      Moreover, the authors found that bridge cells are connected with the ANN neurons, giving them the role of rhythmic modulators.

      From these observations, the authors conclude that the control is coordination instead of feedforward sensory-motor function, a hypothesis that had been put forth in the past but could not be validated until now. They also compare it to the circuitry implementing a similar behavior in a species that belongs to a different phylum, where the nervous system is thought to have evolved separately.

      Therefore, this work significantly advances our knowledge of the circuitry implementing the control of the cilia that participate in statocyst function, which ultimately allows the animal to correct its orientation. It represents an example of systems neuroscience explaining how the nervous system allows an animal to solve a specific problem and puts it in an evolutionary perspective, showing a convincing case of convergent evolution.

      Strengths:

      The evidence for how the circuitry is connected is convincing. Pictures of synapses showing the direction of connectivity are clear, and there are good reasons to believe that the diagram inferred is valid, even though we can always expect that some connections are missing.

      The evidence for how the cilia change their beating frequency is also convincing, and the paradigm and recording methods seem pretty robust.

      The authors achieved their aims, and the results support their conclusions. This work impacts its field by presenting a mechanism by which ctenophores correct their balance, which will provide a template for comparison with other sensory systems.

      Thank you very much for these comments.

      Weaknesses:

      The evidence supporting the claim that the neural circuitry presented here controls the cilia beating is more correlational because it only relies on the fact that the location of the two types of ANN neurons coincides with the quadrants that are affected in the behavioral recordings. Discussing ways by which causality could be established might be helpful.

      We have now added additional discussions in a new “Future Directions” section explaining that for example calcium imaging or targeted neuron ablations could be used in future work to establish causality. This would require the development of genetic delivery techniques to e.g. introduce GCaMP calcium sensor or transgenic reporters.

      The explanation of the relevance of this work could be improved. The conclusion that the work hints at coordination instead of feedforward sensory-motor control is explained over only a few lines. The authors could provide a more detailed explanation of how the two models compete (coordination vs feedforward sensory-motor control), and why choosing one option over the other could provide advantages in this context.

      We added a more detailed explanation about the two types of model and why we believe that a coordination model is more compatible with our connectome data.

      “An alternative model for the function of the nerve net would be a feedforward sensory-motor system, in which balancer cells provide mechanosensory input to motor effectors via the nerve net, similar to a reflex arc. None of our observations support such a sensory-motor model. There are no synaptic pathways from balancer cells or any other sensory cells to the nerve net. The only synaptic input to ANNs comes from the bridge cells (discussed below) and from each other. The three synaptically interconnected ANNs may generate endogenous rhythm that controls balancer cilia and is influenced by bridge input. ANNs may also be influenced by neuropeptides secreted by other aboral organ neurons. Such chemical inputs may underlie the flexibility of gravitaxis and its modulation by other cues (e.g. light). Overall, the coordination model parsimoniously explains both the ANN wiring topology and the observed dynamics, whereas a simple feedforward reflex does not.”

      Since the fact that the ANN neurons form a syncytium is an important finding of this study, it would be useful to have additional illustrations of it. For instance, pictures showing anastomosing membranes could typically be added in Figure 2.

      We have now included a movie (Video 3) showing a volumetric reconstruction of a segment of an ANN neuron, which highlights the anastomosing morphology in greater detail than static images.

      “Video 3. Volumetric reconstruction of a single ANN Q1-4 neuron showing syncytial soma (cyan) and nuclei (magenta). The rotating view highlights the anastomosing morphology, although not all fine details could be reconstructed due to data limitations.”

      Also, to better establish the importance of the study, it could be useful to explain why the balancers’ cilia spontaneously beat in the first place (instead of being static and just acting as stretch sensors).

      We have discussed in more detail why it may be important for the balancer cilia to beat.

      “The observation that balancer cilia beat spontaneously, even in the absence of external tilt, suggests that they are active sensory oscillators rather than static stretch sensors. Their spontaneous beating could set a dynamic baseline of sensitivity, which can then be modulated by ANN inputs or sensory changes during tilt. Such a dynamic system may be more sensitive to small deflections and be more responsive [@Lowe1997]. Thus, the regulated beating of balancer cilia should not be seen as noise, but as an adaptive feature that enables flexible and robust graviceptive responses. The ctenophore balancer may thus use active ciliary oscillations for enhanced sensorimotor integration similar to other sensory systems [@Wan_2023].”

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors describe the production of a high-resolution connectome for the statocyst of a ctenophore nervous system. This study is of particular interest because of the apparent independent evolution of the ctenophore nervous system. The statocyst is a component of the aboral organ, which is used by ctenophores to sense gravity and regulate the activity of the organ’s balancer cilia. The EM reconstruction of the aboral organ was carried out on a five-day-old larva of the model ctenophore Mnemiopsis leidyi. To place their connectome data in a functional context, the authors used high-speed imaging of ciliary beating in immobilized larvae. With these data, the authors were able to model the circuitry used for gravity sensing in a ctenophore larva.

      Strengths:

      Because of it apparently being the sister phylum to all other metazoans, Ctenophora is a particularly important group for studies of metazoan evolution. Thus, this work has much to tell us about how animals evolved. Added to that is the apparent independent evolution of the ctenophore nervous system. This study provides the first high-resolution connectomic analysis of a portion of a ctenophore nervous system, extending previous studies of the ctenophore nervous system carried out by Sid Tamm. As such, it establishes the methodology for high-resolution analysis of the ctenophore nervous system. While the generation of a connectome is in and of itself an important accomplishment, the coupling of the connectome data with analysis of the beating frequency of balancer cell cilia provides a functional context for understanding how the organization of the neural circuitry in the aboral organ carries out gravity sensing. In addition, the authors identified a new type of syncytial neuron in  Mnemiopsis. Interestingly, the authors show that the neural circuitry controlling cilia beating in Mnemiopsis shares features with the circuitry that controls ciliary movement in the annelid Platynereis, suggesting convergent evolution of this circuitry in the two organisms. The data in this paper are of high quality, and the analyses have been thoroughly and carefully done.

      Weaknesses:

      The paper has no obvious weaknesses.

      We thank the reviewer for these comments.

      Reviewer #3 (Public review):

      Summary:

      It has been a long time since I enjoyed reviewing a paper as much as this one. In it, the authors generate an unprecedented view of the aboral organ of a 5-day-old ctenophore. They proceed to derive numerous insights by reconstructing the populations and connections of cell types, with up to 150 connections from the main Q1-4 neuron.

      Strengths:

      The strengths of the analysis are the sophisticated imaging methods used, the labor-intensive reconstruction of individual neurons and organelles, and especially the mapping of synapses. The synaptic connections to and from the main coordinating neurons allow the authors to create a polarized network diagram for these components of the aboral organ. These connections give insight into the potential functions of the major neurons. This also gives some unexpected results, particularly the lack of connections from the balancer system to the coordinating system.

      Thank you for these positive comments on the paper.

      Weaknesses:

      There were no significant weaknesses in the paper - only a slate of interesting unanswered questions to motivate future studies.

      Recommendations for the authors:

      Reviewing Editor Comments:

      In consultation, the reviewers recommend that improving the evidence to “exceptional” would require additional perturbation experiments (e.g., ablation of specific neurons), as Reviewer 1 suggests. They also recommend adding a “Future Directions” section to the manuscript, because it opens up so many new experimental directions.

      We have added a new “Future Directions” section at the end of the Discussion. To carry out the proposed perturbation or calcium imaging experiments would require significant additional work and method development. We are actively working in establishing mRNA and DNA injection into ctenophore zygotes to enable live imaging, cell labelling or ablations in the future.

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data, or analyses:

      To establish causality (neurons control balancer cilia), an important experiment would be to manipulate each of these neuronal populations (e.g., by ablating them) and measure the effect of these ablations on the beating frequency of the balancer cilia of the four quadrants. Moreover, direct observation of neuronal activity (e.g., by using calcium imaging) would also provide more compelling evidence for neuronal control.

      We agree with the reviewer that such perturbation experiments would be needed to establish causality. Such experiments are currently still not possible in ctenophoes and would require significant technology development. We discuss such experiments in the “Future directions” section and also place this in the context of the currently available techniques in ctenophores. We are actively working on this but waiting for such technological breakthroughs and new experiments would significantly delay the publication of a version of record of the paper.

      Recommendations for improving the writing and presentation:

      ANN neurons are described in great detail, though SNN neurons are described more loosely. Perhaps a more detailed description of SNN neurons would be helpful.

      We added the information on SNNs to show that these cells are distinct from the ANN neurons. Since our focus is on the aboral organ, we did not aim for a comprehensive reconstruction of SNNs. Several of the processes of the SNNs are also truncated and outside our EM volume. We have nevertheless added additional details about the morphology and connectivity of SNN neurons.

      “Near the perifery of the aboral organ, we identified four further anastomosing nerve-net neurons. These resembled the previously reported syncytial subepithelial nerve net (SNN) neurons in the body wall of Mnemiopsis (Figure 2–figure supplement 1C–G) and were clearly distinct from the ANN neurons (both in location and morphology). SNN neurons show a blebbed morphology and contain dense core vesicles @Burkhardt2023 but no synapses.”

      Minor corrections to the text and figures:

      (1) Figure 2 C): “mitochondia” instead of “mitochondria”.

      corrected

      (2) Figure 3. Title: “balancer and and bridge”.

      corrected

      (3) Figure 3.C) “shown in xxx color”

      corrected

      Reviewer #2 (Recommendations for the authors):

      Clearer usage of the terms statocyst, aboral organ, aboral nerve net, statolith, dome, and lithocytes would be helpful. For readers not familiar with ctenophore anatomy, things can get a bit confusing. A single schematic with all of these terms would be helpful. In Figure 1E, there is a label “dc”. Should this be “do”?

      We have added an annotated schematic to Figure 1, explaining these terms.

      Figure 1C “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      Reviewer #3 (Recommendations for the authors):

      My comments are numerous, but mostly minor suggestions for improving the clarity.

      [Suggested insertions/changes are indicated by square brackets]

      (1) [It would be much easier to review this if there were line numbers, or with a double-spaced manuscript that was more accommodating for markup.]

      Thank you for this comment. We have increased the line spacing in the revised version. (We set the CSS line-height property on the html ‘body’ element to 2em).

      (2) The terms statolith, statocyst, and lithocytes can be confusing, so it would be nice to have an upfront definition of how they relate to each other.

      We have now explain these terms in the Introduction and also have improved the annotation of Figure 1.

      Figure1C. “The statocyst is a cavity-like organ enclosed by the dome cilia (do), which contains the statolith formed by lithocytes (li) and supported by the balancer cilia (bal).”

      (3) Statolith is spelled as statolyth in the early pages, but statolith in the later pages. I think -lith is more common, but in any case, these should be standardized.

      corrected to ‘statolith’

      ABSTRACT:

      (1) Differential load[s] on the balancer cilia [lead] to altered

      changed

      (2) We used volume electron microscopy (vEM) to image the aboral organ.

      changed

      (3) also form reciprocal connections with the bridge cells.

      corrected

      INTRODUCTION:

      (1) “identify conserved neuronal markers in ctenophores” - confusing - does this mean conserved across ctenophores, or conserved in ctenophores and other animals?

      changed to “classical neuronal markers”

      (2) “either increase or decrease their [ciliary] activity, indicating” - otherwise it sounds like the balancers are increasing activity.

      changed to “balancer cells may either increase or decrease their ciliary activity”

      (3) after “matches the setup used in high-speed imagine experiments”, it might be nice to add a statement like “Future studies could potentially investigate activity in the inverted orientation, when the statolith is suspended below the cilia, to see if the response differs.”

      In this sentence we referred to the orientation of the animals in our figures. There is a consensus among ctenophore researchers that when depicting ctenophores, the aboral organ should face downwards. However, for this paper we chose the opposite orientation to better match our experiments and help interpreting the results. We changed the text to: “In this study, we represent ctenophores with their aboral organ facing upwards (”balancer-up” posture), as this configuration facilitates intuitive interpretation of balance-like functions and matches the setup used in high-speed imaging experiments. ”

      We added the sentences “Future experiments could also explore how orientation affects the response of balancer cilia. For example, when the statolith is suspended below the cilia (the”balancer-down” posture), ciliary beating patterns may differ from what we observed here in the “balancer-up” configuration.” to the section Future Directions”.

      (4) “abolished by calcium[-]channel inhibitors”

      corrected

      (5) “By functional imaging, we uncovered” - It is not clear what functional imaging is. Maybe a fewword definition here, and be sure to explain in the methods.

      changed to “By high-speed ciliary imaging”. The details of the imaging are explained in the Methods section under “Imaging the Activity of Balancer Cilia”.

      RESULTS:

      (1) “five-day-old” - is it worth saying post-fertilization here?

      Thank you for pointing this out. In accordance with Presnell et al. (2022), we use post-hatching as the reference. We have revised the text in the Materials and Methods section to read: “5-day-old (5 days post-hatching)”

      (2) “We classified these cells into cell types [based on …]” - specify a bit about how you classified them based on morphology, the presence of organelles, etc.

      We added a clarification. “Our classification was based on i) ultrastructural features (e.g. number of cilia), ii) cell morphology (e.g. nerve net or bridge cells), iii) unique organelles (e.g. lamellate body, plumose cells), iv) and similarities to cell types previously described by EM. Our classification agrees with the cell types identified in the 1-day-old larva [@ferraioli2025].”

      (3) “CATMAID only supports [bifurcating] skeleton trees” - Correct?

      yes, a node in CATMAID cannot be fused to another node of the same skeleton to represent anastomoses

      FIGURE 1:

      (1) It is not worth redrawing and renumbering everything, but I wish the lateral view in A matched the rotated aboral view in B, instead of having to do two rotations to get the alignment to coincide. (Rotating panel B 90{degree sign} clockwise would make them match, but then it wouldn’t coincide with all the subsequent figures.)

      Thank you for the suggestion. We have replaced panel A with a lateral view that now matches panel B.

      (2) The labels on Figure 1 are a mix of two typefaces (Helvetica and Myriad?). They should be standardized to all use one typeface (preferably Helvetica).

      we have changed the font to Helvetica

      (3) Panel C legend: arrows are not really arrows. Say “Eye icons” or something like that. Can you show the location of the anal pores in the DIC image?

      Changed to ‘eye icons’. The anal pores are usually closed and only open briefly therefore it is not clear where exactly they would be, so indicating their position would be misleading.

      (4) Panel F, I cannot see the lines mentioned in the legend at all, except for maybe a tiny wisp in a couple of places. Either omit or make visible.

      changed to “The spheres indicate the position of nuclei in the reconstructed cells.”

      (5) Panel G. “Cells are color coded according to quadrants”… but unfortunately, the color scale is 90{degree sign} off of what is presented in the rest of the panels and the paper. Q1 and Q3 have been blue, but now Q2+4 are blue/purple, while Q1+3 are orange/yellow. Again, it seems like too much work to recolor panel G, but in future, it would be nice to maintain that consistency, especially since other panels specifically mention the consistent colors.

      We have changed the color code in panels B, C and E to match G and the subsequent panels/figures.

      RESULTS: Aboral synaptic nerve net

      (1)“We reconstructed three aboral nerve-net (ANN) neurons” - out of how many total? Were these three just the first ones traced, or are they likely to be all of the multi-domain neurons? One can’t tell if these are the top 3 (out of X), or if there are other multi-quad neurons that were not traced. Are there any Q1Q4 or Q2Q3 neurona? Specify overall composition.

      There are only three ANN neurons in the aboral organ. These are all completely reconstructed and contained within the volume. We have clarified this in the text. “We identified and reconstructed three aboral nerve-net (ANN) neurons, each exhibiting a syncytial morphology characterized by anastomosing membranes and multiple nuclei (ranging from two to five) (Figure 2A and B, Figure 2–figure supplement 1C). These three neurons are the only fully reconstructed ANN neurons contained within the volume. Several small ANN-like fragments were also observed at the periphery of the aboral organ, but their connectivity to the main ANN remains uncertain.”

      FIGURE 2:

      (1) Panel C: “N > 2 cells for each cell type” - is that supposed to say “N > 2 mitochondria”? More than 2 cells in all the types shown in the graph.

      It is number of cells for each cell type

      (2) Panel D: Is this the wrong caption? I can only see green and black circles, not red, yellow, or blue. Make them larger or “flat” (circled, not shaded spheres) if they are supposed to be visible

      Thank you for pointing this out. The caption was incorrect and has been corrected to match the figure.

      (3) Panel E: Amazing to see the cross-network connections!

      Thank you

      (4) Again, it is great to see the three ANN mapped out, but … are there other connections that weren’t mapped in this study? Other high-level coordinating neurons? ANN_Q1Q4 or Q2Q3?

      The reconstruction is complete and there are no other neurons or connections. Given the large size of ctenophore synapses, we are confident that we identified all or most synapses and their connections.

      RESULTS: Synaptic connectome

      (1) “displaying rotational symmetry” - This is one of the things I am most curious about. Where is the evidence of rotational symmetry in the network diagram? Is it the larger number of connections to Q2 and Q4? Any evidence of rotational symmetry, like Q1 and Q3 connect to Q2 and Q4 respectively, but not the other way around?

      changed to “displaying biradial symmetry”, we do not consider the slight difference in synapse number from ANN Q1-4 to the Q1-Q3 vs. Q2-Q4 balancers as significant or strong enough evidence for a single rotational symmetry (i.e. 180 degrees rotation)

      (2) “Surprisingly” - this *was* really surprising. There have to be some afferent neurons connecting from the balancers, don’t there? I can’t remember the connections to the SNN, but is there a tertiary set of ANNs that connect between the balancers and the top 3 ANNs? I would like a little more discussion about this.

      Indeed, this is why this is so surprising. Most people would have expected some output connections from the balancer to the nerve net or elsewhere. There are none. We have the complete balancer network and all balancer cells are ‘sink nodes’ (inputs only)(Figure3–figure supplement 1).

      we added a short statement in the beginning of the Bridge Cells as Feedback Regulators of Ciliary Rhythms section noting that no direct connections from the balancers to the ANN were found and that all balancer cells act as sink nodes (inputs only; Figure 3–figure supplement 1). This highlights that bridge cells are indeed the sole neuronal input to the ANN circuit.

      Figure 3:

      (1) As you know, during development, the diagonally opposite cells have a shared heritage and shared functionality. Are there neuronal signatures that correspond to the rotational symmetry that we see, for example, in the position of the anal pores?

      We did not find any evidence in neuronal complement for a diagonal symmetry, suggesting that neuronal organization does not simply mirror the organism’s rotational body symmetry.

      (2) Do you have the information to say whether there are any diagonal or asymmetric connections? Can’t tell if those would have shown up in the mapping efforts or if you focused on the major ones only.

      Based on our complete mapping, we did not find evidence for a diagonal pattern. The connectivity instead shows a biradial organization.

      (3) “extending across opposite quadrant regions” - to me, opposite would be diagonally opposite, but this looks like a set of cells between Q1 and Q2 is connecting to a sister-set in Q3+Q4. I wonder if, in a more detailed view, you could see whether this is a rotational correspondence, rather than a reflection. There are some subtle hints of this in the aboral view, with some cells on the right of the blue cluster and the left of the magenta cluster.

      changed to “extending across tentacular-axis-symmetric quadrant regions” for clarity

      (4) As with Figure 2, I do not see any circles/spheres that are yellow, red, or blue! There are some traces of what appear to be other neurons that have these colors, but nothing that would suggest the localization of mitochondria.

      Thank you for pointing this out. We have corrected the caption to match the figure, as in the previous item.

      (5) The connectivity map is very cool, but the caption does not seem to correspond to the version included in the manuscript. I don’t see any hexagons; all arrows seem to have the same thickness.

      changed to: “Complete connectivity map of the gravity-sensing neural circuit. Cells belonging to the same group are shown as diamonds, and the number of cells is added to their labels. The number of synapses is shown on the arrows.”

      RESULTS: Dynamics of balancer cilia

      (1) The orientation of the stage+larvae is a bit hard to follow. Maybe say the sagittal or tentacular plane is parallel to the sample stage and the gravity vector?

      we added “Larvae were oriented with their sagittal or tentacular plane parallel to the sample stage.”

      (2) “We could simultaneously image Q1(3) and Q2(4). The meaning of the numbers in () is not clear. Either way that I try to interpret it does not match the diagrams. Should this say viewing the tentacular plane, you can image Q1 and 4 or Q2 and 3?

      Thank you for spotting this mistake, we have changed to: “In larvae with their sagittal plane facing the objective, we could compare balancer-cilia movements between Q1 vs. Q2 or Q3 vs. Q4. In other larvae oriented in the tentacular plane, we could simultaneously image Q1 and Q4 or Q2 and Q3.”

      (3) Typo: episod[e]s were excluded

      Corrected

      DISCUSSION:

      This section is quite clean. Maybe mention some future directions:

      We have added a “Future Directions” section

      (1) Do these networks change during development? Five-days-old is still quite undeveloped - what would it look like in an adult specimen? Would you expect a larger version of the same or more diverse connections?

      As far as we know from work on aboral organs in adult ctenophores, the same structures and cells can be found. We do not know how the network will develop. We know that at 5 days the balancer is fully functional and the animals can orient and their behaviour is coordinated. So the wiring may not change extensively later in development. In the 1-day-old larva, Ferraioli et al. did not distinguish ANN neurons as a separate population, as these were merged with SNNs in their dataset. This suggests that significant cellular and circuit maturation likely occurs between 1 and 5 days.

      METHODS: Imaging the Activity of Balancer Cilia

      (1) “we selected only larvae whose aboral-oral axis was oriented nearly perpendicular to the gravitational vector”. Shouldn’t this be “nearly parallel to the gravity vector” not perpendicular?

      Thank you for spotting this, corrected.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study by Luden et al. seeks to elucidate the molecular functions of AHL15, a member of the AT-HOOK MOTIF NUCLEAR LOCALIZED (AHL) protein family, whose overexpression has been shown to extend plant longevity in Arabidopsis. To address this question, the authors conducted genome-wide ChIP-sequencing analyses to identify AHL15 binding sites. They further integrated these data with RNA-sequencing and ATAC-sequencing analyses to compare directly bound AHL15 targets with genes exhibiting altered expression and chromatin accessibility upon ectopic AHL15 overexpression.

      The analyses indicate that AHL15 preferentially associates with regions near transcription start sites (TSS) and transcription end sites (TES). Notably, no clear consensus DNA-binding motif was identified, suggesting that AHL15 binding may be mediated through interactions with other regulatory factors rather than through direct sequence recognition. The authors further show that AHL15 predominantly represses its direct target genes; however, this repression appears to be largely independent of detectable changes in chromatin accessibility.

      In addition to the AHL protein family, the globular H1 domain-containing high-mobility group A (GH1-HMGA) protein family also harbors AT-hook DNA-binding domains. Recent studies have shown that GH1-HMGA proteins repress FLC, a key regulator of flowering time, by interfering with gene-loop formation. The observed enrichment of AHL15 at both TSS and TES regions, therefore, raises the intriguing possibility that AHL15 may also participate in regulating gene-loop architecture. Consistent with this idea, the authors report that several direct AHL15 target genes are known to form gene loops.

      Overall, the conclusions of this study are well supported by the presented data and provide new mechanistic insights into how AHL family proteins may regulate gene expression.

      However, it is important to note that the genome-wide analyses in this study rely predominantly on ectopic overexpression of AHL15 at developmental stages when the gene is not usually expressed. Moreover, loss-of-function phenotypes for AHL15 have not been reported, leaving unresolved whether AHL15 plays a physiological role in regulating plant longevity under native conditions. It therefore remains possible that longevity control is mediated by other AHL family members rather than by AHL15 itself. In this regard, the manuscript's title would benefit from more accurately reflecting this broader implication.

      The ahl15 loss-of-function phenotype has previously been described in Karami et al., 2020 (Nat. Plants), Rahimi et al., 2022a (New Phyt.), and Rahimi et al., 2022b (Curr. Biol.), showing that ahl15 loss-of-function among others results in accelerated vegetative phase change and flowering, a reduced number of leaves produced by axillary meristems in short day grown plants and reduced secondary growth in the inflorescence stem. The dominant-negative ahl15 delta-G allele, expressing a mutant protein lacking the conserved G motif in the PPC domain, shows these phenotypes more clearly in the heterozygous ahl15 +/- background, and is embryo lethal in the homozygous ahl15 background (Karami et al., 2021, Nature Comm.). In addition, we recently show that leaf senescence is significantly accelerated in the ahl15 loss-of-function mutant (Luden et al., 2025, BioRxiv). These results show that AHL15 is involved in several aspects of ageing in Arabidopsis, and we will adjust the introduction to discuss these previous findings more explicitly.

      I agree with reviewer 1 on the possibility that multiple AHLs could have an effect on longevity, which is partially supported by the delayed flowering time observed in the AHL20, AHL27, or AHL29 overexpression lines (Karami et al., 2020, Street et al., 2008). However, the induction of the AHL15-GR fusion alone by DEX shows a clear delay of developmental phase transitions and the aging process in general, indicating that AHL15 by itself is able to extend longevity as other AHLs are not affected by DEX treatment (proven by the fact that their expression is not significantly changed in our RNA-seq analysis of DEX-treated 35S:AHL15-GR seedlings).

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Luden et al. investigates the molecular function and DNA-binding modes of AHL15, a transcription factor with pleiotropic effects on plant development. The results contribute to our understanding of AHL15 function in development, specifically, and transcriptional regulation in plants, more broadly.

      Strengths:

      The authors developed a set of genetic tools for high-resolution profiling of AHL15 DNA binding and provided exploratory analyses of chromatin accessibility changes upon AHL15 overexpression. The generated data (CHiP-Seq, ATAC-Seq and RNA-Seq is a valuable resource for further studies. The data suggest that AHL15 does not operate as a pioneer TF, but is likely involved in gene looping.

      Weaknesses:

      While the overall message is conveyed clearly and convincingly, I see one major issue concerning motif discovery and interpretation. The authors state that because HOMER detected highly enriched motifs at frequencies below 1%, they conclude that "a true DNA binding motif would be present in a large portion of the AHL15 peaks (targets) and would be rare in other regions of the genome (background)."

      I agree that the frequency below 1% is unexpectedly low; however, this more likely reflects problems in data preprocessing or motif discovery rather than intrinsic biological properties of the transcriptional factor that possesses a DNA-binding domain and is known to bind AT_rich motifs. As it is, Figure 2 cannot serve as a main figure in the manuscript: it rather suggests that the generated CHiP-Seq peakset is dominated by noise (or motif discovery was done improperly) than that AHL15 binds nonspecifically.

      Since key methodological details on the HOMER workflow are missing in the M&M section, it is not possible to determine what went wrong. Looking at other results, i.e. the reasonably structured peak distribution around TSS/TTS and consistent overlap of the peaks between the replicas, I assume that the motif discovery step was done improperly.

      Therefore, I recommend redoing the motif analysis, for example, by restricting the search to the top-ranked peaks (e.g. TOP1000) and by using an appropriate background set (HOMER can generate good backgrounds, but it was not documented in the manuscript how the authors did it). If HOMER remains unsuccessful, the authors should consider complementary methods such as STREME or MEME, similar to the approach used for GH1-HMGA (https://pmc.ncbi.nlm.nih.gov/). If the peakset is of good quality, I would expect the analysis to identify an AT-rich motif with a frequency substantially higher than 1%-more likely in the range of at least 30%. If such a motif is detected, it should be reported clearly, ideally with positional enrichment information relative to TSS or TTS. It would also be informative to compare the recovered motif with known GH1-HMGA motifs.

      If de novo motif discovery remains inconclusive, the authors should, at a minimum, assess enrichment of known AHL binding motifs using available PWMs (e.g. from JASPAR). As it stands, the claim that "our ChIP-seq data show that AHL15 binds to AT-rich DNA throughout the Arabidopsis genome with limited sequence specificity (Figure 2A, Figure S2-S4)" is not convincingly supported.

      Another point concerns the authors' hypothesis regarding the role of AHL15 in gene looping. While I like this hypothesis and it is good to discuss it in the discussion section, the data presented are not sufficient to support the claim, stated in the abstract, that AHL15 "regulates 3D genome organization," as such a conclusion would require additional, dedicated experiments.

      The motifs discovered by HOMER are ranked by their enrichment over background, of which the highest-scoring motifs are very rare in the AHL15-bound targets, but even rarer in the background, which is why they score highly on the percent enrichment score. As expected by reviewer 2, we identified AT-rich motifs that were present in a larger percentage of AHL15 targets (found in 3-18% of targets, depending on the motif, see for example motif #5 in figure S4A), which can be seen at the right tail of the histograms shown in figures 2B-C and figures S2-S4B-C. However, these motifs were also common in the background and were therefore not considered as significantly enriched in the AHL15-bound regions, with a target:background ratio of <2. As most of these motifs were flagged by HOMER as possible false-positives, and to limit the size of the (supplemental) figures, we did not show each of the motifs identified by HOMER in table form. We can include the full tables of de novo motifs identified by HOMER, including possible false-positive results for clarification.

      Although the identification of AT-rich motifs shows that AHL15 (and very likely most other AHL proteins as well) binds AT-rich regions, it does not sufficiently explain the binding of AHL15 to its target genes, as these motifs are found at almost equal frequencies in non-AHL15-bound regions.  In addition, a sequence found at this frequency in the genomic background is, in our view, too unspecific to be considered as a transcription factor binding site. Based on this, we concluded that AHL15 lacks a specific binding motif that can define the genes it binds.

      We will update the methods section to include more details on the HOMER analysis, and will also run the analysis in the top1000 shared peaks as suggested by reviewer 2.

      Reviewer #3 (Public review):

      Summary:

      This study investigated the role of AHL15 in the regulation of gene expression using AHL15 overexpression lines. Their results do show that more genes are downregulated when AHL15 is upregulated, and its binding does not affect the chromatin accessibility. Further, they investigated AHL15 binds in regions depleted in histone modifications and other epigenetic signatures. Subsequently, they investigated the presence of AHL15 in the gene chromatin loops. They found overlaps with both upregulated and downregulated genes. The methods are appropriately described, but could be improved to include the analysis of self-looping gene boundaries.

      Strengths:

      Their study clearly showed a lack of any specific sequence enrichment in the AHL15 binding sites, other than these being AT-rich, suggesting that AHL proteins do not recognize a specific DNA sequence but are recruited to their AT-rich target sites in another way. The study does suggest significant enrichment of AHL15 binding sites at TSS and TES, and AHL15 sites are depleted of any histone marks. They also identified that AHL15 binding sites overlap with self-looping gene boundaries.

      Weaknesses:

      The claim that AHL15 acts as a repressor and genes regulated by it are downregulated needs to be investigated based on AHL15 binding sites, to show enrichment/ depletion of AHL15 binding sites in overexpressing genes and repressed genes. The authors should provide data to support plant longevity with AHL15 overexpression using the DEX-induced system to support the claims in the title. Calculation of the enrichment score of AHL15 peaks in the self-looping genes that are upregulated or downregulated, and discussion about the different effects of AHL15 binding on self-looping regions to regulate gene expression may be helpful to understand the significance of the study. Motif enrichment in upregulated and downregulated genes separately to identify binding sequence preferences may be useful. It is not clear how the overlap of AHL15 peaks with self-looping genes has been carried out.

      A metagenome plot of AHL15 binding around genes that are differentially expressed upon DEX treatment can be found in Figure 3F. This analysis shows that AHL15 binding near differentially expressed genes is more pronounced compared to all AHL15-bound genes, and that AHL15 binding near the TSS is especially enriched for upregulated genes.

      As also suggested by reviewer 2, we will run a motif enrichment analysis on the differentially expressed genes that are bound by AHL15 to see if any motifs are enriched compared to the background and overrepresented in the AHL15-bound genes.

      Plant longevity in 35S:AHL15-GR plants treated with DEX has been shown by Karami et al. (2020; Nature Plants). DEX treatment extended vegetative development after flowering in Arabidopsis and tobacco, enhanced overall biomass in Arabidopsis and tobacco, re-initiation of vegetative growth in senescent tobacco) and recently we showed that it delays leaf senescence in Arabidopsis (Luden et al., 2025, bioRxiv). All these observations will be discussed in more detail in the text. In addition, we show that 35S:AHL15-GR plants treated a single time with DEX at 10 days after germination show a significantly delayed flowering time in figure 4C-D of this manuscript.

      The enrichment of AHL15 ChIP-seq peaks in self-looping genes will be analyzed as suggested and compared to a random set of genes as a control, and the methods section will be updated to clarify how the analyses on self-looping genes were carried out.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      The authors present exciting new experimental data on the antigenic recognition of 78 H3N2 strains (from the beginning of the 2023 Northern Hemisphere season) against a set of 150 serum samples. The authors compare protection profiles of individual sera and find that the antigenic effect of amino acid substitutions at specific sites depends on the immune class of the sera, differentiating between children and adults. Person-to-person heterogeneity in the measured titers is strong, specifically in the group of children's sera. The authors find that the fraction of sera with low titers correlates with the inferred growth rate using maximum likelihood regression (MLR), a correlation that does not hold for pooled sera. The authors then measure the protection profile of the sera against historical vaccine strains and find that it can be explained by birth cohort for children. Finally, the authors present data comparing pre- and post- vaccination protection profiles for 39 (USA) and 8 (Australia) adults. The data shows a cohort-specific vaccination effect as measured by the average titer increase, and also a virus-specific vaccination effect for the historical vaccine strains. The generated data is shared by the authors and they also note that these methods can be applied to inform the bi-annual vaccine composition meetings, which could be highly valuable.

      We appreciate the reviewer’s clear summary of our work.

      Thanks to the authors for the revised version of the manuscript. A few concerns remain after the revision:

      (1) We appreciate the additional computational analysis the authors have performed on normalizing the titers with the geometric mean titer for each individual, as shown in the new Supplemental Figure 6. We agree with the authors statement that, after averaging again within specific age groups, "there are no obvious age group-specific patterns." A discussion of this should be added to the revised manuscript, for example in the section "Pooled sera fail to capture the heterogeneity of individual sera," referring to the new Supplemental Figure 6.

      However, we also suggested that after this normalization, patterns might emerge that are not necessarily defined by birth cohort. This possibility remains unexplored and could provide an interesting addition to support potential effects of substitutions at sites 145 and 275/276 in individuals with specific titer profiles, which as stated above do not necessarily follow birth cohort patterns.

      The reviewer is correct that there remains heterogeneity among the serum titers to different strains that we cannot easily explain via age group, and suggests that additional patterns could emerge. We certainly agree that explaining this heterogeneity remains an interesting goal, but as described in the manuscript we have analyzed the possible causes of the heterogeneity as exhaustively as possible given the available metadata. At this point, the most we can say is that the strain-specific neutralization titers are highly heterogeneous in a way that cannot be completely explained by birth cohort. We agree that further analysis of the cause is an area for future work, and have made all of our data available so that others can continue to explore additional hypotheses. It may be that these questions can only be answered by experiments on sera from newer cohorts where more detailed metadata on infection and vaccination history are available.

      (2) Thank you for elaborating further on the method used to estimate growth rates in your reply to the reviewers. To clarify: the reason that we infer from Fig. 5a that A/Massachusetts has a higher fitness than A/Sydney is not because it reaches a higher maximum frequency, but because it seems to have a higher slope. The discrepancy between this plot and the MLR inferred fitness could be clarified by plotting the frequency trajectories on a log-scale.

      For the MLR, we understand that the initial frequency matters in assessing a variant's growth. However, when starting points of two clades differ in time (i.e., in different contexts of competing clades), this affects comparability, particularly between A/Massachusetts and A/Ontario, as well as for other strains. We still think that mentioning these time-dependent effects, which are not captured by the MLR analysis, would be appropriate. To support this, it could be helpful to include the MLR fits as an appendix figure, showing the different starting and/or time points used.

      Multinomial logistic regression is a widely used technique to estimate viral growth rates from sequencing counts (PLoS Computational Biology, 20:e1012443; Nature, 597:703-708; Science, 376:1327-1332). As the reviewer points out, it does assume that the relative viral growth rates are constant over the time period analyzed. However, most of the patterns mentioned by the reviewer are not deviations from this assumption, but rather just due to the fact that frequencies are plotted on a linear scale. More specifically, our multinomial logistic regression implementation defines two parameters per variant: the initial frequency and the growth rate. The absolute variant growth rate is effectively the slope of the logit-transformed variant frequencies. Each variant's relative fitness depends on that variant's growth rate relative to a predefined baseline variant. Plotting frequencies on a logit scale does help emphasize the importance of the slope by showing exponential growth as a linear trajectory. We have added a new Supplemental Figure 9 that plots the frequencies from Figure 5A on a logit scale. As can be seen the frequency trajectories are closer to linear on the logit scale.

      We have updated the results text to clarify the nature of the fixed relative growth rates per strain and to refer to this new supplemental figure as follows:

      To estimate the evolutionary success of different human H3N2 influenza strains during 2023, we used multinomial logistic regression, which uses sequence counts to estimate fixed strain growth rates relative to a baseline strain for the entire analysis time period (in this case, 2023) [50–52]. Relative growth rates estimated by multinomial logistic regression represent relative fitnesses of strains over that time period. There were sufficient sequencing counts to reliably estimate growth rates in 2023 for 12 of the HAs for which we measured titers using our sequencing-based neutralization assay libraries (Figure 5a,b and Supplemental Figure 9). We estimated strain growth rates relative to the baseline strain of A/Massachusetts/18/2022. Note that these growth rates estimate how rapidly each strain grows relative to the baseline strain, rather than the absolute highest frequency reached by each strain. Each strain’s absolute growth rate corresponds to the slope of the strain’s logit-transformed frequencies at the end of the analysis time period (Supplemental Figure 9).

      As the reviewer notes, the multinomial logistic regression implementation assumes a fixed growth rate for each strain over the time period being analyzed. This limitation causes the inferred growth rates to emphasize the latest trends in the analysis time period. For example, at the end of December 2023 in Figure 5A, the A/Ontario/RV00796/2023 strain is growing rapidly and replacing all other variants. Correspondingly, the multinomial logistic regression infers a high growth rate for that Ontario strain relative to the A/Massachusetts/18/2022 baseline strain. However, the A/Massachusetts/18/2022 strain was growing relative to other strains in the first half of 2023 since it has a higher growth rate than they do. However, there are modest deviations from linearity on the logit scale shown in the added supplementary figure likely because the assumption of a fixed set of relative growth rates over the analyzed time period is an approximation.

      We have added the following text to the discussion to highlight this limitation of the multinomial logistic regression:

      Our comparisons of the neutralization titers to the growth rates of different H3N2 strains was limited by the fact that only a modest number of strains had adequate sequence data to estimate their growth rates. Strains with more sequencing counts tend to be those with moderate-to-high fitness, which therefore limited the dynamic range of growth rates across strains we were able to analyze. Relatedly, the multinomial logistic regression infers a single fixed growth rate per strain for the entire analysis time period of 2023, and cannot represent changes in relative fitness of strains over that relatively short time period. Additionally, because the strains for which we estimated growth rates are phylogenetically related it is difficult to assess the statistical significance of the correlation [53], so it will be important for future work to reassess the correlations with new neutralization data against the dominant strains in future years.

      (3) Regarding my previous suggestion to test an older vaccine strain than A/Texas/50/2012 to assess whether the observed peak in titer measurements is virus-specific: We understand that the authors want to focus the scope of this paper on the relative fitness of contemporary strains, and that this additional experimental effort would go beyond the main objectives outlined in this manuscript. However, the authors explicitly note that "Adults across age groups also have their highest titers to the oldest vaccine strain tested, consistent with the fact that these adults were first imprinted by exposure to an older strain." This statement gives the impression that imprinting effects increase titers for older strains, whereas this does not seem to be true from their results, but only true for A/Texas. It should be modified accordingly.

      We agree with the reviewer’s suggestion that the specific language describing the potential trend of adults having the highest titers to the oldest strain tested could be further caveated. To this end, we have made the following edits to the portion of the main text that they highlighted:

      Adults across age groups also have their highest titers to the oldest vaccine strain tested (Figure 6), consistent with the fact that these adults were likely first imprinted by exposure to an older strain more antigenically similar to A/Texas/50/2012 (the oldest strain tested here) than more recent strains. Note that a similar trend towards adult sera having higher titers to older vaccine strains was also observed in a more recent study we have performed using the same methodology described here [60].

      Notably, this trend of adults across age groups having the highest titers to the oldest vaccine strains tested has held true in subsequent work we’ve performed with H1N1 viruses (Kikawa et al., 2025 Virus Evolution, DOI: https://doi.org/10.1093/ve/veaf086). In that more recent study, we again saw that adults (cohorts EPIHK, NIID, and UWMC) tended to have their highest titers to the oldest cell-passaged strain tested (A/California/07/2009), whereas children (cohort SCH) had more similar neutralization titers across strains.  These additional data therefore support the idea that adults tend to have their highest titers to older vaccine strains, a finding that is also consistent with substantial prior work (eg, Science, 346:996-1000).

      Reviewer #2 (Public review):

      This is an excellent paper. The ability to measure the immune response to multiple viruses in parallel is a major advancement for the field, that will be relevant across pathogens (assuming the assay can be appropriately adapted). I only had a few comments, focused on maximising the information provided by the sera. These concerns were all addressed in the revised paper.

      We thank this reviewer for the summary of our work and their helpful comments in the first revision.

      Reviewer #3 (Public review):

      The authors use high throughput neutralisation data to explore how different summary statistics for population immune responses relate to strain success, as measured by growth rate during the 2023 season. The question of how serological measurements relate to epidemic growth is an important one, and I thought the authors present a thoughtful analysis tackling this question, with some clear figures. In particular, they found that stratifying the population based on the magnitude of their antibody titres correlates more with strain growth than using measurements derived from pooled serum data. The updated manuscript has a stronger motivation, and there is substantial potential to build on this work in future research.

      Comments on revisions:

      I have no additional recommendations. There are several areas where the work could be further developed, which were not addressed in detail in the responses, but given this is a strong manuscript as it stands, it is fine that these aspects are for consideration only at this point.

      We appreciate this reviewer’s summary of our work, and we are glad they feel the motivation is stronger in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      Summary: 

      Overall, this is a well-designed and carefully executed study that delivers clear and actionable guidance on the sample size and representative demographic requirements for robust normative modelling in neuroimaging. The central claims are convincingly supported. 

      Strengths: 

      The study has multiple strengths. First, it offers a comprehensive and methodologically rigorous analysis of sample size and age distribution, supported by multiple complementary fit indices. Second, the learning-curve results are compelling and reproducible and will be of immediate utility to researchers planning normative modelling projects. Third, the study includes both replication in an independent dataset and an adaptive transfer analysis from UK Biobank, highlighting both the robustness of the results and the practical advantages of transfer learning for smaller clinical cohorts. Finally, the clinical validation ties the methodological work back to clinical application.  

      We are grateful for the reviewer’s positive overall evaluation and for the constructive feedback, which has helped us refine and clarify the manuscript.

      Weaknesses: 

      There are two minor points for consideration: 

      (1) Calibration of percentile estimates could be shown for the main evaluation (similar to that done in Figure 4E). Because the clinical utility of normative models often hinges on identifying individuals outside the 5th or 95th percentiles, readers would benefit from visual overlays of model-derived percentile curves on the curves from the full training data and simple reporting of the proportion of healthy controls falling outside these bounds for the main analyses (i.e., 2.1. Model fit evaluation). 

      We thank the reviewer for this helpful point. To address this, we implemented two complementary analyses that evaluate the accuracy of percentile estimates in the main evaluation (Section 2.1, Model fit evaluation).

      (a) Percentage of healthy controls (HC) outside the extreme centiles (added to the main figure)

      For each sampling strategy and sample size, we now report the proportion of healthy controls falling outside the predicted 2.5th and 97.5th percentiles, to remain consistent with the 1.96 threshold used throughout the study. Under perfect calibration, this proportion should be close to 2.5%. This metric was computed for every ROI, model run, sample size, and sampling condition. The results are now shown in the main model-fit figure alongside MSLL, EV, Rho, SMSE, and ICC, and the corresponding statistics have been added throughout. This directly quantifies how well the centile estimates capture tail behavior, which is essential for the clinical interpretation of normative deviations. See the added plots to Figure 2 and Figure 3 (see also Table 2-3 in the revised main manuscript and replication in AIBL and transfer leaning experiments in Supplementary Materials Figure S1, S10-11, S18-19, S2829, Table S1-2, S5-6, S9-10). 

      (b) Centile curve overlays (added to the Supplementary Figures)

      To visually demonstrate calibration, we now include additional overlays of model-derived percentile curves against those obtained using the full training set. These are shown for key ROIs, multiple sample sizes and different sampling strategies in Supplementary Materials (Figure S9 and S27). These overlays illustrate where centile estimation diverges, particularly at age extremes. 

      Together, these additions provide both quantitative and qualitative evidence of percentile calibration across sampling regimes and sample sizes.

      (2) The larger negative effect of left-skewed sampling likely reflects a mismatch between the younger training set and the older test set; accounting explicitly for this mismatch would make the conclusions more generalizable. 

      We agree with the reviewer that the large negative effect of left-skewed training reflects a mismatch between the training and test age distributions. 

      To characterize the expected age distributions produced by each sampling strategy, we simulated the procedures used in the main analyses by repeatedly drawing training samples under all sampling conditions (representative, left-skewed, right-skewed, and the predefined sex-ratio settings). Simulations were performed at a fixed sample size (n = 200), generating 1000 samples per condition, and the resulting age distributions were summarized separately for males and females (Supplementary Materials section 5.1). These simulated distributions show that left-skewed sampling produces a more pronounced shift toward younger ages than the corresponding shift toward older ages under rightskewed sampling, particularly in OASIS-3, with smaller differences observed in AIBL (Tables S14– S15).

      To further quantify how these sampling-induced age profiles align with the empirical age structure of the test cohorts, we computed an age-bin coverage metric based on distribution intersection. Age was discretized into 20 quantile-based bins using the full training set of each dataset (OASIS-3 and AIBL) as reference.

      For each sampling strategy (Representative, Left-skewed, Right-skewed), sample size, and dataset, we generated 1000 independent training samples using the same sampling procedures as in the main analyses. For each sampled training set, age-bin count distributions were computed and compared to the corresponding HC test-set age-bin counts.

      Coverage was defined as:

      where, 𝑖 indexes age bins, 𝑛<sub>train</sub> and 𝑛<sub>test</sub> are the numbers of individuals in bin i in the sampled training set and HC test set, respectively. This metric quantifies the fraction of the test-set age distribution that is “covered” by the sampled training set and ranges from 0 (no test-set ages covered) to 1 (complete coverage of the test-set age distribution). For each condition, the mean and standard deviation of the coverage across repetitions were computed.

      We show that under left-skewed sampling, age coverage remains markedly reduced across all sample sizes in OASIS-3 in comparison with AIBL dataset (see Figures S37). This suggests that the poorer performance observed with left-skewed training may stem from a reduced coverage of the test age range. We added the following in the Discussion (page 27):

      “The left-skewed sampling had overall a greater effect than right-skewed sampling in both model evaluation and clinical validation, likely due to (1) the dataset’s original bias toward older individuals, making younger-skewed samples less representative, and (2) the older age structure of the AD population, which exacerbates mismatch when younger HC are used to calibrate models in the clinical population. This asymmetry is also reflected in the coverage analysis, where left-skewed sampling resulted in poorer age coverage of the target population at the same sample size (Supplementary Materials section 5.4.)”

      Reviewer #2:

      Summary: 

      The authors test how sample size and demographic balance of reference cohorts affect the reliability of normative models in ageing and Alzheimer's disease. Using OASIS-3 and replicating in AIBL, they change age and sex distributions and number of samples and show that age alignment is more important than overall sample size. They also demonstrate that models adapted from a large dataset (UK Biobank) can achieve stable performance with fewer samples. The results suggest that moderately sized but demographically well-balanced cohorts can provide robust performance. 

      Strengths: 

      The study is thorough and systematic, varying sample size, age, and sex distributions in a controlled way. Results are replicated in two independent datasets with relatively large sample sizes, thereby strengthening confidence in the findings. The analyses are clearly presented and use widely applied evaluation metrics. Clinical validation (outlier detection, classification) adds relevance beyond technical benchmarks. The comparison between within-cohort training and adaptation from a large dataset is valuable for real-world applications. 

      The work convincingly shows that age alignment is crucial and that adapted models can reach good performance with fewer samples. However, some dataset-specific patterns (noted above) should be acknowledged more directly, and the practical guidance could be sharper. 

      We are grateful for the reviewer’s positive overall evaluation and for the constructive comments that guided our revisions strengthened the manuscript.

      Weaknesses: 

      The paper uses a simple regression framework, which is understandable for scalability, but limits generalization to multi-site settings where a hierarchical approach could better account for site differences. This limitation is acknowledged; a brief sensitivity analysis (or a clearer discussion) would help readers weigh trade-offs. 

      We thank the reviewer for this insightful point. We agree that hierarchical Bayesian regression provides clear advantages in multi-site settings, particularly when site-level variability is substantial or when federated learning is required. In our case, both OASIS-3 and AIBL include only a small number of sites, and the primary aim of the study was to isolate the effects of sample size and covariate composition rather than to model site-related structure. For these reasons, implementing HBR was beyond the scope of the present work, but we fully acknowledge its relevance for studies with larger or more heterogeneous site configurations. To clarify this distinction, we added a dedicated paragraph in the Discussion (page 28) that situates warped BLR and HBR within different data scenarios and outlines the circumstances under which each approach is preferable.

      “From a methodological perspective, the choice between warped BLR and HBR should primarily be guided by the structure of site effects and by computational constraints. HBR explicitly models sitelevel variation through hierarchical random effects, enabling information sharing across sites and supporting federated-learning implementations in which site-specific updates can be combined without sharing raw data (Bayer et al., 2022; Kia et al., 2021; Maccioni et al., 2025). This structure provides more stable estimates when site-specific sample sizes are small or acquisition differences are substantial. In contrast, wrapped BLR treats site as a fixed-effect covariate when site adjustment is required and does not implement hierarchical pooling, but offers simpler inference and substantially lower computational cost while accommodating non-Gaussian data distributions through the warping transformation (C. J. Fraza et al., 2021). These properties make wrapped BLR practical in settings where site heterogeneity is limited or adequately controlled, whereas HBR may be preferable in strongly multisite contexts or when federated learning is required for privacy-preserving data integration.”

      Other than that, there are some points that are not fully explained in the paper: 

      (1) The replication in AIBL does not fully match the OASIS results. In AIBL, left-skewed age sampling converges with other strategies as sample size grows, unlike in OASIS. This suggests that skew effects depend on where variability lies across the age span. 

      Recommendation: Replication differences across datasets (age skew): 

      In OASIS, left-skewed (younger-heavy) training harms performance and does not fully recover with more data; in AIBL, performance under left-skew appears to converge toward the other conditions as training size grows. Given AIBL's smaller size and older age range, please explain this discrepancy. Does this imply that the effect of skew depends on where biological variability is highest across the age span (e.g., more variability from ~45-60 in OASIS vs {greater than or equal to}60 in AIBL), rather than on "skew" per se? If so, the paper should say explicitly that skewness must be interpreted relative to the age-variability profile of the target population, not just counts. 

      We thank the reviewer for this thoughtful comment. To examine whether differences in age-related variability could explain the replication patterns, we quantified how regional variance changed with age by computing age-binned variance profiles in the HC training sets of OASIS-3 and AIBL. Age was discretized into 10 quantile-based bins for each dataset separately. For each ROI and each age bin, we calculated the sample variance of the ROI values within that bin. The bin center was defined as the mean age of individuals in the corresponding bin. We then summarized variance across ROIs by computing, for each age bin, the median variance and its interquartile range (25th–75th percentile). These summary profiles (median and IQR across ROIs as a function of bin-centered age) are shown in Author response image 1. As shown in this plot, OASIS-3 and AIBL display comparable levels of variance across their respective age ranges, and the profiles do not suggest pronounced shifts in variability that would account for the divergent behavior of the left-skewed models.

      Author response image 1.

      Median ROI variance across age bins for OASIS-3 and AIBL. Shaded areas represent variability across regions within each age bin.

      Instead, the coverage analysis recommended by the reviewer in comment #5 and introduced in our response to Reviewer 1, comment #2 indicates that the replication differences between OASIS-3 and AIBL are primarily driven by the age coverage of the sampled training sets relative to the test cohorts. In AIBL, which has a narrower and predominantly older age range, left-skewed sampling shows slightly lower coverage than right-skewed sampling, but coverage increases steadily with sample size, and the strategies converge as n grows. In contrast, OASIS-3 spans a broader lifespan and is itself skewed toward older ages; under left-skewed sampling, coverage of the test-set age range increases more slowly and remains comparatively lower even at large n. This slower recovery of age coverage explains why leftskewed performance does not recover in OASIS-3 and why the discrepancies between left- and rightskewed sampling are more pronounced in this dataset. The corresponding age-coverage curves are reported in Supplementary Figures S37. 

      Furthermore, this difference is also reflected in the expected age distributions obtained from repeated simulations of the sampling procedures (Supplementary Materials section 5.1. Tables S14–S15), where left-skewed sampling induces a larger shift toward younger ages than right-skewed sampling induces toward older ages, especially in OASIS-3, with smaller differences observed in AIBL. 

      For more details on both analyses see also our response to Reviewer 1, comment #2.

      (2) Sex imbalance effects are difficult to interpret, since sex is included only as a fixed effect, and residual age differences may drive some errors. 

      Recommendation: Sex effects may be confounded with age:

      Because sex is treated only as a fixed effect, it is unclear whether errors under sex-imbalance scenarios partly reflect residual age differences between female and male subsets. Please report (or control for) age distributions within each sex-imbalance condition, and clarify whether the observed error changes are truly attributable to sex composition rather than age composition. 

      To address the concern that sex-imbalance effects could be driven by residual age differences we now explicitly report the age distributions by sex for the original training and test datasets, as well as the expected age distributions induced by each sampling condition, obtained by repeated simulation of the sampling procedure (Supplementary Materials section 5.1, Tables S13-15). Table S13 shows very similar distributions of age for HC train and test sets across sexes within each dataset. Tables S14–S15 further show that, within each sampling strategy, the age distributions of females and males are highly similar, including under sex-imbalanced conditions. These summaries confirm that the sampling procedures do not introduce systematic age-structure differences between sexes.

      In addition, we extended the statistical models for tOC and MSE to explicitly include age, sex, and all higher-order interactions with the diagnosis, sample size, and sex-ratio sampling (Supplementary Materials section 5.2., Tables S17 for direct training, and S19 for transferred models). For completion we also included age and sex for age samplings models (Supplementary Tables S16 for direct training, S18 for transferred models). These analyses revealed no significant main effects of age under seximbalanced sampling and only very small effect sizes in isolated higher-order interactions. Together, these results indicate that age did not introduce residual confounding in our analyses.

      We now report in the Results section (page 15) the following: 

      “Supplementary analysis (Tables S17,19) also showed that main effect of age was not significant for either MSE or tOC, and no significant age × sex-ratio interactions were observed. While some higherorder interactions involving age, diagnosis, and sex-ratio reached statistical significance, all associated effect sizes were very small and inconsistent across outcomes, indicating that the observed error changes are not driven by residual age confounding.”

      And in the Methods section (page 36): 

      “Age distributions were summarized separately for males and females in the original training and test sets (Supplementary Table S13) and the expected age distributions resulting from the skewed-age sampling and the sex-imbalance sampling procedures were obtained by repeated simulations at a fixed sample size and are reported in Supplementary Tables S14–S15.”

      (3) In Figure 3, performance drops around n≈300 across conditions. This consistent pattern raises the question of sensitivity to individual samples or sub-sampling strategy. 

      Recommendation: Instability around n ≈ 300 (Figure 3):

      Several panels show a consistent dip in performance near n=300. What drives this? Is the model sensitive to particular individuals being included/excluded at that size, or does it reflect an interaction with the binning/selection scheme? A brief ablation (e.g., alternative sub-sampling seeds or bins) would help rule out artefacts. 

      We thank the reviewer for highlighting this point. To assess whether the observed dip at n=300 reflected sensitivity to the specific individuals selected or to the sub-sampling scheme, we re-ran the analysis at n = 300 using 20 independent random seeds (Supplementary Materials sections 5.3.). This ablation showed no systematic decrease in performance across repetitions, indicating that the original effect was driven by stochastic sampling variability rather than a stable model instability or binning interaction. We now report this control analysis in the Supplementary Materials (Figure S36). We have clarified this point in the Results page 10:

      “A consistent dip in performance was observed around n = 300 for the left-skewed sampling condition in the original analysis (Figure 3). To assess whether this reflected sensitivity to the specific subsampling or stochastic sampling variability, we repeated the analysis for this specific sample using 20 independent random seeds (Figure S36); the absence of a consistent effect across repetitions indicates that the original pattern was driven by sampling variability rather than a systematic model artifact.”

      (4) The total outlier count (tOC) analysis is interesting but hard to generalize. For example, in AIBL, left-skew sometimes performs slightly better despite a weaker model fit. Clearer guidance on how to weigh model fit versus outlier detection would strengthen the practical message. 

      Recommendation: Interpreting total outlier count (tOC): 

      The tOC findings are interesting but hard to operationalize. In AIBL, even for n>40, left-skewed training sometimes yields slightly better tOC discrimination and other strategies plateau. Does this mean that a better model fit on the reference cohort does not necessarily produce better outlier-based case separation? Please add a short practical rule-set: e.g., when optimizing for deviation mapping/outlier detection, prioritize coverage of the patient-relevant age band over global fit metrics; report both fit and tOC sensitivity to training-set age coverage. 

      We thank the reviewer for this important point. Apparent improvements in tOC-based separation under left-skewed training should not be interpreted as indicating a better model or superior deviation mapping. In particular, in AIBL, left-skew can sometimes yield slightly larger group differences in tOC despite weaker overall model fit. This reflects an inflation of deviation magnitude in AD rather than improved separation per se. Crucially, relative ranking between HC and AD remains preserved across sampling strategies, as shown by the classification analysis in the main manuscript (Figure 5C), indicating that enhanced tOC contrast under left-skew does not translate into improved case discrimination. Instead, it reflects a systematic shift in deviation scale due to age-mismatched training.

      We now clarify this distinction in the Discussion of the main manuscript on page 26:

      “Importantly, apparent increases in HC–AD separation in total outlier count should not be interpreted as evidence of superior model quality. Age-mismatched training can rescale deviation magnitudes and inflate tOC in specific subgroups without improving true case–control separability, as shown by classification task (Figure 5C). Model fit metrics and outlier-based measures, therefore capture complementary but distinct aspects of normative model behavior and should be interpreted jointly rather than in isolation.”

      (5) The suggested plateau at n≈200 seems context dependent. It may be better to frame sample size targets in relation to coverage across age bins rather than as an absolute number. 

      Recommendation: "n≈200" as a plateau is context-dependent: 

      The suggested threshold for stable fits (about 200 people) likely depends on how variable the brain features are across the covered ages. Rather than an absolute number, consider reporting a coverageaware target, such as a minimum per-age-bin coverage or an effective sample size relative to the age range. This would make the guidance transferable to cohorts with different age spans. 

      We agree that the observed performance plateau around n≈200 is context dependent and may shift with the covered age range, anatomical variability, and feature of interest. In the present study, this stabilization was evaluated within the specific datasets and age spans considered and extending it to broader lifespan or different biological contexts will require dedicated future work.

      To clarify this point, we added an explicit age-coverage analysis in the Supplementary Materials (section 5.4.) as introduced in response to reviewer 1 on comment #2. This analysis shows that, under representative sampling, the point at which age coverage becomes complete closely coincides with the saturation of model fit and stability metrics. At the same time, we note that normative models operate in continuous covariate space, such that reliable interpolation can still be achieved even when intermediate age ranges are less densely sampled, provided that surrounding age ranges are sufficiently represented. This makes rigid minimum per-bin requirements difficult to define in a generalizable way.

      Rather than proposing a universal sample-size threshold, we now emphasize that both learning-curve analyses and age-coverage assessments offer a more transferable way to identify when performance approaches saturation for a given dataset. This clarification is now included in the Discussion on page 25:

      “This is further supported by the coverage analysis reported in the Supplementary Materials (section 5.4), which shows that under representative sampling, the point of full age coverage closely coincides with the saturation of model fit and stability metrics. Rather than proposing a universal sample size threshold, we therefore encourage readers to perform learning-curve analyses, complemented by age coverage assessments, in their own datasets to empirically assess when performance approaches saturation for their specific age range and population.”

      And we also address it in the limitations page 29: 

      “In addition, the observed stabilization of model performance around 200–300 participants was evaluated within the specific age ranges and cohorts examined here and may shift in broader lifespan settings or in populations with different sources of biological variability.”

      (5) Minor inconsistency in training-set size: 

      The manuscript mentions 691 in Methods, but the figures/scripts label is 692. Please correct for consistency. 

      Thank you for pointing out this inconsistency, the error in the methods section has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study provides insights into the role of Pten mutations in SHH-medulloblastoma, by using mouse models to resolve the effects of heterozygous vs homozygous mutations on proliferation and cell death throughout tumorigenesis. The experiments presented are convincing, with rigorous quantifications and orthogonal experimentation provided throughout, and the models employing sporadic oncogene induction, rather than EGL-wide genetic modifications, represent an advancement in experimental design. However, the study remains incomplete, such that the biological conclusions do not extend greatly from those in the extant literature; this could be addressed with additional experimentation focused on cell cycle kinetic changes at early stages, as well as greater characterization of macrophage phenotypes (e.g., microglia vs circulating monocytes). The work will be of interest to medical biologists studying general cancer mechanisms, as the function of Pten may be similar across tumor types.

      We appreciate the summary of the importance of our work and agree that it provides a foundation for future experiments addressing underlying mechanisms including the role of macrophages in tumor progression/regression

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper investigates how Pten loss influences the development of medulloblastoma using mouse models of Shh-driven MB. Previous studies have shown that Pten heterozygosity can accelerate tumorigenesis in models where the entire GNP compartment has MB-promoting mutations, raising questions about how Pten levels and context interact, especially when cancer-causing mutations are more sporadic. Here, the authors create an allelic series combining sporadic, cell-autonomous induction of SmoM2 with Pten loss in granule neuron progenitors. In their models, Pten heterozygosity does not significantly impact tumor development, whereas complete Pten loss accelerates tumour onset. Notably, Pten-deficient tumours accumulate differentiated cells, reduced cell death, and decreased macrophage infiltration. At early stages, before tumour establishment, they observe EGL hyperplasia and more pre-tumour cells in S phase, leading them to suggest that Pten loss initially drives proliferation but later shifts towards differentiation and accumulation of death-resistant, postmitotic cells. Overall, this is a well-executed and technically elegant study that confirms and extends earlier findings with more refined models. The phenotyping is strong, but the mechanistic insight is limited, especially with respect to dosage effects and macrophage biology.

      Strengths:

      The work is carefully executed, and the models-using sporadic oncogene induction rather than EGL-wide genetic manipulations-represent an advance in experimental design. The deeper phenotyping, including singlecell RNA-seq and target validation, adds rigor.

      Weaknesses:

      The biological conclusions largely confirm findings from previous studies (Castellino et al, 2010; Metcalf et al, 2013), showing that germline or conditional Pten heterozygosity accelerates tumorigenesis, generates tumors with a very similar phenotype, including abundant postmitotic cells, and reduced cell death.

      We respectfully would like to point out that we have added new insights not covered in the previous more abbreviated studies. First, we are the first to show that in a sporadic model, heterozygous loss of Pten does not lead to accelerated or more aggressive disease. This is an important finding, since this is the case for many patients and only germline PTEN mutant humans are likely to have more aggressive tumors. Also, the previous studies did not examine tumor progress by analyzing neonatal stages or analyze spinal cord metastasis. We found a different phenotype at some early stages then at end stage, thus they provide new insights. Our study also is the only one to apply a mosaic analysis to study cell behaviors at early stages of progression, including proliferation and differentiation/survival. We are also the first to demonstrate a reduction in macrophages in Pten mutant SHH-MB.

      The second stated goal - to understand why Pten dosage might matter - remains underdeveloped. The difference between earlier models using EGL-wide SmoA1 or Ptch loss versus sporadic cell-autonomous SmoM2 induction and Pten loss in this study could reflect model-specific effects or non-cell-autonomous contributions from Pten-deficient neighbouring cells in the EGL, for example. However, the study does not explore these possibilities. For instance, examining germline Pten loss in the sporadic SmoM2 context could have provided insight into whether dosage effects are cell-autonomous or dependent on the context.

      We thank the reviewer for suggesting this experiment and agree it would be an informative one for other groups to perform as a follow up to our work to allow a direct comparison in the same sporadic SHH-MB model of mosaic vs germline loss of Pten. Also, we would like to point out that we do show a dosage effect of lowering vs removing Pten when only sporadic GCPs also have an activating mutation in SMO. Please see above comments for additional new mechanistic insight we have provided.

      The observations on macrophages are intriguing but preliminary. The reduction in Iba1+ cells could reflect changes in microglia, barrier-associated macrophages, or infiltrating peripheral macrophages, but these populations are not distinguished. Moreover, the functional relevance of these immune changes for tumor initiation or progression remains unexplored.

      We agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting.

      Reviewer #2 (Public review):

      The authors sought to answer several questions about the role of the tumor suppressor PTEN in SHHmedulloblastoma formation. Namely, whether Pten loss increases metastasis, understanding why Pten loss accelerates tumor growth, and the effect of single-copy vs double-copy loss on tumorigenesis. Using an elegant mouse model, the authors found that Pten mutations do not increase metastasis in a SmoD2-driven SHH-medulloblastoma mouse model, based on extensive characterization of the presence of spinal cord metastases. Upon examining the cellular phenotype of Pten-null tumors in the cerebellum, the authors made the interesting and puzzling observation that Pten loss increased the differentiation state of the tumor, with fewer cycling cells, seemingly in contrast to the higher penetrance and decreased latency of tumor growth.

      The authors then examined the rate of cell death in the tumor. Interestingly, Pten-null tumors had fewer dying cells, as assessed by TUNEL. In addition, the tumors expressed differentiation markers NeuN and SyP, which are rare in SHH-MB mouse models. This reduction in dying cells is also evident at earlier stages of tumor growth. By looking shortly after Pten-loss induction, the authors found that Pten loss had an immediate impact on increasing the proliferative state of GCPs, followed by enhancing the survival of differentiated cells. These two pro-tumor features together account for the increased penetrance and decreased latency of the model. While heterozygous loss of Pten also promoted proliferation, it did not protect against cell death.

      Interestingly, loss of Pten alone in GCPs caused an increase in cerebellar size throughout development. The authors suggest that Pten normally constrains GCP proliferation, although they did not check whether reduced cell death is also contributing to cerebellum size.

      Lastly, the authors examined macrophage infiltration and found that there was less macrophage infiltration in the Pten-null tumors. Using scRNA-seq, they suggest that the observed reduction in macrophages might be due to an immunosuppressive tumor microenvironment.

      This mouse model will be of high relevance to the medulloblastoma community, as current models do not reflect the heterogeneity of the disease. In addition, the elegant experimentation into Pten function may be relevant to cancer biologists outside of the medulloblastoma field.

      Strengths:

      The in-depth characterisation of the mouse model is a major strength of the study, including multiple time points and quantifications. The single-cell sequencing adds a nice molecular feature, and this dataset may be relevant to other researchers with specific questions of Pten function.

      Weaknesses:

      One weakness of the study was the examination of the macrophage phenotype, which did not include quantification (only single images), so it is difficult to assess whether this reduction of macrophages holds true across multiple samples. Future studies will also be needed to assess whether Pten-mutated patient medulloblastomas also have a differentiation phenotype, but this is difficult to assess given the low number of samples worldwide.

      We thank the reviewer for highlighting the importance of our sporadic mutant approach and new findings. As stated above, we agree, further studies of the influence of Pten mutations on macrophage phenotypes will be interesting as well as of human samples once large numbers can be obtained. All conclusions about macrophages are based on analyzing 3 independent tumors/genotype, which was stated in the Figure legends, and for all end stage tumors the sections were collected from one lateral edge of the tumor to the midline and for earlier stage from one side of the brain to the other, thus we believe the reported phenotypes are consistent within tumor and stages

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor points 

      (1) The authors should state explicitly that early EGL analyses sample the same cerebellar region across animals (e.g., matched lobule or distance from the midline) because position-dependent effects are possible. 

      We agree this is an important aspect of the rigor of the study and are sorry this was not clear enough. We had stated in the legends to Figures 4 and 5 that midline sections were analyzed and when it was not the entire EGL quantified the region analyzed was shown, but we now include more details in all relevant Figure legends and in the Methods section. 

      (2) It is not clear from Figure 3i-k that TUNEL density in Syp-high regions differs between Pten+/- and Pten-/- tumors. 

      We have added a new graph as Figure 3 Supplemental Figure 1D with this direct comparison. Indeed, there is no difference between the Syp-high regions of Pten+/- and Pten-/- tumors as these regions of Pten+/- tumors have no detectable PTEN protein and thus have the same behavior as Pten-/- tumors (reduced cell death).

      (3) The authors interpret the increase in the %EdU+ GFP+ cells in the EGL as evidence of a faster cell cycle. However, EdU labeling alone does not demonstrate altered cell cycle kinetics; this would require a dedicated assay. It would also be informative to combine EdU with Ki67 staining. This could clarify whether the effect reflects changes in differentiation - for example, if a higher proportion of GFP+ pre-tumor cells remain Ki67+-or whether the increase in EdU simply reflects a greater fraction of cells being in cycle. Such an analysis might even reveal no change in cycling if the proliferation index in controls is lower. 

      We are sorry we did not make our analysis sufficiently clear in Figure 5 and Figure 6. The quantification of EdU+ cells was restricted to the outer EGL (region defined by containing GFP+ and EdU+ cells) where all cells should be Ki67+.  We cannot perform co-staining of Ki67 and GFP, since antigen retrieval for Ki67 removes the epitope for our GFP antibody. We have revised the wording in the figure legends and results sections.  

      (4) Some of the stains are unconvincing - for example, Figure 2 E,F, the p27 staining is difficult to distinguish from the background, Figure 7G,E- CD31+ blood vessels are difficult to see. 

      As requested, in Fig. 2 we adjusted the level of the green color for P27 to reduce the background in A, B, E , F using Photoshop. In Fig. 7G, H we adjusted the level of the green color for CD31 to reduce the background.  

      (5) Line 158: "unlike a SmoA2 model with germline or broad deletion of Pten in the cerebellum, where heterozygous deletion is sufficient..." That paper refers to the Neuro-D2SmoA1 mouse model. So this statement should be clarified.  

      We have made this edit.

      Reviewer #2 (Recommendations for the authors): 

      (1) I find the final discussion paragraph about Kmt2d does not add much to the study, as it seems obvious that the mechanisms of tumor formation would differ between two different tumor suppressor genes, but this is only my opinion. 

      We respectfully think it is interesting, even if expected, so have left it in the Discussion.

      (2) There is also a typo on line 342 that changes the meaning of the sentence: mTORC1 signaling is significantly 'unregulated'; 

      We thank the reviewer for noticing this mistake. We have changed 'unregulated' to ‘upregulated’.

      (3) Figure 9Q,R mislabeled: not mTORC1, but instead UPR  

      Asns is included in the mTOR pathway in Hallmark MTOR1 signaling as well as in the Unfolded Protein Response gene list. We have made a note of this in the Figure legend.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Dixit and colleagues investigate the role of FRG1 in modulating nonsense-mediated mRNA decay using human cell lines and zebrafish embryos. They present data from experiments that test the effect of normal, reduced or elevated levels of FRG1 on NMD of a luciferase-based NMD reporter and on endogenous mRNA substrates of NMD. They also carry out experiments to investigate FRG1's influence on UPF1 mRNA and protein levels, with a particular focus on the possibility that FRG1 regulates UPF1 protein levels through ubiquitin-mediated proteolysis of UPF1. The experiments described also test whether DUX4's effect on UPF1 protein levels and NMD could be mediated through FRG1. Finally, the authors also present experiments that test for physical interaction between UPF1, the spliceosome and components of the exon junction complex.

      Strengths:

      A key strength of the work is its focus on an intriguing model of NMD regulation by FRG1, which is of particular interest as FRG1 is positively regulated by DUX4, which has been previously implicated in subjecting UPF1 to proteosome-mediated degradation and thereby causing NMD inhibition. The data that shows that DUX4-mediated effect on UPF1 levels is diminished upon FRG1 depletion suggests that DUX4's regulation of NMD could be mediated by FRG1.

      Weaknesses:

      A major weakness and concern is that many of the key conclusions drawn by the authors are not supported by the data, and there are also some significant concerns with experimental design. More specific comments below describe these issues:

      (1) Multiple issues lower the confidence in the experiments testing the effect of FRG1 on NMD.

      (a) All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small. This assay is the key experimental approach throughout the manuscript. However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      We thank the reviewer for raising these points and for the careful evaluation of our experimental approach. Here we provide our response to comment (a) in three parts

      Reliance on luciferase-based reporter assays

      While luciferase-based NMD reporter assays represent an important experimental component of this study, our conclusions do not rely exclusively on this approach. The reporter-based findings are independently supported by RNA sequencing analyses of FRG1-perturbed cells, which demonstrate altered abundance of established PTC-containing NMD target transcripts. This genome-wide analysis provides an unbiased and physiologically relevant validation of FRG1 involvement in NMD regulation.

      All reporter assays presented in the manuscript are based on quantification of luciferase activity, and in most cases, the effect on luciferase activity is quite small.

      We respectfully disagree with the comment that the magnitude of the luciferase effects is low. Increased expression of FRG1, which leads to reduced UPF1 levels, results in a ~3.5-fold increase in relative luciferase activity (Fig. 1C), indicating a robust effect. Furthermore, in the in vivo zebrafish model, FRG1 knockout causes a pronounced decrease in relative luciferase activity (Fig. 1H), consistent with elevated UPF1 levels and enhanced NMD activity.

      It is also important to note that FRG1 functions as a negative regulator of UPF1; therefore, its depletion is expected to increase UPF1 levels. However, excessive elevation of UPF1 is likely constrained by additional regulatory mechanisms, which may limit the observable effects of FRG1 knockdown or knockout. In line with this, our previous study (1) demonstrated that FRG1 positively regulates multiple NMD factors while exerting an inverse regulatory effect on UPF1. This dual role suggests that FRG1 may act as a compensatory modulator of the NMD machinery, which likely explains the relatively subtle net effects observed in FRG1 knockdown/knockout conditions in vitro (Fig. 1A and 1B). This interpretation is explicitly discussed in the manuscript (Discussion, paragraph para 4).

      However, no evidence is provided that the effect captured by this assay is due to enhanced degradation of the mRNA encoding the luciferase reporter, which is what is implied in the interpretation of these experiments. Crucially, there is also no control for the reporter that can account for the effects of experimental manipulations on transcriptional versus post-transcriptional effects. A control reporter lacking a 3'UTR intron is described in Barid et al, where the authors got their NMD reporter from. Due to small effects observed on luciferase activity upon FRG1 depletion, it is necessary to not only measure NMD reporter mRNA steady state levels, but it will be equally important to ascertain that the effect of FRG1 on NMD is at the level of mRNA decay and not altered transcription of NMD substrates. This can be accomplished by testing decay rates of the beta-globin reporter mRNA.

      Thank you for your suggestion. We will test decay rates of the beta-globin reporter mRNA.

      (b) It is unusual to use luciferase enzymatic activity as a measurement of RNA decay status. Such an approach can at least be justified if the authors can test how many-fold the luciferase activity changes when NMD is inhibited using a chemical inhibitor (e.g., SMG1 inhibitor) or knockdown of a core NMD factor.

      We respectfully disagree that the use of luciferase enzymatic activity as a readout for NMD is unusual. Multiple prior studies have successfully employed identical or closely related luciferase-based/fluorescence-based reporters to quantify NMD activity (2–5). Importantly, the goal of our study was not to measure RNA decay kinetics per se, but rather to assess how altered FRG1 levels influence the functional efficiency of the NMD pathway. Given that FRG1 is a structural component of the spliceosome C complex (6) and is previously indirectly linked to NMD regulation (1,7) this approach was well-suited to address our central question.

      As suggested by the reviewer, we will also assess luciferase activity following pharmacological inhibition of NMD to further validate the reporter system's responsiveness.

      (c) The concern about the direct effect of FRG1 on NMD is further amplified by the small effects of FRG1 knockout on steady-state levels of endogenous NMD targets (Figure 1A and B: ~20% reduction in reporter mRNA in MCF7 cells; Figure 1M, only 18 endogenous NMD targets shared between FRG1_KO and FRG1_KD).

      The modest changes observed upon FRG1 loss do not preclude a direct role in NMD. As detailed in our response to comment (a) and discussed in paragraph 4 of the Discussion, limited effects on steady-state levels of endogenous NMD targets are expected given the buffering capacity of the NMD pathway and the contribution of compensatory regulatory mechanisms.

      (d) The question about transcriptional versus post-transcriptional effects is also important in light of the authors' previous work that FRG1 can act as a transcriptional regulator.

      We agree that distinguishing between transcriptional and post-transcriptional effects is important, particularly in light of our previous work demonstrating that FRG1 can function as a transcriptional regulator of multiple NMD genes (1). Consistent with this, the current manuscript shows that FRG1 influences the transcript levels of UPF1. In addition, we demonstrate that FRG1 regulates UPF1 at the protein level. We therefore conclude that FRG1 regulates UPF1 dually, at both transcriptional and post-transcriptional levels, supporting a dual role for FRG1 in the regulation of NMD.

      This conclusion is further supported by prior studies indicating post-transcriptional functions of FRG1. FRG1 is a nucleocytoplasmic shuttling protein(8), interacts with the NMD factor ROD1 (7), and has been identified as a component of the spliceosomal C complex (6). FRG1 has also been reported to associate with the hnRNPK family of proteins (8), which participate in extensive protein–protein interaction networks. Collectively, these observations are consistent with a role for FRG1 in regulating NMD components at multiple levels.

      (2) In the experiments probing the relationship between DUX4 and FRG1 in NMD regulation, there are some inconsistencies that need to be resolved.

      (a) Figure 3 shows that the inhibition of NMD reporter activity caused by DUX4 induction is reversed by FRG1 knockdown. Although levels of FRG1 and UPF1 in DUX4 uninduced and DUX4 induced + FRG1 knockdown conditions are similar (Figure 5A), why is the reporter activity in DUX4 induced + FRG1 knockdown cells much lower than DUX4 uninduced cells in Figure 3?

      We appreciate the reviewer’s comment. Figures 3 and 5A represent independent experiments in which FRG1 knockdown was achieved by transient transfection. As such, variability in transfection efficiency is expected and likely accounts for the quantitative difference. We want to highlight that compared to DUX4_induced lane (Fig. 5A, lane 2), when we knock down FRG1 on the DUX4_induced background, it shows a clear increase in the UPF1 level (Fig. 5A, lane 3). We will add one more replicate to 5 A with better FRG1_KD transfection to the experiment.

      (b) In Figure 3, it is important to know the effect of FRG1 knockdown in DUX4 uninduced conditions.

      We thank the reviewer for this thoughtful suggestion. The effect of FRG1 knockdown under DUX4-uninduced conditions is presented in Figure 1A, where FRG1 levels are reduced without altering DUX4 expression. In contrast, Figure 3 is specifically designed to assess the rescue effect—namely, how reduction of FRG1 expression under DUX4-induced conditions influences NMD efficiency. Therefore, inclusion of an FRG1 knockdown–only group in Figure 3 was not relevant to the objective of this experiment.

      (c) On line 401, the authors claim that MG132 treatment leads to "time-dependent increase in UPF1 protein levels" in Figure 5C. However, upon proteasome inhibition, UPF1 levels significantly increase only at 8h time point, while the change at 12 and 24 hours is not significantly different from the control.

      We thank the reviewer for this observation and agree that the statement of a “time-dependent increase in UPF1 protein levels” was inaccurate. A significant increase is observed only at the 8 h time point following MG132 treatment, with no significant changes at 12 h or 24 h. The text will be revised accordingly to reflect Figure 5C.

      (3) There are multiple issues with experiments investigating ubiquitination of UPF1:

      (a) Ubiquitin blots in Figure 6 are very difficult to interpret. There is no information provided either in the text or figure legends as to which bands in the blots are being compared, or about what the sizes of these bands are, as compared to UPF1. Also, the signal for Ub in most IP samples looks very similar to or even lower than the input.

      We agree that the ubiquitin blots in Figure 6 require clearer presentation. In the revised figure, we will annotate the ubiquitin immunoblots to indicate the region corresponding to UPF1 (~140 kDa), which is the relevant molecular weight for interpretation. Because UPF1 is polyubiquitinated, ubiquitinated species are expected to appear as multiple bands rather than a single discrete signal; therefore, ubiquitination was assessed across the full blot. Importantly, interpretation is based on comparisons between UPF1 immunoprecipitated samples within each panel (Fig. 6C–F), rather than between input and IP lanes. For example, in Figure 6 C UPF1 IP FRG1_KD compared to UPF1 IP FRG1_Ex, in Figure 6 D UPF1 IP FRG1_WT compared to UPF1 IP FRG1_KO, in Figure 6 E UPF1 IP FRG1_KO compared to UPF1 IP FRG1_KO+FRG1_Ex, and in Figure 6 F UPF1 IP FRG1_Ex compared to UPF1 IP FRG1_Ex+MG132 TRT.

      (b) Western blot images in Figure 6D appear to be adjusted for brightness/contrast to reduce background, but are done in such a way that pixel intensities are not linearly altered. This image appears to be the most affected, although some others have also similar patterns (e.g., Figure 5C).

      We thank the reviewer for raising this point. The appearance noted in Figure 6D was not due to non-linear alteration of pixel intensities, but rather resulted from the poor quality of the ubiquitin antibody, which required prolonged exposure times. To address this, we replaced the antibody and repeated the ubiquitin immunoblots shown in Figures 6D, 6E, and 6F.

      For Figure 5C, only uniform contrast adjustment was applied for clarity. Importantly, all adjustments were performed linearly and applied to the entire image. Raw, unprocessed images for all blots are provided in the Supplementary Information. Updated versions of Figures 5 and 6 will be included in the revised manuscript.

      (4) The experiments probing physical interactions of FRG1 with UPF1, spliceosome and EJC proteins need to consider the following points:

      (a) There is no information provided in the results or methods section on whether immunoprecipitations were carried out in the absence or presence of RNases. Each RNA can be bound by a plethora of proteins that may not be functionally engaged with each other. Without RNase treatment, even such interactions will lead to co-immunoprecipitation. Thus, experiments in Figure 6 and Figure 7A-D should be repeated with and without RNase treatment.

      We thank the reviewer for this important point. The co-immunoprecipitation experiments shown in Figures 6 and 7A–D were performed in the absence of RNase treatment; this information was inadvertently omitted and will be added to the Methods section and the relevant figure legends. To directly assess whether the observed interactions are RNA-dependent, we will repeat the key co-immunoprecipitation experiments in the presence of RNase treatment and include these results in the revised manuscript.

      (b) Also, the authors claim that FRG1 is a "structural component" of EJC and NMD complexes seems to be an overinterpretation. As noted in the previous comment, these interactions could be mediated by a connecting RNA molecule.

      We thank the reviewer for this insightful comment. As noted, previous studies have suggested that FRG1 interacts with components of the EJC and NMD machinery. Specifically, Bertram et al. (6) identified FRG1 as a component of the spliceosomal C complex via Cryo-EM structural analysis, and pull-down studies have shown direct interaction between FRG1 and ROD1, a known EJC component (7). These findings support a protein-protein interaction rather than one mediated solely by RNA. To further address the reviewer’s concern, we will perform key co-immunoprecipitation experiments in the presence of RNase treatment to distinguish RNA-dependent from RNA-independent interactions.

      (c) A negative control (non-precipitating protein) is missing in Figure 7 co-IP experiments.

      We agree that including a non-precipitating protein as a negative control is important, and we will perform the co-IP experiment incorporating this control.

      (d) Polysome analysis is missing important controls. FRG1 and EIF4A3 co-sedimentation with polysomes could simply be due to their association with another large complex (e.g., spliceosome), which will also co-sediment in these gradients. This possibility can at least be tested by Western blotting for some spliceosome components across the gradient fractions. More importantly, a puromycin treatment control needs to be performed to confirm that FRG1 and EIF4A3 are indeed bound to polysomes, which are separated into ribosome subunits upon puromycin treatment. This leads to a shift of the signal for ribosomal proteins and any polysome-associated proteins to the left.

      As recommended, we will examine the distribution of a spliceosome component across the gradient fractions to assess potential co-sedimentation. Additionally, we will perform a puromycin treatment control to confirm that FRG1 and EIF4A3 are genuinely associated with polysomes.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Palo et al present a novel role for FRG1 as a multifaceted regulator of nonsense-mediated mRNA decay (NMD). Through a combination of reporter assays, transcriptome-wide analyses, genetic models, protein-protein interaction studies, ubiquitination assays, and ribosome-associated complex analyses, the authors propose that FRG1 acts as a negative regulator of NMD by destabilizing UPF1 and associating with spliceosomal, EJC, and translation-related complexes. Overall, the data, while consistent with the authors' central conclusions, are undermined by several claims-particularly regarding structural roles and mechanistic exclusivity. To really make the claims presented, further experimental evidence would be required.

      Strengths:

      (1) The integration of multiple experimental systems (zebrafish and cell culture).

      (2) Attempts to go into a mechanistic understanding of the relationship between FGR1 and UPF1.

      Weaknesses:

      (1) Overstatement of FRG1 as a structural NMD component.

      Although FRG1 interacts with UPF1, eIF4A3, PRP8, and CWC22, core spliceosomal and EJC interactions (PRP8-CWC22 and eIF4A3-UPF3B) remain intact in FRG1-deficient cells. This suggests that, while FRG1 associates with these complexes, this interaction is not required for their assembly or structural stability. Without further functional or reconstitution experiments, the presented data are more consistent with an interpretation of FRG1 acting as a regulatory or accessory factor rather than a core structural component.

      We thank the reviewer for this clarification. We would like to emphasize that we do not claim FRG1 to be a core structural component of either the spliceosome or the EJC. Consistent with the reviewer’s interpretation, our data indicate that FRG1 deficiency does not disrupt the structural integrity of these complexes. Our intended conclusion is that FRG1 functions as a regulatory or accessory factor in NMD rather than being required for complex assembly or stability. We will carefully revise the manuscript to remove any language that could be interpreted as an overstatement. In addition, we are currently performing further experiments to better define the association of FRG1 with the EJC.

      (2) Causality between UPF1 depletion and NMD inhibition is not fully established.

      While reduced UPF1 levels provide a plausible explanation for decreased NMD efficiency, the manuscript does not conclusively demonstrate that UPF1 depletion drives all observed effects. Given FRG1's known roles in transcription, splicing, and RNA metabolism, alterations in transcript isoform composition and apparent NMD sensitivity may arise from mechanisms independent of UPF1 abundance. To directly link UPF1 depletion to altered NMD efficiency, rescue experiments testing whether UPF1 re-expression restores NMD activity in FRG1-overexpressing cells would be important.

      As suggested, to directly test causality, we will perform rescue experiments to determine whether UPF1 re-expression restores NMD activity in FRG1-overexpressing MCF7 cells.

      (3) Mechanism of FRG1-mediated UPF1 ubiquitination requires clarification.

      The ubiquitination assays support a role for FRG1 in promoting UPF1 degradation; however, the mechanism underlying this remains unexplored. The relationship between FRG1-UPF1 what role FRG1 plays in this is unclear (does it function as an adaptor, recruits an E3 ubiquitin ligase, or influences UPF1 ubiquitination indirectly through transcriptional or signaling pathways?).

      We agree with the reviewer that the precise mechanism by which FRG1 promotes UPF1 ubiquitination remains to be defined. Our ubiquitination assays support a role for FRG1 in facilitating UPF1 degradation; however, whether FRG1 functions directly as an adaptor or E3 ligase, or instead influences UPF1 stability indirectly, is currently unclear. Notably, a prior study by Geng et al. reported that DUX4 expression alters the expression of numerous genes involved in protein ubiquitination, including multiple E3 ubiquitin ligases (9), and FRG1 itself has been reported to be upregulated upon DUX4 expression in muscle cells. We will expand the Discussion to address these potential mechanisms and place our findings in the context of indirect transcriptional or signaling pathways that may regulate UPF1 proteolysis. A detailed mechanistic dissection of FRG1-mediated ubiquitination is beyond the scope of the present study.

      (4) Limited transcriptome-wide interpretation of RNA-seq data.

      Although the RNA-seq data analysis relies heavily on a small subset of "top 10" genes. Additionally, the criteria used to define NMD-sensitive isoforms are unclear. A more comprehensive transcriptome-wide summary-indicating how many NMD-sensitive isoforms are detected and how many are significantly altered-would substantially strengthen the analysis.

      We thank the reviewer for this comment and agree that the current presentation may place a disproportionate emphasis on a limited subset of genes. These genes were selected as illustrative examples from an isoform-level analysis performed using IsoformSwitchAnalyzeR (ISAR) (10); however, we acknowledge that this approach does not fully convey the transcriptome-wide scope of the analysis.

      Using quantified RNA-seq data, ISAR was employed to identify significant isoform switches and transcripts predicted to be NMD-sensitive. Isoforms were annotated using GENCODE v47, and NMD sensitivity was assigned based on the established 50-nucleotide rule, as described in the Materials and Methods. To address the reviewer’s concern, we will revise the Results section to include a transcriptome-wide summary derived from the ISAR analysis.

      (5) Clarification of NMD sensor assay interpretation.

      The logic underlying the NMD sensor assay should be explained more clearly early in the manuscript, as the inverse relationship between luciferase signal and NMD efficiency may be counterintuitive to readers unfamiliar with this reporter system. Inclusion of a schematic or brief explanatory diagram would improve accessibility.

      We agree with the reviewer and would provide a schematic as well as the experimental setup diagram to improve accessibility to the readers.

      (6) Potential confounding effects of high MG132 concentration.

      The MG132 concentration used (50 µM) is relatively high and may induce broad cellular stress responses, including inhibition of global translation (its known that proteosome inhibition shuts down translation). Controls addressing these secondary effects would strengthen the conclusion that UPF1 stabilization specifically reflects proteasome-dependent degradation would be essential.

      We acknowledge the reviewer’s concern regarding the relatively high concentration of MG132 used in this study. While proteasome inhibition can indeed induce global translation inhibition, our interpretation is based on the specific stabilization of UPF1 observed under these conditions. Since inhibition of global translation would generally reduce protein levels rather than cause selective accumulation, the observed increase in UPF1 is unlikely to result from translational effects. To address this point, we plan to repeat selected experiments using a lower MG132 concentration to further confirm that UPF1 stabilization reflects proteasome-dependent degradation.

      (7) Interpretation of polysome co-sedimentation data.

      While the co-sedimentation of FRG1 with polysomes is intriguing, this approach does not distinguish between direct ribosomal association and co-migration with ribosome-associated complexes. This limitation should be explicitly acknowledged in the interpretation.

      We acknowledge that polysome co-sedimentation alone cannot definitively distinguish between direct ribosomal binding and co-migration with ribosome-associated complexes. Importantly, our interpretation does not rely solely on this assay; when combined with co-immunoprecipitation and proximity ligation assay results, the data consistently support an association of FRG1 with the exon junction complex. We are also conducting additional experiments with appropriate controls to further validate the specificity of FRG1’s association with ribosomes and to address the possibility of nonspecific co-migration.

      (8) Limitations of PLA-based interaction evidence.

      The PLA data convincingly demonstrate close spatial proximity between FRG1 and eIF4A3; however, PLA does not provide definitive evidence of direct interaction and is known to be susceptible to artefacts. Moreover, a distance threshold of ~40 nm still allows for proteins to be in proximity without being part of the same complex. These limitations should be clearly acknowledged, and conclusions should be framed accordingly.

      We thank the reviewer for highlighting this important point. We agree that PLA indicates close spatial proximity but does not constitute definitive evidence of direct interaction and can be susceptible to artefacts. We will explicitly acknowledge this limitation in the revised manuscript. Importantly, our conclusions are not solely based on PLA data; they are supported by complementary co-immunoprecipitation and polysome co-sedimentation assays, which provide biochemical evidence consistent with an association between FRG1 and eIF4A3.

      Reviewer #3 (Public review):

      The manuscript by Palo and colleagues demonstrates identification of FRG1 as a novel regulator of nonsense-mediated mRNA decay (NMD), showing that FRG1 inversely modulates NMD efficiency by controlling UPF1 abundance. Using cell-based models and a frg1 knockout zebrafish, the authors show that FRG1 promotes UPF1 ubiquitination and proteasomal degradation, independently of DUX4. The work further positions FRG1 as a structural component of the spliceosome and exon junction complex without compromising its integrity. Overall, the manuscript provides mechanistic insight into FRG1-mediated post-transcriptional regulation and expands understanding of NMD homeostasis. The authors should address the following issues to improve the quality of their manuscript.

      (1) Figure 7A-D, appropriate positive controls for the nuclear fraction (e.g., Histone H3) and the cytoplasmic fraction (e.g., GAPDH or α-tubulin) should be included to validate the efficiency and purity of the subcellular fractionation.

      We thank the reviewer for the suggestion. We will include appropriate positive controls for the nuclear fraction (Histone H3) and the cytoplasmic fraction (GAPDH or α-tubulin) in Figure 7A–D to validate the efficiency and purity of the subcellular fractionation.

      (2) To strengthen the conclusion that FRG1 broadly impacts the NMD pathway, qRT-PCR analysis of additional core NMD factors (beyond UPF1) in the frg1⁻/⁻ zebrafish at 48 hpf would be informative.

      We appreciate the reviewer’s insightful comment. We will perform qRT-PCR analysis of additional core NMD factors in the frg1⁻/⁻ zebrafish at 48 hpf to further strengthen the conclusion that FRG1 broadly impacts the NMD pathway.

      (3) Figure labels should be standardized throughout the manuscript (e.g., consistent use of "Ex" instead of mixed terms such as "Oex") to improve clarity and readability.

      We thank the reviewer for noticing the inconsistency. We will ensure that all figure labels are standardized throughout the manuscript (e.g., using “Ex” consistently) to improve clarity and readability.

      (4) The methods describing the generation of the frg1 knockout zebrafish could be expanded to include additional detail, and a schematic illustrating the CRISPR design, genotyping workflow, and validation strategy would enhance transparency and reproducibility.

      We appreciate the reviewer’s suggestion and will expand the Methods section to provide additional detail on the generation of the frg1 knockout zebrafish. A schematic illustrating the CRISPR design, genotyping workflow, and validation strategy will also be included to enhance transparency and reproducibility.

      (5) As FRG1 is a well-established tumor suppressor, additional cell-based functional assays under combined FRG1 and UPF1 perturbation (e.g., proliferation, migration, or survival assays) could help determine whether FRG1 influences cancer-associated phenotypes through modulation of the NMD pathway.

      We thank the reviewer for this thoughtful and constructive suggestion. While FRG1 is indeed a well-established tumor suppressor, incorporating additional cell-based functional assays under combined FRG1 and UPF1 perturbation would significantly broaden the scope of the current study. The present work is focused on elucidating the molecular relationship between FRG1 and the NMD pathway. Investigation of downstream cancer-associated phenotypes represents an important and interesting direction for future studies, but is beyond the scope of the current manuscript.

      (6) Given the claim that FRG1 inversely regulates NMD efficacy via UPF1, an epistasis experiment such as UPF1 overexpression in an FRG1-overexpressing background followed by an NMD reporter assay would provide stronger functional validation of pathway hierarchy.

      We agree with the reviewer’s suggestion. To strengthen the functional validation of the proposed pathway hierarchy, we will perform an epistasis experiment by overexpressing UPF1 in an FRG1-overexpressing background and assess NMD activity using an established NMD reporter assay. The results of this experiment will be included in the revised manuscript.

      References

      (1) Palo A, Patel SA, Shubhanjali S, Dixit M. Dynamic interplay of Sp1, YY1, and DUX4 in regulating FRG1 transcription with intricate balance. Biochim Biophys Acta Mol Basis Dis. 2025 Mar;1871(3):167636.

      (2) Sato H, Singer RH. Cellular variability of nonsense-mediated mRNA decay. Nat Commun. 2021 Dec 10;12(1):7203.

      (3) Baird TD, Cheng KCC, Chen YC, Buehler E, Martin SE, Inglese J, et al. ICE1 promotes the link between splicing and nonsense-mediated mRNA decay. eLife. 2018 Mar 12;7:e33178.

      (4) Chu V, Feng Q, Lim Y, Shao S. Selective destabilization of polypeptides synthesized from NMD-targeted transcripts. Mol Biol Cell. 2021 Dec 1;32(22):ar38.

      (5) Udy DB, Bradley RK. Nonsense-mediated mRNA decay uses complementary mechanisms to suppress mRNA and protein accumulation. Life Sci Alliance. 2022 Mar;5(3):e202101217.

      (6) Bertram K, El Ayoubi L, Dybkov O, Agafonov DE, Will CL, Hartmuth K, et al. Structural Insights into the Roles of Metazoan-Specific Splicing Factors in the Human Step 1 Spliceosome. Mol Cell. 2020 Oct 1;80(1):127-139.e6.

      (7) Brazão TF, Demmers J, van IJcken W, Strouboulis J, Fornerod M, Romão L, et al. A new function of ROD1 in nonsense-mediated mRNA decay. FEBS Lett. 2012 Apr 24;586(8):1101–10.

      (8) Sun CYJ, van Koningsbruggen S, Long SW, Straasheijm K, Klooster R, Jones TI, et al. Facioscapulohumeral muscular dystrophy region gene 1 is a dynamic RNA-associated and actin-bundling protein. J Mol Biol. 2011 Aug 12;411(2):397–416.

      (9) Geng LN, Yao Z, Snider L, Fong AP, Cech JN, Young JM, et al. DUX4 activates germline genes, retroelements, and immune mediators: implications for facioscapulohumeral dystrophy. Dev Cell. 2012 Jan 17;22(1):38–51.

      (10) Vitting-Seerup K, Sandelin A. The Landscape of Isoform Switches in Human Cancers. Mol Cancer Res MCR. 2017 Sep;15(9):1206–20.

    1. Author response:

      eLife Assessment 

      This study presents a valuable finding on maternal SETDB1 as a key chromatin repressor that shuts down the 2C gene program and enables normal mouse embryonic development. The evidence supporting the claims of the authors is solid, although the inclusion of a causality test, a mechanistic understanding of SETDB1 targeting, and phenotypic quantification would have greatly strengthened the study. The work will be of broad interest to biologists working on embryonic development, stem cells and gene regulation.

      Thank you for this positive evaluation of our work. Please find the point-by point responses to the Reviewer’s comments below.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      During the earliest stages of mouse development, the zygote and 2-cell (2C) embryo are totipotent, capable of generating all embryonic and extra-embryonic lineages, and they transiently express a distinctive set of "2C-stage" genes, many driven by MERVL long terminal repeat (LTR) promoters. Although activation of these transcripts is a normal feature of totipotency, they must be rapidly silenced as development proceeds to the 4-cell and 8-cell stages; failure to shut down the 2C program results in developmental arrest. This study examines the role of maternal SETDB1, a histone H3K9 methyltransferase, in suppressing the 2C transcriptional network. Using an oocyte-specific conditional knockout that removes maternal Setdb1 while leaving the paternal allele intact, the authors demonstrate that embryos lacking maternal SETDB1 arrest during cleavage, with very few progressing beyond the 8-cell stage and no morphologically normal blastocysts forming. Transcriptomic analyses reveal persistent expression of MERVL-LTR-driven transcripts and other totipotency markers, indicating a failure to terminate the totipotent state. Together, the data demonstrate that maternally deposited SETDB1 is required to silence the MERVL-driven 2C program and enable the transition from totipotency to pluripotency. More broadly, the work identifies maternal SETDB1 as a key chromatin repressor that deposits repressive H3K9 methylation to shut down the transient 2C gene network and to permit normal preimplantation development. 

      Strengths: 

      (1) Closes a key knowledge gap. 

      The study tackles a central open question - how embryos exit the totipotent 2-cell (2C) state - and provides direct in vivo evidence that epigenetic repression is required to terminate the 2C program for development to proceed. By identifying maternal SETDB1 as the responsible factor, the work substantially advances our understanding of the maternal-to-zygotic transition and early lineage specification. 

      (2) Clean genetics paired with rigorous genomics. 

      An oocyte-specific Setdb1 knockout cleanly isolates a maternal-effect requirement, ensuring that early phenotypes arise from loss of maternal protein. The resulting cleavage-stage arrest is unambiguous (most embryos stall before or around the 8-cell stage). State-of-the-art single-embryo RNA-seq across stages - well-matched to low-cell-number constraints - captures genome-wide mis-expression, including persistent 2C transcripts in mutants, strongly supporting the conclusions. 

      (3) Compelling molecular linkage to phenotype. 

      Transcriptome data show that without maternal SETDB1, embryos fail to repress a suite of 1-cell/2C-specific genes by the 8-cell stage. The tight correlation between continued activation of the MERVL-driven totipotency network and developmental arrest provides a specific molecular explanation for the observed failure to progress. 

      (4) Mechanistic insight grounded in chromatin biology. 

      SETDB1, a H3K9 methyltransferase classically linked to heterochromatin and transposon repression, targets MERVL LTRs and MERVL-driven chimeric transcripts in early embryos. Bioinformatic evidence indicates that these loci normally acquire H3K9me3 during the 2C→4C transition. The data articulate a coherent mechanism: maternal SETDB1 deposits repressive H3K9me3 at 2C gene loci to shut down the totipotency network, extending observations from ESC systems to bona fide embryos. 

      (5) Broad implications for development and stem-cell biology. 

      By pinpointing a maternal gatekeeper of the totipotent-to-pluripotent transition, the work suggests that some cases of cleavage-stage arrest (e.g., in IVF) may reflect faulty epigenetic silencing of transposon-driven genes. It also informs stem-cell efforts to control totipotent-like states in vitro (e.g., 2C-like cells), linking epigenetic reprogramming, transposable-element regulation, and developmental potency.

      We thank Reviewer 1 for recognizing the strengths in our work and for the suggestions below.

      Weaknesses: 

      (1) Causality not directly demonstrated. 

      The link among loss of SETDB1, persistence of 2C transcripts, and developmental arrest is compelling but remains correlative. No rescue experiments test whether dampening the 2C/MERVL program restores development. Targeted interventions-e.g., knocking down key 2C drivers (such as Dux) or pharmacologically curbing MERVL-linked transcription in maternal Setdb1 mutants-would strengthen the claim that unchecked 2C activity is causal rather than a by-product of other SETDB1 functions.

      We agree that rescue experiments might strengthen causality. Those experiments, however, would be extremely challenging technically because the knockdowns would need to be precisely timed to follow (and not prevent) the wave of 2c-specific activation. Knocking down 2c drivers in the zygote, for example, may prevent switching on the totipotency program. In addition, while sustained MERVL expression—such as that induced by forced DUX expression—disrupts totipotency exit and embryo development (1, 2), derepression of transcription is very broad in Setdb1<sup>mat-/+</sup> embryos and knocking down individual 2C drivers may not be sufficient to rescue development or restore the exit from totipotency.

      (2) Limited mechanistic resolution of SETDB1 targeting. 

      The study establishes a requirement for maternal SETDB1 but does not define how it is recruited to MERVL loci. Given SETDB1's canonical cooperation with TRIM28/KAP1 and KRAB-ZNFs, upstream sequence-specific factors and/or pre-existing chromatin features likely guide targeting. Direct occupancy and mark-placement evidence (e.g., SETDB1/TRIM28 CUT&RUN or ChIP, and H3K9me3 profiling at MERVL LTRs during the 2C→4C window) would convert inferred mechanisms into demonstrated ones.

      We do show H3K9me3 patterns at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window from a published dataset. Please see the genome browser images in Figures 4C, 4D, 4E, 6D, 6E and Figure S6. We agree that mapping of SETDB1/TRIM28 to those locations would strengthen the mechanistic insight. However, ChIPseq or CUT&RUN of those proteins in preimplantation embryos are not technically feasible. We do provide genetic evidence for the collaboration between SETDB1 and DUXBL, a DNA-binding factor, by showing that DUXBL cannot switch off its top targets without SETDB1 (Figure 6). Future studies will characterize the molecular mechanisms underlying this (likely indirect) collaboration. We do not think that DUXBL and SETDB1 directly interact, because such interaction was not detected by DUXBL IP-MS (3).

      (3) Narrow scope on MERVL; broader epigenomic consequences underexplored. 

      Maternal SETDB1 may restrain additional repeat classes or genes beyond the 2C network. A systematic repeatome analysis (LINEs/SINEs/ERV subfamilies) would clarify specificity versus a general loss of heterochromatin control. Moreover, potential effects on imprinting or DNA methylation balance are not examined; perturbations there could also contribute to arrest. Bisulfite-based DNA methylation maps at imprinted loci and allele-specific expression analyses would help rule in/out these mechanisms.

      We did examine genes and repeat elements beyond the 2c network. We evaluated gene and TE expression changes using four-way comparisons. Please find the results regarding gene expression in Figure 1C-J, Figure S2, Figure S3, Figure S4., Table S2, Table S3, and Table S4. Please find results on TE expression in Figure S5. Table S6, Table S7, and Table S8 and in the text. We agree that DNA methylation may be altered in Setdb1<sup>mat-/+</sup> embryos. In our hands, evaluating this possibility using bisulfite sequencing requires a larger number of embryos than what we can feasibly obtain (the number of obtained mutant embryos is very small). Regarding imprinted gene expression, one cannot fully assess and interpret imprinted gene expression in preimplantation stage embryos before the maternally deposited transcripts are gone. We reported earlier that clear somatic parental-specific patterns of imprinted gene expression may only start later in development, around 8.5 dpc (4).

      (4) Phenotype quantitation and transcriptomic breadth could be clearer. 

      The developmental phenotype is described qualitatively ("very few beyond 8-cell") without precise stage-wise arrest rates or representative morphology. Tabulated counts (2C/4C/8C/blastocyst), images, and statistics would increase clarity. On the RNA-seq side, the narrative emphasizes known 2C markers; reporting novel/unannotated misregulated transcripts, as well as downregulated pathways (e.g., failure to activate normal 8-cell programs, metabolism, or early lineage markers), would present a fuller portrait of the mutant state.

      Tabulated counts are displayed in Figure 1A, and morphology is shown in Figure S1A. We do say that 4% Setdb1<sup>mat-/+</sup> embryos reached the 8-cel stage by 2.5 dpc. We recovered zero Setdb1<sup>mat-/+</sup> blastocysts at 4.5 dpc (not shown). On the RNA-seq side we do report a more global assessment of transcription of genes and TEs (please see above at point 3), including novel chimeric transcripts (Table S6). Developmental pathways are shown in Figure S3 and Figure S4. Metabolic pathways are displayed in Figure S2.

      Reviewer #2 (Public review): 

      Zeng et al. report that Setdb1-/- embryos fail to extinguish the 1- and 2-cell embryo transcriptional program and have permanent expression of MERVL transposable elements. The manuscript is technically sound and well performed, but, in my opinion, the results lack conceptual novelty.

      (1) The manuscript builds on previous observations that: 1, Setbd1 is necessary for early mouse development, with knockout embryos rarely reaching the 8-cell stage; 2, SETB1 mediates H3K9me3 deposition at transposable elements in mouse ESCs; 3, SETB1silences MERVLs to prevent 2CLC-state acquisition in mouse ESCs. The strength of the current work is the demonstration that this is not due to a general transcriptional collapse; but otherwise, the findings are not surprising. The well-known (several Nature papers of years ago) crosstalk between m6A RNA modification and H3K9me3 in preventing 2CLC generation also partly compromises the novelty of this work.

      We thank the Reviewer for appreciating the technical quality of our work. Regarding novelty, please consider that prior work in ES cells included contradictory findings (please see our Introduction). Prior embryology work (please see our Introduction) did not explain the preimplantation-stage phenotype. We highly appreciate those earlier works. Our work here answers the expectations drawn from prior studies and unequivocally shows that SETDB1 carries out the developmentally essential function of suppressing MERVLs and the 2-cell program in the mouse embryo.

      (2) The conclusions regarding H3K9me3 deposition are inferred based on previously reported datasets, but there is no direct demonstration.

      Dynamic H3K9me3 deposition is displayed at MERVL LTRs during the early2c-late2c-2c-4c-8c-morula window (Figures 4C, 4D, 4E, 6D, 6E and Figure S6) from a published work that has very high-quality data. We agree that demonstrating loss off H3K9me3 in Setdb1<sup>mat-/+</sup> embryos would confirm that the H3K9me3 histone methyltransferase function of SETDB1 (as opposed to any, yet unidentified, non-HMT specific activity of SETDB1) is responsible for shutting down MERVL LTRs. However, ChIP-seq, CUT&RUN, or similar assays are not feasible due to the rarity of Setdb1<sup>mat-/+</sup> embryos.

      (3) The detection of chimeric transcripts is somewhat unreliable using short-read sequencing.

      We used single embryo total RNA-seq and we report detecting chimeric transcripts (Table S6), which is considered more reliable than mRNA-seq for detecting chimeric transcripts, because many are not polyadenylated. We acknowledge, however, that long-read sequencing, which recently is becoming available, but which is still very expensive, is currently the most powerful method for detecting chimeric transcripts. This, however, does not affect the major conclusions or the significance of our work.

    1. Author response:

      The following is the authors’ response to the original reviews

      Comment to both reviewers:

      We are very grateful for the thoughtful and constructive comments from both reviewers. During the revision, and in direct response to these comments, we performed additional control experiments for the cellular fluorescence measurements. These new data revealed that the weak increase in green fluorescence reported in our original submission does not depend on retron-expressed Lettuce RT-DNA or the DFHBI-1T fluorophore, but instead reflects stress-induced autofluorescence of E. coli (e.g. upon inducer and antibiotic treatment).

      We also benchmarked the fluorogenic properties of Lettuce against the RNA FLAP Broccoli and found that Lettuce is ~100-fold less fluorogenic under optimal in vitro conditions. Consequently, with the currently available, in vitro- but not in vivo-optimized Lettuce variants, intracellular fluorescence cannot be reliably detected by microscopy or flow cytometry. We have therefore removed the original flow cytometry / and in-culture-fluorescence data and no longer claim detectable intracellular Lettuce fluorescence.

      In the revised manuscript, we now directly demonstrate that retron-produced Lettuce RT-DNA can be purified from cells and remains functional ex vivo with a gel-based fluorophore-binding assays. Together, these data clarify the current limitations of DNA-based FLAPs for in vivo imaging, while still establishing retrons as a viable platform for intracellular production of functional DNA aptamers.

      Reviewer #1 (Public Review):

      Summary:

      The authors use an interesting expression system called a retron to express single-stranded DNA aptamers. Expressing DNA as a single-stranded sequence is very hard - DNA is naturally double-stranded. However, the successful demonstration by the authors of expressing Lettuce, which is a fluorogenic DNA aptamer, allowed visual demonstration of both expression and folding. This method will likely be the main method for expressing and testing DNA aptamers of all kinds, including fluorogenic aptamers like Lettuce and future variants/alternatives.

      Strengths:

      This has an overall simplicity which will lead to ready adoption. I am very excited about this work. People will be able to express other fluorogenic aptamers or DNA aptamers tagged with Lettuce with this system.

      We thank the reviewer for their thoughtful assessment and appreciate their encouraging remarks.

      Weaknesses:

      Several things are not addressed/shown:

      (1) How stable are these DNA in cells? Half-life?

      We thank the reviewer for this insightful question.

      Retron RT-DNA forms a phage surveillance complex with the associated RT and effector protein[1-4]. Moreover, considering the unique ‘closed’ structure of RT-DNA[5] (with the ends of msr and msd bound either by 2’-5’ linkage and base paired region) and its noncoding function, we hypothesized that the RT-DNA must be exceptionally stable. Nevertheless, we attempted to determine half-life of the RT-DNA using qPCR for Eco2 RT-DNA. To this end, we designed an assay where we would first induce RT-DNA expression, use the induced cells to start a fresh culture without the inducers. We would then take aliquots from this fresh culture at different timepoints and determine RT-DNA abundance by qPCR.

      We induced RT-DNA expression of retron Eco2 in BL21AI cells as described in the Methods. After overnight induction, cells were washed to remove IPTG and arabinose, diluted to OD<sub>600</sub> = 0.2 into fresh LB without inducers, and grown at 37°C. At the indicated time points, aliquots corresponding to OD<sub>600</sub> = 0.1 were boiled (95°C, 5 min), and 1 µL of the lysate was used as template in 20 µL qPCR reactions (see revised Methods for details).

      Assuming RT-DNA degradation would occur by active degradation mechanisms (nuclease-mediated degradation) and dilution (cell growth and division), we determined the rate of degradation by the following equation

      where  is the degradation rate constant and the ratio is the dilution factor which takes into account dilution by cell division. OD<sub>600</sub>(t) was determined by fitting the OD<sub>600</sub> measurements by the following the equation describing logistic growth:

      Which yields the plots shown in Figure 2–figure supplement 1.

      After substituting OD<sub>600</sub>(t) by the function in equation (2), we fit the experimental data for the fold-change of the RT-DNA to equation (1). Interestingly, the best fit (red) was obtained with a  converging towards zero suggesting that the half-life of the RT-DNA is beyond the detection limit of our assay. To showcase typical half-lives of RNA, which are in the range of minutes in growing E. coli cells[6], we refitted the data using constant half-life of 15 and 30 minutes. In both cases, simulated curve deviated significantly from the experimental data further confirming that the half-life of the RT-DNA is probably orders of magnitude higher than the doubling time of E. coli under these optimal conditions. While we cannot exclude that the RT-DNA is still produced as a result of promotor leakiness, but we expect this effect to be low as the expression of RT-DNA in E. coli AI cells requires both the presence of IPGT and arabinose, which were thoroughly removed before inoculating the growth media with the starter culture. Overall, our data therefore argues for an exceptional stability of the RT-DNA in growing bacterial cells.

      We have now included this new experimental data in the supplementary information.

      (2) What concentration do they achieve in cells/copy numbers? This is important since it relates to the total fluorescence output and, if the aptamer is meant to bind a protein, it will reveal if the copy number is sufficient to stoichiometrically bind target proteins. Perhaps the gels could have standards with known amounts in order to get exact amounts of aptamer expression per cell?

      The copy number of RT-DNA can be estimated based on the qPCR experiments. We use a pET28a plasmid, which is low-copy with typical copy number 15-20 per cell[7]. We determined the abundance of RT-DNA over plasmid/RT-DNA, upon induction, to be 8-fold, thereby indicating copy number of Eco2 RT-DNA to be roughly around 100-200. Assuming an average aqueous volume of E. coli of 1 femtoliter[6], the concentration of RT-DNA is ~250-500 nM. We have added this information to the revised version of the manuscript.

      (3) Microscopic images of the fluorescent E. coli - why are these not shown (unless I missed them)? It would be good to see that cells are fluorescent rather than just showing flow sorting data.

      In the original submission, we used flow cytometry as an orthogonal method to quantify the fluorescence output of intracellularly expressed Lettuce aptamer, anticipating that it would provide high-throughput, quantitative information on a large population of cells. During the revision, additional controls revealed that the weak increase in fluorescence we had previously attributed to Lettuce expression was in fact a stress-induced autofluorescence signal that occurred independently of retron RT-DNA and DFHBI-1T. We have therefore removed these data from the manuscript and no longer claim detectable intracellular Lettuce fluorescence.

      To understand this limitation, we compared the in vitro fluorescence of Lettuce with that of the RNA FLAP Broccoli, which is commonly used for RNA live-cell imaging. Under optimal in vitro conditions, Lettuce shows ~100-fold lower fluorescence output than Broccoli (new Figure 3–figure supplement 5). Given this poor fluorogenicity and the low intracellular concentration of retron RT-DNA (now derived from the qPCR experiments), we conclude that the current Lettuce variants are below the detection threshold for in vivo imaging in our system. We now explicitly discuss this limitation and the need for further (in vivo) evolution of DNA-based FLAPs in the revised manuscript.

      (4) I would appreciate a better Figure 1 to show all the intermediate steps in the RNA processing, the subsequent beginning of the RT step, and then the final production of the ssDNA. I did not understand all the processing steps that lead to the final product, and the role of the 2'OH.

      We thank the referee for this comment. We have now made changes to Figure 1, showing the intermediate steps as well as a better illustration of the 2’-5’ linkage.

      (5) I would like a better understanding or a protocol for choosing insertion sites into MSD for other aptamers - people will need simple instructions.

      We appreciate the reviewer for bringing up this important point. We simulated the ssDNA structure using Vienna RNA fold with DNA parameters. Based on the resulting structure, we inserted Lettuce sequence in the single stranded and/or loop regions to minimise interference with the native msd fold. We have now included this information in the description of Figure 3.

      (6) Can the gels be stained with DFHBI/other dyes to see the Lettuce as has been done for fluorogenic RNAs?

      Yes. We have now included experiments where we performed in-gel staining with DFHBI-1T for both chemically synthesized Eco2-Lettuce surrogates as well as the heterologously expressed Eco2-Lettuce RT-DNA. We have added this data to the revised Figure 3 (panel C and E).

      (7) Sometimes FLAPs are called fluorogenic RNA aptamers - it might be good to mention both terms initially since some people use fluorogenic aptamer as their search term.

      We thank the referee for this useful suggestion. We have now included both terms in the introduction of the revised version.

      (8) What E coli strains are compatible with this retron system?

      Experimental and bioinformatic analysis have shown that retrons abundance varies drastically across different strains of E. coli[8-10]. For example, in an experimental investigation of 113 independent clinical isolates of E. coli, only 7 strains contained RT-DNA[8]. In our experiments, we have found that BL21AI strain is compatible with plasmid-borne Eco2. The fact that this strain has a native retron system (Eco1) allowed us to use it as internal standard. However, we were also able express Eco2 RT-DNA in conventional lab strains such as E. coli Top 10 (data not shown), indicating both ncRNA and the RT alone are sufficient for intracellular RT-DNA synthesis.

      (9) What steps would be needed to use in mammalian cells?

      We appreciate the reviewer’s thoughtful inquiry. Expression of retrons has been demonstrated in mammalian cells by Mirochnitchenko et al[11] and Lopez et al[12]. For example, Lopez et al demonstrate expression of retrons in mammalian cell lines using the Lipofectamine 3000 transfection protocol (Invitrogen) and a PiggyBac transposase system[12]. We also mention this in the discussion section of the revised manuscript. Expression of retron-encoded DNA aptamers in mammalian cells should be possible with these systems.

      (10) Is the conjugated RNA stable and does it degrade to leave just the DNA aptamer?

      We are grateful to the reviewer for their perceptive question. This usually depends on the specific retron system. For example, in case of certain retron systems such as retron Sen2, Eco4 and Eco7, the RNA is cleaved off, leaving behind just the ssDNA. In our case, with retron Eco2, the RNA remains stably bound to the ssDNA, thereby maintaining a stable hybrid RNA-DNA structure[10,13]. During the extraction of RT-DNA, the conjugated RNA is degraded during the RNase digestion step, and therefore is not visible in the gel images.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores a DNA fluorescent light-up aptamer (FLAP) with the specific goal of comparing activity in vitro to that in bacterial cells. In order to achieve expression in bacteria, the authors devise an expression strategy based on retrons and test four different constructs with the aptamer inserted at different points in the retron scaffold. They only observe binding for one scaffold in vitro, but achieve fluorescence enhancement for all four scaffolds in bacterial cells. These results demonstrate that aptamer performance can be very different in these two contexts.

      Strengths:

      Given the importance of FLAPs for use in cellular imaging and the fact that these are typically evolved in vitro, understanding the difference in performance between a buffer and a cellular environment is an important research question.

      The return strategy utilized by the authors is thoughtful and well-described.

      The observation that some aptamers fail to show binding in vitro but do show enhancement in cells is interesting and surprising.

      We appreciate the reviewer’s thorough assessment.

      Weaknesses:

      This study hints toward an interesting observation, but would benefit from greater depth to more fully understand this phenomenon. Particularly challenging is that FLAP performance is measured in vitro by affinity and in cells by enhancement, and these may not be directly proportional. For example, it may be that some constructs have much lower affinity but a greater enhancement and this is the explanation for the seemingly different performance.

      We thank the reviewer for this insightful comment. In response, we conducted a series of additional control experiments to better understand the apparent discrepancy between the in vitro and in vivo data. These experiments revealed that the previously reported increase in intracellular green fluorescence is independent of retron-expressed Lettuce RT-DNA and DFHBI-1T, and instead reflects stress-induced autofluorescence of E. coli upon inducer and antibiotic treatment. Our original negative controls (empty wild-type Eco2, uninduced cells in the presence of DFHBI-1T) were therefore not sufficient to rule out this effect.

      As a consequence, we have removed the earlier FACS data from the manuscript and no longer claim detectable intracellular Lettuce fluorescence. The reviewer’s comment prompted us to re-examine the fluorogenicity of our constructs in vitro. We found that the 4Lev4 construct folds poorly and produces very low signal in in-gel staining assays with DFHBI-1T. In contrast, the 8LE variant (8-nt P1 stem at position v4) shows the highest fluorescence in these in-gel assays (new Figure 3C). Nevertheless, even this construct remains 100-fold less fluorogenic than the RNA-based FLAP Broccoli (new Figure 3–figure supplement 5), and we were unable to detect its intracellular fluorescence above background (new Figure 3–figure supplement 4).

      To still directly demonstrate that retron-embedded Lettuce domains that are synthesized under intracellular conditions are functional, we modified our strategy in the revision and purified the expressed RT-DNA from E. coli, followed by in-gel staining with DFHBI-1T (new Figure 3E). Despite the challenge of obtaining sufficient amounts of ssDNA, this ex vivo approach clearly shows that the retron-produced Lettuce RT-DNA retains fluorogenic activity.

      The authors only test enhancement at one concentration of fluorophore in cells (and this experimental detail is difficult to find and would be helpful to include in the figure legend). This limits the conclusions that can be drawn from the data and limits utility for other researchers aiming to use these constructs.

      We appreciate this excellent suggestion. In the original experiments, the DFHBI-1T concentration in cells was chosen based on published conditions for live-cell imaging of the Broccoli RNA aptamer[14], which is substantially more fluorogenic than Lettuce. Motivated by the reviewer’s comment, we explored different fluorophore concentrations and additional controls to optimize the in vivo readout. These experiments showed that the weak intracellular fluorescence signal is dominated by stress-induced autofluorescence[15] (possibly due to the weaker antitoxin activity of the modified msd) and does not depend on the presence of Lettuce RT-DNA or DFHBI-1T.

      Given the combination of low Lettuce fluorogenicity and low intracellular RT-DNA levels, we concluded that varying the fluorophore concentration alone does not provide a meaningful way to deconvolute these confounding factors in cells. Instead, we shifted our focus to a more direct assessment of Lettuce activity: we now demonstrate that retron-produced Lettuce RT-DNA can be purified from E. coli and retains fluorogenic activity in an in-gel staining assay with DFHBI-1T (new Figure 3E). We believe this revised strategy provides a clearer and more quantitative characterization of the system’s capabilities and limitations than the initial in vivo fluorescence measurements.

      The FLAP that is used seems to have a relatively low fluorescence enhancement of only 2-3 fold in cells. It would be interesting to know if this is also the case in vitro. This is lower than typical FLAPs and it would be helpful for the authors to comment on what level of enhancement is needed for the FLAP to be of practical use for cellular imaging.

      In the revised manuscript, we directly address this point by comparing the in vitro fluorescence of Lettuce (DNA) and Broccoli (RNA) under optimized buffer conditions. These experiments show that Broccoli is nearly two orders of magnitude more fluorogenic than Lettuce (new Figure 3-figure supplement 5). Thus, the low enhancement observed for Lettuce in cells is consistent with its intrinsically poor fluorogenicity in vitro.

      Based on this comparison and on reported properties of RNA FLAPs such as Broccoli, we conclude that robust cellular imaging typically requires substantially higher fluorogenicity and dynamic range than currently provided by DNA-based Lettuce. In other words, under our conditions, Lettuce is close to or below the practical detection limit for in vivo imaging, whereas Broccoli performs well. We now explicitly state in the Discussion that further evolution and optimization of DNA FLAPs will be required to achieve fluorescence enhancements that are suitable for routine cellular imaging, and we position our work as a first demonstration that functional DNA aptamers can be produced in cells via retrons, while also delineating the current sensitivity limits.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Addgene accession numbers are not listed - how is this plasmid obtained?

      The sequence was obtained from Millman et al[16], and ordered as gblock from IDT. The gblock was then cloned into a pET28a vector by Gibson assembly. We have now included this in the methods section.

      Reviewer #2 (Recommendations For The Authors):

      Page 2, line 40 - FLAPS should be FLAPs

      We have corrected this typo in the revised version.

      References

      (1) Rousset, F. & Sorek, R. The evolutionary success of regulated cell death in bacterial immunity. Curr. Opin. Microbiol. 74, 102312; 10.1016/j.mib.2023.102312 (2023).

      (2) Gao, L. et al. Diverse enzymatic activities mediate antiviral immunity in prokaryotes. Science 369, 1077–1084; 10.1126/science.aba0372 (2020).

      (3) Carabias, A. et al. Retron-Eco1 assembles NAD+-hydrolyzing filaments that provide immunity against bacteriophages. Mol. Cell 84, 2185-2202.e12; 10.1016/j.molcel.2024.05.001 (2024).

      (4) Wang, Y. et al. DNA methylation activates retron Ec86 filaments for antiphage defense. Cell Rep. 43, 114857; 10.1016/j.celrep.2024.114857 (2024).

      (5) Wang, Y. et al. Cryo-EM structures of Escherichia coli Ec86 retron complexes reveal architecture and defence mechanism. Nat. Microbiol. 7, 1480–1489; 10.1038/s41564-022-01197-7 (2022).

      (6) Milo, R. & Phillips, R. Cell biology by the numbers (Garland Science Taylor & Francis Group, New York NY, 2016).

      (7) Sathiamoorthy, S. & Shin, J. A. Boundaries of the origin of replication: creation of a pET-28a-derived vector with p15A copy control allowing compatible coexistence with pET vectors. PLOS ONE 7, e47259; 10.1371/journal.pone.0047259 (2012).

      (8) Sun, J. et al. Extensive diversity of branched-RNA-linked multicopy single-stranded DNAs in clinical strains of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 86, 7208–7212; 10.1073/pnas.86.18.7208 (1989).

      (9) Rice, S. A. & Lampson, B. C. Bacterial reverse transcriptase and msDNA. Virus Genes 11, 95–104; 10.1007/BF01728651 (1995).

      (10) Simon, A. J., Ellington, A. D. & Finkelstein, I. J. Retrons and their applications in genome engineering. Nucleic Acids Res. 47, 11007–11019; 10.1093/nar/gkz865 (2019).

      (11) Mirochnitchenko, O., Inouye, S. & Inouye, M. Production of single-stranded DNA in mammalian cells by means of a bacterial retron. J. Biol. Chem. 269, 2380–2383; 10.1016/S0021-9258(17)41956-9 (1994).

      (12) Lopez, S. C., Crawford, K. D., Lear, S. K., Bhattarai-Kline, S. & Shipman, S. L. Precise genome editing across kingdoms of life using retron-derived DNA. Nat. Chem. Biol. 18, 199–206; 10.1038/s41589-021-00927-y (2022).

      (13) Lampson, B. C. et al. Reverse transcriptase in a clinical strain of Escherichia coli: production of branched RNA-linked msDNA. Science 243, 1033–1038; 10.1126/science.2466332 (1989).

      (14) Filonov, G. S., Moon, J. D., Svensen, N. & Jaffrey, S. R. Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299–16308; 10.1021/ja508478x (2014).

      (15) Renggli Sabine, Keck Wolfgang, Jenal Urs & Ritz Daniel. Role of Autofluorescence in Flow Cytometric Analysis of Escherichia coli Treated with Bactericidal Antibiotics. J. Bacteriol. 195, 4067–4073; 10.1128/jb.00393-13. (2013).

      (16) Millman, A. et al. Bacterial Retrons Function In Anti-Phage Defense. Cell 183, 1551-1561.e12; 10.1016/j.cell.2020.09.065 (2020).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Zhang and colleagues examine neural representations underlying abstract navigation in entorhinal cortex (EC) and hippocampus (HC) using fMRI. This paper replicates a previously identified hexagonal modulation of abstract navigation vectors in abstract space in EC in a novel task involving navigating in a conceptual Greeble space. In HC, the authors identify a three-fold signal of the navigation angle. They also use a novel analysis technique (spectral analysis) to look at spatial patterns in these two areas and identify phase coupling between HC and EC. Interestingly, the three-fold pattern identified in the hippocampus explains quirks in participants' behavior where navigation performance follows a three-fold periodicity. Finally, the authors propose a EC-HPC PhaseSync Model to understand how the EC and HC construct cognitive maps. The wide array and creativity of the techniques used is impressive but because of their unique nature, the paper would benefit from more details on how some of these techniques were implemented.

      Comments on revisions:

      Most of my concerns were adequately addressed, and I believe the paper is greatly improved. I have two more points. I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure. I also think the paper would benefit from more details regarding some of the analyses.

      Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed.

      (1)“…I noticed that the legend for Figure 4 still refers to some components of the previous figure version, this should be updated to reflect the current version of the figure…”.

      Thank you for pointing this out. We have revised the legend of Figure 4 by removing the significance notation “***: p < 0.001”, which referred to elements from a previous version of the figure.

      (2)“…I also think the paper would benefit from more details regarding some of the analyses. Specifically, the phase-amplitude coupling analysis should have a section in the methods which should be sure to clarify how the BOLD signals were reconstructed”.

      We agree and appreciate the reviewer’s helpful suggestion. We have added a dedicated subsection entitled “Phase–amplitude coupling” to the Materials and Methods, in which we provide a detailed description of how the EC and HPC BOLD signals were reconstructed and how the coupling analysis was implemented. Correspondingly, we refined the description of this analysis in the Results section under “Phase synchronization between the HPC and EC activity”. The revised sections have been included below for your convenience. 

      Materials and Methods: Phase–amplitude coupling

      To quantify the spatial peak relationship between EC and HPC BOLD activity, we implemented a cross-frequency amplitude–phase coupling analysis in the directional space (Canolty et al., 2006). Rather than analyzing raw BOLD signals, we reconstructed 6-fold EC activity and 3-fold HPC activity in each voxel using sinusoidal modulation weights (β<sub>sine</sub> and β<sub>cosine</sub>) estimated from the raw BOLD signals. Specifically, activity was modeled as β<sub>cosine</sub>cos(kθ) + β<sub>sine</sub>sin(kθ), where k denotes the rotational symmetry. This approach selectively captures the hypothesized spatial symmetries of neural activity (e.g., 6-fold or 3-fold periodicity) as a function of movement direction. For this coupling analysis, we used participants’ original movement directions (i.e., without applying orientation calibration). The reconstructed 6-fold EC and 3-fold HPC activity were then converted into analytic representations using the Hilbert transform, yielding the instantaneous phase of the HPC (ϕ<sub>HPC</sub>) and the amplitude envelope of the EC (A<sub>ERC</sub>). HPC phases were classified into nine bins. The composite analytic signal, defined as z = A<sub>ERC</sub>e<sup>iϕHPC</sup>, was used to compute the modulation index M (Canolty et al., 2006), defined as the absolute value of the mean of z values, quantifying the scalar coupling strength between EC amplitude and HPC phase within each bin. A surrogate dataset, a null distribution of the modulation indices (M<sup>-</sup>), was generated by spatially offsetting the EC amplitude relative to the HPC phase across all possible spatial lags. The mean of this surrogate distribution was used as the baseline reference against which the observed coupling strength was compared.

      Results: Phase synchronization between the HPC and EC activity

      To examine whether the spatial phase structure in one region could predict that in another, we tested whether the orientations of the 6-fold EC and 3-fold HPC periodic activities, estimated from odd-numbered sessions using sinusoidal modulation with rotationally symmetric parameters, were correlated across participants. A cross-participant circular correlation was conducted between the spatial phases of the two areas to quantify the spatial correspondence of their activity patterns (EC: purple dots; HPC: green dots) (Jammalamadaka & Sengupta, 2001). The analysis revealed a significant circular correlation (Fig. 4a; r = 0.42, p < 0.001), as reflected by the continuous color progression across the participants (i.e., the colored lines connecting each pair of the EC and HPC dots in Fig. 4a), suggesting that participants with smaller hippocampal phases (green, outer ring) tended to have smaller entorhinal phases (purple, inner ring), and vice versa.

      In addition to the across-participant phase correlation, we further examined the spatial alignment between the 6-fold EC and 3-fold HPC activity patterns. Given that the spatial phase of the HPC is hypothesized to depend on EC projections, particularly along the three primary axes of the hexagonal code, we examined whether the periodic activities of the EC and HPC were spatially peak-aligned. Notably, unlike previous studies that focused on temporal coherence of neural oscillations (Buzsaki, 2006; Maris et al., 2011; Friese et al., 2013), our analysis focused on periodic coupling between brain areas in the directional space. To test spatial peak alignment between EC and HPC, a cross-frequency spatial coupling analysis (adapted from the amplitude–phase coupling framework; Canolty et al., 2006) was employed to identify at which HPC phase the EC exhibited maximal amplitude modulation. If the activities of both areas were peak-aligned (i.e., no peak offset), a strong coupling at phase 0 of the HPC would be expected as shown by the one-cyclebased schema in Fig. 4b. In doing so, the instantaneous phase of the HPC and the amplitude envelope of the EC were extracted from the reconstructed activity using the Hilbert transform (see methods for details). HPC phases were classified into nine bins, and the modulation index (M), quantifying the scalar coupling strength between EC amplitude and HPC phase, was computed within each bin. As a result, significant coupling was observed in the bin centered at phase 0 of the HPC (Fig. 4c; t(32) = 2.57, p = 0.02, Bonferroni-corrected across tests; Cohen’s d = 0.45). In contrast, no significant coupling was found in other bins (p > 0.05). To rule out the possibility that the observed coupling was driven by a potential harmonic (integer multiple) relationship between the 3-fold and 6-fold periodicities, we additionally conducted control analyses using 9-fold and 12-fold EC components. However, no significant coupling was observed in these controls (Fig. 4c; p > 0.05). Together, these results confirmed selective alignments of spatial peaks between the 6fold EC and 3-fold HPC periodicity in the conceptual direction domain.

      Reviewer #2 (Public review):

      The authors report results from behavioral data, fMRI recordings, and computer simulations during a conceptual navigation task. They report 3-fold symmetry in behavioral and simulated model performance, 3-fold symmetry in hippocampal activity, and 6-fold symmetry in entorhinal activity (all as a function of movement directions in conceptual space). The analyses seem thoroughly done, and the results and simulations are very interesting.

      We thank the reviewer for the positive assessment of our work.

      We thank both reviewers again for their constructive and insightful feedback, which has substantially strengthened the manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics 2 of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived chephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author conclude that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds in disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67 and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent componente analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript that this simulation strategy is well suited for the problem under evaluation.

      Weaknesses:

      In the revised version, the authors addressed my concerns regarding their use of the MSM, and in my view, their conclusions are now much more robust and well-supported by the data. While it would be very interesting to see a quantitative correlation between the effects of the mutations observed in the MD data and relevant experimental findings, I understand that this may be beyond the scope of the manuscript.

      Thank you for the careful evaluation and constructive comments. Regarding the suggestion of a more quantitative correlation with experimental observables, we agree that this would be valuable, and we have noted it as an important direction for future work.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting and the study uses MD simulations and to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. Some greater consideration of the uncertainties and how the method choice affect the ability to compare equilibrium properties would strengthen the quantitative conclusions. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described the relationship to prior literature is discussed extensively.

      Comments on revised version:

      I am concerned that the authors state in the response to reviews that it is not possible to get error bars on values due to the use of the AB-MD protocol that guides the simulations to unexplored basins. Yet the authors want to compare these values between the WT and mutants. This relates to RMSD, RMSF, % H-bond and volume calculations. I don't accept that you cannot calculate an uncertainty on a time averaged property calculated across the entire simulation. In these cases you can either run repeat simulations to get multiple values on which to do statistical analysis, or you can break the simulation into blocks and check both convergence and calculate uncertainties.

      We thank the reviewer for raising this point. We would like to clarify that we did not intend to state that error bars are impossible to obtain under AB-MD. In fact, we reported error bars for several quantities derived from the AB-MD trajectories (we also broke the trajectories into blocks and calculated uncertainties for RMSF in our first-round response as you suggested). However, these data are closely related to your concern about comparing quantitative information without an appropriate reweighting of the ensemble. Therefore, in the revised manuscript, we removed quantitative analyses that were calculated directly from the raw AB-MD trajectories. Instead, the quantitative comparisons are now obtained from MSM analysis. We report pocket volumes and key interaction metrics for MSM metastable states, with corresponding error bars for these MSM-based quantities (Figure 6 and its supplementary figure).

      I note that the authors do provide error bars on the volumes, but the statistics given for these need closer scrutiny (I cant test this without the raw data). For example the authors have p<0.0001 for the following pair of volumes 1072 {plus minus} 158 and 1115 {plus minus} 242, or for SASA p<0.0001 is given for 2 identical numbers 155+/- 3.

      Thank you for this comment. As noted above, we have removed the table from the manuscript, and the pocket-volume results together with their error bars are now shown in Figure 6. To address the concern raised here and to avoid making the same mistake in future analyses, we re-examined how the statistics were computed. We believe the very small p-values were caused by treating per-frame MD values as independent observations in two-sample t-tests. Because consecutive MD frames are strongly time-correlated, they do not satisfy the independence assumption, which can greatly overestimate the effective sample size and lead to artificially small p-values. For the SASA, a p < 0.0001 is reported even though both values are shown as 155 ± 3. This is due to rounding, which can hide subtle underlying differences.

      I also remain concerned about comparisons between simulations run with the AB-MD scheme. While each simulation is an equilibrium simulation run without biasing forces, new simulations are seeded to expand the conformational sampling of the system. This means that by definition the ensemble of simulations does not represent and equilibrium ensemble. For example, the frequency at which conformations are sampled would not be the same as in a single much longer equilibrium simulation. While you may be able to see trends in the differences between conditions run in this way, I still don't understand how you can compare quantitative information without some method of reweighing the ensemble. It is not clear that such a rewieghting exists for this methods, in which case I advise some more caution in the wording of the comparisons made from this data.

      At this stage I don't feel the revision has directly addressed the main comments I raised in the earlier review, although there is a stronger response to the comments of Reviewer #2.

      We thank the reviewer for reiterating this important point, and we agree with the underlying concern. Although AB-MD generates unbiased trajectories, the ensemble of simulations does not represent an equilibrium ensemble. As a result, statistics computed by simply concatenating all AB-MD trajectories should not be used for quantitative comparisons. In the original version, we acknowledge that we reported several quantitative descriptors directly from concatenated AB-MD frames, including (i) distributions of χ1 torsions, (ii) mean pocket volumes and SASA, and (iii) percentages of some key interactions. We agree that this was not appropriate given the adaptive sampling protocol. In the revised manuscript, we have removed these quantitative analyses.

      We retained RMSD and RMSF analyses, but we have revised their wording and clarified their purpose. RMSD and RMSF are used only to summarize the structural variability and residue-level mobility observed across the collected trajectory segments and to motivate the selection of structural features for MSM construction. The manuscript now states: “Because AB-MD adaptively seeds new unbiased trajectories to expand conformational sampling, RMSD and RMSF are used here to summarize the structural variability and per-residue mobility observed across the collected trajectories.”

      Regarding the reviewer’s question about reweighting, the Markov state model (MSM) provides a principled framework to obtain the stationary distribution π from the transition probability matrix T<sub>τ</sub>. The resulting π<sub>i</sup> gives the equilibrium weight of each microstate i, and the corresponding discrete free energy can be written as F<sup>i</sup>=−k<sub>B</sub>Tln(π<sub>i</sup>). PCCA then coarse-grains the microstate space into a small number of metastable states. In the revised manuscript, quantitative comparisons are therefore derived from the MSM at the level of these metastable states, rather than from unweighted counts of concatenated AB-MD frames.

      Accordingly, we have revised the sections “E219K and Y221A mutations facilitate proton transfer” and “Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams”, and we have added new figures in Figure 6 and its figure supplement. The adjustments to the quantitative analyses do not affect our original conclusions.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript uses adaptive sampling simulations to understand the impact of mutations on the specificity of the enzyme PDC-3 β-lactamase. The authors argue that mutations in the Ω-loop can expand the active site to accommodate larger substrates.

      Strengths:

      The authors simulate an array of variants and perform numerous analyses to support their conclusions. The use of constant pH simulations to connect structural differences with likely functional outcomes is a strength.

      Weaknesses:

      I would like to have seen more error bars on quantities reported (e.g., % populations reported in the text and Table 1).

      We appreciate this point. Here, the population we analyze is intended to showcase conformational differences across variants rather than to estimate equilibrium occupancies. Although each system includes 100 trajectories, they were generated using an adaptive-bandit protocol. The protocol deliberately guides towards underexplored basins, therefore conformational heterogeneity betweentrajectories is expected by design. For example, in E219K the MSM decomposition shows that in states 1, 6, and 7 the K67(NZ)–S64(OG) distance is almost entirely > 6 Å, whereas in states 2 and 3 it is almost entirely < 3.5 Å (Figure 5—figure supplement 12). These distances suggest that the hydrogen bond fraction is approximately zero in states 1, 6, and 7, and close to one in states 2 and 3. In addition, the mean first passage time of the Markov state models suggests that the formation and disruption of this hydrogen bond occur on the microsecond timescale, which is far longer than the length of each individual trajectory (300 ns). Consequently, across the 100 replicas, some trajectories exhibit very low fractions, while others display the opposite trend. Under such bimodal, protocol-induced heterogeneity, computing an error bar across trajectories mainly visualizes the protocol’s dispersion and risks being misread as thermodynamic uncertainty, which is not central to our aim of comparing conformational differences between wild-type PDC-3 and variants. We therefore do not include the error bars. 

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "Ω-Loop mutations control dynamics of the active site by modulating the 3 hydrogen-bonding network in PDC-3 4 β-lactamase", Chen and coworkers provide a computational investigation of the dynamics of the enzyme Pseudomonas-derived cephalosporinase 3 (PDC3) and some mutants associated with increased antibiotic resistance. After an initial analysis of the enzyme dynamics provided by RMSD/RMSF, the author concludes that the mutations alter the local dynamics within the omega loop and the R2 loop. The authors show that the network of hydrogen bonds is disrupted in the mutants. Constant pH calculations showed that the mutations also change the pKa of the catalytic lysine 67, and pocket volume calculations showed that the mutations expand the catalytic pocket. Finally, time-independent component analysis (tiCA) showed different profiles for the mutant enzyme as compared to the wild type.

      Strengths:

      The scope of the manuscript is definitely relevant. Antibiotic resistance is an important problem, and, in particular, Pseudomonas aeruginosa resistance is associated with an increasing number of deaths. The choice of the computational methods is also something to highlight here. Although I am not familiar with Adaptive Bandit Molecular Dynamics (ABMD), the description provided in the manuscript suggests that this simulation strategy is well-suited for the problem under evaluation.

      Weaknesses:

      In the description of many of their results, the authors do not provide enough information for a deep understanding of the biochemistry/biophysics involved. Without these issues addressed, the strength of the evidence is of concern.

      We thank the reviewer for pointing out the need for deeper discussion of the biochemical and biophysical implications of our results. In our manuscript, we begin by examining basic structural metrics (e.g., RMSD and RMSF) which clearly indicate that the major conformational changes occur in the Ω-loop and the R2 loop. We have now added a paragraph to describe the importance of the Ωloop and highlighted it in the revised manuscript on lines 142-166 of page 6. This observation guided our subsequent focus on these regions, as well as on the catalytic site. Our analysis revealed notable alterations in the hydrogen bonding network—especially in interactions involving the K67-S64, K67N152, K67-G220, Y150-A292, and N287-N314 pairs. These observations led us to conclude that:

      (1) Mutations E219K and Y221A facilitate the proton transfer of catalytic residues. This is consistent with prior experimental data showing that these substitutions produce the most pronounced increase in sensitivity to cephalosporin antibiotics (lines 210-212 in page 8 of the revised manuscript). 

      (2) Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams.This is in line with MIC measurements reported by Barnes et al. (2018), which showed that mutants with larger active-site pockets exhibit markedly greater sensitivity to cephalosporins with bulky side chains than others (lines 249-259 in pages 10).

      Furthermore, we applied Markov state models (MSMs) to explore the timescales of the transitions between these different conformational states. We believe that these methodological steps support our conclusions.

      Reviewer #3 (Public review):

      Summary:

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting, and the study uses MD simulations to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket. However, the study doesn't clearly describe the way the data is generated. While many results appear significant by eye, quantifying this and ensuring convergence would strengthen the conclusions.

      Strengths:

      The significance of the problem is clearly described, and the relationship to prior literature is discussed extensively.

      Weaknesses:

      The methods used to gain the results are not explained clearly, meaning it was hard to determine exactly how some data was obtained. The convergence and uncertainties in the data were not adequately quantified. The text is also a little long, which obscures the main findings.

      We thank the reviewer for the suggestion. We respectfully ask the reviewer to specify which aspects of the data-generation methods are unclear so that we can include the necessary details in the next revision. Moreover, all statistics that are reported in the manuscript are obtained from extensive analyses of 300,000 simulation frames. The Markov state models have been validated by the ITS plots and Chapman-Kolmogorov (CK) test. The two-sample t-tests were also carried out for the volume and SASA.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1D focus on the PDC3 catalytic site. However, the authors mentioned before that the enzyme has two domains, an alpha domain and an alpha/beta domain. The reader would benefit from a more detailed description of the enzyme, its active site, AND the location of the mutants under investigation in the figure.

      We have updated Figure 1D and marked the positions of all mutations (V211A/G, G214A/R, E219A/G/K and Y221A/H), which have now been highlighted as spheres.

      (2) Since in the journal format, the results come before the methods. It would be interesting to add a brief description of where the results came from. For example, in the first section of the results, the authors describe the flexibility of the omega loop and the R2 loop. However, the reader won't know what kind of simulation was used and for how long, for example. A sentence would add the required context for a deeper understanding here.

      At the beginning of the Results and Discussion section we now state: “To investigate how the mutations in the Ω-loop affect PDC-3 dynamics, adaptive-bandit molecular dynamics (AB-MD) simulations were carried out for each system. 100 trajectories of 300 ns each (totaling 30 μs per system) were run.”

      (3) Still in the same section, the authors don't define what change in RMSF is considered significant. For example, I can't see a relevant change in the RMSF for the omega loop between the et enzyme and the E219 mutants in Figure 2D. A more objective definition would be of benefit here.

      Our analysis reveals that while the wild-type PDC-3 and the G214A, G214R, E214G, and Y221A variants exhibit an average per-residue RMSF of around 4 Å in the Ω-loop, the V211A and V211G variants show markedly lower values (around 1.5 Å), and the E219K and Y221H variants exhibit intermediate values between 2 and 2.5 Å. In addition, the fluctuations around the binding site should be seen collectively along with the fluctuations in the R2-loop. Importantly, we urge the reviewer to focus on the MDLovofit analysis in Figure 2C, where the dynamic differences between the core and the fluctuating loops is clearly evident.  

      (4) In line 138, the authors state that "Therefore, the flexibility of these proteins is mainly caused by the fluctuations in the Ω-loops and R2-loop". This is quite a bold statement to be drawn at this point. First of all, there is no mention of it in the manuscript, but is there any domain movement? Figure 2C clearly shows that there is some mobility in omega and R2 loops. But there is no evidence shown in the manuscript that shows that "the flexibility of these proteins is mainly caused by the fluctuations in the" loops. Please consider rephrasing this sentence or adding more data, if available.

      We have revised the wording to take the reviewer’s concern into account. The sentence now states: “Therefore, flexibility of PDC-3 is predominantly localized to the Ω- and R2-loops, whereas the remainder of the structure is comparatively rigid.” To further explain to the reviewer, the β lactamase enzymes are fairly rigid structures, where no large-scale domain motions occur. Instead, the enzyme communicates structurally via cross correlation of loop dynamics ( https://doi.org/10.7554/eLife.66567 ).  

      (5) I guess, the most relevant question for the scope of the paper is not answered in this section. The authors show that the mobility of the omega- and R2-loops is altered by some mutations. Why is that? I wish I could see a figure showing where the mutations are and where the loops are. This question will come back in other sections.

      We have updated Figure 1D to mark the positions of all mutations (V211A/G, G214A/R, E219A/G/K and Y221A/H) as spheres. The Ω- and R2-loops are also highlighted. All mutations map to the Ω-loop, indicating that these substitutions directly perturb this region. Notably, K67 forms a hydrogen bond with the backbone of G220 within the Ω-loop and another with the phenolic hydroxyl of Y150. Y150, in turn, hydrogen-bonds with A292 in the R2 loop. Together, the residue interaction network (G220– K67–Y150–A292) suggest a pathway by which Ω-loop mutations propagate their effects to the R2 loop.

      (6) The authors then analyze the network of polar residues in the active site and the hydrogen bonds observed there. For the K67-N152 hydrogen bond, for example, there is a reduction in the occupancy from ~70% in the wild-type enzyme to ~30% and 40% in the mutants E219K and Y221, respectively. This finding is interesting. The question that remains is "why is that"? From the structural point of view, how does the replacement of E219 with a Lysine alter the hydrogen bond formation between K67 and N152? Is it due to direct competition? Solvent rearrangement? The reader is left without a clue in this section. Also, Figure 3B won't help the reader, since the mutated residues are not shown there. Please consider adding some information about why the authors believe that the mutations are disrupting the active site hydrogen bond network and showing it in Figure 3B.

      We appreciate the comment and have updated Figures 1D and 3B to highlight the mutation sites. The change from ~70% in the wild type to ~30–40% in the E219K and Y221T variants reported in Table 1 refers to the S64–K67 hydrogen bond. In the wild type, K67 forms an additional hydrogen bond with G220 on the Ω-loop, which helps anchor the K67 side chain in a geometry that favors the S64–K67 interaction. In the variants, the mutations reshape the Ω-loop and frequently disrupt the K67–G220 contact. The loss of this local anchor increases the conformational dispersion of K67, which is consistent with the observed reduction of the S64–K67 occupancy. Furthermore, our observation that the mutations are disrupting the active-site hydrogen-bond network is a data-driven conclusion rather than a subjective inference. Across ten systems, our AB-MD simulations provided 30 µs of sampling per system. Saving one frame every nanosecond yielded 30,000 conformations per system and 300,000 in total. All hydrogen-bond and salt-bridge statistics were computed over this full ensemble. Thus, the conclusion that the mutations disrupt the active-site hydrogen-bond network follows directly from these ensemble statistics. 

      (7) The pKa calculations and the pocket volume calculations show that the mutations expand the volume of the catalytic site and alter the microenvironment. Is there any change in the solvation associated with these changes? If the volume expands and the environment becomes more acidic, are there more water molecules in the mutants as compared to the wt enzyme? If so, can changes in solvation be associated with the changes in the hydrogen bond network? Would a simulation in the presence of a substrate be meaningful here? ( I guess it would!).

      Regarding solvation, we observe a modest increase in transient water occupancy associated with the increase in volume of the pocket. The conserved deacylation water molecule is the most important and is always present throughout the simulation. Additional waters enter and leave the pocket but do not form persistent interactions that measurably perturb the hydrogen-bond network of the Ω- and R2-loops. We agree that simulations with a bound substrate would be informative. However, our study focuses on how Ω-loop mutations modulate the active site of apo PDC-3 and its variants. Within this scope, we find: (i) Amino acid substitutions change the flexibility of Ω-loops and R2-loops; (ii) E219K and Y221A mutations facilitate the proton transfer; (iii) Substitutions enlarge the active-site pocket to accommodate bulkier R1 and R2 groups of β-lactams.

      (8) I have some concerns regarding the Markov State Modeling as shown here. After a time-independent component analysis, the authors show the projections on the components, which is different between wild wild-type enzyme and the mutants, and draw some conclusions from these changes. For example, the authors state that "From the metastable state results, we observe that E219K adopts a highly stable conformation in which all the tridentate hydrogen-bonding interactions (K67(NZ)-S64(OG), K67(NZ)N152(OD1) and K67(NZ)-G220(O) mentioned above are broken". This is conclusion is very difficult to draw from Figure 5 alone. Unless the macrostates observed in the MSM can be shown (their structures) and could confirm the broken interactions, I really don't believe that the reader can come to the same conclusion as drawn by the authors here. I would recommend the authors to map the macrostates back to the coordinates and show them (what structure corresponds to what macrostate). After showing that, it makes sense to discuss what macrostate is being favored by what mutation. Taking conclusions from tiCA projections only is not recommended. I very strongly suggest that the authors revisit this entire section, adding more context so that the reader can draw conclusions from the data that is shown.

      We appreciate the reviewer’s concern. In the Markov state modeling section, our objective is to quantify the timescales (via mean first passage times) associated with the formation and disruption of the critical hydrogen bonds (K67(NZ)-S64(OG), K67(NZ)-N152(OD1), K67(NZ)-G220(O), Y150(N)A292(O), N287(ND2)-N314(OD1)) mentioned above. Representative structures illustrating these interactions are shown in Figures 3B and 4A. We agree that the main Figure 5 alone does not convey structural information. Accordingly, we provide Figure 5—figure supplements 12–16. Together, Figure 5B and Figure 5—figure supplements 12–16 map structures to metastable states, whereas Figures 3B and 4A supply atomistic detail of the interactions. Author response image 1 presents selected subplots from Figure 5— figure supplements 12–14. Together with the free-energy landscape in Figure 5A, these data indicate that E219K adopts a highly stable conformation in which all three K67-centered hydrogen bonds (K67(NZ)–S64(OG), K67(NZ)–N152(OD1), and K67(NZ)–G220(O)) are broken.

      Author response image 1.

      TICA plot illustrates the distribution of E219K with the colour indicating the K67(NZ)-S64(OG), K67(NZ)-N152(OD1) and K67(NZ)-G220(O) distance.

      (9) As a very minor issue, there are a few typos in the manuscript text. The authors might want to take some time to revisit their entire text. Examples in lines 70, 197, etc.

      Thank you for your comment. We have corrected these typos.

      Reviewer #3 (Recommendations for the authors):

      This manuscript aims to explore how mutations in the PDC-3 3 β-lactamase alter its ability to bind and catalyse reactions of antibiotic compounds. The topic is interesting, and the study uses MD simulations to provide hypotheses about how the size of the binding site is altered by mutations that change the conformation and flexibility of two loops that line the binding pocket.

      However, the study doesn't clearly describe the way the data is generated and potentially lacks statistical rigour, which makes it uncertain if the key results are significant. As such, it is difficult to judge if the conclusions made are supported by data.

      All necessary data-acquisition methods are described in the Methods section. The Markov state models have been validated by the ITS plot and the Chapman-Kolmogorov (CK) test (Figure 5—figure supplement 2–11) . The two-sample t-tests were also carried out for the volume and SASA (Table 2).

      The results section jumps straight to reporting RMSD and RMSF values; however, it is not clear what simulations are used to generate this information. Indeed, the main text does not mention the simulations themselves at all. The methods section mentions that 10 independent MD simulations were set up for each system, but no information is given as to how long these were run or the equilibration protocol used. Then it says that AB-MD simulations were run, but it is not clear what starting coordinates were used for this or how the 10 replicates were fed into these simulations. Most importantly, are the RMSD and RMSF calculations and later distance distribution information derived from the equilibrium MD runs or from the AB-MD simulations?

      Thank you for pointing this out. We have added “To investigate how the mutations in the Ω-loop affect PDC-3 dynamics, adaptive-bandit molecular dynamics (AB-MD) simulations were carried out for each system. 100 trajectories of 300 ns each (totaling 30 μs per system) were run.” to the Results and Discussion section. We didn’t run 10 independent MD simulations per system. We regret the typo in the Methods section that confused the reviewer. The sentence should have read – ‘All-atom MD simulations of wild-type PDC-3 and its variants were performed.’ Each system was equilibrated for 5 ns at 1 atmospheric pressure using Berendsen barostat. AB-MD simulations were initiated from these equilibrated structures. All analyses, apart from CpHMD, are based on the AB-MD trajectories.

      If these are taken from the equilibrium simulations, then it is critical that the reproducibility and statistical significance of the simulations is established. This can be done by calculating the RMSD and RMSF values independently for each replicate and determining the error bars. From this, the significance of differences between WT and mutant simulations can be determined. Without this, I have no data to judge if the main conclusions are supported or not. If these are derived from the AB-MD simulations, then I want to know how the independent simulations were combined and reweighted to generate overall RMSD, RMSF, and distance distributions. Unless I misunderstand the approach, the individual simulations no longer sample all regions of conformational space the same relative amount you would see in a standard MD simulation - specific conformational regions are intentionally run more to enhance sampling, then the overall conformational distributions cannot be obtained from these simulations without some form of reweighting scheme. But no such scheme is described. In addition, convergence of the data is required to ensure that the RMSD, RMSF, and distances have reached stable values. It is possible that I am misunderstanding the approach here. But in that case, I hope the authors can clarify the method and provide a means of ensuring that the data presented is converged. Many of the differences are clear by eye, but it is important to know they are not random differences between simulations and rather reflect differences between them.

      Thank you for raising this important point. In our AB-MD workflow, the adaptive bandit is used only for starting-structure selection (adaptive seeding). After each epoch, it chooses new starting snapshots from previously sampled conformations and launches the next runs. Each trajectory itself is standard, unbiased MD with no biasing potentials and no modification of the Hamiltonian. In other words, AB decides where we start, but does not alter the physics or sampling dynamics within an individual trajectory. In addition, our goal in this work is to compare variants under the same adaptive-bandit (AB) protocol, rather than to estimate equilibrium (Boltzmann) populations. Hence, we did not apply equilibrium reweighting to RMSD, RMSF, or distance distributions. However, MSM section provides reweighted reference results based on the MSM stationary distribution.

      In the response to reviews, the authors state that the "RMSF is a statistical quantity derived from averaging the time series of atomic displacements, resulting in a fixed value without an inherent error bar." But normally we would run multiple replicates and get an error bar from the different values in each. To dismiss the request for uncertainties and error bars seems to miss the point. I strongly agree with the prior reviewer that comparisons between RMSF or other values should be accompanied by uncertainties and estimates of statistical significance.

      Regarding the reviewers’ suggestion to present the data as a bar graph with error bars, we would like to note that RMSF is calculated as the time average of the fluctuations of each residue’s Cα atom over the entire simulation. As such, RMSF is a statistical quantity derived from averaging the time series of atomic displacements, resulting in a fixed value without an inherent error bar. We believe that our current presentation clearly and accurately reflects the local flexibility differences among the variants. Nearly all published studies report RMSF in this way, as indicated by the following examples:

      Figure 3a in DOI: https://doi.org/10.1021/jacsau.2c00077

      Figure 2 in DOI: https://doi.org/10.1021/acs.jcim.4c00089

      Supplementary Fig. 1, 2, 5, 9, 12, 20, 22, 24, and 26 in DOI: https://doi.org/10.1038/s41467-022-293313

      However, in response to the reviewers’ strong request, we present RMSF plots with error bars in our response letter. 

      Author response image 2.

      The root-mean-square fluctuation (RMSF) profiles of wild-type PDC-3 and its variants. Blue lines show the mean RMSF across 100 independent MD trajectories for each system; red translucent bands denote the standard deviation across trajectories. The Ω-loop (residues G183 to S226) is highlighted in yellow, and the R2-loop (residues L280 to Q310) is highlighted in blue.

      It was good to see that convergence of the constant-pH simulations was shown. While it can be challenging to get absolute pH values from the implicit solvent-based simulations, the differences between the systems are large and the trends appear significant. I was not clear how the starting coordinates were chosen for these simulations. Is the end point of the classical simulations, or is a representative snapshot chosen somehow?

      To ensure comparison, all systems used the X-ray crystal structure (PDB ID: 4HEF) with T79A substitution as the initial structure. The E219K and Y221A mutants were generated in silico using the ICM mutagenesis module. We have added the clarification in Methods section: “The starting structures were identical to those used for AB-MD.”

      Significant figures: Throughout the text and tables, the authors present data with more figures than are significant. 1071.81+-157.55 should be reported as 1100 +/ 160 or 1070 =- 160 . See the eLife guidelines for advice on this.

      Thank you for your suggestion. We have amended these now. 

      The manuscript is very long for the results presented, and I feel that a clearer story would come across if the authors shortened the text so that the main conclusions and results were not lost.

      We appreciate the suggestion. We examined the twenty most recent research articles published in eLife and found that they are either longer than or comparable in length to our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors devote significant effort to characterizing the physical interaction between Bicc1 and Pkd2. However, the study does not examine or discuss how this interaction relates to Bicc1's well-established role in posttranscriptional regulation of Pkd2 mRNA stability and translation efficiency.

      The reviewer is correct that the present study has not addressed the downstream consequences of uthis interaction considering that Bicc1 is a posttranscriptional regulator of Pkd2 (and potentially Pkd1). We think that the complex of Bicc1/Pkd1/Pkd2 retains Bicc1 in the cytoplasm and thus restrict its activity in participating in posttranscriptional regulation (see Author response image 1). We, however, do not yet have data to support this and thus have not included this model in the manuscript. Yet, we have updated the discussion of the manuscript to further elaborate on the potential mechanism of the Bicc1/Pkd1/Pkd2 complex.

      We have updated the discussion to include a discussion on the potential consequences on posttranscriptional regulation by Bicc1.

      Author response image 1.

      Model of BICC1, PC1 and PC2 self-regulation. In this model Bicc1 acts as a positive regulator of PKD gene expression. In the presence of ‘sufficient’ amounts of PC1/PC2 complex, it is tethered to the complex and remains biologically inactive (Fig. 1A). However, once the levels of the PC1/PC2 complex are reduced, Bicc1 is now present in the cytoplasm to promote expression of the PKD proteins, thereby raising their levels (Fig. 4B), which then in turn will ‘shutdown’ Bicc1 activity by again tethering it to the plasma membrane.

      (2) Bicc1 inactivation appears to downregulate Pkd1 expression, yet it remains unclear whether Bicc1 regulates Pkd1 through direct interaction or by antagonizing miR-17, as observed in Pkd2 regulation. This should be further examined or discussed.

      This is a very interesting comment. Vishal Patel published that PKD1 is regulated by a mir-17 binding site in its 3’UTR (PMID: 35965273). We, however, have not evaluated whether BICC1 participates in this regulation. A definitive answer would require utilization of the mice described in above reference, which is beyond the scope of this manuscript. We, however, have revised the discussion to elaborate on this potential mechanism. 

      We have updated the discussion to include a statement on the potential direct regulation of Pkd1 mRNA by Bicc1.

      (3) The evidence supporting Bicc1 and ADPKD gene cooperativity, particularly with Pkd1, in mouse models is not entirely convincing, likely due to substantial variability and the aggressive nature of Bpk/Bpk mice. Increasing the number of animals or using a milder Bicc1 strain, such as jcpk heterozygotes, could help substantiate the genetic interaction.

      We have initially performed the analysis using our Bicc1 complete knockout, we previously reported on (PMID 20215348) focusing on compound heterozygotes. Yet, similar to the Pkd1/Pkd2 compound heterozygotes (PMID 12140187) no cyst development was observed when we sacrificed the mice as late as P21. Our strain is similar to the above mentioned jcpk, which is characterized by a short, abnormal transcript thought to result in a null allele (PMID: 12682776). We thank the reviewer for pointing us to the reference showing the heterozygous mice exhibit glomerular cysts in the adults (PMID: 7723240). This suggestion is an interesting idea we will investigate. In general, we agree with the reviewer that a better understanding of the contribution of Bicc1 to the adult PKD phenotype will be critical. To this end, we are currently generating a floxed allele of Bicc1 that will allow us to address the cooperativity in the adult kidney, when e.g. crossed to the Pkd1<sup>RC/RC</sup> mice. Yet, these experiments are beyond the timeframe for this revision. 

      No changes were made in the revised manuscript. 

      Reviewer #2 (Public review):

      (1) These results are potentially interesting, despite the limitation, also recognized by the authors, that BICC1 mutations seem exceedingly rare in PKD patients and may not "significantly contribute to the mutational load in ADPKD or ARPKD". The manuscript has several intrinsic limitations that must be addressed. 

      As mentioned above, the study was designed to explore whether there is an interaction between BICC1 and the PKD1/PKD2 and whether this interaction is functionally important. How this translates into the clinical relevance will require additional studies (and we have addressed this in the discussion of the manuscript).

      (2) The manuscript contains factual errors, imprecisions, and language ambiguities. This has the effect of making this reviewer wonder how thorough the research reported and analyses have been. 

      We respectfully disagree with the reviewer on the latter interpretation. The study was performed with rigor. We have carefully assessed the critiques raised by the reviewer. As presented below, most of the criticisms raised by the reviewer have been easily addressed in the revised version of the manuscript. Yet, none of the critiques seems to directly impact the overall interpretation of the data. 

      Reviewer #1 (Recommendations for the authors):

      (1) The manuscript requires further editing. For example, figure panels and legends are mismatched in Figure 1

      We have corrected the labeling of Figure 1. 

      (2) Y-axis units and values are inconsistent in Figures 4b-4g, Supplementary Figures S2e and S2f are not referenced in the text, genotypes are missing in Supplementary Figure S3f, and numerous typographical errors are present.

      In respect to the y-axis in Figure 4b-g, the scale is different for each of them, but that is intentional as one would lose the differences if they were all scaled identically. But we have now mentioned this in the figure legend to make the reader aware of it. In respect to the Supplemental Figure S2e,f, we included the panels in the description of the mutant BICC1 lines, but unfortunately forgot to reference them. This has now been done.

      We have updated the labeling of the Y-axis for the cystic indices adding “[%]” as the unit and updated the figure legend of Figure 4. We have included the genotypes in Supplementary Figure S3f. The Supplementary Figure S2e,f is now mentioned in the supplemental material (page 9, 2<sup>nd</sup> paragraph). 

      Reviewer #2 (Recommendations for the authors):

      (1) Previous data from mouse, Xenopus, and zebrafish suggest a crucial role for the RNAbinding protein Bicc1 in the pathogenesis of PKD, although BICC1 mutations in human PKD have not been previously reported." The cited sources (and others that were not cited) link Bicc1 mutations to renal cysts, similar to a report by Kraus (PMID: 21922595) that the authors cite later. However, a more direct link to PKD was reported by Lian and colleagues using whole Pkd1 mice (PMID: 20219263) and by Gamberi and colleagues using Pkd1 kidneys and human microarrays (PMID: 28406902). Although relevant, neither is cited here, and only the former is cited later in the manuscript.

      Thanks for pointing this out. We have added these three citations.

      We have added these three citations (PMID: 21922595, PMID: 20219263 and PMID: 28406902) in the indicated sentence.

      (2) In Figure 1B, the lanes do not seem to correspond among panels, particularly evident in the panel with myc-mBicc1. Hence, it is difficult to agree with the presented conclusions.

      We have corrected the labeling of the lanes in Figure 1b.

      (3) In the Figure 1 legend: "(g) Western blot analysis following co-IP experiments, using an anti-mouse Bicc1 or anti-goat PC2 antibody as bait, identified protein interactions between endogenous PC2 and BICC1 in UCL93 cells. Non-immune goat and mouse IgG were included as a negative control." There is no mention of panel H, although this reviewer can imagine what the authors meant. The capitalization differs in the figure and legend. More troublingly, in panel G, a non-defined star indicates a strong band present in both immune and non-immune control.

      We have corrected the figure legend of Figure 1 and clarified the non-specific band in the figure legend.

      (4) In Figure 4, the authors do not show the matched control for the Bicc1 Pkd1 interaction in panel d, nor do they show a scale bar in either a) or d). Thus, the phenotypic severity cannot be properly assessed.

      Thanks for pointing out the missing scale bars, which have now been added. In respect to the two kidneys shown in Figure 4d, the two kidneys shown are from littermates to illustrate the kidney size in agreement with the cumulative data shown in Figure 4e. Unfortunately, this litter did not have a wildtype control. As the data analysis in Figure 4e is based on littermates, mixing and matching kidneys of different litters does not seem appropriate. Thus, we have omitted showing a wildtype control in this panel. However, the size of the wildtype kidney can be seen in Figure 4a.

      We have added the scale bar to both panels and have updated the figure legend to emphasize that the kidneys shown are from littermates and that no wildtype littermate was present in this litter.

      (5) "Surprisingly, an 8-fold stronger interaction was observed between full-length PC1 and myc-mBicc1-ΔKH compared to mycmBicc1 or myc-mBicc1-ΔSAM." Assuming all the controls for protein folding and expression levels have been carried out and not shown/mentioned, this sentence seems to contradict the previous statement that Bicc1deltaSAM reduced the interaction with PC1 by 55%. Because the full length and SAM deletion have different interaction strengths, the latter sentence makes no sense.

      The reduction in the levels of myc-mBicc1-ΔSAM compared to wildtype mycmBicc1 in respect to PC1 binding was not significant. We have clarified this in the text.

      We have corrected the sentence and modified the Figure accordingly. 

      (6) Imprecise statements make a reader wonder how to interpret the data: "More than three independent experiments were analyzed." Stating the sample size or including it in the figure would save space and improve confidence in the data presented.

      We have stated the exact number of animals per conditions above each of the bars.

      (7) "Next, we performed a similar mouse study for Pkd1 by reducing the gene dose of Pkd1 postnatally in the collecting ducts using a Pkhd1-Cre as previously described40" What did the authors mean?

      The reference was included to cite the mouse strain, but realized that it can be mis-interpreted that the exact experiments has been performed previously. We have clarified this in the text.

      We have reworded the sentence to avoid misinterpretation. 

      (8) The authors examined the additive effects of knocking down Bicc1, Pkd1, and Pkd2 with morpholinos in Xenopus and, genetically, in mice. While the Bicc1[+/-] Pkd1 or 2[+/-] double heterozygote mice did not show phenotypes, the authors report that the Bicc1[-/-] Pkd1 or 2 [+/-] did instead show enlarged kidneys. What is the phenotype of a Bicc1[+/-] Pkd1 or 2 [-/-]? What we learn from the author's findings among the PKD population suggests that the latter situation would be potentially translationally relevant.

      The mouse experiments were designed to address a cooperativity between Bicc1 and either Pkd1 or Pkd2 and whether removal of one copy of Pkd1 or Pkd2 would further worsen the Bicc1 cystic kidney phenotype. Thus, the parental crosses were chosen to maximize the number of animals obtained for these genotypes. Unfortunately, these crosses did not yield the genotypes requested by the reviewer. To address the contribution of Bicc1 towards the PKD population, we will need to perform a different cross, where we eliminate Pkd1 or Pkd2 in a floxed background of Bicc1 postnatally in adult mice. While we are gearing up to perform such an experiment, this is timewise beyond the scope of the manuscript. In addition, please note that we have addressed the question about the translation towards the PKD population already in the discussion of the original submission (page 13/14, last/first paragraph).

      No changes have been made to the revised version of the manuscript.

      (9) How do the authors interpret the milder effects of the Bicc1[-/-] Pkd1[+/-] compared to Bicc1[-/-] Pkd2[+/-] relative to the respective protein-protein interactions?

      The milder effects are due to the nature of the crosses. While the Pkd2 mutant is a germline mutation, the Pkd1 mutant is a conditional allele eliminating Pkd1 only in the collecting ducts of the kidney. As such, we spare other nephron segments such as the proximal tubules, which also significantly contribute to the cyst load. As such these mouse data support the interaction between Pkd1 and Pkd2 with Bicc1, but do not allow us to directly compare the outcomes. While this was mentioned in the previous version of the manuscript, we have expanded on this in the revised version of the manuscript.

      We have expanded the results section in the revised version of the manuscript highlighting that the two different approaches cannot be directly compared.

      (10) How do the authors interpret that the strong Bicc1[Bpk] Pkd1 or Pkd2 double heterozygote mice did not have defects and "kidneys from Bicc1+/-:Pkd2+/- did not exhibit cysts (data not shown)", when the VEO PKD patients and - although not a genetic reduction - also the morpholino-treated Xenopus did?

      VEO PKD patients are characterized by a loss of function of PKD1 or PKD2 and – as we propose in this manuscript - that BICC1 further aggravates the phenotype. Yet, we do not address either in the mouse or Xenopus experiments whether BICC1 is a genetic modifier. We are simply addressing whether the two genes show a genetic interaction. In the mouse studies, we eliminate one copy of Pkd1 or Pkd2 in the background of a hypomorphic allele of Bicc1. Similarly, in the Xenopus experiments, we employ suboptimal doses of the morpholino oligomers, i.e., concentrations that did not yield a phenotypic change and then asked whether removing both together show cooperativity. It is important to state that this is based on a biological readout and not defined based on the amount of protein. While we have described this already in the original manuscript (page 7, first paragraph), we have amended our description of the Xenopus experiment to make this even clearer. 

      Finally, we agree with the reviewer that if we were to address whether Bicc1 is a modifier of the PKD phenotype in mouse, we would need to reduce Bicc1 function in a Pkd1 or Pkd2 mutants. Yet, we have recognized this already in the initial version of the manuscript in the discussion (page 14, first paragraph).

      We have expanded the results section when discussing the suboptimal amounts of the morpholino oligos (Page 6, 1<sup>st</sup> paragraph).

      (11) Unclear: "While variants in BICC1 are very rare, we could identify two patients with BICC1 variants harboring an additional PKD2 or PKD1 variant in trans, respectively." Shortly after, the authors state in apparent contradiction that "the patients had no other variants in any of other PKD genes or genes which phenocopy PKD including PKD1, PKD2, PKHD1, HNF1s, GANAB, IFT140, DZIP1L, CYS1, DNAJB11, ALG5, ALG8, ALG9, LRP5, NEK8, OFD1, or PMM2."

      The reviewer is correct. This should have been phrased differently. We have now added “Besides the variants reported below” to clarify this more adequately.

      The sentence was changed to start with “Besides the variants reported below, […].”

      (12) "The demonstrated interaction of BICC1, PC1, and PC2 now provides a molecular mechanism that can explain some of the phenotypic variability in these families." How do the authors reconcile this statement with their reported ultra-rare occurrence of the BICC1 mutations?

      As mentioned in the manuscript and also in response to the other two reviewers, Bicc1 has been shown to regulate Pkd2 gene expression in mice and frogs via an interaction with the miR-17 family of microRNAs. Moreover, the miR-17 family has been demonstrated to be critical in PKD (PMID: 30760828, PMID: 35965273, PMID: 31515477, PMID: 30760828). In fact, both other reviewers have pointed out that we should stress this more since Bicc1 is part of this regulatory pathway. Future experiments are needed to address whether Bicc1 contributes to the variability in ADPKD onset/severity. Yet, this is beyond the scope of this study. 

      Based on the comments of the two other reviewers we have further addressed the Bicc1/miR-17 interaction.

      (13) The manuscript should use correct genetic conventions of italicization and capitalization. This is an issue affecting the entire manuscript. Some exemplary instances are listed below.

      (a) "We also demonstrate that Pkd1 and Pkd2 modifies the cystic phenotype in Bicc1 mice in a dose-dependent manner and that Bicc1 functionally interacts with Pkd1, Pkd2 and Pkhd1 in the pronephros of Xenopus embryos." Genes? Proteins?

      The data presented in this section show that a hypomorphic allele of Bicc1 in mouse and a knockdown in Xenopus yields this. As both affect the proteins, the spelling should reflect the proteins.

      No changes have been made in the revised manuscript.

      (b) The sentence seems to use both the human and mouse genetic capitalization, although it refers to experiments in the mouse system “to define the Bicc1 interacting domains for PC2 (Fig. 2d,e). Full-length PC2 (PC2-HA) interacted with full-length myc-mBICC1.”

      We agree with the review that stating the species of the molecules used is critical, we have adapted a spelling of Bicc1, where BICC1 is the human homologue, mBicc1 is the mouse homologue and xBicc1 the Xenopus one.

      We have highlighted the species spelling in the methods section and labeled the species accordingly throughout the manuscript and figures. 

      (14) “Together these data supported our biochemical interaction data and demonstrated that BICC1 cooperated with PKD1 and PKD2.” Are the authors implying that these results in mice will translate to the human protein?

      We agree that we have not formally shown that the same applies to the human proteins. Thus, we have changed the spelling accordingly.

      We have revised the capitalization of the proteins. 

      (15) The text is often unclear, terse, or inconsistent.

      (a) “These results suggested that the interaction between PC1 and Bicc1 involves the SAM but not the KH/KHL domains (or the first 132 amino acids of Bicc1). It also suggests that the N-terminus could have an inhibitory effect on PC1-BICC1 association.” How do the authors define the N-terminus? The first 132 aa? KH/KHL domains?

      This was illustrated in the original Figure 2A. The DKH constructs lack the first 351 amino acids. 

      To make this more evident, we have specified this in the text as well.

      (b) Similarly, the authors state below, "Unlike PC1, PC2 interacted with mycmBICC1ΔSAM, but not myc-mBICC1-ΔKH suggesting that PC2 binding is dependent on the N-terminal domains but not the SAM domain." It is unclear if the authors refer to the KH/KHL domains or others. Whatever the reference to the N-terminal region, it should also be consistent with the section above.

      This is now specified in the text.

      (c) Unclear: "We have previously demonstrated that Pkd2 levels are reduced in a complete Bicc1 null mice,22 performing qRT-PCR of P4 kidneys (i.e. before the onset of a strong cystic phenotype), revealed that Bicc1, Pkd1 and Pkd2 were statistically significantly down9 regulated (Fig. 4h-j)".

      We have changed the text to clarify this. 

      (d) “Utilizing recombinant GST domains of PC1 and PC2, we demonstrated that BICC1 binds to both proteins in GST-pulldown assays (Fig. 1a, b)." GST-tagged domains? Fusions?

      We have changed the text to clarify this. 

      (e) "To study the interaction between BICC1, PKD1 and PKD2 we combined biochemical approaches, knockout studies in mice and Xenopus, genetic engineered human kidney cells" > genetically engineered.

      We have changed the text to clarify this.

      (f) Capitalization (e.g., see Figure S3, ref. the Bpk allele) and annotation (e.g., Gly821Glu and G821E) are inconsistent.

      We have homogenized the labeling of the capitalization and annotations throughout the manuscript. 

      (g) What do the authors mean by "homozygous evolutionarily well-conserved missense variant"?

      We have changed this is the revised version of the manuscript. 

      Reviewer #3 (Public review/Recommendations to the authors):

      (1) A further study in HUREC cells investigating the critical regulatory role of BICC1 and potential interaction with mir-17 may yet lead to a modifiable therapeutic target.

      (2) This study should ideally include experiments in HUREC material obtained from patients/families with BICC1 mutations and studying its effects on the PKD1/2 complex in primary cell lines.

      This is an excellent suggestion. We agree with the reviewer that it would have been interesting to analyze HUREC material from the affected patients. Unfortunately, besides DNA and the phenotypic analysis described in the manuscript neither human tissue nor primary patient-derived cells collected once the two patients with the BICC1 p.Ser240Pro variant passed away.

      No changes to the revised manuscript have been made to address this point.

      (3) Please remove repeated words in the following sentence in paragraph 2 of the introduction: "BICC1 encodes an evolutionarily conserved protein that is characterized by 3 K-homology (KH) and 2 KH-like (KHL) RNA-binding domains at the N-terminus and a SAM domain at the C-terminus, which are separated by a by a disordered intervening sequence (IVS).23-28".

      This has been changed.

    1. Author response:

      Reviewer #1 (Public review):

      The authors analysed large-scale brain-state dynamics while humans watched a short video. They sought to identify the role of thalamocortical interactions.

      Major concerns

      (1) Rationale for using the naturalistic stimulus

      In terms of brain state dynamics, previous studies have already reported large-scale neural dynamics by applying some data-driven analyses, like energy landscape analysis and Hidden Markov Model, to human fMRI/EEG data recorded during resting/task states. Considering such prior work, it'd be critical to provide sufficient biological rationales to perform a conceptually similar study in a naturalistic condition, i.e., not just "because no previous work has been done". The authors would have to clarify what type of neural mechanisms could be missed in conventional resting-state studies using, say, energy landscape analysis, but could be revealed in the naturalistic condition.

      We appreciate your insightful comments regarding the need for a biological rationale in our study. As you mentioned, there are similar studies, just like Meer et al. utilized Hidden Markov Models to identify various activation modes of brain networks that included subcortical regions[1], Song et al. linked brain states to narrative understandings and attentional dynamics[2, 3]. These studies could answer why we use naturalistic stimuli datasets. Moreover, there is evidence suggesting that the thalamus plays a crucial role in processing information in a more naturalistic context while pointing out the vital role in thalamocortical communications[4, 5]. So, we tended to bridge thalamic activity and cortical state transition using the energy landscape description.

      To address these gaps in conventional resting-state studies, we explored an alternative method—maximum entropy modeling based on the energy landscape. This allowed us to validate how the thalamus responds to cortical state transitions. To enhance clarity, we will update our introduction to emphasize the motivations behind our research and the significance of examining these neural mechanisms in a naturalistic setting.

      (2) Effects of the uniqueness of the visual stimulus and reproducibility

      One of the main drawbacks of the naturalistic condition is the unexpected effects of the stimuli. That is, this study looked into the data recorded from participants who were watching Sherlock, but what would happen to the results if we analyzed the brain activity data obtained from individuals who were watching different movies? To ensure the generalizability of the current findings, it would be necessary to demonstrate qualitative reproducibility of the current observations by analysing different datasets that employed different movie stimuli. In fact, it'd be possible to find such open datasets, like www.nature.com/articles/s41597-023-02458-8.

      We appreciate your concern regarding the reproducibility of our findings. The dataset from the "Sherlock" study is of high quality and has shown good generalizability in various research contexts. We acknowledge the importance of validating our results with different datasets to enhance the robustness of our conclusions. While we are open to exploring additional datasets, we intend to pursue this validation once we identify a suitable alternative. Currently, we are considering a comparison with the dataset from "Forrest Gump" as part of our initial plan.

      (3) Spatial accuracy of the "Thalamic circuit" definition

      One of the main claims of this study heavily relies on the accuracy of the localization of two different thalamic architectures: matrix and core. Given the conventional or relatively low spatial resolution of the fMRI data acquisition (3x3x3 mm^3), it appears to be critically essential to demonstrate that the current analysis accurately distinguished fMRI signals between the matrix and core parts of the thalamus for each individual.

      We acknowledge the importance of accurately localizing the different thalamic architectures, specifically the matrix and core regions. To address this, we downsampled the atlas of matrix and core cell populations from the previous study from a resolution of 2x2x2 mm<sup>3</sup> to 3x3x3 mm<sup>3</sup>, which aligns with our fMRI data acquisition. We would report the atlas as Supplementary Figures in our revision.

      (4) More detailed analysis of the thalamic circuits

      In addition, if such thalamic localisation is accurate enough, it would be greatly appreciated if the authors perform similar comparisons not only between the matrix and core architectures but also between different nuclei. For example, anterior, medial, and lateral groups (e.g., pulvinar group). Such an investigation would meet the expectations of readers who presume some microscopic circuit-level findings.

      We appreciate your suggestion regarding a more detailed analysis of thalamic circuits. We have touched upon this in the discussion section as a forward-looking consideration. However, we believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. That said, we are interested in exploring these nuclei-pathway connections to cortical areas in future studies with a proper 7T fMRI naturalistic dataset.

      (5) Rationale for different time window lengths

      The authors adopted two different time window lengths to examine the neural dynamics. First, they used a 21-TR window for signal normalisation. Then, they narrowed down the window length to 13-TR periods for the following statistical evaluation. Such a seemingly arbitrary choice of the shorter time window might be misunderstood as a measure to relax the threshold for the correction of multiple comparisons. Therefore, it'd be appreciated if the authors stuck to the original 21-TR time window and performed statistical evaluations based on the setting.

      Thank you for your valuable feedback regarding the choice of time window lengths. We aimed to maintain consistency in window lengths across our analyses. In light of your comments and suggestions from other reviewers, we plan to test our results using different time window lengths and report findings that generalize across these variations. Should the results differ significantly, we will discuss the implications of this variability in our revised manuscript.

      (6) Temporal resolution

      After identifying brain states with energy landscape analysis, this study investigated the brain state transitions by directly looking into the fMRI signal changes. This manner seems to implicitly assume that no significant state changes happen in one TR (=1.5sec), which needs sufficient validation. Otherwise, like previous studies, it'd be highly recommended to conduct different analyses (e.g., random-walk simulation) to address and circumvent this problem.

      Thank you for raising this important point regarding temporal resolution. Many fMRI studies, such as those examining event boundaries during movie watching, operate under similar assumptions concerning state changes within one TR. For example, Barnett et al. processed the dynamic functional connectivity (dFC) with a window of 20 TRs (24.4s). So, we do not think it is a limitation but is a common question related to fMRI scanning parameters. To strengthen our analysis of state transitions and ensure they are not merely coincidental, we plan to conduct random-walk simulations, as suggested, to validate our findings in accordance with methodologies used in previous research.

      Reviewer #2 (Public review):

      Summary:

      In this study, Liu et al. investigated cortical network dynamics during movie watching using an energy landscape analysis based on a maximum entropy model. They identified perception- and attention-oriented states as the dominant cortical states during movie watching and found that transitions between these states were associated with inter-subject synchronization of regional brain activity. They also showed that distinct thalamic compartments modulated distinct state transitions. They concluded that cortico-thalamo-cortical circuits are key regulators of cortical network dynamics.

      Strengths:

      A mechanistic understanding of cortical network dynamics is an important topic in both experimental and computational neuroscience, and this study represents a step forward in this direction by identifying key cortico-thalamo-cortical circuits. The analytical strategy employed in this study, particularly the LASSO-based analysis, is interesting and would be applicable to other data types, such as task- and resting-state fMRI.

      We thanks for this comment and encouragement.

      Weaknesses:

      Due to issues related to data preprocessing, support for the conclusions remains incomplete. I also believe that a more careful interpretation of the "energy" derived from the maximum entropy model would greatly clarify what the analysis actually revealed.

      Thank you for your valuable suggestions, and we apologize for any misunderstandings regarding the interpretation of the energy landscape in our study. To address this issue, we will include a dedicated paragraph in both the methods and results sections to clarify our use of the term "energy" derived from the maximum entropy model. This addition aims to eliminate any ambiguity and provide a clearer understanding of what our analysis reveals.

      (1) I think the method used for binarization of BOLD activity is problematic in multiple ways.

      a) Although the authors appear to avoid using global signal regression (page 4, lines 114-118), the proposed method effectively removes the global signal. According to the description on page 4, lines 117-122, the authors binarized network-wise ROI signals by comparing them with the cross-network BOLD signal (i.e., the global signal): at each time point, network-wise ROI signals above the cross-network signal were set to 1, and the rest were set to −1. If I understand the binarization procedure correctly, this approach forces the cross-network signal to be zero (up to some noise introduced by the binarization of network-wise signals), which is essentially equivalent to removing the global signal. Please clarify what the authors meant by stating that "this approach maintained a diverse range of binarized cortical states in data where the global signal was preserved" (page 4, lines 121-122).

      Thank you for highlighting the potential issue with our binarization method. We appreciate your insights regarding the comparison of network-wise ROI signals with the cross-network BOLD signal, as this may inadvertently remove the global signal. To address this, we will conduct a comparative analysis of results obtained from both our current approach and the original pipeline. If we decide to retain our current method, we will carefully reconsider the rationale and rephrase our descriptions to ensure clarity regarding the preservation of the global signal and the diversity of binarized cortical states.

      b) The authors might argue that they maintained a diverse range of cortical states by performing the binarization at each time point (rather than within each network). However, I believe this introduces another problem, because binarizing network-wise signals at each time point distorts the distribution of cortical states. For example, because the cross-network signal is effectively set to zero, the network cannot take certain states, such as all +1 or all −1. Similarly, this binarization biases the system toward states with similar numbers of +1s and −1s, rather than toward unbalanced states such as (+1, −1, −1, −1, −1, −1). These constraints and biases are not biological in origin but are simply artifacts of the binarization procedure. Importantly, the energy landscape and its derivatives (e.g., hard/easy transitions) are likely to be affected by these artifacts. I suggest that the authors try a more conventional binarization procedure (i.e., binarization within each network), which is more robust to such artifacts.

      Related to this point, I have a question regarding Figure S1, in which the authors plotted predicted versus empirical state probabilities. As argued above, some empirical state probabilities should be zero because of the binarization procedure. However, in Figure S1, I do not see data points corresponding to these states (i.e., there should be points on the y-axis). Did the authors plot only a subset of states in Figure S1? I believe that all states should be included. The correlation coefficient between empirical and predicted probabilities (and the accuracy) should also be calculated using all states.

      Thank you for your thoughtful examination of our data processing pipeline. We agree that a comparison between the conventional binarization method and our current approach is warranted, and we appreciate your suggestion. Upon reviewing Figure S1, we discovered that there was indeed an error related to the plotting style set to "log10." As you correctly pointed out, the data should reflect that the probabilities for states where all networks are either activated or deactivated are zero. We are very interested in exploring the state distributions obtained from both the original and current approaches, as your comments highlight important considerations. We sincerely appreciate your insightful feedback and will make sure to address these points thoroughly in our first revision.

      c) The current binarization procedure likely inflates non-neuronal noise and obscures the relationship between the true BOLD signal and its binarized representation. For example, consider two ROIs (A and B): both (+2%, +1%) and (+0.01%, −0.01%) in BOLD signal changes would be mapped to (+1, −1) after binarization. This suggests that qualitatively different signal magnitudes are treated identically. I believe that this issue could be alleviated if the authors were to binarize the signal within each network, rather than at each time point.

      Thank you for your important observation regarding the potential inflation of non-neuronal noise in our current binarization procedure. We recognize that this process could lead to qualitatively different signal magnitudes being treated similarly after binarization, as you illustrated with your example. While we acknowledge your point, we believe that conventional binarization pipelines may also encounter this issue, albeit by comparing signals to a network's temporal mean activity. To address this concern and maintain consistency with previous studies, we will discuss this limitation in our revised manuscript. Additionally, if deemed necessary, we will explore implementing a percentile-based threshold above the baseline to further refine our binarization approach. Your suggestion provides a valuable perspective, and we appreciate your insights.

      (2) As the authors state (page 5, lines 145-148), the "energy" described in the energy landscape is not biological energy but rather a statistical transformation of probability distributions derived from the Boltzmann distribution. If this is the case, I believe that Figure 2A is potentially misleading and should be removed. This type of schematic may give the false impression that cortical state dynamics are governed by the energy landscape derived from the maximum entropy model (which is not validated).

      Thank you for your valuable feedback regarding Figure 2A. We apologize for any confusion it may have created. While we recognize that similar figures are commonly used in literature involving energy landscapes (maximum entropy model), we agree that Figure 2A may mislead readers into thinking that cortical state dynamics are directly governed by the energy landscape derived from the maximum entropy model, which has not been validated. In light of your comments, we will remove Figure 2A and instead emphasize the analytical strategy presented in Figure 2B. Additionally, we will provide a simplified line graph as an illustrative example to clarify the concepts without the potential for misinterpretation.

      Reviewer #3 (Public review):

      Summary:

      In this study, Liu et al. analyze fMRI data collected during movie watching, applied an energy landscape method with pairwise maximum entropy models. They identify a set of brain states defined at the level of canonical functional networks and quantify how the brain transitions between these states. Transitions are classified as "easy" or "hard" based on changes in the inferred energy landscape, and the authors relate transition probabilities to inter-subject correlation. A major emphasis of the work is the role of the thalamus, which shows transition-linked activity changes and dynamic connectivity patterns, including differential involvement of parvalbumin- and calbindin-associated thalamic subdivisions.

      Strengths:

      The study is methodologically complex and technically sophisticated. It integrates advanced analytical methods into high-dimensional fMRI data. The application of energy landscape analysis to movie-watching data appears to be novel as well. The finding on the thalamus involved energy state transition and provides a strong linkage to several theories on thalamic control functions, which is a notable strength.

      Thanks for your comments on the novelty of our study.

      Weaknesses:

      The main weakness is the conceptual clarity and advances that this otherwise sophisticated set of analyses affords. A central conceptual ambiguity concerns the energy landscape framework itself. The authors note that the "energy" in this model is not biological energy but a statistical quantity derived from the Boltzmann distribution. After multiple reads, I still have major trouble mapping this measure onto any biological and cognitive operations. BOLD signal is a measure of oxygenation as a proxy of neural activity, and correlated BOLD (functional connectivity) is thought to measure the architecture of information communication of brain systems. The energy framework described in the current format is very difficult for most readers to map onto any neural or cognitive knowledge base on the structure and function of brain systems. Readers unfamiliar with maximum entropy models may easily misinterpret energy changes as reflecting metabolic cost, neural effort, or physiological variables, and it is just very unclear what that measure is supposed to reflect. The manuscript does not clearly articulate what conceptual and mechanistic advances the energy formalism provides beyond a mathematical and statistical report. In other words, beyond mathematical description, it is very hard for most readers to understand the process and function of what this framework is supposed to tell us in regards to functional connectivity, brain systems, and cognition. The brain is not a mathematical object; it is a biological organ with cognitive functions. The impact of this paper is severely limited until connections can be made.

      Thank you for your insightful and constructive comments regarding the conceptual clarity of our energy landscape framework. We appreciate your perspective on the challenges of mapping the statistical measure of "energy" derived from the Boltzmann distribution onto biological and cognitive operations. To address these concerns, we will revise our manuscript to clarify our expressions surrounding "energy" and emphasize its probabilistic nature. Additionally, we will incorporate a series of analyses that explicitly relate the features of the energy landscape to cognitive processes and key parameters, such as brain integration and functional connectivity. We believe these changes will help bridge the gap between our mathematical framework and its relevance to understanding brain systems and cognitive functions.

      Relatedly, the use of metaphors such as "valleys," "hills," and "routes" in multidimensional measures lacks grounding. Valleys and hills of what is not intuitive to understand. Based on my reading, these features correspond to local minima and barriers in a probability distribution over binarized network activation patterns, but similar to the first point, the manuscript does not clearly explain what it means conceptually, neurobiologically, or computationally for the brain to "move" through such a landscape. The brain is not computing these probabilities; they are measurement tools of "something". What is it? To advance beyond mathematical description, these measurements must be mapped onto neurobiological and cognitive information.

      Thank you for your valuable feedback. In our revisions, we would aim to link the concept of rapid transition routes in the energy landscape to cognitive processes, such as narrative understanding and related features. By exploring these connections, we hope to provide a clearer context for how our framework can enhance understanding of cognitive functions and their neural correlates.

      This conceptual ambiguity goes back to the Introduction. At the level of motivation, the purpose and deliverables of the study are not defined in the Introduction. The stated goal is "Transitions between distinct cortical brain states modulate the degree of shared neural processing under naturalistic conditions". I do not know if readers will have a clear answer to this question at the end. Is the claim that state transitions cause changes in inter-subject correlation, that they index moments of narrative alignment, or that they reflect changes in attentional or cognitive mode? This level of explanation is largely dissociated from the methods in their current form.

      Thank you for highlighting this important point regarding the conceptual clarity in our Introduction. We appreciate your feedback about the motivation and objectives of the study. To clarify the stated goal of investigating how transitions between distinct cortical brain states modulate shared neural processing under naturalistic conditions, we will revise the manuscript to explicitly define the specific claims we aim to address. We will ensure that these explanations are closely tied to the methods employed in our study, providing a clearer framework for our readers.

      Several methodological choices can use clarification. The use of a 21-TR window centered on transition offsets is unusually long relative to the temporal scale of fMRI dynamics and to the hypothesized rapidity of state transitions. On a related note, what is the temporal scale of state transition? Is it faster than 21 TRs?

      Thank you for your insightful questions regarding our methodological choices. Our focus on specific state transitions necessitated the use of a 21-TR window. While it’s true that other transitions may occur within this window, averaging across the same transitions at different times allows us to identify distinctive thalamic BOLD patterns that precede cortical state transitions. This methodology enables us to capture relevant dynamics while ensuring that we focus on the transitions of interest. We appreciate your feedback, and this clarification will be included in our revised manuscript. We would also add a figure that describe the dwell time of cortical states.

      The choice of movie-watching data is a strength. But, many of the analyses performed here, energy landscape estimation, clustering of states, could in principle be applied to resting-state data. The manuscript does not clearly articulate what is gained, mechanistically or cognitively, by using movie stimuli beyond the availability of inter-subject correlation.

      Thank you for your question, which closely aligns with a concern raised by Reviewer #1. Our core hypothesis posits that naturalistic stimuli yield a broader set of brain states compared to those observed during resting-state conditions. To support this assertion, we will clearly articulate the findings from previous studies that relate to this hypothesis. Additionally, if appropriate, we will provide a comparative analysis between our data and resting-state data to highlight the differences and emphasize the uniqueness of the brain states elicited by naturalistic stimuli.

      Because of the above issues, a broader concern throughout the results is the largely descriptive nature of the findings. For example, the LASSO analysis shows that certain state transitions predict ISC in a subset of regions, with respectable R² values. While statistically robust, the manuscript provides little beyond why these particular transitions should matter, what computations they might reflect, or how they relate to known cognitive operations during movie watching. Similar issues arise in the clustering analyses. Clustering high-dimensional fMRI-derived features will almost inevitably produce structure, whether during rest, task, or naturalistic viewing. What is missing is an explanation of why these specific clusters are meaningful in functional or mechanistic terms.

      Thank you for your questions. In our revisions, we will perform additional analyses aimed at linking state transitions to cognitive processes more explicitly. Regarding clustering, we will provide a thorough discussion in the revised manuscript.

      Finally, the treatment of the thalamus, while very exciting, could use a bit more anatomical and circuit-level specificity. The manuscript largely treats the thalamus as a unitary structure, despite decades of work demonstrating big functional and connectivity differences across thalamic nuclei. A whole-thalamus analysis without more detailed resolution is increasingly difficult to justify. The subsequent subdivision into PVALB- and CALB-associated regions partially addresses this, but these markers span multiple nuclei with overlapping projection patterns.

      This suggestion aligns with the feedback from Reviewer #1. We believe that performing nuclei segmentation with 3T fMRI may not be ideal due to well-documented concerns regarding signal-to-noise ratio and spatial resolution. Therefore, investigating core and matrix cell projections across different thalamic nuclei using 7T fMRI presents a promising avenue for further study.

      (1) Van Der Meer J N, Breakspear M, Chang L J, et al. Movie viewing elicits rich and reliable brain state dynamics [J]. Nature Communications, 2020, 11(1): 5004.

      (2) Song H, Park B Y, Park H, et al. Cognitive and Neural State Dynamics of Narrative Comprehension [J]. Journal of Neuroscience, 2021, 41(43): 8972-8990.

      (3) Song H, Shim W M, Rosenberg M D. Large-scale neural dynamics in a shared low-dimensional state space reflect cognitive and attentional dynamics [J]. Elife, 2023, 12.

      (4) Shine J M, Lewis L D, Garrett D D, et al. The impact of the human thalamus on brain-wide information processing [J]. Nature Reviews Neuroscience, 2023, 24(7): 416-430.

      (5) Yang M Y, Keller D, Dobolyi A, et al. The lateral thalamus: a bridge between multisensory processing and naturalistic behaviors [J]. Trends in Neurosciences, 2025, 48(1): 33-46.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1(Public review):

      In this study, Acosta-Bayona et al. aim to better understand how environmental conditions could have influenced specific gene functions that may have been selected for during the domestication of teosinte parviglumis into domesticated maize. The authors are particularly interested in identifying the initial phenotypic changes that led to the original divergence of these two subspecies. They selected heavy metal (HM) stress as the condition to investigate. While the justification for this choice remains speculative, paleoenvironmental data would add value; the authors hypothesize that volcanic activity near the region of origin could have played a role.

      The justification of choice to investigate the effects of heavy metal stress is not speculative. As mentioned now in the Abstract, the elucidation of the genome from the Palomero toluqueño maize landrace revealed heavy metal effects during domestication (Vielle-Calzada et al., Science 2009). Our aim was to test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte parviglumis to maize.

      (1) Although the paper presents some interesting findings, it is difficult to distinguish which observations are novel versus already known in the literature regarding maize HM stress responses. The rationale behind focusing on specific loci is often lacking. For example, a statistically significant region identified via LOD score on chromosome 5 contains over 50 genes, yet the authors focus on three known HM-related genes without discussing others in the region. It is unclear why ZmHMA1 was selected for mutagenesis over ZmHMA7 or ZmSKUs5.

      We appreciated the depth and value of this comment.

      Maize phenotypic responses to sublethal concentrations to heavy metals – copper (Cu) and cadmium (Cd) in particular - are well characterized and published, and in agreement with our results. In the first section of the Results (pgs 7 and 8), we added pertinent references to clearly show which observations are already known. By contrast, teosinte parviglumis responses are in all cases novel. To our knowledge this is the first study that analyzed in detail the phenotypic response of teosinte to sublethal concentrations of heavy metals, specifically Cu and Cd. We have now emphasized the novelty of these observations (pg 8).

      To address the fact that we only focused on three known HM-related genes without discussing others in the statistically significant region identified via LOD score on chr.5, we have added a full section that reads as follows (pgs. 11 to 13 of the new version):

      “Large-scale genomic and transcriptomic comparisons indicate that many HM response genes were positively selected across the maize genome.

      To expand the results well beyond the analysis of the three genes previously described, we performed a detailed analysis of genetic diversity across the 11.47 Mb genomic region comprised between Z_mSKUs5_ and ZmHMA1. This additional analysis reveals general tendencies in the quantity and nature of loci that were affected by positive selection during the teosinte parviglumis to maize transition in a region identified via LOD score on chr.5. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). Two types of statistical tests (ANOVA and Wilcoxon) were applied to nucleotide variability comparisons using the entirety of each locus. The Benjamini-Hochber procedure allowed an estimation of the false discovery rate (FDR<0.05) to avoid type I errors (false positives). Although some individual loci appear as differently classified depending on the statistical test applied (22 out of 173 loci), the general differences in nucleotide variability are consistently maintained within the subregions described below. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. The first six loci are consecutively ordered in a 402 Kb subregion that includes ZmSKUs5. A second group of 13 consecutive loci expands over a 1.44 Mb subregion that contains NRAMP ALUMINUM TRANSPORTER1, also involved in HM response through uptake of divalent ions. A third group of 17 consecutive loci expands over 1.28 Mb; eleven contain genes encoding for uncharacterized proteins. The fourth group is composed of 57 consecutive loci expanding over 3.22 Mb and contains genes encoding for DEFECTIVE KERNEL55, AUXIN RESPONSE FACTOR16, and peroxydases involved in responses to oxydative stress. The fifth group contains 12 consecutive loci expanding over 713 Kb and contains ZmHMA1. An additional segment of approximately 1.17 Mb and containing 25 consecutive loci that were positively selected expands away from the ZmSKUs5-ZmHMA1 segment; it also contains several genes encoding for peroxydases. Although multiple loci include genes that could be involved in abiotic stress and oxidative responses, these results suggest that multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5 during the teosinte parviglumis to maize transition.

      To further analyze the possibility that HM response could have played a role in maize emergence and subsequent domestication, we analyzed large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress. Six available transcriptomes were selected for in-depth analysis because they presented a fold change strictly higher than 1, and their results were supported by false discovery rates (FDR<0.05). These six transcriptomes (Table S5) included HM response datasets corresponding to growth conditions that not only incorporated Cu, but also lead (Pb) and chromium (Cr) that were not included in the substrate of our experiments. Transcriptional profiles were obtained from roots of plants at different stages: maize seedlings (Shen et al., 2012; Gao et al., 2015; Zhang et al., 2024a), three week old plantlets (Yang et al., 2023), and plants at V2 stage (Zhang et al., 2024b; Fengxia et al., 2025). A total of 120 genes shared by all six transcriptomes were found to be differentially expressed under HM stress conditions (66 upegulated and 54 downregulated; Figure S3), including ZmSKUs5, ZmHMA1 and ZmHMA7; 52 of them (43.3%) are located in maize loci showing less than 70% of the nucleotide variability found in teosinte parviglumis, suggesting that they were affected by positive selection (Yamasaki et al., 2005; Supplementary File 7). Of 18 mapping in chr.5, twelve are within the 82 cM that fractionates into multiple QTLs under selection during the parviglumis to maize transition. Interestingly, five additional loci containing HM response genes completely lack SNPs within their total length in both parviglumis and maize, and 19 additional loci lack SNPs in at least one 30 Kb segment or their coding region (Supplementary File 7), suggesting the frequent presence of ultraconserved genomic regions in many loci containing HM response genes. When this same analysis was conducted in a set of loci comprising 63 genes previously identified as differentially expressed in response to abiotic stress not directly related to HM responses (hypoxia; nutritional deficiency; soil alkalinity; drought; soil salinity), 18 loci (28.6%) showed less than 70% of the nucleotide variability found in teosinte parviglumis. Only one of them maps in chr.5 and none contained segments or coding regions lacking SNPs in parviglumis or maize. These results suggest that in contrast to other types of abiotic stress response genes, loci comprising a large set of genes that unambiguously respond to HM stress caused by chemical elements of diverse nature were affected by positive selection during the parviglumis to maize transition, irrespectively of their position in the genome.”

      The detailed analysis of genetic diversity across 11.47 Mb of chr.5 in the genomic region comprised between ZmSKUs5 and ZmHMA1 in presented as Supplementary File 6.

      The analysis of genetic diversity in loci encompassing heavy metal response genes shared by six transcriptomes and abiotic stress controls are described in Supplementary File 7.

      In the Discussion (pgs. 21 and 22), we added a paragraph section that reads as follows:

      “Although loss of genetic diversity is usually the result of human selection during domestication, it can also represent a consequence of natural selective pressures favoring fitness of specific teosinte parviglumis allelic variants better adapted to environmental changes and subsequently affected by human selection during the domestication process. This possibility is reflected by widely spread selective sweeps affecting a large portion of chr.5 that contains hundreds of genes showing signatures of positive selection. The analysis of 11.47 Mb covering the ZmHMA1ZmSKUs5 segment confirms the presence of large but discrete genomic subregions that were positively selected during the teosinte parviglumis to maize transition. Although several contain genes involved in HM response and oxidative stress, the diversity of gene functions does not necessarily favor abiotic stress over other factors that could be at the origin of selective forces affecting these regions. By contrast, a large scale transcriptomic survey indicates that genes consistently responding to HMs (Cu, Cd, Pb and Cr ) show signatures of positive selection at unusual high frequencies (43.3%) as compared to loci containing genes responding to other types of abiotic stress (28.6%). Our identification of HM response genes affected by positive selection is far from being exhaustive. Nevertheless, it agrees with the expected effects of a widespread selective sweep caused by environmental changes that influenced the parviglumis to maize transition at the genetic level. Of intriguing interest are 24 loci that partially or completely lack SNPs in both teosinte parviglumis and maize, suggesting possible genetic bottlenecks occurred before the teosinte to maize transition. Examples of other edaphological factors driving genetic divergence either in the teosintes or maize include local adaptation to phosphorus concentration in mexicana and parviglumis (Aguirre-Liguori et al. 2019), and fast maize adaptation to changing iron availability through the action of genes involved in its mobilization, uptake, and transport (Benke and Stich 2011). Our results reveal a teosinte parviglumis environmental plasticity that could be related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition. Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      The mutagenic analysis of ZmHMA7 and ZmSKUs5 will be included in a different publication.

      (2) The idea that HM stress impacted gene function and influenced human selection during domestication is of interest. However, the data presented do not convincingly link environmental factors with human-driven selection or the paleoenvironmental context of the transition. While lower nucleotide diversity values in maize could suggest selective pressure, it is not sufficient to infer human selection and could be due to other evolutionary processes. It is also unclear whether the statistical analysis was robust enough to rule out bias from a narrow locus selection. Furthermore, the addition of paleoclimate records (Paleoenvironmental Data Sources as a starting point) or conducting ecological niche modeling or crop growth models incorporating climate and soil scenarios would strengthen the arguments.

      We think that the detailed analysis of genetic diversity across 11.46 Mb covering the ZmSKUs5 to ZmHMA1 genomic segment – and its statistical validation - provides a precise understanding of the selective sweep dimensions in chr.5.

      We do agree that lower nucleotide diversity values in maize are not sufficient to infer human selection. Because many HM response loci show unusually low nucleotide variability in teosinte parviglumis (see the results of the transcriptomic analysis presented above), we cannot discard the possibility that natural selection forces related to environmental changes could have affected native populations of teosinte parviglumis.

      To further explore the link between environmental factors, natural or human-driven selection, and the paleoenvironmental context of the parviglumis to maize transition, we revised paleoenvironmental and geological records and added results in two sections that read as follows (pgs. 17 to 20):

      “Paleoenvironmental studies reveal periods of climatic instability in the presumed region of maize emergence during the early Holocene.

      It is well accepted that temperature fluctuations, volcanism and anthropogenic impact shaped the distribution and abundance of plant species in the Transmexican Volcanic Belt (TMVB) during the last 14,000 years (Torrescano-Valle et al. 2019). The TMVB has produced close to 8000 volcanic structures (Ferrari et al., 2011), transforming the relief multiple times, and causing hydrographic and soil changes that actively modified the distribution and composition of plant communities in Central Mexico. Detailed paleoenvironmental data for the Pleistocene and Holocene is available for several lacustrine zones located within the 50 to 100 km range of the region currently considered the cradle of maize domestication (Matzuoka et al. 2002; Figure 5a). In Lake Zirahuén (102°44′ W; 19°26′ N and approximately 2075 meters above sea level; index [i] in Figure 5a), pollen, microcharcoal and magnetic susceptibility analyses of two sedimentary sequences reveals three periods of major ecological change during the early and middle Holocene.

      Between 9500 and 9000 calibrated years before present (cal yr BP), pine forests seem to have been associated with summer insolation increases. A second peak of forest change occurred at around 8200 cal yr BP, coinciding with cold oscillations documented in the North Atlantic. Finally, events occurred between 7500 and 7100 cal yr BP shows an abrupt change in the plant community related to humid Holocene climates and a presumed volcanic event (Lozano-García et al., 2013). The environmental history of the central Balsas watershed has also been documented by pollen, charcoal, and sedimentary analysis conducted in three lakes and a swamp of the Iguala valley (Piperno et al. 2007). Paleoecological records of lake Ixtacyola (8°20N, 99°35W and approximately 720 meters above sea level; index [ii] in Figure 5a) and lake Ixtapa (8°21N, 99°26W) indicate that an important increase in temperature and precipitation occurred between 13000 and 10000 cal yr BP. The pollen record of Ixtacyola showed that members of the genus Zea were already part of the vegetation coverage by 12900 to 13000 cal yr BP, suggesting that some teosintes – likely including parviglumis - were commonly found at elevation areas where they do not presently occur. Lake Almoloya (also named Chignahuapan; 19°05N, 99°20E and approximately 2575 meters above sea level; index [iii] in Figure 5a) in the upper Lerma basin is only 20 Km from the crater of the Nevado de Toluca that is responsible for creating the late Pleistocene Upper Toluca Pumice layer over which the Lerma basin is deposited. Pollen records indicate the presence of Zea species by 11080 to 10780 cal yr BP. As for other locations, an important period of climatic instability prevailed between 11500 and 8500 cal yr BP (Ludlow-Wiechers et al., 2005). Humidity fluctuations occurred until 8000 cal yr BP, with a stable temperate climate between 8500 and 5000 cal yr BP. Although pollen and diatom studies are often difficult to interpret at a regional scale, the overall results presented above suggest consistent periods of Zea plants present in periods of environmental and climatic instability that correlate with the history of volcanic activity during the early Holocene, as described in the next section.

      Temporal and geographical convergence between volcanic eruptions and maize emergence during the Holocene.

      Current evidence indicates that the emergence and domestication of maize initiated in Mesoamerica some time around 9,000 yr BP (Matsuoka et al. 2002). The current location of teosinte parviglumis populations that are phylogenetically most closely allied with maize are currently distributed in a region located between the Michoacan-Guanajuato Volcanic Field (MGVF) at their northwest, and the Nevado de Toluca and Popocatéptl volcanoes at their east and northeast (Figure 5a; Matsuoka et al. 2002). Precise records of field data indicate that ten accessions were collected in the Balsas river drainage near Teloloapan and Sierra de Huautla (Guerrero), at approximately 100 km south of the Nevado de Toluca crater. Three other accessions were collected near Tejupilco de Hidalgo and Zacazonapan (Estado de México), at approximately 50 to 60 km from the Nevado de Toluca crater (8762, JSG y LOS-161, and JSG-391). And four other accessions were located in Michoacan, at a location within the MGVF (accession 8763), or at mid-distance between the MGVF and the Nevado de Toluca crater (accessions JSG y LOS-130, 8761, and 8766).

      The most important source of HMs in ancient soils of Mesoamerica is TMBV-dependent volcanic activity through short- and long-term effects related to lava deposits, ores, hydrothermal flow, and ash (Torrescano-Valle et al. 2019). The Nevado de Toluca volcano produced one of the most powerful eruptions from central Mesoamerica in the Holocene, giving rise to the Upper Toluca Pumice deposit at 12621 to 12025 cal yr BP (Arce et al., 2003; Figure 5b). The pumice fallout blanketed the Lerma and Mexico basins with 40 cm of coarse ash (Bloomfield and Valastro 1977; Arce et al. 2003). A second eruption dated by 36Cl exposure occurred at 9700 cal yr BP (Arce et al. 2003; Figure 5b), and the most recent eruption occurred at 3580 to 3831 cal yr BP (Macías et al. 1997). During the early and middle Holocene, the Popocatéptl volcano produced at least four eruptions dated 13037-12060, 10775–9564, 8328-7591, and 6262-5318 cal yr BP (Siebe et al. 1997); three other important eruptions occurred during the late Holocene, between 2713 and 733 cal yr BP (Siebe and Macías, 2006). In addition, the MGFV is a monogenetic volcanic field for which 23 independent eruptions have been documented during the Holocene, 21 of them located towards the southern part of the field, in close proximity to the region harboring some of the teosinte parviglumis populations most closely related to maize. Three of these eruptions occurred in the early Holocene (El Huanillo 1130 to 9688 cal yr BP; La Taza 10649 to 10300 cal yr BP; Cerro Grande 10173 to 9502 cal yr BP; Figure 5b), and three others during the initial period of the middle Holocene, between 8400 and 7696 cal yr BP (La Mina, Los Caballos, and Cerro Amarillo; Figure 5b). On average, a new volcano forms every ~435 years in the MGFV (Macías and Arce, 2019). No less than 16 other eruptions occurred between 7159 cal yr BP and the present time (Figure 5b). Soils of volcanic origin (andosols) are currently distributed in regions north-west from the Nevado de Toluca and Popocatéptl craters, in close proximity with teosinte parviglumis populations most closely related to maize (Figure S5). Although modern distribution of teosinte populations may differ from their distribution around 9000 yr BP, and unknown populations more closely related to maize may yet to be discovered, this data indicates that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the Holocene in that same region.”

      (3) Despite the interest in examining HM stress in maize and the presence of a pleiotropic phenotype, the assessment of the impact of gene expression is limited. The authors rely on qPCR for two ZmHMA genes and the locus tb1, known to be associated with maize architecture. A transcriptomic analysis would be necessary to 1- strengthen the proposed connection and 2- identify other genes with linked QTLs, such as those in the short arm of chromosome 5.

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. We have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      On the other hand, the transcriptional analysis the identification of 52 additional HM response genes showing signatures of positive selection occurred during the parviglumis to maize transition; 12 of them map to chr.5 within the region having linked QTLs within the short arm of chr.5. So far, genes involved in HM response and oxidative stress represent the most prevalent class of genes identified within the genomic region showing pleiotropic effects on domestication and multiple linked QTLs in chr.5.

      Reviewer #2 (Public review):

      Summary:

      This work explores the phenotypic developmental traits associated with Cu and Cd responses in teosinte parviglumis, a species evolutionary related to extant maize crops. Cu and Cd could serve as a proxy for heavy metals present in the soils. The manuscript explores potential genetic loci associated with heavy metal responses and domestication identified in previous studies. This includes heavy metal transporters, which are unregulated during stress. To study that, the authors compare the plant architecture of maize defective in ZmHMA1 and speculate on its association with domestication.

      Strengths:

      Very few studies covered the responses of teosintes to heavy metal stress. The physiological function of ZmHMA1 in maize also gives some novelty in this study. The idea and speculation section is interesting and well-implemented.

      Weaknesses:

      The authors explored Cu/Cd stress but not a more comprehensive panel of heavy metals, making the implications of this study quite narrow. Some techniques used, such as end-point RT-PCR and qPCR, are substandard for the field. The phenotypic changes explored are not clearly connected with the potential genetic mechanisms associated with them, with the exception of nodal roots. If teosintes in response to heavy metal have phenotypic similarity with modern landraces of maize, then heavy metal stress might have been a confounding factor in the selection of maize and not a potential driving factor. Similar to the positive selection of ZmHMA1 and its phenotypic traits. In that sense, there is no clear hypothesis of what the authors are looking for in this study, and it is hard to make conclusions based on the provided results to understand its importance. The authors do not provide any clear data on the potential influence of heavy metals in the field during the domestication of maize. The potential role of Tb-1 is not very clear either.

      Thank you for these comments. We have now emphasized our hypothesis in the abstract and the last paragraph of the Introduction (pg. 6):

      “To test the hypothesis that heavy metal (HM) stress influenced the evolutionary transition of teosinte to maize, we exposed both subspecies to sublethal concentrations of copper and cadmium etc…”

      A comprehensive panel of heavy metals would not be more accurate in terms of simulating the composition of soils evolving across 9,000 years in the region where maize presumably emerged. Copper (Cu) and cadmium (Cu) correspond each to a different affinity group for proteins of the ZmHMA family. ZmHMA1 has preferential affinity for Cu and Ag (silver), whereas ZmHMA7 has preferential affinity to Cd, Zn (zinc), Co (cobalt), and Pb (lead). Since these P1b-ATPase transporters mediate the movement of divalent cations, their function remains consistent regardless of the specific metal tested, provided it belongs to the respective affinity group. By applying sublethal concentrations of Cd (16 mg/kg) and Cu (400 mg/kg), we caused a measurable physiological response while allowing plants to complete their life cycle, including the reproductive phase, facilitating a comprehensive analysis of metal stress adaptation. Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)

      Based on comments by both reviewers, we present now a large transcriptional analysis that incorporates HM responses to lead (Pb) and chromium (Cr), in addition to Cu. Results show that many genes responding to Pb and Cr were also positively selected across the maize genome, suggesting that HM stress led to a ubiquitous rather than a specific evolutionary response to heavy metals (please see our response to Reviewer#1 and sections in pgs. 11 to 13) .

      Real-time qPCR is an accurate and reliable approach to assess the expression of specific genes such as ZMHMA1 and Tb1, but we agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      There are two phenotypic changes clearly connected with the genetic mechanisms involved in the parviglumis to maize transition: plant height and the number of seminal roots (not nodal roots). These changes have been now emphasized in the Abstract and the description of the results.

      Regarding the possibility for HM stress to represent a confounding factor in the selection of maize and not a driving factor, we expanded the genomic analysis of genetic diversity well beyond the analysis of the three genes under initial study, to cover a segment of 11.47 Mb comprised between ZmSKUs5 and ZmHMA1. We compared nucleotide variability by using 100 bp bins covering loci composed of two 30 Kb segments up and downstream of coding sequences, respectively, and the coding sequence itself, for 173 genes present within the genomic region comprised between ZmSKUs5 and ZmHMA (Figure S1 and Supplementary File 6). The full analysis is presented in a new section pgs. 11 and 12. We found that 166 out of 173 loci show signatures of positive selection and are roughly organized in five independent subregions of variable length. Four out of five subregions contain more than one HM or oxidative stress response gene within loci showing signatures of positive selection. Although multiple factors other than HM stress could have played a role in the evolutionary mechanisms that affected the genetic diversity of chr.5, large scale transcriptomic data corresponding to independent experiments aiming at understanding the response of maize roots to HM stress allowed the identification of 49 additional HM response genes within loci showing positive selection across the genome, a proportion (43.3%) far greater than the proportion of loci containing response genes to other types of abiotic stress not related to HMs (28.6%). These results are described in detail in pgs. 12 and 13 (Figure S3 and Supplementary File 7). These results provide strong evidence in favor of HM stress and not another factor driving positive selection.

      We now provide precise and pertinent paleoenvironmental data on the potential influence of heavy metals in the field. In sections pgs. 17 to 20 we review paleoenvironmental studies revealing periods of climatic instability in the presumed region of maize emergence during the early Holocene, and data indicating that the date and region where maize emerged is convergent with the dates and locations of several volcanic eruptions occurred during the early and middle Holocene in that same region. Please see responses to Reviewer#1 for details.

      We agree that our results do not allow to establish a direct regulatory link between the function of Tb1, the pleiotropic parviglumis phenotype under HM stress, and the function of ZmHMA1. We also concede that the large transcriptional analysis of HM response in maize (presented above) does not allow to elucidate a possible connection between these two genes. Therefore, we have substantially downplayed our conclusion in this section by modifying the end of the section in pg. 17, that now reads:

      “These results do not allow to directly link the regulation of ZmHMA1 expression to the function of Tb1; however, they open an opportunity to further investigate the possibility that under HM stress, the formation of secondary ramifications in teosinte parviglumis could be repressed by transcription factors of the TCP family, including Tb1.”

      This is also emphasized in the Discussion (pg 21) as follows:

      “Under HM stress, we also show that Tb1 is overexpressed in the apical meristem of teosinte parviglumis, suggesting that formation of secondary ramifications is repressed by Tb1 function under HM stress, as in extant maize. At this stage we cannot discard the possibility that Tb1 upregulation in parviglumis reflects a more generalized response to abiotic stress; however, the expression ZmHMA1 is downregulated in W22 wild-type maize meristems in the presence of HMs but upregulated in teosinte parviglumis meristems, suggesting that a specific regulatory shift relating HM responses and ZmHMA1 function occurred during the teosinte parviglumis to maize transition.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      While the dataset generated provides an interesting foundation for hypothesis testing on HM stress and domestication, the current data do not sufficiently support the conclusions of the manuscript.

      (1) The description of maize and teosinte architecture under HM stress is well presented.

      However, traits like shoot height, leaf size reduction, and biomass loss also occur under other environmental stresses such as drought and salinity. Additional evidence beyond shoot and root architecture would help validate the link between tb1 expression and specific ZmHMA genes under HM stress, or whether it reflects a more generalized stress response.

      We have already addressed in detail this point in the public response to Reviewer#1.

      (2) The nucleotide variability analysis is interesting, but I would have liked to see additional information to clarify the choice of the data selection and the strength of the conclusions with human selection.

      We have already addressed in detail this point in the public response to Reviewer#1.

      a) The choice of Tripsacum dactyloides as the outgroup to determine nucleotide variability seems to be distant, and I wonder whether other combinations with a closer outgroup or multiple outgroups were tried to provide a more accurate context.

      Nucleotide variability in Tripsacum dactyloides is used to graphically illustrate an external reference and not as an outgroup in the extended analysis of genetic diversity at the locus and genomic level. We did not used Tripsacum dactyloides as an outgroup in our statisticalm analysis. We could have indeed a closer teosinte subspecies as an outgroup, but at this stage no data warrants that environmentally-related selective pressures could have affected genetic diversite in other teosintes. This possibility in currently being investigated.

      b) Evolutionary differences not related to human influence could affect the results. The phrase "order of magnitude difference in π values" needs statistical validation (e.g., confidence intervals, p-values).

      We agree and have eliminated the sentence, as it is no longer relevant at the light of the detailed genomic analysis of genetic diversity prsented in Supplementary File 6.

      c) The comparison with ZmGLB1, a neutral control locus, suggests that domestication-related changes in nucleotide variability are specific to the three candidate genes. However, the concept of neutrality is complex, and while ZmGLB1 may be considered neutral in this case, the argument does not address the possibility of other factors, such as linked selection, that could influence variability in these genes. Referencing Hufford et al. is insufficient and would require a deeper argument.

      We also agree with this comment. We think that the influence and consequences of linked selection are now well documented for 11.46 Mb analyzed in chr.5 (pgs 11 and 12) in the main text and Supplementary File 6).

      (3) The statement: "Our evidence indicates that HM stress revealed a teosinte parviglumis environmental plasticity that is directly related to the function of specific HM response genes that were affected by domestication through human selection" is not supported by the presented data. The rationale for the specific Cd/Cu dosage used is unclear. A dose-response gradient would better demonstrate the nature and strength of the plastic response.

      Previous reports support the rationale for the specific HM dosage in this study; Cu/Cd dosage response gradients have been conducted in maize (AbdElgawad et al. 2020; Atta et al., 202), but since no studies have been conducted in teosinte, we reasoned that it was important to apply the same treatment to both subspecies. We have now emphasized this rationale by adding the following in pg XX: “Whereas higher doses impair flowering or are lethal, lower Cu/Cd concentrations do not consistently show conventional phenotypic responses such as reduced plant growth (AbdElgawad et al. 2020; Atta et al., 2023)”.

      We agree that the statement raised by the reviewer needed revision at the light of our results. We did revise the statement to accurately reflect our current evidence as follows: “Our results reveal a teosinte parviglumis environmental plasticity that is likely related to the function of HM response genes positively selected during the teosinte parviglumis to maize transition.”

      (4) In maize, TEs are known to influence gene expression under abiotic stress, including for tb1 (PMID: 25569788). Since the author appears to make a causative conclusion between ZmHMA1, TB1, and HM stress, I would have liked to see a whole-transcriptome analysis and not a curation of two genes to determine whether other factors, such as TEs, can have that would lead to similar outcomes.

      We agree that is definetely a possibility that we have not investigated at this stage. However, we added a pargraph to reflect this pertinent suggestion:

      “Previous studies have demonstrated that transposable elements (TEs) contribute to activation of maize genes in response to abiotic stress, affecting up to 20% of the genes upregulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress (Makarevitch et al., 2015). It is therefore possible that the HM response of some specific genes that influenced maize emergence or domestication could be mediated by TEs influencing or driving their transcriptional regulation.”

      (5) I would suggest that the authors carefully review the tables, figures, and the corresponding legends. For example :

      a) Table 2 is called before Table 1, I would therefore suggest changing the numbering to reflect the paragraph order.

      Thank you for your help, we did change the order of the Tables in the new version.

      b) In Table 2, it is not clear whether the P value applies to the mean difference between WT and the mutant zmhma1, either in the presence or the absence of heavy metals. In addition, the authors need to use the P-value to estimate the differences between WT in the absence vs presence of HM, and WT in the absence of HM versus the mutant in the absence of HM (idem for presence).

      We did address this issue in detail and added P-values and specific pairwise comparisons to that Table (now Table 1). Data are presented as mean ± standard deviation and were tested by a paired Student’s T-Test. When the effects were significant according to T-Test, the treatments were compared with the Welch two sample T-Test at P < 0.05.

      c) Table 1 and Table 2: Indicate what type of statistical test was used and the number of plants used for each experiment (n). Also, I recommend the use of scientific notation for the P-values.

      The statistical tests have now been indicated, scientific notation has been added to the P-values; the number of plants and biological replicates are indicated in the Methods section.

      d) Lines 202 and 204: I assume Table 1 should be called instead of Table 2.

      This error has been corrected.

      e) General: In the text, when significance is highlighted along with measurements, the p-value needs to be added.

      We have added the P-value along the measurement for all significant differences.

      f) In the text, it is also mentioned that "the expression of ZMHMA1 was significantly increased in the presence of HMs (Figure 3c)". We are looking here at an RT-PCR, which is qualitative and without a robust quantitative comparison and statistics, I cannot conclude this assessment based on the presented evidence. No statistical measure is indicated here.

      Panel 3c is not RT-PCR but a real-time qPCR, showing relative fold-change, normalized to actin, with a 3-technical triplicate per 3 biological replicates). We have added error bars (SD) and P-values represented by asterisks (calculated with Student's t statistic) to support significant differences (P<0.05 and P<0.01). ZmHMA1 expression was significantly increased in the presence of HMs only in teosinte; there was no significant difference in maize.

      g) Figure 3 should at least have the gene name in the figure to quickly understand the figure panel. The key conserved domains should also be identified.

      We agree and apologize for the omission. The gene names have been added adjacent to the structures.

      h) Sentence at lines 459-460 lacks words and punctuation.

      This unfortunate rror has also been corrected.

      i) Figure S1, the reference Lemmon and Doebley, 2024 should be Lemmon and Doebley, 2014 to harmonize with the text.

      The correct year is 2014. We have corrected this error.

      Reviewer #2 (Recommendations for the authors):

      (1) The narrative should be clearer, starting with a clearer hypothesis that is later sustained or not in the results, and then discussed in the idea and speculation section.

      Thank you for the comment. We have clarified the hypothesis, it is included in the abstract and the last paragraph of the Introduction. We hope it is now clear that the evidence presented supports our hypothesis

      (2) Focus more on traits that are relevant, for example, nodal and seminal roots.

      We modified the text to emphasize three relevant traits. In the case of teosinte under HM stress, absence of tillering and increase in the number of female inflorescences. In the case of the zmha1 mutant under HM stress, differences in the number of nodal roots, and differences in height.

      (3) RNA-seq in Cu/Cd stress could make the work much more useful and complete.

      As previously mentioned, we have incorporated a large scale transcriptional analysis on the basis of six transcriptomes statistically validated (Table S5). Please see sections pgs. 11 to 13 for details.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lai and Doe address the integration of spatial information with temporal patterning and genes that specify cell fate. They identify the Forkhead transcription factor Fd4 as a lineage-restricted cell fate regulator that bridges transient spatial transcription factors to terminal selector genes in the developing Drosophila ventral nerve cord. The experimental evidence convincingly demonstrates that Fd4 is both necessary for lateborn NB7-1 neurons, but also sufficient to transform other neural stem cell lineages toward the NB7-1 identity. This work addresses an important question that will be of interest to developmental neurobiologists: How can cell identities defined by initial transient developmental cues be maintained in the progeny cells, even if the molecular mechanism remains to be investigated? In addition, the study proposes a broader concept of lineage identity genes that could be utilized in other lineages and regions in the Drosophila nervous system and in other species.

      Thanks for the accurate summary and positive comments!

      While the spatial factors patterning the neuroepithelium to define the neuroblast lineages in the Drosophila ventral nerve cord are known, these factors are sometimes absent or not required during neurogenesis. In the current work, Lai and Doe identified Fd4 in the NB7-1 lineage that bridges this gap and explains how NB7-1 neurons are specified after Engrailed (En) and Vnd cease their expression. They show that Fd4 is transiently co-expressed with En and Vnd and is present in all nascent NB7-1 progenies. They further demonstrate that Fd4 is required for later-born NB7-1 progenies and sufficient for the induction of NB7-1 markers (Eve and Dbx) while repressing markers of other lineages when force-expressed in neural progenitors, e.g., in the NB56 lineage and in the NB7-3 lineage. They also demonstrate that, when Fd4 is ectopically expressed in NB7-3 and NB5-6 lineages, this leads to the ectopic generation of dorsal muscle-innervating neurons. The inclusion of functional validation using axon projections demonstrates that the transformed neurons acquire appropriate NB7-1 characteristics beyond just molecular markers. Quantitative analyses are thorough and well-presented for all experiments.

      Thanks for the positive comments!

      (1) While Fd4 is required and sufficient for several later-born NB7-1 progeny features, a comparison between early-born (Hb/Eve) and later-born (Run/Eve) appears missing for pan-progenitor gain of Fd4 (with sca-Gal4; Figure 4) and for the NB7-3 lineage (Figure 6). Having a quantification for both could make it clearer whether Fd4 preferentially induces later-born neurons or is sufficient for NB7-1 features without temporal restriction.

      We quantified the percentage of Hb+ and Runt+ cells among Eve+ cells with sca-gal4, and the results are shown in Figure 4-figure supplement 1. We found that the proportion of early-born cells is slightly reduced but the proportion of later-born cells remain similar. Interestingly, we also found a subset of Eve+ cells with a mixed fate (Hb+Runt+) but the reason remains unclear.

      (2) Fd4 and Fd5 are shown to be partially redundant, as Fd4 loss of function alone does not alter the number of Eve+ and Dbx+ neurons. This information is critical and should be included in Figure 3.

      Because every hemisegment in an fd4 single mutant is normal, we just added it as the following text: “In fd4 mutants, we observe no change in the number of Eve+ neurons or Dbx+ neurons (n=40 hemisegments).”

      (3) Several observations suggest that lineage identity maintenance involves both Fd4dependent and Fd4-independent mechanisms. In particular, the fact that fd4-Gal4 reporter remains active in fd4/fd5 mutants even after Vnd and En disappear indicates that Fd4's own expression, a key feature of NB7-1 identity, is maintained independently of Fd4 protein. This raises questions about what proportion of lineage identity features require Fd4 versus other maintenance mechanisms, which deserves discussion.

      We agree, thanks for raising this point. We add the following text to the Discussion. “Interestingly, the fd4 fd5 mutant maintains expression of fd4:gal4, suggesting that the fd4/fd5 locus may have established a chromatin state that allows “permanent” expression in the absence of Vnd, En, and Fd4/Fd5 proteins.”

      (4) Similarly, while gain of Fd4 induces NB7-1 lineage markers and dorsal muscle innervation in NB5-6 and NB7-3 lineages, drivers for the two lineages remain active despite the loss of molecular markers, indicating some regulatory elements retain activity consistent with their original lineage identity. It is therefore important to understand the degree of functional conversion in the gain-of-function experiments. Sparse labeling of Fd4 overexpressing NB5-6 and NB7-3 progenies, as was done in Seroka and Doe (2019), would be an option.

      We agree it is interesting that the NB7-3 and NB5-6 drivers remain on following Fd4 misexpression. To explore this, we used sca-gal4 to overexpress Fd4 and observed that Lbe expression persisted while Eg was largely repressed (Author response image 1). The results show that Lbe and Eg respond differently to Fd4. A non-mutually exclusive possibility is that the continued expression of lbe-Gal4 UAS-GFP or eg-Gal4 UAS-GFP may be due to the lengthy perdurance of both Gal4 and GFP.

      Author response image 1.

      (5) The less-penetrant induction of Dbx+ neurons in NB5-6 with Fd4-overexpression is interesting. It might be worth the authors discussing whether it is an Fd4 feature or an NB56 feature by examining Dbx+ neuron number in NB7-3 with Fd4-overexpression.

      In the NB7-3 lineages misexpressing Fd4, only 5 lineages generated Dbx+ cells (0.1±0.4, n=64 hemisegments), suggesting that the low penetrance of Dbx+ induction is an intrinsic feature of Fd4 rather than lineage context. We have added this information in the results section.

      (6) It is logical to hypothesize that spatial factors specify early-born neurons directly, so only late-born neurons require Fd4, but it was not tested. The model would be strengthened by examining whether Fd4-Gal4-driven Vnd rescues the generation of laterborn neurons in fd4/fd5 mutants.

      When we used en-gal4 driver to express UAS-vnd in the fd4/fd5 mutant background, we found an average 7.4±2.2 Eve+ cells per hemisegment (n=36), significantly higher than fd4/fd5 mutant alone (3.9±0.8 cells, n=52, p=2.6x10<sup>-11</sup>) (Figure 3J). In addition, 0.2±0.5 Eve+ cells were ectopic Hb+ (excluding U1/U2), indicating that Vnd-En integration is sufficient to generate both early-born and late-born Eve+ cells in the fd4/fd5 mutants. We have added the results to the text.

      (7) It is mentioned that Fd5 is not sufficient for the NB7-1 lineage identity. The observation is intriguing in how similar regulators serve distinct roles, but the data are not shown. The analysis in Figure 4 should be performed for Fd5 as supplemental information.

      Thanks for the suggestion. Because the results are exactly the same as the wild type, we don’t think it is necessary to provide an additional images or analysis as supplemental information.

      Reviewer #2 (Public review):

      Via a detailed expression analysis, they find that Fd4 is selectively expressed in embryonic NB7-1 and newly born neurons within this lineage. They also undertake a comprehensive genetic analysis to provide evidence that fd4 is necessary and sufficient for the identity of NB7-1 progeny.

      Thanks for the accurate summary!

      The analysis is both careful and rigorous, and the findings are of interest to developmental neurobiologists interested in molecular mechanisms underlying the generation of neuronal diversity. Great care was taken to make the figures clear and accessible. This work takes great advantage of years of painstaking descriptive work that has mapped embryonic neuroblast lineages in Drosophila.

      Thanks for the positive comments!

      The argument that Fd4 is necessary for NB7-1 lineage identity is based on a Fd4/Fd5 double mutant. Loss of fd4 alone did not alter the number of NB7-1-derived Eve+ or Dbx+ neurons. The authors clearly demonstrate redundancy between fd4 and fd5, and the fact that the LOF analysis is based on a double mutant should be better woven through the text.The authors generated an Fd5 mutant. I assume that Fd5 single mutants do not display NB7-1 lineage defects, but this is not stated. The focus on Fd4 over Fd5 is based on its highly specific expression profile and the dramatic misexpression phenotypes. But the LOF analysis demonstrates redundancy, and the conclusions in the abstract and through the results should reflect the existence of Fd5 in the conclusions of this manuscript.

      We agree, and have added new text to clarify the single mutant phenotypes (there are none) and the double mutant phenotype (loss of NB7-1 molecular and morphological features. The following text is added to the manuscript: “Not surprisingly, we found that fd4 single mutants or fd5 single mutants had no phenotype (Eve+ neurons were all normal). Thus, to assess their roles, we generated a fd4 and fd5 double mutant. Because many Eve+ and Dbx+ cells are generated outside of NB7-1 lineage, it was also essential to identify the Eve+ or Dbx+ cells within NB7-1 lineage in wild type and fd4 mutant embryos. To achieve this, we replaced the open reading frame of fd4 with gal4 (called fd4-gal4) (see Methods); this stock simultaneously knocked out both fd4 and fd5 (called fd4/fd5 mutant hereafter) while specifically labeling the NB7-1 lineage. For the remainder of this paper we use the fd4/fd5 double mutant to assay for loss of function phenotypes.”

      It is notable that Fd4 overexpression can rewire motor circuits. This analysis adds another dimension to the changes in transcription factor expression and, importantly, demonstrates functional consequences. Could the authors test whether U4 and U5 motor axon targeting changes in the fd4/fd5 double mutant? To strengthen claims regarding the importance of fd4/fd5 for lineage identity, it would help to address terminal features of U motorneuron identity in the LOF condition.

      Thanks for raising this important point. We examined the axon targeting on body wall muscles in both wild type and in fd4/fd5 mutant background and added the results in Figure 3-figure supplement 2. We found that the axon targeting in the late-born neuron region (LL1) is significantly reduced, suggesting that the loss of late-born neurons in fd4/fd5 mutant leads to the absence of innervation of corresponding muscle targets.

      Reviewer #3 (Public review):

      The goal of the work is to establish the linkage between the spatial transcription factors (STFs) that function transiently to establish the identities of the individual NBs and the terminal selector genes (typically homeodomain genes) that appear in the newborn postmitotic neurons. How is the identity of the NB maintained and carried forward after the spatial genes have faded away? Focusing on a single neuroblast (NB 7-1), the authors present evidence that the fork-head transcription factor, fd4, provides a bridge linking the transient spatial cues that initially specified neuroblast identity with the terminal selector genes that establish and maintain the identity of the stem cell's progeny.

      Thanks for the positive comments!

      The study is systematic, concise, and takes full advantage of 40+ years of work on the molecular players that establish neuronal identities in the Drosophila CNS. In the embryonic VNC, fd4 is expressed only in the NB 7-1 and its lineage. They show that Fd4 appears in the NB while the latter is still expressing the Spatial Transcription Factors and continues after the expression of the latter fades out. Fd4 is maintained through the early life of the neuronal progeny but then declines as the neurons turn on their terminal selector genes. Hence, fd4 expression is compatible with it being a bridging factor between the two sets of genes.

      Thanks for the accurate summary!

      Experimental support for the "bridging" role of Fd4 comes from a set of loss-of-function and gain-of-function manipulations. The loss of function of Fd4, and the partially redundant gene Fd5, from lineage 7-1 does not aoect the size of the lineage, but terminal markers of late-born neuronal phenotypes, like Eve and Dbx, are reduced or missing. By contrast, ectopic expression of fd4, but not fd5, results in ectopic expression of the terminal markers eve and Dbx throughout diverse VNC lineages.

      Thanks for the accurate summary!

      A detailed test of fd4's expression was then carried out using lineages 7-3 and 5-6, two well-characterized lineages in Drosophila. Lineage 7-3 is much smaller than 7-1 and continues to be so when subjected to fd4 misexpression. However, under the influence of ectopic Fd4 expression, the lineage 7-3 neurons lost their expected serotonin and corazonin expression and showed Eve expression as well as motoneuron phenotypes that partially mimic the U motoneurons of lineage 7-1.

      Thanks for the positive comments!

      Ectopic expression of Fd4 also produced changes in the 5-6 lineage. Expression of apterous, a feature of lineage 5-6, was suppressed, and expression of the 7-1 marker, Eve, was evident. Dbx expression was also evident in the transformed 5-6 lineages, but extremely restricted as compared to a normal 7-1 lineage. Considering the partial redundancy of fd4 and fd5, it would have been interesting to express both genes in the 5-6 lineage. The anatomical changes that are exhibited by motoneurons in response to Fd4 expression confirm that these cells do, indeed, show a shift in their cellular identity.

      We appreciate the positive comments. We agree double misexpression of Fd4 and Fd5 might give a stronger phenotype (as the reviewer says) but the lack of this experiment does not change the conclusions that Fd4 can promote NB7-1 molecular and morphological aspects at the expense of NB5-6 molecular markers.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The title of Figure 4 may be intended to include the term "Widespread", not "Wild spread". (Though the expansion of the Eve and Dbx with Fd4 is quite remarkable…).

      Done!

      Reviewer #3 (Recommendations for the authors):

      (1) Line 138. Is part of the sentence missing? Did the authors mean to say "that fd5 is coexpressed with fd4 in NB7-1 and its .....".

      Done!

      (2) ln 237: In trying to explain the "U-like" phenotype of the transformed motoneurons in lineage 7-3, the authors speculate that "perhaps their late birth did not give them time to extend to the most distant dorsal muscles ". It is very difficult to convince a motoneuron to stop growing in the absence of a target! An alternate possibility is that since there is only one or two U neurons made instead of the normal five, the growing motoneuron has enough information to direct them to the dorsal domain, but they lack the specification that allows them to recognize a specific muscle target.

      We agree there are additional possibilities, and now update the text to say: “We observed that these transformed neurons did not innervate the dorsal muscles, perhaps their late birth did not give them time to extend to the most distant dorsal muscles, or they were incompletely specified.”

      (3) In the References, I think that the Anderson et al. reference should also include "BioRxiv" before the DOI.

      Done!

      (4) Figure 6A for wild-type 7-3 lineage. The corazonin expression appears to be expressed in EW2 as well as EW3. This should be explained.

      We agree it looks that way, due to the 3D rotation used; we now replace it with a more representative image. Note that our quantification always shows a single Cor+ neuron per hemisegment.

      (5) Figure 7: Issues of terminology. The designation of "longitudinal" for muscles is traditionally in reference to the body axis, such as the Dorsal Longitudinal Muscles (DLM) of the adult thorax. The "longitudinal" muscles in the figure are really "transverse" muscles. I also suggest using "axon" or "neurites" rather than "filament". For the middle and bottom parts of E and F, are these lateral and ventral views? They should be designated as such.

      Thanks, we agree and have made the changes, using Axon instead of Filament, and labeling the views (lateral and ventro-lateral).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses:

      The technical approach is strong and the conceptual framing is compelling, but several aspects of the evidence remain incomplete. In particular, it is unclear whether the reported changes in connectivity truly capture causal influences, as the rank metrics remain correlational and show discrepancies with the manipulation results.

      We agree that our functional connectivity ranking analyses cannot establish causal influences. As discussed in the manuscript, besides learning-related activity changes, the functional connectivity may also be influenced by neuromodulatory systems and internal state fluctuations. In addition, the spatial scope of our recordings is still limited compared to the full network implicated in visual discrimination learning, which may bias the ranking estimates. In future, we aim to achieve broader region coverage and integrate multiple complementary analyses to address the causal contribution of each region.

      The absolute response onset latencies also appear slow for sensory-guided behavior in mice, and it is not clear whether this reflects the method used to define onset timing or factors such as task structure or internal state.

      We believe this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      Furthermore, the small number of animals, combined with extensive repeated measures, raises questions about statistical independence and how multiple comparisons were controlled.

      We agree that a larger sample size would strengthen the robustness of the findings. However, as noted above, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      The optogenetic experiments, while intended to test the functional relevance of rank increasing regions, leave it unclear how effectively the targeted circuits were silenced. Without direct evidence of reliable local inhibition, the behavioral effects or lack thereof are difficult to interpret.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy.

      Details on spike sorting are limited.

      We have provided more details on spike sorting in method section, including the exact parameters used in the automated sorting algorithm and the subsequent manual curation criteria.

      Reviewer #2 (Public review):

      Weaknesses:

      I had several major concerns:

      (1) The number of mice was small for the ephys recordings. Although the authors start with 7 mice in Figure 1, they then reduce to 5 in panel F. And in their main analysis, they minimize their analysis to 6/7 sessions from 3 mice only. I couldn't find a rationale for this reduction, but in the methods they do mention that 2 mice were used for fruitless training, which I found no mention in the results. Moreover, in the early case, all of the analysis is from 118 CR trials taken from 3 mice. In general, this is a rather low number of mice and trial numbers. I think it is quite essential to add more mice.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      As we noted in our response to Reviewer #1, the current dataset has inherent limitations in both the number of recorded regions and the behavioral paradigm. Given the considerable effort required to achieve high-quality unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. These improvements will enable us to collect data from a larger sample size and extract more precise insights into mesoscale dynamics during learning.

      (2) Movement analysis was not sufficient. Mice learning a go/no-go task establish a movement strategy that is developed throughout learning and is also biased towards Hit trials. There is an analysis of movement in Figure S4, but this is rather superficial. I was not even sure that the 3 mice in Figure S4 are the same 3 mice in the main figure. There should be also an analysis of movement as a function of time to see differences. Also for Hits and FAs. I give some more details below. In general, most of the results can be explained by the fact that as mice gain expertise, they move more (also in CR during specific times) which leads to more activation in frontal cortex and more coordination with visual areas. More needs to be done in terms of analysis, or at least a mention of this in the text.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (3) Most of the figures are over-detailed, and it is hard to understand the take-home message. Although the text is written succinctly and rather short, the figures are mostly overwhelming, especially Figures 4-7. For example, Figure 4 presents 24 brain plots! For rank input and output rank during early and late stim and response periods, for early and expert and their difference. All in the same colormap. No significance shown at all. The Δrank maps for all cases look essentially identical across conditions. The division into early and late time periods is not properly justified. But the main take home message is positive Δrank in OFC, V2M, V1 and negative Δrank in ThalMD and Str. In my opinion, one trio map is enough, and the rest could be bumped to the Supplementary section, if at all. In general, the figure in several cases do not convey the main take home messages. See more details below.

      We thank the reviewer for this valuable critique. The statistical significance corresponding to the brain plots (Figure 4 and Figure 5) was presented in Figure S3 and S5 (now Figure S5 and S7 in the revised manuscript), but we agree that the figure can be simplified to focus on the key results.

      In the revised manuscript, we have condensed these figures to focus on the most important comparisons to make the visual presentation more concise and the take-home message clearer.

      (4) The analysis is sometimes not intuitive enough. For example, the rank analysis of input and output rank seemed a bit over complex. Figure 3 was hard to follow (although a lot of effort was made by the authors to make it clearer). Was there any difference between the output and input analysis? Also, the time period seems redundant sometimes. Also, there are other network analysis that can be done which are a bit more intuitive. The use of rank within the 10 areas was not the most intuitive. Even a dimensionality reduction along with clustering can be used as an alternative. In my opinion, I don't think the authors should completely redo their analysis, but maybe mention the fact that other analyses exist

      We appreciate the reviewer’s comment. In brief, the input- and output-rank analyses yielded largely similar patterns across regions in CR trials, although some differences were observed in certain areas (e.g., striatum) in Hit trials, where the magnitude of rank change was not identical between input and output measures. We have condensed the figures to only show averaged rank results, and the colormap was updated to better covey the message.

      We did explore dimensionality reduction applied to the ranking data. However, the results were not intuitive as well and required additional interpretation, which did not bring more insights. Still, we acknowledge that other analysis approaches might provide complementary insights.

      Reviewer #3 (Public review):

      Weaknesses:

      The weakness is also related to the strength provided by the method. It is demonstrated in the original method that this approach in principle can track individual units for four months (Luan et al, 2017). The authors have not showed chronically tracked neurons across learning. Without demonstrating that and taking advantage of analyzing chronically tracked neurons, this approach is not different from acute recording across multiple days during learning. Many studies have achieved acute recording across learning using similar tasks. These studies have recorded units from a few brain areas or even across brain-wide areas.

      We appreciate the reviewer’s important point. We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses. Concentrating probes in fewer regions would allow us to obtain enough units tracked across learning in future studies to fully exploit the advantages of this method.

      Another weakness is that major results are based on analyses of functional connectivity that is calculated using the cross-correlation score of spiking activity (TSPE algorithm). Functional connection strengthen across areas is then ranked 1-10 based on relative strength. Without ground truth data, it is hard to judge the underlying caveats. I'd strongly advise the authors to use complementary methods to verify the functional connectivity and to evaluate the mesoscale change in subnetworks. Perhaps the authors can use one key information of anatomy, i.e. the cortex projects to the striatum, while the striatum does not directly affect other brain structures recorded in this manuscript

      We agree that the functional connectivity measured in this study relies on statistical correlations rather than direct anatomical connections. We plan to test the functional connection data with shorter cross-correlation delay criteria to see whether the results are consistent with anatomical connections and whether the original findings still hold.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The small number of mice, each contributing many sessions, complicates the  interpretation of the data. It is unclear how statistical analyses accounted for the small  sample size, repeated measures, and non-independence across sessions, or whether  multiple comparisons were adequately controlled.

      We realized the limitation from the small number of animal subjects, yet the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size. Though we agree that a larger sample size would strengthen the robustness of the findings, however, as noted below the current dataset has inherent limitations in both the scope of recorded regions and the behavioral paradigm.

      Given the considerable effort required to achieve sufficient unit yields across all targeted regions, we wish to adjust the set of recorded regions, improve behavioral task design, and implement better analyses in future studies. This will allow us to both increase the number of animals and extract more precise insights into mesoscale dynamics during learning.

      (2) The ranking approach, although intuitive for visualizing relative changes in  connectivity, is fundamentally descriptive and does not reflect the magnitude or  reliability of the connections. Converting raw measures into ordinal ranks may obscure  meaningful differences in strength and can inflate apparent effects when the underlying  signal is weak.

      We agree with this important point. As stated in the manuscript, our motivation in taking the ranking approach was that the differences in firing rates might bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (3) The absolute response onset latencies also appear quite slow for sensory-guided  behavior in mice, and it remains unclear whether this reflects the method used to  determine onset timing or factors such as task design, sensorimotor demands, or  internal state. The approach for estimating onset latency by comparing firing rates in  short windows to baseline using a t-test raises concerns about robustness, as it may  be sensitive to trial-to-trial variability and yield spurious detections.

      We agree this may be primarily due to our conservative definition of onset timing. Specifically, we required the firing rate to exceed baseline (t-test, p < 0.05) for at least 3 consecutive 25-ms time windows. This might lead to later estimates than other studies, such as using the latency to the first spike after visual stimulus onset (Siegle et al., 2021) or the time to half-max response (Goldbach, Akitake, Leedy, & Histed, 2021).

      The estimation of response onset latency in our study may also be affected by potential internal state fluctuations of the mice. We used the time before visual stimulus onset as baseline firing, since firing rates in this period could be affected by trial history, we acknowledge this may increase the variability of the baseline, thus increase the difficulty to statistically detect the onset of response.

      Still, we believe these concerns do not affect the observation of the formation of compressed activity sequence in CR trials during learning.

      (4) Details on spike sorting are very limited. For example, defining single units only by  an interspike interval threshold above one millisecond may not sufficiently rule out  contamination or overlapping clusters. How exactly were neurons tracked across days  (Figure 7B)?

      We have added more details on spike sorting, including the processing steps and important parameters used in the automated sorting algorithm. Only the clusters well isolated in feature space were accepted in manual curation.

      We attempted to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      This is now stated more clearly in the discussion section.

      (5) The optogenetic experiments, while designed to test the functional relevance of  rank-increasing regions, also raise questions. The physiological impact of the inhibition  is not characterized, making it unclear how effectively the targeted circuits were  actually silenced. Without clearer evidence that the manipulations reliably altered local  activity, the interpretation of the observed or absent behavioral effects remains  uncertain.

      We appreciate this important point. Due to the design of the flexible electrodes and the implantation procedure, bilateral co-implantation of both electrodes and optical fibers was challenging, which prevented us from directly validating the inhibition effect in the same animals used for behavior. In hindsight, we could have conducted parallel validations using conventional electrodes, and we will incorporate such controls in future work to provide direct evidence of manipulation efficacy. 

      (6) The task itself is relatively simple, and the anatomical coverage does not include  midbrain or cerebellar regions, limiting how broadly the findings can be generalized to more flexible or ethologically relevant forms of decision-making.

      We appreciate this advice and have expanded the existing discussion to more explicitly state that the relatively simple task design and anatomical coverage might limit the generalizability of our findings.

      (7) The abstract would benefit from more consistent use of tense, as the current mix of  past and present can make the main findings harder to follow. In addition, terms like  "mesoscale network," "subnetwork," and "functional motif" are used interchangeably in  places; adopting clearer, consistent terminology would improve readability.

      We have changed several verbs in abstract to past form, and we now adopted a more consistent terminology by substituting “functional motif” as “subnetwork”. We still feel the use of

      “mesoscale network” and “subnetwork” could emphasize different aspects of the results according to the context, so these words are kept the same.

      (8) The discussion could better acknowledge that the observed network changes may  not reflect task-specific learning alone but could also arise from broader shifts in  arousal, attention, or motivation over repeated sessions.

      We have expanded the existing discussion to better acknowledge the possible effects from broader shifts in arousal, attention, or motivation over repeated sessions.

      (9) The figures would also benefit from clearer presentation, as several are dense and  not straightforward to interpret. For example, Figure S8 could be organized more  clearly to highlight the key comparisons and main message

      We have simplified the over-detailed brain plots in Figure 4-5, and the plots in Figure 6 and S8 (now S10 in the revised manuscript).

      (10) Finally, while the manuscript notes that data and code are available upon request,  it would strengthen the study's transparency and reproducibility to provide open access  through a public repository, in line with best practices in the field.

      The spiking data, behavior data and codes for the core analyses in the manuscript are now shared in pubic repository (Dryad). And we have changed the description in the Data Availability secition accordingly.

      Reviewer #2 (Recommendations for the authors):

      (A) Introduction:

      (1) "Previous studies have implicated multiple cortical and subcortical regions in visual  task learning and decision-making". No references here, and also in the next sentence.

      The references were in the following introduction and we have added those references here as well.

      We also added one review on cortical-subcortical neural correlates in goal-directed behavior (Cruz et al., 2023).

      (2) Intro: In general, the citation of previous literature is rather minimal, too minimal.  There is a lot of studies using large scale recordings during learning, not necessarily  visual tasks. An example for brain-wide learning study in subcortical areas is Sych et  al. 2022 (cell reports). And for wide-field imaging there are several papers from the  Helmchen lab and Komiyama labs, also for multi-area cortical imaging.

      We appreciate this advice. We included mainly visual task learning literature to keep a more focused scope around the regions and task we actually explored in this study. We fear if we expand the intro to include all the large-scale imaging/recording studies in learning field, the background part might become too broad.

      We have included (Sych, Fomins, Novelli, & Helmchen, 2022) for its relevance and importance in the field.

      (3) In the intro, there is only a mention of a recording of 10 brain regions, with no  mention of which areas, along with their relevance to learning. This is mentioned in the  results, but it will be good in the intro.

      The area names are now added in intro.

      (B) Results:

      (1) Were you able to track the same neurons across the learning profile? This is not  stated clearly.

      We did attempt to track the same neurons across learning in this project. However, due to the limited number of electrodes implanted in each brain region, the number of chronically tracked neurons in each region was insufficient to support statistically robust analyses.

      We now stated this more clearly in the discussion section.

      (2) Figure 1 starts with 7 mice, but only 5 mice are in the last panel. Later it goes down  to 3 mice. This should be explained in the results and justified.

      We apologize for the confusion. As described in the Methods section, 7 mice (Figure 1B) were used for behavioral training without electrode array or optical fiber implants to establish learning curves, and an additional 5 mice underwent electrophysiological recordings (3 for visual-based decision-making learning and 2 for fruitless learning).

      (3) I can't see the electrode tracks in Figure 1d. If they are flexible, how can you make  sure they did not bend during insertion? I couldn't find a description of this in the  methods also.

      The electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      The ultra-flexible probes could not penetrate brain on their own (since they are flexible), and had to be shuttled to position by tungsten wires through holes designed at the tip of array shanks. The tungsten wires were assembled to the electrode array before implantation; this was described in the section of electrode array fabrication and assembly. We also included the description about the retraction of the guiding tungsten wires in the surgery section to avoid confusion.

      As an further attempt to verify the accuracy of implantation depth, we also measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      (4) In the spike rater in 1E, there seems to be ~20 cells in V2L, for example, but in 1F,  the number of neurons doesn't go below 40. What is the difference here? 

      We checked Figure 1F, the plotted dots do go below 40 to ~20. Perhaps the file that reviewer received wasn’t showing correctly?

      (5) The authors focus mainly on CR, but during learning, the number of CR trials is  rather low (because they are not experts). This can also be seen in the noisier traces  in Figure 2a. Do the authors account for that (for example by taking equal trials from  each group)? 

      We accounted this by reconstructing bootstrap-resampled datasets with only 5 trials for each session in both the early stage and the expert stage. The mean trace of the 500 datasets again showed overall decrease in CR trial firing rate during task learning, with highly similar temporal dynamics to the original data.

      The figure is now added to supplementary materials (as Figure S3 in the revised manuscript).

      (6) From Figure 2a, it is evident that Hit trials increase response when mice become  experts in all brain areas. The authors have decided to focus on the response onset  differences in CRs, but the Hit responses display a strong difference between naïve  and expert cases.

      Judged from the learning curve in this task the mice learned to inhibit its licking action when the No-Go stimuli appeared, which is the main reason we focused on these types of trials.

      The movement effects and potential licking artefacts in Hit trials also restricted our interpretation of these trials.

      (7) Figure 3 is still a bit cumbersome. I wasn't 100% convinced of why there is a need  to rank the connection matrix. I mean when you convert to rank, essentially there could  be a meaningful general reduction in correlation, for example during licking, and this  will be invisible in the ranking system. Maybe show in the supp non-ranked data, or  clarify this somehow

      We agree with this important point. As stated in the manuscript and response to Reviewer #1, our motivation in taking the ranking approach was that the differences in firing rates could bias cross-correlation between spike trains, making raw accounts of significant neuron pairs difficult to compare across conditions, but we acknowledge the ranking measures might obscure meaningful differences or inflate weak effects in the data.

      We added the limitations of ranking approach in the discussion section and emphasized the necessity in future studies for better analysis approaches that could provide more accurate assessment of functional connection dynamics without bias from firing rates.

      (8) Figure 4a x label is in manuscript, which is different than previous time labels,  which were seconds.

      We now changed all time labels from Figure 2 to milliseconds.

      (9) Figure 4 input and output rank look essentially the same.

      We have compressed the brain plots in Figures 4-5 to better convey the take-home message.

      (10) Also, what is the late and early stim period? Can you mark each period in panel A? Early stim period is confusing with early CR period. Same for early respons and late response.

      The definition of time periods was in figure legends. We now mark each period out to avoid confusion.

      (11) Looking at panel B, I don't see any differences between delta-rank in early stim,  late stim, early response, and late response. Same for panel c and output plots.

      The rankings were indeed relatively stable across time periods. The plots are now compressed and showed a mean rank value.

      (12) Panels B and C are just overwhelming and hard to grasp. Colors are similar both  to regular rank values and delta-rank. I don't see any differences between all  conditions (in general). In the text, the authors report only M2 to have an increase in  rank during the response period. Late or early response? The figure does not go well  with the text. Consider minimizing this plot and moving stuff to supplementary.

      The colormap are now changed to avoid confusion, and brain plots are now compressed.

      (13) In terms of a statistical test for Figure 4, a two-way ANOVA was done, but over  what? What are the statistics and p-values for the test? Is there a main effect of time  also? Is their a significant interaction? Was this done on all mice together? How many  mice? If I understand correctly, the post-hoc statistics are presented in the  supplementary, but from the main figure, you cannot know what is significant and what  is not.

      For these figures we were mainly concerned with the post-hoc statistics which described the changes in the rankings of each region across learning.

      We have changed the description to “t-test with Sidak correction” to avoid the confusion.

      (14) In the legend of Figure 4, it is reported that 610 expert CR trials from 6 sessions,  instead of 7 sessions. Why was that? Also, like the previous point, why only 3 mice?

      Behavior data of all the sessions used were shown in Figure S1. There were only 3 mice used for the learning group, the difficulty to achieve sufficient unit yields across all regions in the same animal restricted our sample size

      (15) Body movement analysis: was this done in a different cohort of mice? Only now  do I understand why there was a division into early and late stim periods. In supp 4,  there should be a trace of each body part in CR expert versus naïve. This should also  be done for Hit trials as a sanity check. I am not sure that the brightness difference  between consecutive frames is the best measure. Rather try to calculate frame-to frame correlation. In general, body movement analysis is super important and should  be carefully analyzed.

      Due to the limitation in the experimental design and implementation, movement tracking was not performed during the electrophysiological recordings, and the 3 mice shown in Figure S4 (now S5) were from a separate group. We have carefully examined the temporal profiles of mouse movements and found it did not fully match the rank dynamics for all regions, and we have added these results and related discussion in the revised manuscript. However, we acknowledge the observed motion energy pattern could explain some of the functional connection dynamics, such as the decrease in face and pupil motion energy could explain the reduction in ranks for striatum.

      Without synchronized movement recordings in the main dataset, we cannot fully disentangle movement-related neural activity from task-related signals. We have made this limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (16) For Hit trials, in the striatum, there is an increase in input rank around the  response period, and from Figure S6 it is clear that this is lick-related. Other than that,  the authors report other significant changes across learning and point out to Figure 5b,c. I couldn't see which areas and when it occurred.

      We did naturally expect the activity in striatum to be strongly related to movement.

      With Figure S6 (now S7) we wished to show that the observed rank increase for striatum could not simply be attributed to changes in time of lick initiation.

      As some readers may argue that during learning the mice might have learned to only intensely lick after response signal onset, causing the observed rise of input rank after response signal, we realigned the spikes in each trial to the time of the first lick, and a strong difference could still be observed between early training stage and expert training stage.

      We still cannot fully rule out the effects from more subtle movement changes, as the face motion energy did increase in early response period. This result and related discussion has been added to the results section of revised manuscript.

      (17) Figure 6, again, is rather hard to grasp. There are 16 panels, spread over 4 areas,  input and output, stim and response. What is the take home message of all this?  Visually, it's hard to differentiate between each panel. For me, it seems like all the  panels indicate that for all 4 areas, both in output and input, frontal areas increase in  rank. This take-home message can be visually conveyed in much less tedious ways.  This simpler approach is actually conveyed better in the text than in the figures  themselves. Also, the whole explanation on how this analysis was done, was not clear  from the text. If I understand it, you just divided and ranked the general input (or  output) into individual connections? If so, then this should be better explained.

      We appreciate this advice and we have compressed the figures to better convey the main message.The rankings for Figure 6 and Figure S8 (now Figure S9) was explained in the left panel of Figure 3C. Each non-zero element in the connection matrix was ranked to value from 1-10, with a value of 10 represented the 10% strongest non-zero elements in the matrix.

      We have updated the figure legends of Figure 3, and we have also updated the description in methods (Connection rank analyses) to give a clearer description of how the analyses were applied in subsequent figures.

      (18) Figure 7: Here, the authors perform a ROC analysis between go and no-go  stimuli. They balance between choice, but there is still an essential difference between  a hit and a FA in terms of movement and licks. That is maybe why there is a big  difference in selective units during the response period. For example, during a Hit trial  the mouse licks and gets a reward, resulting in more licking and excitement. In FAs,the mouse licks, but gets punished, which causes a reduction in additional licking and  movements. This could be a simple explanation why the ROC was good in the late  response period. Body movement analysis of Hit and FA should be done as in Figure  S4.

      We appreciate this insightful advice.

      Though we balanced the numbers of basic trial types, we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, which is likely the reason of large proportion of encoding neurons in response period.

      We have added this discussion both in result section and discussion section along with the necessity of more carefully designed behavior paradigm to disentangle task information.

      (19) The authors also find selective neurons before stimulus onset, and refer to trial  history effects. This can be directly checked, that is if neurons decode trial history.

      We attempted encoding analyses on trial history, but regrettably for our dataset we could not find enough trials to construct a dataset with fully balanced trial history, visual stimulus and behavior choice.

      (20) Figure 7e. What is the interpretation for these results? That areas which peaked  earlier had more input and output with other areas? So, these areas are initiating  hubs? Would be nice to see ACC vs Str traces from B superimposed on each other.  Having said this, the Str is the only area to show significant differences in the early  stim period. But is also has the latest peak time. This is a bit of a discrepancy.

      We appreciate this important point.

      The limitation in the anatomical coverage of brain regions restricted our interpretation about these findings. They could be initiating hubs or earlier receiver of the true initiating hubs that were not monitored in our study.

      The Str trace was in fact above the ACC trace, especially in the response period. This could be explained by the above advice 18: since we couldn’t rule out the difference in the intrinsic movement amount difference in FA trials and Hit trials, and considering striatum activity is strongly related to movement, the Str trace may reflect more in the motion related spike count difference between FA trials and Hit trials, instead of visual stimulus related difference.

      This further shows the necessity of more carefully designed behavior paradigm to disentangle task information.

      The striatum trace also in fact didn’t show a true double peak form as traces in other regions, it ramped up in the stimulus region and only peaked in response period. This description is now added to the results section.

      In the early stim period, the Striatum did show significant differences in average percent of encoding neurons, as the encoding neurons were stably high in expert stage. The striatum activity is more directly affected Still the percentage of neurons only reached peak in late stimulus period.

      (21) For the optogenetic silencing experiments, how many mice were trained for each  group? This is not mentioned in the results section but only in the legend of Figure 8. This part is rather convincing in terms of the necessity for OFC and V2M

      We have included the mice numbers in results section as well.

      (C) Discussion

      (1) There are several studies linking sensory areas to frontal networks that should be  mentioned, for example, Esmaeili et a,l 2022, Matteucci et al., 2022, Guo et a,l 2014,Gallero Salas et al, 2021, Jerry Chen et al, 2015. Sonja Hofer papers, maybe. Probably more.

      We appreciate this advice. We have now included one of the mentioned papers (Esmaeili et al., 2022) in the results section and discussion section for its direct characterization of the enhanced coupling between somatosensory region and frontal (motor) region during sensory learning.The other studies mentioned here seem to focus more on the differences in encoding properties between regions along specific cortical pathways, rather than functional connection or interregional activity correlation, and we feel they are not directly related to the observations discussed.

      (2) The reposted reorganization of brain-wide networks with shifts in time is best  described also in Sych et al. 2021.

      We regret we didn’t include this important research and we have now cited this in discussion section.

      (3) Regarding the discussion about more widespread stimulus encoding after learning,  the results indicate that the striatum emerges first in decoding abilities (Figure 7c left  panel), but this is not discussed at all.

      We briefly discussed this in the result section. We tend to attribute this to trial history signal in striatum, but since the structure of our data could not support a direct encoding analysis on trial history, we felt it might be inappropriate to over-interpret the results.

      (4) An important issue which is not discussed is the contribution of movement which  was shown to have a strong effect on brain-wide dynamics (Steinmetz et al 2019;  Musall et al 2019; Stringer et al 2019; Gilad et al 2018) The authors do have some movement analysis, but this is not enough. At least a discussion of the possible effects of movement on learning-related dynamics should be added.

      We have included these studies in discussion section accordingly. Since the movement analyses were done in a separate cohort of mice, we have made our limitation explicit in the revised manuscript and discuss it as a potential confound, along with possible approaches to address it in future work.

      (D) Methods

      (1) How was the light delivery of the optogenetic experiments done? Via fiber  implantation in the OFC? And for V2M? If the red laser was on the skull, how did it get  to the OFC?

      The fibers were placed on cortex surface for V2M group, and were implanted above OFC for OFC manipulation group. These were described in the viral injection part of the methods section.

      (2) No data given on how electrode tracking was done post hoc

      As noted in our response to the advice 3 in results section, the electrode shanks were ultra-thin (1-1.5 µm) and it was usually difficult to recover observable tracks or electrodes in section.

      As an attempt to verify the accuracy of implantation depth, we measured the repeatability of implantation in a group of mice and found a tendency for the arrays to end in slightly deeper location in cortex (142.1 ± 55.2 μm, n = 7 shanks), and slightly shallower location in subcortical structure (-122.6 ± 71.7 μm, n = 7 shanks). We added these results as new Figure S1 to accompany Figure 1.

      Reviewer #3 (Recommendations for the authors):

      (1) The manuscript uses decision-making in the title, abstract and introduction.  However, nothing is related to decision learning in the results section. Mice simply  learned to suppress licking in no-go trials. This type of task is typically used to study behavioral inhibition. And consistent with this, the authors mainly identified changes  related to network on no-go trials. I really think the title and main message is  misleading. It is better to rephrase it as visual discrimination learning. In the  introduction, the authors also reviewed multiple related studies that are based on  learning of visual discrimination tasks.

      We do view the Go/No-Go task as a specific genre of decision-making task, as there were literature that discussed this task as decision-making task under the framework of signal detection theory or updating of item values (Carandini & Churchland, 2013; Veling, Becker, Liu, Quandt, & Holland, 2022).

      We do acknowledge the essential differences between the Go/No-Go task and the tasks that require the animal to choose between alternatives, and since we have now realized some readers may not accept this task as a decision task, we have changed the title to visual discrimination task as advised.

      (2) Learning induced a faster onset on CR trials. As the no-go stimulus was not  presented to mice during early stages of training, this change might reflect the  perceptual learning of relevant visual stimulus after repeated presentation. This further  confirms my speculation, and the decision-making used in the title is misleading. 

      We have changed the title to visual discrimination task accordingly.

      (3) Figure 1E, show one hit trial. If the second 'no-go stimulus' is correct, that trial  might be a false alarm trial as mice licked briefly. I'd like to see whether continuous  licking can cause motion artifacts in recording. 

      We appreciate this important point. There were indeed licking artifacts with continuous licking in Hit trials, which was part of the reason we focused our analyses on CR trials. Opto-based lick detectors may help to reduce the artefacts in future studies.

      (4) What is the rationale for using a threshold of d' < 2 as the early-stage data and d'>3  as expert stage data?

      The thresholds were chosen as a result from trade-off based on practical needs to gather enough CR trials in early training stage, while maintaining a relatively low performance.

      Assume the mice showed lick response in 95% of Go stimulus trials, then d' < 2 corresponded to the performance level at which the mouse correctly rejected less than 63.9% of No-Go stimulus trials, and d' > 3 corresponded to the performance level at which the mouse correctly rejected more than 91.2% of No-Go stimulus trials.

      (5) Figure 2A, there is a change in baseline firing rates in V2M, MDTh, and Str. There  is no discussion. But what can cause this change? Recording instability, problem in  spiking sorting, or learning?

      It’s highly possible that the firing rates before visual stimulus onset is affected by previous reward history and task engagement states of the mice. Notably, though recorded simultaneously in same sessions, the changes in CR trials baseline firing rates in the V2M region were not observed in Hit trials.

      Thus, though we cannot completely rule out the possibility in recording instability, we see this as evidence of the effects on firing rates from changes in trial history or task engagement during learning.

      References:

      Carandini, M., & Churchland, A. K. (2013). Probing perceptual decisions in rodents. Nat Neurosci, 16(7), 824-831. doi:10.1038/nn.3410.

      Cruz, K. G., Leow, Y. N., Le, N. M., Adam, E., Huda, R., & Sur, M. (2023).Cortical-subcortical interactions in goal-directed behavior. Physiol Rev, 103(1), 347-389. doi:10.1152/physrev.00048.2021

      Esmaeili, V., Oryshchuk, A., Asri, R., Tamura, K., Foustoukos, G., Liu, Y., Guiet, R., Crochet, S., & Petersen, C. C. H. (2022). Learning-related congruent and incongruent changes of excitation and inhibition in distinct cortical areas. PLOS Biology, 20(5), e3001667. doi:10.1371/journal.pbio.3001667

      Goldbach, H. C., Akitake, B., Leedy, C. E., & Histed, M. H. (2021). Performance in even a simple perceptual task depends on mouse secondary visual areas. Elife, 10, e62156. doi:10.7554/eLife.62156.

      Siegle, J. H., Jia, X., Durand, S., Gale, S., Bennett, C., Graddis, N., Heller, G.,Ramirez, T. K., Choi, H., Luviano, J. A., Groblewski, P. A., Ahmed, R., Arkhipov, A., Bernard, A., Billeh, Y. N., Brown, D., Buice, M. A., Cain, N.,Caldejon, S., Casal, L., Cho, A., Chvilicek, M., Cox, T. C., Dai, K., Denman, D.J., de Vries, S. E. J., Dietzman, R., Esposito, L., Farrell, C., Feng, D., Galbraith, J., Garrett, M., Gelfand, E. C., Hancock, N., Harris, J. A., Howard, R., Hu, B.,Hytnen, R., Iyer, R., Jessett, E., Johnson, K., Kato, I., Kiggins, J., Lambert, S., Lecoq, J., Ledochowitsch, P., Lee, J. H., Leon, A., Li, Y., Liang, E., Long, F., Mace, K., Melchior, J., Millman, D., Mollenkopf, T., Nayan, C., Ng, L., Ngo, K., Nguyen, T., Nicovich, P. R., North, K., Ocker, G. K., Ollerenshaw, D., Oliver, M., Pachitariu, M., Perkins, J., Reding, M., Reid, D., Robertson, M., Ronellenfitch, K., Seid, S., Slaughterbeck, C., Stoecklin, M., Sullivan, D., Sutton, B., Swapp, J., Thompson, C., Turner, K., Wakeman, W., Whitesell, J. D., Williams, D., Williford, A., Young, R., Zeng, H., Naylor, S., Phillips, J. W., Reid, R. C., Mihalas, S., Olsen, S. R., & Koch, C. (2021). Survey of spiking in the mouse visual system reveals functional hierarchy. Nature, 592(7852), 86-92. doi:10.1038/s41586-020-03171-x

      Sych, Y., Fomins, A., Novelli, L., & Helmchen, F. (2022). Dynamic reorganization of the cortico-basal ganglia-thalamo-cortical network during task learning. Cell Rep, 40(12), 111394. doi:10.1016/j.celrep.2022.111394

      Veling, H., Becker, D., Liu, H., Quandt, J., & Holland, R. W. (2022). How go/no-go training changes behavior: A value-based decision-making perspective. Current Opinion in Behavioral Sciences, 47,101206.

      doi:https://doi.org/10.1016/j.cobeha.2022.101206.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors' goal was to arrest PsV capsids on the extracellular matrix using cytochalasin D. The cohort was then released, and interaction with the cell surface, specifically with CD151, was assessed.

      The model that fragmented HS associated with released virions mediates the dominant mechanism of infectious entry has only been suggested by research from a single laboratory and has not been verified in the 10+ years since publication. The authors are basing this study on the assumption that this model is correct, and these data are referred to repeatedly as the accepted model despite much evidence to the contrary.

      We stated in the introduction on line 65/66 ´Two release mechanisms are discussed, that mutually are not exclusive´. This implies that we do not consider the shedding model as ‘the accepted model’. Furthermore, we do not state in the discussion neither that the shedding model is the preferred one. However, we referred to the shedding model in the discussion, because we find HS associated with transferred PsVs, which is in line with this model.

      The discussion in lines 65-71 concerning virion and HSPG affinity changes is greatly simplified. The structural changes in the capsid induced by HS interaction and the role of this priming for KLK8 and furin cleavage have been well researched. Multiple laboratories have independently documented this. If this study aims to verify the shedding model, additional data need to be provided.

      Our findings are compatible with both models, and we do not aim to verify the shedding model neither want to disprove the priming model. However, as we understand, the referee wishes more visibility of the priming model. Therefore, using inhibitors previously used in the field, we tested whether inhibition of KLK8 or furin reduces PsV translocation to the cell body (after CytD wash off). Leupeptin blocks transport, while Furin inhibitor I still allows some initial translocation. We incorporated this new data as Figure 2 (line 265): “…we would expect that inhibition of L1 processing during the CytD incubation prevents the recovery of PsV translocation from the ECM to the cell body (Figure 2A and D). To test for this possibility, as employed in earlier studies, the protease inhibitor leupeptin was used to inhibit proteases including KLK8 which is required for L1 cleavage (Cerqueira et al. 2015). Employing this inhibitor, the PCC between PsV-L1 and F-actin staining remains negative after CytD removal, showing that for translocation indeed the action of proteases is required (Figure 2B and D). In contrast, inhibition of L2 cleavage by a furin specific inhibitor has no effect on the PCC (Figure 2C and D). However, it should be noted that we occasionally observe PsVs not completely translocating but accumulating at the border of the F-actin stained area (for example see Figure 2C (60 min)). This results in an increase of the PCC almost equal to complete translocation, explaining why the PCC remains unaffected despite a furin inhibitory effect. Hence, furin inhibition may have some effect on translocation that, however, is undetected in this type of analysis.’

      Moreover, we have added a paragraph discussing how our data integrates into the established model of the HPV infection cascade (line 604): ‘HPV infection is the result of several steps, starting with the initial binding of virions via electrostatic and polar interactions (Dasgupta et al. 2011) to the primary attachment site HS (Richards et al. 2013), which induces capsid modification (Feng et al. 2024; Cerqueira et al. 2015) and HS cleavage (Surviladze et al. 2015), enabling the virion to be released from the ECM or the glycocalyx. Next, virions bind to the cell surface to a secondary receptor complex that forms over time, and become internalized via endocytosis, before they are trafficked to the nucleus (Ozbun and Campos 2021; Mikuličić et al. 2021). Regarding the transition from the primary attachment site to cell surface binding, as already outlined in the introduction, two models are discussed. In one model, proteases cleave the capsid proteins. After priming, the capsids are structurally modified and the virion can dissociate from its HS attachment site. It has been suggested that capsid priming is mediated by KLK8 (Cerqueira et al. 2015) and furin (Richards et al. 2006). In our system, KLK8 inhibition blocks PsV transport, while furin inhibition has some effect that, however, cannot be detected in this analysis (Figure 2) suggesting furin engagement at later steps in the infection cascade. This is in line with earlier in vitro studies on the role of cell surface furin (Surviladze et al. 2015; Day et al. 2008; Day and Schiller 2009). In any case, our results align with both models of ECM detachment: one involving HS cleavage (HS co-transfer) and another involving capsid modification (by e.g., KLK8).’

      The model should be fitted into established entry events,…

      Please see our reply above.

      or at minimum, these conflicting data, a subset of which is noted below, need to be acknowledged.

      (1) The Sapp lab (Richards et al., 2013) found that HSPG-mediated conformational changes in L1 and L2 allowed the release of the virus from primary binding and allowing secondary receptor engagements in the absence of HS shedding.

      (2) Becker et al. found that furin-precleaved capsids could infect cells independently of HSPG interaction, but this infection was still inhibited with cytochalasin D.

      (3) Other work from the Schelhaas lab showed that cytochalasin D inhibition of infection resulted in the accumulation of capsids in deep invaginations from the cell surface, not on the ECM

      (4) Selinka et al., 2007, showed that preventing HSPG-induced conformational changes in the capsid surface resulted in noninfectious uptake that was not prevented with cytochalasin D.

      (5) The well-described capsid processing events by KLK8 and furin need to be mechanistically linked to the proposed model. Does inhibition of either of these cleavages prevent engagement with CD151?

      The authors need to consider an explanation for these discrepancies.

      We do not see any discrepancies; our observations are compatible with aspects of both the shedding and the priming model. That PsVs carry HS-cleavage products doesn´t imply that HS cleavage is sufficient or required for infection, or that the priming model would be wrong. We do not view our data as being in conflict with the priming model. Most of the above-mentioned papers are now cited.

      Altogether, we acknowledge that the study gains importance by directly testing the priming model within our experimental system. We are thankful for the above comments and addressed this issue.

      Other issues:

      (1) Line 110-111. The statement about PsVs in the ECM being too far away from the cell surface to make physical contact with the cell surface entry receptors is confusing. ECM binding has not been shown to be an obligatory step for in vitro infection.

      Not obligatory, but strongly supportive (Bienkowska-Haba et al., Plos Path., 2018; Surviladze et al., J. Gen. Viro., 2015). As recently published by the Sapp lab (Bienkowska-Haba et al., Plos Path., 2018), ´Direct binding of HPV16 to primary keratinocytes yields very inefficient infection rates for unknown reasons.´ Moreover, the paper shows that HaCaT cell ECM binding of PsVs increases the infection of NHEK by 10-fold and of HFK by almost 50-fold.

      This idea is referred to again on lines 158-159 and 199. The claim (line 158) that PsV does not interact with the cell within an hour needs to be demonstrated experimentally and seems at odds with multiple laboratories' data. PsV has been shown to directly interact with HSPG on the cell surface in addition to the ECM. Why are these PsVs not detected?

      The reviewing editor speculated that HaCaT cells may be a model system in which the in vivo relevant binding to the ECM can be better studied as in non-polarized cell types. This is because binding to the ECM cannot be bypassed by direct cell surface binding. The observation that only few PsVs bind to the basal cell membrane indeed suggests restricted diffusional access of PsVs to binding receptors of the basal membrane. The reviewing editor asked for an experiment showing that more PsVs bind after cell detachment. We performed this experiment and indeed find more PsVs binding to the cell surface of detached cells. This point is very important for the understanding of the study and now we mention it in several sections of the manuscript, as outlined in the following.

      Line 125: ‘Many PsVs that bind to the ECM may locate distal from the cell surface and are thus unable to establish direct contact with entry receptors. However, they are capable of migrating by an actindependent transport along cell protrusions towards the cell body (Smith et al. 2008; Schelhaas et al. 2008). We aimed for blocking this transport in HaCaT cells, a cell line that is widely used as a cell culture model for HPV infection. HaCaT cells closely resemble primary keratinocytes in key aspects: they are not virally transformed and produce large amounts of ECM that facilitates infection (Bienkowska-Haba et al. 2018; Gilson et al. 2020). In addition, HaCaT cells exhibit cellular polarity that enforces binding of virus particles to the ECM, as the virions cannot bind to receptors/entry components, such as CD151, Itgα6 and HSPGs that co-distribute on the basolateral membrane of polarized keratinocytes (Sterk et al. 2000; Cowin et al. 2006; Mertens et al. 1996), making them inaccessible by diffusion.’

      Line 205: ‘During the CytD incubation, PsVs bind to HSPGs of the basolateral membrane for 5 h. Still, in the cell body area hardly any PsVs are present (0.14 PsV/µm<sup>2</sup>, Supplementary Figure 1B). In the control, the PsV density is several-fold larger (Supplementary Figure 1B). This is expected, as the PsVs bind to the ECM and translocate to the cell body. We wondered whether there are more binding sites at the basal membrane that remain inaccessible to PsVs by diffusion because of the insufficient space between glass-coverslip and basolateral membrane. For clarification, we incubated EDTA detached HaCaT cells in suspension with PsVs for 1 h at 4 °C, followed by re-attachment for 1 h. Under these conditions, we find a PsV density 12.4-fold larger than after 5 h of CytD incubation of adhered cells (Supplementary Figure 1B and D). However, it should be noted that these values cannot be directly compared. Aside from the different treatments, another difference lies in the size of the basal membrane, as re-attachment of cells is not complete after only 1 h (compare size of adhered membranes in Supplementary Figure 1A and C). Therefore, the imaged membranes are likely strongly ruffled, which results in the underestimation of the size of the adhered membrane. As a result, we overestimate the PsVs per µm<sup>2</sup> (please note that we cannot re-attach cells for longer times as we would then lose PsVs due to endocytosis). On the other hand, we would underestimate the PsV density at the basal membrane if after re-attachment we image in part also some apical membrane. In any case, the experiment suggests that PsVs bind more efficiently if membrane surface receptors are accessible by diffusion. This is in support of the above notion that the basal membrane may provide more entry receptors than one would expect from the low density of PsVs bound after 5 h CytD (Supplementary Figure 1B). This suggests that under our assay conditions, PsVs cannot easily bypass the translocation from the ECM to the cell body by diffusing directly to the basal membrane. Hence, the large majority of PsVs that enter the cell were previously bound to the ECM. Therefore, HaCaT cells serve as an ideal model for studying the transfer of ECM bound HPV particles to the cell surface, which is similar to in vivo infection of basal keratinocytes after binding to the basement membrane (Day and Schelhaas 2014; Kines et al. 2009; Schiller et al. 2010; Bienkowska-Haba et al. 2018).’

      Line 529: ‘Filopodia usage not only facilitates infection but also increases the likelihood of virions to reach their target cells during wound healing, namely the filopodia-rich basal dividing cells. In fact, several types of viruses exploit filopodia during virus entry (Chang et al. 2016), hinting at the possibility that for HPV and other types of viruses actin-driven virion transport may play a more important role than it is currently assumed. If this is the case, sub-confluent HaCaT cells, or even better single HaCaT cells, would be an ideal model system for the study of these very early infection steps that involve ECM attachment and subsequent filopodia-dependent transport. As shown in Supplementary Figure 1, HaCaT cells have many binding sites for the HPV16 PsVs. However, as they are polarized and the binding receptors are only at the basal membrane, they remain relatively inaccessible by diffusion. Therefore, the ECM binding that is also observed in vivo (Day and Schelhaas 2014) and subsequent transport via filopodia are used upon infection of HaCaT cells that locate at the periphery of cell patches. Here, PsVs bind to the ECM which strongly enhances infection of primary keratinocytes (Bienkowska-Haba et al. 2018). In contrast, HPV can readily bind to HSPGs on the cell surface of nonpolarized cells, and by this bypasses ECM mediated virus priming and the filopodia dependency. We propose that HaCaT cells are a valuable system for studying the very early events in HPV infection that allows for dissecting capsid interaction with ECM resident priming factors and cell surface receptors.’

      Finally, please note that in the previous version of the manuscript, we did not question that in many cellular systems PsVs interact with heparan sulfate proteoglycans (HSPGs) present on the cell surface, or both on the cell surface and the ECM. We stated on line 59 ´While in cell culture virions bind to HS of the cell surface and the ECM, it has been suggested that in vivo they bind predominantly to HS of the extracellular basement membrane (Day and Schelhaas, 2014; Kines et al., 2009; Schiller et al., 2010).´

      We hope that after adding the above explanations and the experiment requested by the reviewing editor it is now clear why only few PsVs bind directly (not via the ECM) to the cell surface. We appreciate the reviewer’s and the reviewing editor’s input that has significantly improved the manuscript.

      (2) The experiments shown in Figure 5 need to be better controlled. Why is there no HS staining of the cell surface at the early timepoints? This antibody has been shown to recognize N-sulfated glucosamine residues on HS and, therefore, detects HSPG on the ECM and cell surface.

      There is staining. However, as the staining at the periphery is stronger and images are shown at the same settings of brightness and contrast, the impression is given that the cell surface is not stained. We have added more images showing HS cell surface staining.

      (i) Supplementary Figure 4C shows an enlarged view of the CytD/0 min cell shown in Figure 6A. In the area stained by Itgα6, that marks the cell body, HS staining is present, although less abundant in comparison to the ECM.

      (ii) In Figure 8, CytD/30 min, a cell is shown with abundant HS in the cell body region (compare cyan and green LUT).

      (iii) In newly added Figure 3A, lower panel, another cell with HS in the cell body region is shown.

      Please note that the staining is highly variable. We indicate this by stating on Line 373: ‘The pattern of the HS staining (cyan LUT) and the overlap of HS with PsVs and Itgα6 are highly variable (Figure 6A).’

      Therefore, the conclusion that this confirms HS coating of PsV during release from the ECM (line 430431) is unfounded. How do the authors distinguish between "HS-coated virions" and HSPG-associated virions?

      The transient increase in the PCC at CytD/30 min can be interpreted as PsV/HS co-transport or as direct binding of PsVs to cell surface HSPGs. However, two arguments support co-transport.

      First, we find that CytD/PsVs increases the HS intensity (see newly added Figure 3, confirming old Figure 5 that is now Figure 6). We state on line 290 ‘… that without actin-dependent PsV translocation HS cleavage products are retained in the ECM, consistent with the hypothesis that cleaved HS remains associated with PsVs (Ozbun and Campos 2021).

      Second, the distance between HS and Itgα6 (the cell body marker) decreases over time after CytD removal, which suggests movement of HS to the cell body (Supplementary Figure 8D). We state on line 422: ‘The movement of HS towards the cell body after removal of CytD, which indirectly demonstrates that PsVs are coated with HS, is suggested by a shortening of the HS-Itgα6 distance over time (Supplementary Figure 8D).’

      It is difficult to comprehend how the addition of 50 vge/cell of PsV could cause such a global change in HS levels.

      Some areas are covered with confluent cells, to which hardly any PsVs are bound, because accessing their basolateral membrane is nearly impossible, and PsVs do not bind to the exposed apical membrane as well. We assume this is a major difference to cultures of unpolarized cells, where PsVs should distribute more or less equally over cells. This means that in our experiments the vge/cell is not a suitable parameter for relating the magnitude of an effect to a defined number of PsVs. In the ECM, the PsV density is very high, enabling one cell to collect, in theory, several hundred PsVs, much more than expected from the 50 vge/cell.

      We state on line 135: ‘Frequently, we observe patches of confluent cells which are common to HaCaT cells. Cells at the center of these patches are dismissed during imaging, because there are no anterogradely migrating PsVs at these cells. A second reason for our dismissal of these cells is that hardly any PsVs are bound to them, possibly because their basal membranes are inaccessible by diffusion. Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. In these cells, we find more PsVs per cell than one would expect from the employed 50 viral genome equivalents (vge) per cell, indicating that PsVs are unequally distributed between the cells.’

      The claim that the HS levels are decreased in the non-cytochalasin-treated cells due to PsV-induced shedding needs to be demonstrated.

      We did not claim that PsVs induce shedding, we rather believe they retain shedded HS. Without PsVs, the shedded HS is washed off from the ECM. We have reproduced the observation made in old Figure 5 (now Figure 6) in the newly added Figure 3 that also shows that PsVs alone have no effect on the HS intensity, only when present together with CytD. We state on line 277: ‘As outlined above, during the 5 h incubation with CytD, proteases in the ECM are expected to cleave HS chains. These cleavage products should be able to diffuse out of the ECM, unless they remain associated with nontranslocating PsVs. In the control, PsV associated HS cleavage products would leave the ECM through PsV translocation…. Using an antibody that reacts with an epitope in native heparan sulfate chains, only after CytD and if PsVs are present, the level of HS staining is significantly increased (Figure 3B). As shown in Figure 3A, stronger HS staining at PsVs (open arrows) and as well in PsV free areas (closed arrows) was observed… Collectively, our findings indicate that without actin-dependent PsV translocation HS cleavage products are retained in the ECM, consistent with the hypothesis that cleaved HS remains associated with PsVs (Ozbun and Campos 2021).’

      If HS is actually shed, staining of the cell periphery could increase with the antibody 3G10, which detects the HS neoepitope created following heparinase cleavage.

      We have tested the antibody by which we obtain only a very weak staining (Supplementary Figure 2), not allowing to differentiate between an increase in the cell periphery and the cell body area. We still include the experiment as it suggests that CytD has no effect on HS processing. We state on line 286: ‘As additional control and shown in Supplementary Figure 2, we use an antibody that reacts with a HS neo-epitope generated by heparitinase-treated heparan sulfate chains (Yokoyama et al. 1999; for details see methods). This neo-epitope staining is independent of the presence of CytD and the incubation time, suggesting that CytD does not directly affect HS processing.’

      Reviewer #2 (Public review):

      Summary:

      Massenberg and colleagues aimed to understand how Human papillomavirus particles that bind to the extracellular matrix (ECM) transfer to the cell body for later uptake, entry, and infection. The binding to ECM is key for getting close to the virus's host cell (basal keratinocytes) after a wounding scenario for later infection in a mouse vaginal challenge model, indicating that this is an important question in the field.

      Strengths:

      The authors take on a conceptually interesting and potentially very important question to understand how initial infection occurs in vivo. The authors confirm previous work that actin-based processes contribute to virus transport to the cell body. The superresolution microscopy methods and data collection are state-of-the art and provide an interesting new way of analysing the interaction with host cell proteins on the cell surface in certain infection scenarios. The proposed hypothesis is interesting and, if substantiated, could significantly advance the field.

      Weaknesses:

      As a study design, the authors use infection of HaCaT keratinocytes, and follow virus localisation with and without inhibition of actin polymerisation by cytochalasin D (cytoD) to analyse transfer of virions from the ECM to the cell by filopodial structures using important cellular proteins for cell entry as markers.

      First, the data is mostly descriptive besides the use of cytoD, and does not test the main claim of their model, in which virions that are still bound to heparan sulfate proteoglycans are transferred by binding to tetraspanins along filopodia to the cell body.

      The study identifies a rapid translocation step from the ECM to CD151 assemblies. We have no data that demonstrates a physical interaction between PsVs and CD151. In the model figure, we draw CD151 as part of the secondary receptor complex. We are sorry for having raised the impression that PsVs would bind directly to CD151 and have modified the model Figure accordingly. In the new model figure (Figure 9), the first contact established is to a CD151 free receptor.

      Second, using cytoD is a rather broad treatment that not only affects actin retrograde flow, but also virus endocytosis and further vesicular transport in cells, including exocytosis. Inhibition of myosin II, e.g., by blebbistatin, would have been a better choice as it, for instance, does not interfere with endocytosis of the virus.

      As we focus on early events, we are not concerned about CytD blocking as well late steps in the infection cascade, like endocytosis. However, we agree that a comparison between CytD and blebbistatin would be very interesting. We added Figure 8, showing that blebbistatin only partially stops migration.

      Line 429: ‘Actin retrograde transport, which underlies the here observed virion transport, is the integrative result of three components (Smith et al. 2008; Schelhaas et al. 2008)…. As CytD broadly interferes with F-actin dependent processes, we investigated the effects upon inhibition of only one of the three components, namely the myosin II mediated retrograde movement towards the cell body. Instead of CytD, we employed in the 5 h preincubation the myosin II inhibitor blebbistatin. For the control (0 min), we show in Figure 8A one example of a cell with comparatively many PsVs at the periphery (as mentioned above, the PsV pattern is highly variable) to better illustrate the difference to the PsV pattern occasionally seen with blebbistatin. After blebbistatin treatment (0 min), PsVs are still distal to the cell body but less dispersed than after CytD treatment, seemingly as if translocation started but stopped in the midst of the pathway (Figure 8A, blebbistatin). The PCC between PsVs and HS, like after CytD (Figure 6C), is elevated after blebbistatin, albeit the effect is not significant (Figure 8C). The cell body PCC, is not at 30 min (CytD) but already at 0 min elevated (compare Figure 6D to Figure 8D), which can be explained by partial translocation. This is further supported by the fact that only 8% of PsVs are closely associated with HS (Figure 8E; blebbistatin, 0 min) compared to 15% after CytD treatment (Figure 6E; 0 min). Furthermore, after 0 min PsV incubation with blebbistatin we observe no effect on the HS intensity (compare Figure 8B to Figure 3B and Figure 6B). Hence, in contrast to CytD, blebbistatin does not trap the PsVs in the ECM where they associate with HS, but ongoing actin polymerization pushes actin filaments along with PsVs towards the cell body.’

      Third, the authors aim to study transfer from ECM to the cell body and the effects thereof. However, there are substantial, if not the majority of, viruses that bind to the cell body compared to ECM-bound viruses in close vicinity to the cells.

      Please see our detailed reply to referee #1 that has raised the same issue. In brief, we agree that in multiple cell culture systems viruses bind preferentially to the cell surface directly. However, in HaCaT cells, the majority of PsVs does not bind directly to the basal membrane but gets there after initial binding to the ECM. Thus, we believe our system appropriately models the physiologically relevant scenario of ECM-to-cell transfer, as also speculated by the reviewing editor that has suggested an experiment showing that more PsVs bind to detached cells (please see above).

      This is in part obscured by the small subcellular regions of interest that are imaged by STED microscopy, or by the use of plasma membrane sheets. As a consequence, the obtained data from time point experiments is skewed, and remains for the most part unconvincing due to the fact that the origin of virions in time and space cannot be taken into account. This is particularly important when interpreting association with HS, the tetraspanin CD151, and integral alpha 6, as the low degree of association could originate from cell-bound and ECM-transferred virions alike.

      As already stated above, we observe massive binding of PsVs to the ECM, in contrast to very few PsVs that diffuse beneath the basolateral membrane of the polarized HaCaT cells and do bind directly to the cell surface. In other cellular systems, cells may hardly secrete ECM, are not polarized, and therefore virions can easily bypass ECM binding. Therefore, it is reasonable to assume that in HaCaT cells the large majority of PsVs found on the cell body originates from the ECM.

      Fourth, the use of fixed images in a time course series also does not allow for understanding the issue of a potential contribution of cell membrane retraction upon cytoD treatment due to destabilisation of cortical actin. Or, of cell spreading upon cytoD washout.

      The newly added blebbistatin experiment suggests that the initial translocation is exclusively dependent on retrograde actin flow. However, we agree that we are not able to unravel more details regarding the different possible contributions to the movement. Importantly, the lack of PCC increase after CytD/leupeptin removal (Figure 2D) suggest there is not much cell spreading into the area of accumulated PsVs. Please see our more detailed reply to the same issue raised by the same referee in the recommendations for the authors.

      The microscopic analysis uses an extension of a plasma membrane stain as a marker for ECM-bound virions, which may introduce a bias and skew the analysis.

      The dye TMA-DPH stains exclusively cellular membranes and not the ECM. The stain is actually used to delineate the cell body from the ECM area (please see Figure 1).

      Fifth, while the use of randomisation during image analysis is highly recommended to establish significance (flipping), it should be done using only ROIs that have a similar density of objects for which correlations are being established.

      We agree that the way of how randomization is done is very important. Regarding the association of PsVs with CD151 and HS, we corrected for random background association, which is now explained in more detail in in the Figure legend of Supplementary Figure 7: “On flipped images, we often find values more than half of the values of the original images, demonstrating that many PsVs have a distance ≤ 80 nm to CD151 merely by chance (background association)… (C) Each time point in (A) and (B) obtained from flipped images is the average of three biological replicates. We use these altogether 24 data points, plotting the fraction of closely associated PsVs against the CD151 maxima density. The fraction increases with the maxima density, as the chance of random association increases with the maxima density. The fitted linear regression line describes the dependence of the background association from the maxima density. As a result, the background association (y) can be calculated for any maxima density (x) in original images with the equation y = 2.04x. Please note that the CytD/0 min may be overcorrected as we subtract background association with reference to the CD151 maxima density of the entire ROI (for an example ROI see Supplementary Figure 6A), although the local maxima density at distal PsVs is lower. On the other hand, PsVs at the cell border may have a larger local CD151 maxima density and consequently are undercorrected.’

      For instance, if one flips an image with half of the image showing the cell body, and half of the image ECM, it is clear that association with cell membrane structures will only be significant in the original.

      We are aware of this problem. For instance, it would produce ‘artificially’ low PCCs after flipping images of PsV/HS stainings (please see negative PCC value after flipping in Supplementary Figure 8). In this case, we do not use as argument that in flipped images the PCC is lower. Instead, we would argue that over time the PCC changes in the original images. We still provide the PCC values of flipped images, as additional information, showing that in most cases we obtain after flipping a PCC of zero, as expected

      Hence, we fully agree that careful controls in image analysis is required, and used the above-described method for the correction of background association when the fraction of closely associated PsVs is analyzed. We do not use a lower PCC value in flipped images as argument if not appropriate.

      I am rather convinced that using randomisation only on the plasma membrane ROIs will not establish any clear significance of the correlating signals.

      Figure 6D and 8D show the PCC specifically of the cell body (only of plasma membrane ROIs). In flipped images (not shown in the previous version for clarity), we obtain significantly lower PCCs (Supplementary Figure 8F/G and Supplementary Figure 10C/D. We propose that in this case it would be appropriate to use a lower PCC of flipped images as argument for specific association. Still, also in this experiment we argue with a change in the PCC over time, and not with a PCC of zero after flipping. As above, we still provide the PCC values of flipped images as additional information.

      Also, there should be a higher n for the measurements.

      One replicate is based on the average of 14-15 cells for each condition (more for figure 4). Hence, in a typical experiment (Control and CytD with 4 time points) about 120 cells are analyzed, which is a broad basis for the averages of one replicate.

      We realize that with three biological replicates we find significant effects only if we have strong effects or moderate effects with very low variance.

      Recommendations for the authors:

      Reviewing Editor:

      The focus on the events of HPV infection between ECM binding and keratinocyte-specific receptor binding is unique and interesting. However, I agree with the reviewers that some of the conclusions could use more experimental support, as detailed in their comments. The failure to detect direct binding of the PsV to HSPGs on the cell surface in in vitro assays contradicts much of the published literature. For example, others have found that HPV capsids bind cultured cell lines in suspension, i.e, in the absence of ECM. Do EDTA-suspended HaCaT cells bind PsV? Is the binding HSPG dependent? If the authors think that failure to detect direct cell binding of HaCaTs is an unusual feature of these cell lines or culture condition,s then it would be helpful to provide an explanation. However, it is worth noting that an in vitro system where the cells do not directly bind capsids through HSPG interactions would be a much better model for studying the stages of HPV infection that are the focus of this study, since there is no direct binding of keratinoctyes in vivo.

      We are thankful for this comment that had a strong influence on the revision. The suggested experiment has been incorporated as new Supplementary Figure 1. It shows that many more PsVs bind to the cell surface of cells in suspension than to adhered cells. As suggested by the reviewing editor, we explain now that HaCaT cells are a suitable model system for studying the in vivo transport from the ECM to the cell body that in these cells, due to their polarization, cannot be bypassed (for more details please see our replies above addressing these issues).

      Because conclusions drawn regarding HS interactions are largely based on experiments using a single HS mAb, it is important that the specificity of this mAb is described in more detail, either based on the literature or further experimentation.

      We provide now detailed information about the HS antibodies used in the study. We state on line 282 ‘Using an antibody that reacts with an epitope in native heparan sulfate chains…’ and on line 286 ‘we use an antibody that reacts with a HS neo-epitope generated by heparitinase-treated heparan sulfate chains…’ and in the methods section ‘For Heparan sulfate (HS) a mouse IgM monoclonal antibody (1:200) (amsbio, cat# 370255-S) was used that reacts with an epitope in native heparan sulfate chains and not with hyaluronate, chondroitin or DNA, and poorly with heparin (mAb 10E4 (David et al., 1992)). For HS neo-epitope (Yokoyama et al., 1999) detection, a mouse monoclonal antibody (1:200) (amsbio, cat#370260-S) was used that reacts only with heparitinase-treated heparan sulfate chains, proteoglycans, or tissue sections, and not with heparinase treated HSPGs. The antibody recognizes desaturated uronic acid residues (mAb 3G10 (David et al., 1992)).’

      Reviewer #1 (Recommendations for the authors):

      (1) The phrase "tight association" or similar is repeatedly used and is not acceptable for microscopic studies; use "close association", which has no affinity connotations.

      Has been changed as suggested by the referee.

      (2) Why are lysine-coated coverslips used for microscopy? HaCaT cells adhere tightly to untreated glass, and this coating could affect the distribution of ECM and extracellular PsV.

      We believe a tight association of the basal cell membrane to its substrate, as in vivo, where the basal membrane is tightly adhered to other cells, is important in these experiments. In weakly adherent cells more PsVs may bind to the cell surface, bypassing the transport step. Hence, although HaCaT cells may not require the coat and would be able to adhere to glass, the association may not be tight enough to mimic in vivo conditions.

      (3) What is the reason to use detection of the pseudogenome for some of the experiments instead of L1 detection throughout? The process of EdU detection is sufficiently denaturing to affect some protein epitopes. The introduction of this potential artifact doesn't seem warranted for capsid detection experiments.

      The L1 and the Itgα6 antibody are from the same species, wherefore we have used in Figures 4 and 6 click-labeling of the reporter plasmid. We do not disagree with the notion of the referee, that EdU detection may denature the epitope of some proteins. For instance, we have observed a different staining pattern for CD151; for Itgα6 and HS we saw no obvious difference in the staining patterns. In double staining experiments using L1 antibody and click-labeling, both staining patterns overlapped very well, indicating that click-labeling is suitable to visualize PsVs.

      (4) What concentration of TMA-DPH was used?

      TMA-DPH is a poorly water-soluble dye that becomes strongly fluorescent upon insertion into a membrane. Because of its poor water solubility, a precise concentration cannot be given. We added 50 µl of a saturated TMA-DPH solution in PBS to 1 ml of PBS in the imaging chamber. We state this now in the methods section.

      (5) Line 419: This statement is misleading. Although PsV interaction with HSPG on the ECM is crucial for infectious transfer to cells, the majority of the PsV binding on the ECM has been attributed to interaction with laminin 332. Treatment of PsV with heparin causes sequestration to the ECM.

      We are sorry for the confusion and have removed the misleading statement.

      (6) Some reference choices are poor:

      Line 54: Ozbun and Campos, this is not the correct reference

      In the review we cited, in the introduction it is stated that PsVs establish infection via a break in the epithelial barrier? However, we have replaced this reference by a review that focuses more on epithelial wounding: ‘Ozbun, Michelle A. (2019): Extracellular events impacting human papillomavirus infections: Epithelial wounding to cell signaling involved in virus entry. In Papillomavirus research (Amsterdam, Netherlands) 7, pp. 188–192. DOI: 10.1016/j.pvr.2019.04.009.’

      Line 2012: Doorbar et al., this is not the correct reference.

      Thank you for pointing this out (..we assume the referee refers to line 104 and not line 2012). We have noticed this error during revision. As it is difficult to get a specialized review on this topic, we now cite Ozbun and Campus, 2021 that states PsVs are ‘structurally and immunologically indistinguishable from lesion- and tissue-derived HPVs.’

      Minor issues:

      (1) It is difficult to appreciate the ECM and cell surface binding pattern from the provided images, which do not even contain an entire cell. We need to see a few representative field views with the ECM delineated with laminin 332 staining, as HS antibodies stain both the ECM and cell surface.

      We now provide overview images in Supplementary Figure 4. The only experiment requiring a clear delineation between ECM and cell surface is the experiment of Figure 4. Here, we do not use the HS as a reference staining because it stains both the ECM and the cell surface.

      (2) For Figure 1E, the cells were only infected for 24 hours. The half-time for infectious internalization of HaCaT cells was shown to be 8 hours for cell-associated PsV and closer to 20 hours for PsV that was associated with the ECM prior to cell association (Becker et al., 2018). Why was such a short infection time chosen?

      During assay establishment it has been observed that after 24 h the luciferase activity is optimal.

      (3) Figure 5, the staining of uninfected cells +/- cyto treatment needs to be included.

      Now visible in new Figure 3.

      I am confused by lines 54-57. It seems as if the authors are claiming that HSPGs are not present on the ECM. This sentence, as written, is misleading.

      We agree, and state now on line 58 ‘Here, virions bind to the linear polysaccharide heparan sulfate (HS) that is present in the extracellular matrix (ECM) but as well on the plasma membrane surface. HS is attached to proteins forming so called heparan sulfate proteoglycans (HSPGs).’

      Reviewer #2 (Recommendations for the authors):

      There are further issues that are not pertaining to the study design that I find important.

      (1) It remains speculative whether the virions that are transferred from the ECM are actually structurally modified.

      The newly added Figure 2, showing that leupeptin blocks infection in our assay, suggests that virions indeed are primed.

      (2) The origin of HS correlated with virions on the cell body after transfer is also not clear: does the virus associate with cell surface HS, or does it bring HS from the ECM? Simply staining HS against Nsulfated moieties does not allow such conclusions.

      This issue has been already raised in the public review to which we replied above. In brief, we agree that the transient increase of the PCC between PsVs and HS in the cell body region can be also explained by PsVs coming from the ECM without HS and binding to cell surface HS, or from PsVs binding directly (not via the ECM) to cell surface HSPGs. However, there are two more arguments indicating that PsVs are coated with HS. Please see our detailed reply above.

      (3) Figure 1: There are few, if any, filopodia in untreated cells. It would be good to quantify their abundance to substantiate that resting HaCat cells are indeed a good model for filopodial transport bs. membrane retraction / spreading. In HaCat ECM, the virus also binds to laminin-332 for a good part. Would this not also confound the analysis?

      At first glance, the number of filopodia appears to be too low to account for such an efficient transport. However, please note that the formation of filopodia is very dynamic, and that they can form and disappear within minutes (see below). We also often observe many PsVs aligned at one filopodium. Moreover, not every cell periphery exhibits large accumulations of PsVs. Therefore, we believe it is in principle possible that filopodia are largely responsible for the transport. We cannot exclude that we overestimate the transport rate due to partial cell spreading after CytD removal, which, however, we consider as rather unlikely as in Figure 2 we observe no increase in the PCC when leupeptin was present during the CytD incubation. Under these conditions, PsVs do not translocate but cells could spread, and this would increase he PCC between PsVs and F-actin if cells would spread into the area of accumulated PsVs.

      We now state on line 304: ‘This suggests that the half-time of PsV translocation from the periphery to the cell body is about 15 min. In fact, the half-time maybe longer, as we cannot exclude that cell spreading after CytD removal contributes to less PsVs measured in the cell periphery.’ and on line 477 ‘As mentioned above, the half-time could be longer if cell spreading is in part responsible for the translocation of PsVs onto the cell body. However, we assume that this is rather unlikely, as cell spreading would increase the PCC between PsVs and F-actin under a condition where filopodia mediated transport is blocked but not cell spreading, which is not the case (Figure 2B and D, CytD/leupeptin).’

      (4) Figure 2: This would benefit from live cell analysis. There are considerable amounts of virions on the cell body, which partially contradicts statements from Figure 1.

      Does the referee refer to the images shown in Figure 4 (old Figure 2)? Please note that at CytD/0 min there are hardly any PsVs in the cell body region, the fluorescence (magenta LUT) is autofluorescence (this is explained in the results section). Only at later time points PsVs are in the cell body region.

      The fast transfer to the cell body after cyto D washout is based on the assumption that filopodia formation and transport along them (and not membrane extension) occur quickly. Is this reasonable?

      We are no experts on filopodia, but one finds references suggesting that they grow at rates of several µm per minutes and have lifetimes between a few seconds and several minutes. Hence, within the 15 min we determine for the transport, cells may need a few minutes to recover from CytD, a few minutes to form filopodia that reach out into the ECM, and a few minutes for the transport itself. However, we agree that we cannot exclude membrane extension contributing to our observed transport, although we consider this as rather unlikely (see above).

      (5) Figure 3: The rationale of claiming the existence of 'endocytic structures' needs to be better explained and quantified in the according supplementary figure.

      We now state in the legend ‘We propose that the agglomerated CD151 maxima close to PsVs feature the characteristics of endocytic structures, as CD151 has been shown to co-internalize with PsVs (Scheffer et al. 2013), and as these structures invaginate into the cell, like PsV filled tubular organelles previously described by electron microscopy (Schelhaas et al. 2012).’ For a proper quantification of these highly variable structures a much larger sample would be required.

      The formation of virus-filled tubules upon cytoD treatment has been previously reported. Are these viruses that come from the cell body or from the ECM?

      With the new data and explanations that have been added to the manuscript, it should be clear that it is reasonable to assume that they come largely from the ECM.

      (6) Figure 4: How are the subcellular ROIs chosen? Is there not a bias by not studying a full cell?

      We now explain better how we chose cells for analysis. We state on line 138 ‘Instead, we focus on isolated HaCaT cells or cells at the periphery of cell patches. In these cells, we find more PsVs per cell than one would expect from the employed 50 viral genome equivalents (vge) per cell, as PsVs are unequally distributed between the cells. Moreover, these PsVs usually are not homogenously distributed around the cell but concentrate at one region. We investigate the translocation of PsVs from these regions, defining ROIs for analysis that cover PsVs at the periphery and the cell body (see Supplementary Figures 6A and 8A).’

      (7) Figure 5/6: The data needs a better analysis on correlation by using randomisation as explained above.

      Please see our reply to the same point of the public review raised by the same referee.

      (8) Figure 7: This model involves CD151 being a mediator in transfer, but this has not been functionally shown. There are HaCaT CD151 KO cells available (from the Sonnenberg lab), it would be good to use those to test the model and whether transfer indeed involves CD151.

      As already stated above, we are sorry for having raised the impression that PsVs bind directly to CD151. The model Figure has been modified. Please see our reply above.

      (9) The manuscript would benefit from a number of experiments addressing the most crucial issues:

      (a) As mentioned before, the use of blebbistatin, which blocks myosin II function and arrests actin retrograde flow within seconds of addition, would be a good inhibitor to control for transfer in at least some of the most crucial experiments.

      In Figure 8 we have tested blebbistatin. Please see our reply above.

      (b) Live cell analysis would allow for monitoring of whether membrane retraction upon cytoD treatment would have to be taken into account for the analysis of the data. The same is true for the cytoD washouts, upon which most cells exhibit pronounced membrane spreading. The latter is important to support filopodial transport rather than membrane ruffling and spreading, leading to the clearance of extracellular virions from the ECM.

      We agree that this would be desirable. As replied above, we now discuss the issue of possible membrane spreading and reason why we consider it as rather unlikely.

      (c) To rid oneself of the issue of plasma membrane-bound virions as a confounding factor, one could use cells treated by sodium chlorate, which leads to undersulfation of HS on the cell surface, and seed them onto ECM with functional HSPGs. This would then indeed establish that the HS and virus are transferred together.

      We agree that this would be a smart experiment. As the main focus of our study is not clarifying whether PsVs are coated with HS or not, we gave other experiments priority.

      (10) The manuscript is, while carefully and thoughtfully worded on the issue of microscopy analysis, for a good part, extrapolating too strongly from the authors' data and unsubstantiated assumptions to conclude on their model. It would be good if the authors would support their claims with previous or their own experimental work. Just two examples of several: the assumption that cell-bound virions are negligible should be substantiated, as the literature would indicate otherwise.

      We determined the PsV density in adhered, CytD treated cells, and find around 0.14 per µm<sup>2</sup> (Supplementary figure 1B), which is 4 to 5-fold less when compared to the PsV density quantified in an area covering the cell body and the periphery (Figure 1B, see line 174 for PsVs/µm<sup>2</sup> values). Quantifying the PsV density only in the periphery would yield a severalfold larger difference. However, due to the limited resolution of the microscope we would strongly underestimate the PsV density in the accumulations. We prefer not to discuss this in detail, as exact numbers are difficult to obtain.

      Line 129: Cyto D should not inhibit the enzymes modifying HS or proteins (including virions). This is true, but cytoD may limit their secretion and abundance.

      We show in Figure 3 that CytD does not reduce HS staining (e.g., by limiting HS secretion, as suggested by the referee), suggesting that it rather does not limit secretion.

      We thank the referee´s and the reviewing editor for their helpful comments!

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to examine how the covariation between cognition (represented by a g-factor based on 12 features of 11 cognitive tasks) and mental health (represented by 133 diverse features) is reflected in MR-based neural markers of cognition, as measured through multimodal neuroimaging (structural, rsfMRI, and diffusion MR). To integrate multiple neuroimaging phenotypes across MRI modalities, they used a so-called stacking approach, which employs two levels of machine learning. First, they built a predictive model from each neuroimaging phenotype to predict a target variable. Next, in the stacking level, they used predicted values (i.e., cognition predicted from each neuroimaging phenotype) from the first level as features to predict the target variable. To quantify the contribution of the neural indicators of cognition explaining the relationship between cognition and mental health, they conducted commonality analyses. Results showed that when they stacked neuroimaging phenotypes within dwMRI, rsMRI, and sMRI, they captured 25.5%, 29.8%, and 31.6% of the predictive relationship between cognition and mental health, respectively. By stacking all 72 neuroimaging phenotypes across three MRI modalities, they enhanced the explanation to 48%. Age and sex shared substantial overlapping variance with both mental health and neuroimaging in explaining cognition, accounting for 43% of the variance in the cognition-mental health relationship.

      Strengths:

      (1) A big study population (UK Biobank with 14000 subjects).

      (2) The description of the methods (including Figure 1) is helpful in understanding the approach.

      (3) This revised manuscript is much improved compared to the previous version.

      Weaknesses:

      (1) Although the background and reason for the study are better described in this version of the manuscript, the relevance of the question is, in my opinion, still questionable. The authors aimed to determine whether neural markers of cognition explain the covariance between cognition and mental health and which of the 72 MRI-based features contribute to explaining most of the covariance. I would like to invite the authors to make a stronger case for the relevance, keeping the clinical and scientific relevance in mind (what would you explain to the clinician, what would you explain to the people with lived experience, and how can this knowledge contribute to innovation in mental health care?).

      Thank you for this insightful observation. We agree that establishing the real-world significance of fundamental research is paramount, and we have revised our manuscript to better articulate this relevance.

      For clinicians, our work (a) corroborates the link between cognition and mental health, confirming the transdiagnostic role of cognition, and (b) demonstrates that current neuroimaging tools can capture the neurobiology underlying this relationship. These findings offer several implications for clinical practice. First, they support the development of interventions aimed at enhancing cognitive functioning as a pathway to improving mental health. Second, our work introduces neuroimaging as a potential tool for assessing the neurobiological basis of the cognition–mental health connection. With further research, clinicians may be able to use neuroimaging to track cognitive changes at the neural level, which could help monitor treatment efficacy for interventions (e.g., stimulant medications for ADHD) designed to boost cognitive functioning.

      Following your suggestions, we have expanded the Discussion (Line 684) to include future directions and clinical perspectives on the findings.

      Line 684: “Neuroimaging offers a unique window into the biological mechanisms underlying cognition–mental health overlap – insights unattainable from behavioural data alone. Our findings validate brain-based neural markers as a core unit of analysis for cognitive functioning, advancing mental health research through the lens of cognition. Beyond this conceptual contribution, the study has clinical implications. First, by demonstrating a transdiagnostic link between cognition and mental health, we support interventions that enhance cognition as a pathway to improving mental health. Second, we show neuroimaging as an effective tool for assessing the neurobiological basis of this link. Quantifying neuroimaging’s capacity to capture this relationship is essential for future research integrating imaging with cognitive testing to monitor treatment-related neural changes. Such work could enable personalised interventions, using neuroimaging to track cognitive changes and treatment efficacy (e.g., stimulant medications for ADHD) aimed at boosting cognitive functioning.”

      (2) The discussion on the interpretation of the positive and negative PLRS loadings is not very convincing, and the findings are partly counterintuitive. For example (1) how to explain that distress has a positive loading and anxiety/trauma has a negative loading?; (2) how to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma? From both a clinical and a neuroscientific perspective, this is hard to interpret.

      Thank you for pointing this out. We appreciate your concern regarding the interpretation of positive and negative PLSR loadings. To clarify:

      (1) The directions of PLSR loadings are broadly consistent with univariate correlations, suggesting that the somewhat counterintuitive relationships mentioned are shown even when we apply simply univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. It constructs new components – linear combinations of predictors – that simultaneously explain variance in the predictors and their covariance with the response.

      (2) The positive loading of distress likely reflects cohort-specific questionnaire design in the UK Biobank, where feeling of distress was tied to seeking medical help. Individuals with higher cognition and socioeconomic status may be more likely to seek professional support, which explains the counterintuitive direction.

      (3) The negative loadings of wellbeing and happiness may also reflect cohort-specific effects, such as older age, and align with prior work linking excessive optimism to poorer reasoning and cognitive performance. This suggests that realism or pessimism may sometimes be associated with better cognition, particularly in older adults.

      These points are discussed in detail in the manuscript (Lines 493–545). We have emphasised that some of these findings may be cohort-specific and cited supporting literature, as seen below.

      (1) How to explain that distress has a positive loading and anxiety/trauma has a negative loading?

      Line 493: “The directions of PLSR loadings were broadly consistent with univariate correlations. PLSR extends beyond univariate approaches by modelling multivariate relationships across features and outcomes. Consistently, both univariate correlations and factor loadings derived from the PLSR model indicated that scores for mental distress, alcohol and cannabis use, and self-harm behaviours related positively, and the scores for anxiety, neurological and mental health diagnoses, unusual or psychotic experiences, happiness and subjective well-being, and negative traumatic events related negatively to the g-factor. Positive PLSR loadings of features related to mental distress may indicate greater susceptibility to or exaggerated perception of stressful events, psychological overexcitability, and predisposition to rumination in people with higher cognition [72]. On the other hand, these findings may be specific to the UK Biobank cohort and the way the questions for this mental health category were constructed. In particular, to evaluate mental distress, the UK Biobank questionnaire asked whether an individual sought or received medical help for or suffered from mental distress. In this regard, the estimate for mental distress may be more indicative of whether an individual experiencing mental distress had an opportunity or aspiration to visit a doctor and seek professional help [73]. Thus, people with better cognitive abilities and also with a higher socioeconomic status may indeed be more likely to seek professional help.”

      Line 529: “Consistent with previous studies, we showed that anxiety and negative traumatic experiences were inversely associated with cognitive abilities [90–93]. Anxiety may be linked to poorer cognitive performance via reduced working memory capacity, increased focus on negative thoughts, and attentional bias to threatening stimuli that hinder the allocation of cognitive resources to a current task [94–96]. Individuals with PTSD consistently showed impaired verbal and working memory, visual attention, inhibitory function, task switching, cognitive flexibility, and cognitive control [97–100]. Exposure to traumatic events that did not reach the PTSD threshold was also linked to impaired cognition. For example, childhood trauma is associated with worse performance in processing speed, attention, and executive function tasks in adulthood, and age at a first traumatic event is predictive of the rate of executive function decline in midlife [101,102]. In the UK Biobank cohort, adverse life events have been linked to lower cognitive flexibility, partially via depression level [103].”

      (2) How to explain that mental health features like wellbeing and happiness load in the same direction as psychosis and anxiety/trauma?

      Line 545: “Finally, both negative PLSR loadings and corresponding univariate correlations for features related to happiness and subjective well-being may be specific to the study cohort, as these findings do not agree with some previous research [107–109]. On the other hand, our results agree with the study linking excessive optimism or optimistic thinking to lower cognitive performance in memory, verbal fluency, fluid intelligence, and numerical reasoning tasks, and suggesting that pessimism or realism indicates better cognition [110]. The concept of realism/optimism as indicators of cognition is a plausible explanation for a negative association between the gfactor and friendship satisfaction, as well as a negative PLSR loading of feelings that life is meaningful, especially in older adults who tend to reflect more on the meaning of life [111]. The latter is supported by the study showing a negative association between cognitive function and the search for the meaning of life and a change in the pattern of this relationship after the age of 60 [112]. Finally, a UK Biobank study found a positive association of happiness with speed and visuospatial memory but a negative relationship with reasoning ability [113].”

      (3) The analysis plan has not been preregistered (e.g. at OSF).

      Note: the computational aspects of the methods fall beyond my expertise.

      Thank you for pointing this out. We acknowledge that the analysis plan was not preregistered, as our approach was primarily data‑driven rather than hypothesis‑driven. We essentially applied the machine learning approach to quantify the strength of the cognition-mental health relationship in relation to neuroimaging. To ensure transparency and reproducibility, we have made all analysis code and intermediate outputs publicly available on our GitHub repository (https://github.com/HAM-lab-Otago-University/UKBiobank/) within the constraints of UK Biobank’s ethical policy and provided a detailed description of each methodological step in the Supplementary Materials.

      Reviewer #2 (Public review):

      Summary:

      The goal of this manuscript was to examine whether neural indicators explain the relationship between cognition and mental health. The authors achieved this aim by showing that the combination of MRI markers better predicted the cognition-mental health covariation.

      Strengths:

      The evidence supporting the conclusions is compelling. There is a large sample (UK biobank data) and a clear description of advanced analyses.

      Weaknesses:

      In the previous version of the paper, it was not completely clear what it means to look at the overlap between cognition and mental health. The authors have addressed this in the current version.

      Thank you for your positive feedback and for recognizing the strengths of our work. We appreciate your comments and are happy that the revisions addressed your concerns.

    1. Author response:

      A more in-depth explanation of marker panel applications is needed. Specifically, how should users interpret gene panels where individual genes show only moderate or low expression levels, but the combination provides high specificity? Providing a concrete example, along with guidelines for interpreting such combinatorial signatures, would enhance the practical utility of the method.

      We appreciate the need to explain and demonstrate how to use the novel combinatorial gene marker sets that CellCover generates. To be clear, individual genes expressed at low levels and in small numbers of cells, in general, have high specificity (the ability to mark cells of a particular type without erroneously marking other cells as this type) and are often used in combinations by CellCover to achieve a panel of genes with high sensitivity (the ability to mark all cells of a particular type). Low or sparsely expressed genes of this type may represent poorly measured genes (i.e. zero inflation known to occur in single-cell data, where genes are measured as zero in cells which actually express the gene) or may represent genes which are truly expressed only in a subset of the annotated class. Because CellCover can borrow strength across genes, it can harness the true information in either class of genes, even if affected by zero inflation. Further investigation of structure within the cell class (and across other cell classes) using the CellCover gene marker panel, as well as other genes, is necessary to clarify this issue in any particular analysis. In the manuscript, we evaluate the expression of individual genes within and across classes in this manner to understand deeper structure in Figures 1A, S6 and S8.

      To demonstrate how CellCover selects individual genes with high specificity and low sensitivity, but which are complementary to one another, in order to achieve high collective sensitivity, here we consider a hypothetical dataset of many cells where we focus on one cell class that contains 100 cells composed of four subtypes.

      - Subtype A: cells 1–20

      - Subtype B: cells 21–30

      - Subtype C: cells 31–50

      - Subtype D: cells 51–100

      To illustrate how CellCover evaluates marker gene panels, in this example, the genes under instigation have very different weights (i.e. the ratio of a gene’s expression in the cell class of interest versus its expression in other cells). Suppose we have two candidate marker panels:

      Panel 1 (coarse markers).

      - Gene A: covers cells 1–30 (weight = 0.4)

      - Gene B: covers cells 30–60 (weight = 0.3)

      - Gene C: covers cells 60–100 (weight = 0.2)

      Each gene in this panel covers a relatively large portion of the population (> 30%), but their weights are comparatively high, indicating limited specificity to the focal cell type. Although the panel {A,B,C} attains full coverage, its markers are coarse and nonspecific.

      Panel 2 (fine-grained, combinatorial markers).

      - Gene A’: covers cells 1–20 (weight = 0.05)

      - Gene B’: covers cells 20–30 (weight = 0.10)

      - Gene C’: covers cells 30–50 (weight = 0.05)

      - Gene D’: covers cells 50–100 (weight = 0.10)

      Each marker is expressed in a smaller fraction of the population (individually low sensitivity), but the weights are substantially lower, reflecting strong subtype specificity. Importantly, these genes are complementary: their union covers all 100 cells (high combinatorial sensitivity), even though no single gene spans more than 20–50% of the cells.

      Under a strict covering requirement (e.g., α \= 0, requiring 100% coverage, i.e. perfect sensitiity), both panels satisfy the constraint. However, CellCover selects the second panel because its total weight (specificity) is smaller. This preference reflects the design of the objective function: the method favors markers that are highly cell-type-specific, even if they individually cover only a subset of the population, as long as their complements yield full coverage. As a result, CellCover can reveal refined subtype structure within what appears to be a single cell population.

      Interpretation guidelines. We explicitly note that CellCover marker panels should be interpreted as combinatorial signatures:

      - Individual genes may show localized, subtype-restricted expression.

      - The union of their expression defines the target cell type.

      - Low-weight genes are more specific; CellCover therefore prioritizes them whenever they provide complementary coverage.

      - The resulting panel may highlight latent heterogeneity or subpopulations within the cell type that express different subsets of the markers.

      In addition to these technical guidelines for interpreting gene panels, throughout the manuscript we use the transfer of CellCover marker gene panels to related datasets to assess the biological function of the gene sets. We propose this as a general tool in the examination of gene lists and have implemented methods to visualize the expression of any gene list (including gene lists uploaded by users) using the Projection Tool within NeMO Anlaytics.

      Further quantification of CellCover’s sensitivity in detecting rare cell subtypes or states would strengthen the evaluation of its performance. Additionally, it would be helpful to assess how CellCover performs under noisy conditions, such as low cell numbers or read depths, which are common challenges in scRNA-seq datasets.

      While CellCover is a method to define marker gene panels for cell classes that are already defined in a dataset, its performance on rare cell classes, small numbers of cells and low read depths is still a relevant issue. The analyses in the paper can speak to some of these concerns: The Telley dataset, which we use throughout the manuscript, used FlashTag labeling of cells prior to sequencing in order to ascertain the time since terminal division for each cell. This unique metadata linked to each cell’s expression data enabled many of the analyses we performed in the paper, but also limited the number of cells that were sequenced. For this reason, the number of cells in this dataset (total cells = 2756) is much lower than that seen in the vast majority of other single-cell sequencing studies, including those we use for the transfer of marker gene sets defined by CellCover in the Telley data. As a result, the cell classes for which we define marker gene panels in the paper contain relatively small numbers of cells. This is especially true in the 12-class analysis in Figures 4 and 5 where CellCover successfully defines gene panels for all 12 classes which transfer well to other datasets. Total cells per class range from 134 to 301. Figure S6 shows that the discriminative power of the 12 gene panels varied widely, with the most highly discriminative panel being from the E12.1H condition with only 189 cells).

      In addition, we note that the behavior of CellCover on rare (or any) cell classes can be characterized deterministically under mild condition. For a fixed cell class and a required covering rate of 1, a depth-k covering gene panel exists if and only if every cell in the class expresses at least k genes. Under this condition, CellCover is guaranteed to find a covering panel of depth-k. Importantly, this guarantee does not impose any restriction on the panel size. Consequently, the compactness of the resulting panel reflects intrinsic properties of the data rather than algorithmic limitations: a small panel indicates that a subset of genes is robustly and consistently expressed across most cells in the class, even if the class itself is rare, whereas a large panel suggests highly heterogeneous expression patterns, where different genes are expressed in different cells. In this sense, the feasibility and structure of a covering panel are determined by the biological and technical characteristics of the dataset (e.g., read depth, expression sparsity, and the specificty of gene expression in the defined cell classes), rather than by the performance of CellCover itself.

      It is intriguing and novel that CellCover analysis of the dataset from Telley et al. suggests cell-type-specific expression of ribosomal, mitochondrial, or tRNA genes. These findings would be significantly strengthened by additional validation. For example, the reported radial glia-specific expression of Rps18-ps3 and Rps10-ps1, as well as the postmitotic neuron-specific expression of mt-Tv and mt-Nd4l, should be corroborated using independent scRNA-seq or spatial transcriptomic datasets of the developing neocortex. Alternatively, these expression patterns could be directly examined through immunostaining or single-molecule FISH analysis.

      The main problem with such analysis is that most studies have omitted the expression of these genes (especially mitochondrial genes that are primarily viewed as QC metrics) from their datasets. We encourage researchers to retain the expression of these transcripts in their data so that their biological functions can be explored. Where available, the expression of these genes can be visualized in NeMO Analytics in the mouse where the enrichment of Rps18-ps3 expression in radial glia can be seen in the Di Bella 2021 dataset and in the human where the expression of mt-Tv can be seen in neurons in the Polioudakis 2019, Darmanis 2015, Camp 2015, and Liu 2016 datasets.

      Taking a broader perspective, a growing body of foundational work in developmental neurobiology supports the observation that mitochondrial state and metabolic programs undergo systematic changes during neuronal differentiation, consistent with our CellCover findings. For example, Khacho 2016 demonstrated that mitochondrial dynamics are essential regulators of neuronal fate commitment and that the maturation of the mitochondrial network is essential for the transition from the progenitor metabolic state to the neuronal state. Iwata 2020 further highlight cell type specific mitochondrial dynamics by showing that daughter cells with highly fragmented mitochondria tend to become neurons.

      The observation that outer radial glia (oRG) markers are expressed in neural progenitors before the emergence of gliogenic progenitors in primates and humans is compelling. This could be further supported by examining the temporal and spatial expression patterns of early oRG-specific markers versus gliogenic progenitor markers in recent human spatial transcriptomic datasets - such as the one published by Xuyu et al. (PMID: 40369074) or Wang et al. (PMID: 39779846).

      We have added the scRNA-seq data from Wang et al., as well as data from the Nano et al. 2025 meta-atlas to the NeMO Analytics data collection. oRG markers from Liu et al 2023 can now be visualized across the Wang, Nano and many more human in vivo datasets. In the Nano data, these oRG markers can be seen increasing in expression in the human neocortex from GW7-12, leading into peak neurogenesis and prior to gliogenesis. Although with lower age resolution, the peaking of oRG markers in the 2nd trimester (dring peak neurogenesis) and their precipitous drop in the 3rd trimester (during peak gliogenesis) can also be seen in the Wang data. At NeMO Analytics individual marker genes of oRGs can also visualized in these datasets.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      MPRAs are a high-throughput and powerful tool for assaying the regulatory potential of genomic sequences. However, linking MPRA-nominated regulatory sequences to their endogenous target genes and identifying the more specific functional regions within these sequences can be challenging. MPRAs that tile a genomic region, and saturation mutagenesis-based MPRAs, can help to address these challenges. In this work, Tulloch et al. describe a streamlined MPRA system for the identification and investigation of the regulatory elements surrounding a gene of interest with high resolution. The use of BACs covering a locus of interest to generate MPRA libraries allows for an unbiased and high-coverage assessment of a particular region. Follow-up degenerate MPRAs, where each nucleotide in the nominated sequences is systematically mutated, can then point to key motifs driving their regulatory activity. The authors present this MPRA platform as straightforward, easily customizable, and less time- and resource-intensive than traditional MPRA designs. They demonstrate the utility of their design in the context of the developing mouse retina, where they first use the LS-MPRA to identify active regulatory elements for select retinal genes, followed by d-MPRA, which allowed them to dissect the functional regions within those elements and nominate important regulatory motifs. These assays were able to recapitulate some previously known cis-regulatory modules (CRMs), as well as identify some new potential regulatory regions. Follow-up experiments assessing co-localization of the gene of interest with the CRM-linked GFP reporter in the target cells, and CUT&RUN assays to confirm transcription factor binding to nominated motifs, provided support linking these CRMs to the genes of interest. Overall, this method appears flexible and could be an easy-to-implement tool for other investigators aiming to study their locus of interest with high resolution.

      Strengths:

      (1) The method of fragmenting BACs allows for high, overlapping coverage of the region of interest.

      (2) The d-MPRA method was an efficient way to identify key functional transcription factor motifs and nominate specific transcription factor-driven regulatory pathways that could be studied further.

      (3) Additional assays like co-expression analyses using the endogenous gene promoter, and use of the Notch inhibitor in the case of Olig2, helped correlate the activity of the CRMs to the expression of the gene of interest, and distinguish false positives from the initial MPRA.

      (4) The use of these assays across different time points, tissues, and even species demonstrated that they can be used across many contexts to identify both common and divergent regulatory mechanisms for the same gene.

      Weaknesses:

      The LS-MPRA assay most strongly identified promoters, which are not usually novel regulatory elements you would try to discover, and the signal-to-noise ratio for more TSS-distal, non-promoter regulatory elements was usually high, making it difficult to discriminate lower activity CRMs, like enhancers, from the background. For example, NR2 and NR3 in Figure 3 have very minimal activity peaks (NR3 seems non-existent). The ex vivo data in Figure 2 are similarly noisy. Is there a particular metric or calculation that was or could be used to quantitatively or statistically call a peak above the background? The authors mention in the discussion some adjustments that could reduce the noise, such as increased sequencing depth, which I think is needed to make these initial LS-MPRA results and the benchmarking of this assay more convincing and impactful.

      Much of the statistical and quantitative data asked for by the Reviewers have been provided in the Revision. However, it is important to note that the types of statistics using peak callers asked for regarding candidate choice will be of limited value. If one is testing a library in a single cell type in vitro, and/or running genome-wide assays, these statistics could aid in the choice of candidates. However, here we are electroporating a complex and dynamic set of cells, with each cell type constituting what can be very different frequencies (e.g. Olig2-expressing cells are <2.4% of cells). This fact alone will give different apparent signal to noise values. In addition, at least for Olig2 and Ngn2, their expression is very transient, suggesting dynamic regulation by what is likely multiple positive and negative CRMs. An additional confound is that the level of expression of each gene that one might test is variable. All of these variables render a statistical prediction of candidates to be less valuable than one might hope, and might lead one to miss those CRMs of interest, particularly those in a small subset of cells. Instead, we suggest that one use one’s own level of interest and knowledge in choosing CRM candidates. We provide several examples of experimental, rather than purely statistical, approaches that might help in one’s choice of candidates. We used a functional read-out of CRM activity (Notch perturbation), carried out in the context of the entire LS-MPRA library, as one method. Co-expression in single cells of candidate regulators identified by the d-MPRA is another. One can of course use chromatin structure and sequence conservation, as used in many studies of regulatory regions, as other ways to narrow down candidates. The d-MPRA predictions also can be viewed in light of previous genetic studies, i.e. mutations in TFs that effect the cell type of interest or the regulation of the gene of interest, as we were able to do here for CRMs predicted to be regulated by Otx2.

      Reviewer #2 (Public review):

      Summary:

      In this study, Tulloch et al. developed two modified massively parallel reporter assays (MPRAs) and applied them to identify cis-regulatory modules (CRMs) - genomic regions that activate gene expression, controlling retinal gene expression. These CRMs usually function at specific developmental stages and in distinct cell types to orchestrate retinal development. Studying them provides insights into how retinal progenitor cells give rise to various retinal cell types.

      The first assay, named locus-specific MPRA (LS-MPRA), tests all genomic regions within 150-300 kb of the gene of interest, rather than relying on previously predicted candidate regulatory elements. This approach reduces potential bias introduced during candidate selection, lowers the cost of synthesizing a library of candidate sequences, and simplifies library preparation. The LS-MPRA libraries were electroporated into mouse retinas in vivo or ex vivo. To benchmark the method, the authors first applied LS-MPRA near stably expressed retinal genes (e.g., Rho, Cabp5, Grm6, and Vsx2), and successfully identified both known and novel CRMs. They then used LS-MPRA to identify CRMs in embryonic mouse retinas, near Olig2 and Ngn2, genes expressed in subsets of retinal progenitor cells. Similar experiments were conducted in chick retinas and postnatal mouse retinas, revealing some CRMs with conserved activity across species and developmental stages.

      Although the study identified CRMs with robust reporter activity in Olig2+ or Ngn2+ cells, the data do not provide sufficient evidence to support the claims that these CRMs regulate Olig2 or Ngn2, rather than other nearby genes, in a cell-type-specific manner. For example, the authors propose that three regions (NR1/2/3) regulate Olig2 specifically in retinal progenitor cells based on: (1) the three regions are close to Olig2, (2) increased Olig2 expression and NR1/2/3 activity upon Notch inhibition, and (3) reporter activity observed in Olig2+ cells (though also present in many Olig2- cells). While these are promising findings, they do not directly support the claims.

      The second assay, called degenerate MPRA (d-MPRA), introduces random point mutations into CRMs via error-prone PCR to assess the impact of sequence variations on regulatory activity. This approach was used on NR1/2/3 to identify mutations that alter CRM activity, potentially by influencing transcription factor binding. The authors inferred candidate transcription factors, such as Mybl1 and Otx2, through motif analysis, co-expression with Olig2 (based on single-cell RNA-seq), and CUR&RUN profiling. While some transcription factors identified in this way overlapped with the d-MPRA results, others did not. This raises questions about how well d-MPRA complements other methods for identifying transcriptional regulators.

      Strengths:

      (1) The study introduces two technically robust MPRA protocols that offer advantages over standard methods, such as avoiding reliance on predefined candidate regions, reducing cost and labor, and minimizing selection bias.

      (2) The identified regulatory elements and transcription factors contribute to our understanding of gene regulation in retinal development and may have translational potential for cell-type-specific gene delivery into developing retinas.

      Weaknesses:

      (1) The claims for gene-specific and cell type-specific CRMs would benefit from further validation using complementary approaches, such as CRISPR interference or Prime editing.

      The methods that we developed were meant to provide candidates for regulatory elements for a gene of interest. These candidates could be used to further understand the regulation of a gene, a complex and difficult task, especially for dynamically regulated genes in the context of development. These candidates could also, or instead, be used to drive gene expression specifically in a target cell of interest for applications such as gene therapy or perturbations that need this type of specificity. In the first case, to use the candidates to understand the regulation of a gene, one would need to validate the candidates using the types of methods typically employed for this purpose, most rigorously in the in vivo genomic context. We did not pursue this level of validation as it would encompass a great deal of work outside the scope of the current study. However, by initially testing loci which have been studied by several groups (as cited in the manuscript, Rho, Grm6, Vsx2, and Cabp5), we were able to show that LS-MPRA can identify known CRMs. In the cases of Rho and Vsx2, previous data have shown the CRMs to be relevant in the genomic context in vivo. In addition, two Vsx2 CRM’s identified by LS-MPRA are located at -37 Kb and -17Kb, and the Grm6 CRM identified by LS-MPRA is at -8Kb. These are the same CRM locations identified previously using classical methods. These data show that the method is capable of identifying distal elements. When one has only one or a few loci of interest, i.e. one does not need to use genome-wide approaches, LS-MPRA is accurate enough to be worth the relatively small effort to identify potential CRMs, even those at some distance from the TSS. However, it is apparent that our methods are not perfect and that the LS-MPRA does not pick up all CRMs. We do not know of a method that has been shown to do so.

      Reviewer #3 (Public review):

      Summary:

      Use of reporter assays to understand the regulatory mechanisms controlling gene expression moves beyond simple correlations of cis-regulatory sequence accessibility, evolutionary sequence conservation, and epigenetic status with gene expression, instead quantifying regulatory sequence activity for individual elements. Tulloch et al., provide a systematic characterization of two new reporter assay techniques (LS-MPRA and d-MPRA) to comprehensively identify cis-regulatory sequences contained within genomic loci of interest during retinal development. The authors then apply LS-MPRA and d-MPRA to identify putative cis-regulatory sequences controlling Olig2 and Ngn2 expression, including potential regulatory motifs that known retinal transcription factors may bind. Transcription factor binding to regulatory sequences is then assessed via CUT&RUN. The broader utility of the techniques is then highlighted by performing the assays across development, across species, and across tissues.

      Strengths:

      (1) The authors validate the reporter assays on retinal loci for which the regulatory sequences are known (Rho, Vsx2, Grm6, Cabp5) mostly confirming known regulatory sequence activity but highlighting either limitations of the current technology or discrepancies of previous reporter assays and known biology. The techniques are then applied to loci of interest (Olig2 and Ngn2) to better understand the regulatory sequences driving expression of these transcription factors across retinal development within subsets of retinal progenitor cells, identifying novel regulatory sequences through comprehensive profiling of the region.

      (2) LS-MPRA provides broad coverage of loci of interest.

      (3) d-MPRA identifies sequence features that are important for cis-regulatory sequence activity.

      (4) The authors take into account transcript and protein stability when determining the correlation of putative enhancer sequence activity with target gene expression.

      Weaknesses:

      (1) In its current form, the many important controls that are standard for other MPRA experiments are not shown or not performed, limiting the interpretations of the utility of the techniques. This includes limited controls for basal-promoter activity, limited information about sequence saturation and reproducibility of individual fragments across different barcode sequences, limitations in cloning and assay delivery, and sequencing requirements. Additional quantitative metrics, including locus coverage and number of barcodes/fragments, would be beneficial throughout the manuscript.

      We thank the reviewer for these comments and have provided detailed responses to the additional analyses in the subsequent Recommendations section.

      (2) There are no statistical metrics for calling a region/sequence 'active'. This is especially important given that NR3 for Olig2 seems to have a small 'peak' and has non-significant activity in Figure 4.

      See comments about peak calling in our response to Reviewer #1.

      (3) The authors present correlational data for identified cis-regulatory sequences with target gene expression. Additionally, the significance of transcription factor binding to the putative regulatory sequences is not currently tested, only correlated based on previous single-cell RNA-sequencing data. While putative regulatory sequences with potential mechanisms of regulation are identified/proposed, the lack of validation (and discrepancies with previous literature) makes it hard to decipher the utility of the techniques.

      See comments about further validation in our response to Reviewer #2.

      (4) While the interpretations that Olig2 mRNA/protein expression is dynamically regulated improved the proportions of cells that co-expressed CRM-regulated GFP and Olig2, alternate explanations (some noted) are just as likely. First, the electroporation isn't specific to Olig2+ progenitors. Also, the tested, short CRM fragments may have activating signals outside of Olig2 neurogenic cells because chromatin conformation, histone modifications, and DNA methylation are not present on plasmids to precisely control plasmid activity. Alternatively, repressive elements that control Olig2 expression are not contained in the reporter vectors.

      The electroporation of Olig2 minus and plus cells is an excellent way to determine if a CRM is active in all cells, or only a specific subset, and we therefore consider this the best way to answer the question of specificity. We agree that we were unable to show that all CRM active cells were indeed Olig2-expressing cells. As noted by the Reviewer, we went to some lengths to quantify RNA and protein co-expression, including of endogenous Olig2 protein and RNA. Even with the endogenous RNA and protein, there was a mismatch wherein one infrequently saw the two together in the same cell, which could be predicted from the short half-lives of these molecules. Regarding chromatin, etc., we are intrigued by the proper regulation that we have observed for CRMs that we have previously discovered by plasmid electroporation (e.g. Kim et al. 2008, Matsuda and Cepko, 2004, Wang et al. 2014, Emerson et al. 2013). It is indeed interesting that plasmids can recapitulate proper regulation, without the proper genomic context or chromatin modifications. We have expanded our discussion of these points in the Discussion.

      (5) It is unclear as to why the d-MPRA uses a different barcoding strategy, placing a second copy of the cis-regulatory sequence in the 3' UTR. As acknowledged by the author, this will change the transcript stability by changing the 3' UTR sequence. Because of this, comparisons of sequence activity between the LS-MPRA and d-MPRA should not be performed as the experiments are not equivalent.

      We had provided a rationale for the different strategies of barcoding in the original submission, and believe it is at the discretion of the experimenter to utilize either strategy for their specific purposes. We agree that comparing activity between different techniques would not be appropriate. The analysis of mutated CRMs using d-MPRA does not utilize data from the LS-MPRA, but is an analysis of relative activity among all mutated d-MPRA constructs.

      (6) Furthermore, details of the mutational burden in d-MPRA experiments are not provided, limiting the interpretations of these results.

      We have provided detailed responses to the additional analyses in the subsequent Recommendations section and included details of the mutational burden in Supplemental Document A.

      (7) Many figures are IGV screenshots that suffer from low resolution. Many figures could be consolidated.

      We have increased the resolution of all IGV genome tracks, but believe the content within all figures remains appropriate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Suggestions for improving the clarity of the results in the figures:

      (1) The pie charts used the show the percentage of overlapping cells in the colocalization analyses were not especially intuitive to read, and although the percentages and any statistical significance were often written in the text, it would've been helpful to have them written in the figures. I would suggest displaying the results in stacked bar plots, possibly like the one shown in Figure 6A, to demonstrate the data more clearly.

      We thank the reviewer for the suggestions. Though adding the percentages directly to the pie charts would make the relevant panels too confusing to interpret, we added supplemental tables (Tables S5-S9) with the percentages displayed in all pie charts for readers interested in the precise quantifications.

      (2) The scRNA-seq UMAPs showing co-expression of Olig2 with the TFS of interest - it is very hard to see the cells that co-express. I would recommend either having a window zoomed in on the Olig2-expressing cell population to be able to see the co-expression more clearly visually, and/or including a graph demonstrating the percentages of co-expressing cells. These numbers were written in the text, but would be useful to see in the figure.

      The resolution of the scRNA-Seq plot has been improved for the visualization of co-expressing cells, which were also brought forward in all UMAP plots to improve clarity. Because of the higher quality images, insets should no longer be necessary. We have also included percentages of co-expression in the figures (Figs. 8 and 8S) and thank the reviewer for the suggestion.

      Other minor suggestions/corrections:

      (3) Figures 6B and 10S are missing the overlap quantification (in bar or pie charts) like in the other figures.

      The quantification for the image in 6B (i.e., GFP fluorescence and GFP RNA) is displayed in 6D for the four Olig2 CRM plasmid constructs. In Fig. 10S, the experiments in early chick ventral neural tube delivered constructs to a very limited number of cells, and quantification of cells would not necessarily represent an accurate number of cells with CRM activity. We therefore decided to show only representative images of CRM activity in this population of cells rather than present a biased count or increase the number of experiments/samples to obtain a robust quantification.

      (4) On the second-to-last line of page 10, in the sentence "The d-MPRA approach provided a robust, high resolution method for functionally relevant TF binding sites....", I think you're missing a word between "for" and "functionally". For example, it might be "for identifying..." or "for nominating...".

      We have revised the sentence accordingly.

      Reviewer #2 (Recommendations for the authors):

      Minor suggestions:

      (1) Please indicate which mouse reference genome (e.g., mm10) was used in plots such as Figure 2.

      We have added text to the relevant sections in the Results (the reference genome was already mentioned in Methods).

      (2) In Figures 2 and 2S, the CRMs discussed in the text are not labeled or highlighted, making it unclear which regions are being referenced.

      We have labeled peaks with roman numerals in both the figures, legends, and text for clarity and thank the reviewer for the suggestion.

      (3) Consider listing the genomic coordinates for the CRMs mentioned in the text, as this information would be especially useful for readers interested in exploring these regions further.

      This information was included in Table 2S in the original submission, with all relevant coordinates provided therein.

      (4) The d-MPRA plots (e.g., Figure 7C-E) do not clearly show the effects of different nucleotide substitutions. A more informative visualization style can be found in Kircher et al (PMID: 31395865, Fig. 1D) or Deng et al (PMID: 38781390, Fig. 5F).

      The precise nucleotide substitutions would be informative to visualize the effects of specific changes. However, we were more interested in how any nucleotide substitution influenced the CRM activity to hone in on relevant TFBS. We therefore believe the current visualization is the most appropriate to accomplish this. However, for some types of future applications, a more informative visualization as noted would be a valuable addition.

      (5) It would be extremely helpful to the community if the LS-MPRA data were uploaded to the UCSC genome browser and made accessible via a link.

      We have uploaded all LS-MPRA genome tracks to a Track Hub in the UCSC genome browser and provided the appropriate link to access the Hub (https://github.com/cattapre/ALAS00) in the methods section.

      Reviewer #3 (Recommendations for the authors):

      (1) The authors should address the following metrics to showcase the utility of the techniques:

      We thank the reviewer for requesting the detailed metrics outlined below. We have addressed all inquiries and included the majority of metrics in the resubmission.

      (a) Library size

      This should be shown for each library that is generated. It is acknowledged that the complete size of the library is limited by sequencing, and the comprehensiveness of the library will change every time the library is re-prepped. However, metrics of this are not currently provided in a robust manner for each library. "Libraries of at least 7x10^6 and as many as 9x10^7 fragments are made" - vague - how was library complexity established since this seems to be an estimation, how many reads were utilized to estimate library complexity?

      We created a new supplemental table (Table S3) that displays the complexity based on sequencing rather than the estimated complexity based on the serial dilutions prior to 3D culture (which was used for the estimates listed in the results). We updated the complexity range in the text as well and thank the reviewer for the suggestion.

      Does library size scale proportionally to the BACs of different sizes?

      The fragmentation of different BACs with differing sizes does not necessarily alter the size of the library. Library size is primarily determined by the library creation pipeline, with the size selection step of the fragmented BAC and the cloning step that inserts adapter-ligated fragments into the barcoded expression vector being the primary determinants of complexity of plasmid libraries.

      (b) Sequence saturation

      Can the authors please provide evidence that the libraries have been sequenced to saturation or estimates of the degree of under-sequencing? How many reads does it take to discover a new barcode associated with a new regulatory sequence?

      We have provided library characteristics for this in Table S3 and have also generated Sequence Saturation Curves for each association library in Supplemental Document A.

      (c) Barcode saturation

      How many barcodes are present for each fragment in the libraries? Are most fragments only covered by 1 barcode? The barcoding strategy doesn't prevent the same barcode from being assigned to multiple different fragments, as barcodes are random. What is the incidence of barcode collisions?

      We have provided library characteristics for this in Table S3 and have also generated Barcode Saturation Curves for each association library in Supplemental Document A.

      Additionally, we tested whether the omission of barcode collisions would affect the output of our LS-MPRA. We reanalyzed one barcode abundance library (one replicate following 12h Notch inhibitor) and filtered the barcodes so that only unique barcodes were analyzed. We were able to replicate all previously identified peaks. Though it is not necessary to filter out barcode collisions, there may be an improvement in signal-to-noise if the sequencing depth of libraries was sufficient (see Supplemental Document B).

      (d) Normalization

      As performed, fragment activity is normalized by RNA expression compared to the presence of fragments in the library. While this is done for small libraries, for large libraries, this may not be appropriate. For large libraries, every sequence in the library will not be delivered to each cell, and many fragments contained in the library may not be electroporated at all. Ideally, the authors would have sequenced both the RNA and DNA from the electroporations to i) identify the fragment distribution of the library that was successfully electroporated and ii) provide an internal normalization factor across replicate samples. This is especially important if the libraries were ever re-prepped, as the jack-potting or asymmetries in fragment recovery can occur every time the library is re-derived.

      We agree with the reviewer’s comments about the variability in fragments delivered experimentally, though we also believe the normalization of the libraries is still appropriate. We never needed to re-prep the libraries as there was sufficient material for many more experiments than were performed. However, should one ever need to re-prep an LS-MPRA library, all experimental sequencing should be normalized to the respective sequenced association library to account for biased distributions, as the reviewer mentions.

      In the absence of these metrics (this would likely require the authors to repeat all experiments and is acknowledged to be outside the scope of revisions), the authors should provide information on the percentage of the library that is profiled in the RNA for each library.

      We have provided RNA profiles of all abundance libraries in Table S4. The overall fraction of fragments represented in the RNA pools was lower than that observed in other published MPRAs. This difference is expected given that most MPRA studies preselect fragments based on chromatin accessibility, transcription factor binding, sequence conservation, or bioinformatically predicted CRMs, thereby enriching for regulatory elements with high activity potential. Our locus-specific MPRA libraries, by contrast, include all fragments across the targeted genomic region, many of which are likely to be inactive in the tested context. Consequently, only a smaller proportion of fragments show measurable RNA expression.

      (e) Fragment sizes

      Please provide a density plot or something similar showcasing the size distribution of the libraries generated. Is there any correlation between sequence activity and the size of fragments?

      We have generated size distribution plots and correlations between fragment size and activity of all libraries and have included them in Supplemental Document A.

      (2) Questions about the statistical validity of results:

      (a) What threshold is utilized for calling a sequence as active? This is important as NR3 does not seem to be an element that has significant activity.

      See comments about peak calling in prior responses.

      (b) A Fisher's exact test using cells from single-cell RNA-sequencing as replicate samples is inappropriate as the cells are i) not from replicate experiments and ii) potentially in different cell states. The proportions of cells across replicate scRNA-seq datasets would be more appropriate.

      We thank the reviewer for raising this important point. While we agree that individual cells do not substitute for biological replicates, we believe Fisher’s exact test remains appropriate for testing whether gene expression is associated with Olig2 expression within a single scRNA-seq dataset. The test assesses co-occurrence at the level of individual cells, which is valid under the assumption that each cell represents an independent sampling of transcriptional states, even when it is possible that cells are in different states. We use this method as an exploratory tool to identify candidate genes associated with Olig2 expression in this dataset, and in the future, this could also be further validated by comparing the proportions of cells across replicate datasets, as the reviewer mentions.

      (3) Discussion of the reporter/Olig2/Ngn2 RNA/protein disconnect needs to be expanded. Some simpler explanations for the presence of GFP in Olig2- and Ngn2- cells, as well as the presence of Olig2 or Ngn2 in GFP- cells, is that (i) these putative CRMs are being introduced to cells in plasmids, taking them out of their native genomic context where they may be inaccessible or repressed and allowing them to drive reporter expression even if their candidate target gene is not endogenously expressed, (ii) these putative CRMs may regulate genes besides just Olig2 or Ngn2, and (iii) Olig2 and Ngn2 are regulated by far more regulatory elements than the 3 or 4 being tested in each reporter assay, so their expression likely does not rely solely on the activity of the few putative CRMs tested.

      We have added these points in an expanded discussion in the text.

      (4) Problems with figures: Low resolution of many IGV genome tracks, pink 'co-expression' dots are completely indiscernible. Numbers should be listed with the pie charts. BFP expression should be shown since this is being quantified, especially since electroporation efficiency can change across age and/or tissue samples.

      We have reconfigured the IGV tracks so that they are higher resolution and have included supplemental tables for the numbers pertaining to the pie charts. For electroporation controls (BFP and RFP), BFP expression is shown in Figs 5S, 6, and 10S and the RFP electroporation control is shown in Fig. 11. Though BFP is sometimes used as a qualifier in the denominator of some of the quantification, displaying its expression, particularly in combination with three other signals that are already included in most images, provides limited utility.

      (5) More information is required to understand the utility of the d-MPRA. Detailed quantification of the number of mutations/fragments needs to be ascertained. When multiple mutations are present, how are the authors controlling for which mutation is affecting activity? What is the coverage of the loci of interest for mutational burden (ie, is every base pair mutated in at least one fragment?). For mutations that increase the activity of the element, are there specific sequence features that increase activity (new motifs generated)?

      The d-MPRA platform is a high-throughput assay that seeks to identity putative sub-regions within CRMs nominated by the LS-MPRA, or any other assay. It relies on deep mutational coverage to determine positive and negative regulatory sub-regions of the CRMs. While many reads have multiple mutations, they are broadly co-occurring across the entire fragment (see Supplemental Document A) so as not to create a false linkage between the sites. Every individual site is mutated many times with roughly even coverage across each fragment (see Supplemental Document A), thus allowing us to assess the requirement of each base in contributing to a putative CRM’s activity. Comparing d-MPRA plots using bulk fragments or fragments with singleton mutations (Supplemental Document A) yielded almost identical plots for two libraries, and a similar analysis of the third library. Any differences between analysis of fragments with one or more mutations is likely a result of either sequencing depth or the requirement of multiple bases for binding or CRM activation. Follow-up experiments investigating intra-CRM interactions would elucidate such variability. Whether new motifs are generated for any specific substitution is an interesting question, which could be followed up for a CRM of interest. The d-MPRA data that we provide would provide the starting point for such follow-up experiments.

      (6) Transcription factors as regulators of CRM-activity.

      It is appreciated that the authors validated the binding of transcription factors to NR2. However, this correlative analysis should be further tested in follow-up experiments to highlight novel biology using systems already in place. Potential experiments that could be performed include the following (reagents in hand, or performed in a manner similar to experiments performed by the lab in previous publications):

      (a) over-expression of TF using LS-MPRA library.

      (b) over-expression of TF using d-MPRA library, showing that mutations in the putative TF binding site disrupt activity compared to non-mutated sequences.

      (c) performing TF over-expression using target CRMs, including sequences where the TF binding site is mutated (similar to a small MPRA).

      (d) the quantification of target gene expression when i) TF is over-expressed, ii) CRM is activated using CRISPRa, or iii) CRM is inhibited using CRISPRi.

      These are all valid follow-up experiments. Please see prior responses we have provided regarding further validation.

      Minor points

      (1) Please acknowledge that some distal regulatory sequences may be contained outside of the BAC regions. Also, the authors should emphasize the point that the assay is NOT cell-type-specific or specific to regulatory sequences for the gene of interest, but ALL regulatory sequences contained within the locus. The discussion of this with respect to Ift122 and Rpl32 is somewhat confusing.

      We have added a sentence in the Discussion addressing possible CRMs outside the BAC coverage. We believe it is implicitly understood that the assay only screens regulatory activity in the BAC, and believe we have addressed this in the manuscript.

      If one wishes to use a candidate CRM to drive gene expression in a targeted cell type, one needs to establish specificity. In particular, specificity needs to be established in the context of the vector that is being used. Non-integrated vs integrated vectors, different types of viral vectors with their own confounding regulatory sequences, different types of plasmids and methods of delivery, and copy number can all affect specificity. We provided a double in situ hybridization method for the examination of specificity for some of the novel candidate CRMs. It was quite difficult in the case of Olig2 and Ngn2 as their RNAs and proteins are unstable. We would need to provide further evidence should we wish to use these candidate CRMs for directing expression specifically in Olig2- or Ngn2-expressing cells. We suggest that an investigator can choose the vector and method for establishing specificity depending upon the goals of the application.

      (2) I am curious as to why low-resolution, pseudo-bulked single-nucleus ATAC was utilized instead of more comprehensive retina ATAC samples at similar time-points (for example, as available in Al Diri et al., 2017 (E14, E17, P0, P3, P7, P10) samples are all available.

      The use of pseudo-bulked single-nucleus ATAC-seq data provided a convenient and consistent comparison to our LS-MPRA results. We agree that incorporating higher-resolution datasets such as those from Al Diri et al. would be valuable for future analyses aimed at linking CRM activity with broader chromatin accessibility dynamics.

    1. Author response:

      eLife Assessment

      This study provides valuable mechanistic insight into the mutually exclusive distributions of the histone variant H2A.Z and DNA methylation by testing two hypotheses: (i) that DNA methylation destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin remodeling complexes. Through a series of well-designed and carefully executed experiments, findings are presented in support of both hypotheses. However, the evidence in support of either hypothesis is incomplete, so that the proposed mechanisms underlying the enrichment of H2A.Z on unmethylated DNA remain somewhat speculative.

      We would like to thank the editor and reviewers for their critical assessments of our manuscript. While we do acknowledge the limitations of our work, we believe that our results provide important mechanistic insights into the long-standing question of how H2A.Z is preferentially enriched in hypomethylated genomic DNA regions. First, our structural and biochemical data suggest that DNA methylation increases the openness and physical accessibility of H2A.Z, albeit the effect is relatively subtle and is sequence-dependent. Second, using Xenopus egg extracts and synthetic DNA templates, we provide the first clear and direct evidence that DNA methylation-sensitive H2A.Z deposition is due to the H2A.Z chaperone SRCAP-C, corroborated by our discovery that SRCAP-C binding to DNA is suppressed by DNA methylation. Although the molecular details by which DNA methylation inhibits binding of SRCAP-C is an important area of future study, in our current manuscript, we do provide evidence that directly links the presence of SRCAP-C to the establishment of the DNA methylation/H2A.Z antagonism in a physiological system. Thanks to criticisms by the reviewers, we realized that we did not clearly state in our Abstract that the impact of DNA methylation on intrinsic H2A.Z nucleosome stability is relatively subtle, although we did explain these observations and limitations in the main text. In our revised manuscript, we are willing to edit the text to better clarify the criticisms raised by the reviewers.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors considered the mechanism underlying previous observations that H2A.Z is preferentially excluded from methylated DNA regions. They considered two non-mutually exclusive mechanisms. First, they tested the hypothesis that nucleosomes containing both methylated DNA and H2A.Z might be intrinsically unstable due to their structural features. Second, they explored the possibility that DNA methylation might impede SRCAP-C from efficiently depositing H2A.Z onto these DNA methylated regions.

      Their structural analyses revealed subtle differences between H2A.Z-containing nucleosomes assembled on methylated versus unmethylated DNA. To test the second hypothesis, the authors allowed H2A.Z assembly on sperm chromatin in Xenopus egg extracts and mapped both H2A.Z localization and DNA methylation in this transcriptionally inactive system. They compared these data with corresponding maps from a transcriptionally active Xenopus fibroblast cell line. This comparison confirmed the preferential deposition or enrichment of H2A.Z on unmethylated DNA regions, an effect that was much more pronounced in the fibroblast genome than in sperm chromatin. Furthermore, nucleosome assembly on methylated versus unmethylated DNA, along with SRCAP-C depletion from Xenopus egg extracts, provided a means to test whether SRCAP-C contributes to the preferential loading of H2A.Z onto unmethylated DNA.

      Strengths:

      The strength and originality of this work lie in its focused attempt to dissect the unexplained observation that H2A.Z is excluded from methylated genomic regions.

      Weaknesses:

      The study has two weaknesses. First, although the authors identify specific structural effects of DNA methylation on H2A.Z-containing nucleosomes, they do not provide evidence demonstrating that these structural differences lead to altered histone dynamics or nucleosome instability. Second, building on the elegant work of Berta and colleagues (cited in the manuscript), the authors implicate SRCAP-C in the selective deposition of H2A.Z at unmethylated regions. Yet the role of SRCAP-C appears only partial, and the study does not address how the structural or molecular consequences of DNA methylation prevent efficient H2A.Z deposition. Finally, additional plausible mechanisms beyond the two scenarios the authors considered are not investigated or discussed in the manuscript.

      Although we acknowledge the limitations of our study and are willing to expand our discussion to more thoroughly discuss these points, we believe our manuscript provides several important mechanistic insights which this reviewer may not have fully appreciated.

      Our first conclusion that H2A.Z nucleosomes on methylated DNA are more open and accessible compared to their unmethylated counterparts is supported by both our cryo-EM study and the restriction enzyme accessibility assay. Although the physical effect of DNA methylation is relatively subtle and is likely sequence dependent, as we clearly noted within the manuscript, the difference does exist and is valuable information for the chromatin field at large to consider.

      The second major conclusion of our manuscript is that SRCAP-C exhibits preferential binding to unmethylated DNA over methylated DNA, and that SRCAP-C represents the major mechanism that can explain the biased deposition of H2A.Z to unmethylated DNA in Xenopus egg extracts. Furthermore, our experiments using Xenopus egg extract clearly demonstrated that H2A.Z is deposited by both DNA-methylation sensitive and insensitive mechanisms. Depletion of SRCAP-C almost completely eliminated the levels of DNA-methylation-sensitive H2A.Z deposition and reduced the total level of H2A.Z on chromatin to less than half of that seen in non-depleted extract. This result demonstrated that DNA methylation-sensitive H2A.Z loading is primarily regulated by SRCAP-C, at least in our experimental context where transcription, replication, and other epigenetic modifications are not involved. It is likely that additional mechanisms do further contribute, implicated by our sequencing experiments, particularly at regions with active transcription, and we have noted these possibilities and the rationale for their existence in the Discussion.

      Our study also suggests that a SRCAP-independent, DNA methylation-insensitive mechanism of H2A.Z loading exists, which we suspect to be mediated by Tip60-C. In line with this possibility, our data suggest that Tip60-C binds DNA in a DNA methylation-insensitive manner in Xenopus egg extract. Since antibodies to deplete Tip60-C from Xenopus egg extract are currently unavailable, we were unable to directly test that hypothesis and decided not to include Tip60-C into our final model as we lacked experimental evidence for its role. However, whether or not Tip60-C is the complex responsible for the DNA methylation-insensitive pathway does not influence our final conclusion that SRCAP-C plays a major role in DNA methylation-sensitive H2A.Z loading. We are planning to edit our manuscript to more comprehensively discuss these points.

      Please note that while Berta et al reported that DNA methylation increases at H2A.Z loci in tumors defective in SRCAP-C, they selected those regions based off where H2A.Z is typically enriched within normal tissues (Berta et al., 2021). They did not show data indicating whether H2A.Z is still retained specifically at those analyzed loci upon mutation of SRCAP-C subunits. Thus, although we greatly admire their work and are pleased that many of our findings align with theirs, their paper did not directly address whether SRCAP-C itself differentiates between DNA methylation status nor the impact that has on H2A.Z and DNA methylation colocalization. In contrast, our Xenopus egg extract system, where de novo methylation is undetectable (Nishiyama et al., 2013; Wassing et al., 2024) offers a unique opportunity to examine the direct impact of DNA methylation on H2A.Z deposition using controlled synthetic DNA substrates. Corroborated with our demonstration that DNA binding of SRCAP-C is suppressed by DNA methylation, we believe that our manuscript provides a specific mechanism that can explain the preferential deposition of H2A.Z at hypomethylated genomic regions.

      Reviewer #2 (Public review):

      This manuscript aims to elucidate the mechanistic basis for the long-standing observation that DNA methylation and the histone variant H2A.Z occupy mutually exclusive genomic regions. The authors test two hypotheses: (i) that DNA methylation intrinsically destabilizes H2A.Z nucleosomes, thereby preventing H2A.Z retention, and (ii) that DNA methylation suppresses H2A.Z deposition by ATP-dependent chromatin-remodelling complexes. However, neither hypothesis is rigorously addressed. There are experimental caveats, issues with data interpretation, and conclusions that are not supported by the data. Substantial revision and additional experiments, including controls, would be required before mechanistic conclusions can be drawn. Major concerns are as follows:

      We appreciate the critical assessment of our manuscript by this reviewer. Although we acknowledge the limitations of our study and will revise the manuscript to better describe them, we would like to respectfully argue against the statement that our "conclusions […] are not supported by the data".

      (1) The cryo-EM structure of methylated H2A.Z nucleosomes is insufficiently resolved to address the central mechanistic question: where the methylated CpGs are located relative to DNA-histone contact points and how these modifications influence H2A.Z nucleosome structure. The structure provides no mechanistic insights into methylation-induced destabilization.

      The fact that the DNA resolution in the methylated structure was not high enough to resolve the positions of methylated CpGs despite a high overall resolution of 2.78 Å implies that 1) the Sat2R-P DNA was not as stably registered as the 601L sequence, requiring us to create two alternative Sat2R-P atomic models to account for the variable positioning in our samples, and 2) that the presence of DNA methylation increases that positional variability. We understand that one may prefer to see highly resolved density around each methylation mark, but we do believe that our inability to accomplish that is actually a feature rather than a weakness and has important biological implications. The decrease in local DNA resolution on the methylated Sat2R-P structure compared to its unmethylated counterpart is meaningful and suggests to us that DNA methylation weakens overall DNA wrapping and positioning on the nucleosome, supported by the increased flexibility seen at the linker DNA ends as well as an increase in the population of highly shifted nucleosomes amongst the methylated particles. Additionally, one major view in the DNA methylation/nucleosome stability field is that the presence of DNA methylation can make DNA stiffer and harder to bend, causing opening and destabilization of nucleosomes (Ngo et al., 2016). The increased opening of linker DNA ends and accessibility of methylated H2A.Z nucleosomes in our hands also aligns with such an idea, again suggesting decreased histone-DNA contact stability on methylated DNA substrates. We plan to revise the writing in our manuscript to better reflect these ideas.

      The experimental system also lacks physiological relevance. The template DNA sequence is artificial, despite the existence of well-characterised native genomic sequences for which DNA methylation is known to inhibit H2A.Z incorporation. Alternatively, there are a number of studies examining the effect of DNA methylation on nucleosome structure, stability, DNA unwrapping, and positioning. Choosing one of these DNA sequences would have at least allowed a direct comparison with a canonical nucleosome. Indeed, a major omission is the absence of a cryo-EM structure of a canonical nucleosome assembled on the same DNA template - this is essential to assess whether the observed effects are H2A.Z-specific.

      The reviewer raises a fair question about whether canonical H2A would experience the same DNA methylation-dependent structural effects. We had considered solving the H2A structures, however, ultimately decided against it for a few reasons. First, there already exists crystal structures of canonical H2A nucleosomes using a DNA sequence highly similar to our Sat2R-P with and without the presence of DNA methylation (PDB: 5CPI and 5CPJ). The authors of this study did not see any physical differences present in their structures (Osakabe et al., 2015). Additionally, we had included canonical H2A conditions within our restriction enzyme accessibility assay and did not see a significant impact of DNA methylation on those samples (Fig 3). Because of the previous report and our own negative data, we expected that only limited additional insights would be obtained from the canonical H2A structures and decided not to pursue that analysis.

      One of the primary reasons we chose the Sat2R-P sequence was, as noted above, that there already was a published study examining how DNA methylation affects nucleosome structure using a variant of this sequence which we could compare to our results, as the reviewer has suggested. We did have to modify the sequence, namely by making it palindromic, in order to increase the final achievable resolution. We viewed the Sat2R-P sequence as an attractive candidate because it is physiologically relevant; the initial sequence was taken directly from human satellite II. Several modifications were made for technical reasons, including making the sequence palindromic as described above and also ensuring that each CpG is recognizable by a methylation-sensitive restriction enzyme so that we could be certain about the degree of methylation on our substrates. These practical concerns outweighed the necessity of maintaining a strict physiological sequence to us. However, we still believe the final Sat2R-P more closely mimics physiological sequences than Widom 601. Additionally, human satellite II is a highly abundant sequence in the human genome that is known to undergo large methylation changes on the onset of many disorders, like cancer, as well as during aging. Thus, there are interesting biological questions surrounding how the methylation state of this particular sequence affects chromatin structure. Furthermore, it has been reported that satellite II is devoid of H2A.Z (Capurso et al., 2012). Beyond those reasons, the satellite II sequence is generally interesting to our lab because we have been studying genes involved in ICF syndrome, where hypomethylation of satellite II sequences forms one of the hallmarks of this disorder (Funabiki et al., 2023; Jenness et al., 2018; Wassing et al., 2024). We understand that sequence context plays a large role in nucleosome wrapping and stability. This is why we strived to test multiple sequences in each of our assays. We do agree that it would be interesting to use DNA sequences where H2A.Z binding has already been described to be affected in a DNA methylation-dependent manner, forming an exciting future study to pursue.

      Furthermore, the DNA template is methylated at numerous random CpG sites. The authors' argument that only the global methylation level is relevant is inconsistent with the literature, which clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent. Not all CpG sites contribute equally to nucleosome stability or unwrapping, and this critical factor is not considered.

      We did not argue that only the global methylation level is relevant. We also would appreciate it if the reviewer could provide specific references that "clearly demonstrates that methylation effects on canonical nucleosomes are position-dependent". We are aware of a series of studies conducted by Chongli Yuan's group, including one testing the effect of placing methylated CpGs at different positions along the Widom 601 sequence. In that study (Jimenez-Useche et al., 2013), they did find that positioning of mCpGs has differential impacts on the salt resistance of the nucleosomes, with 5 tandem mCpG copies at the dyad causing the most dramatic nucleosome opening whereas having mCpGs only at the DNA major grooves, but not elsewhere, increased nucleosome stability. However, they did also find that methylation of the original Widom 601 sequence also caused destabilization, albeit to a lesser degree, and another study by the same group (Jimenez-Useche et al., 2014) also found that CpG methylation decreased nucleosome-forming ability for all tested variants of the Widom 601 sequence, regardless of CpG density or positioning.

      Other studies monitored how distribution of methylated CpGs correlates with nucleosome positioning (Collings et al., 2013; Davey et al., 1997; Davey et al., 2004). However, these studies assessed the sequence-dependent effects specifically on nucleosome assembly during in vitro salt dialysis, which is a different physical process than the one our manuscript focuses on, especially when considering the fact that H2A.Z is deposited onto preassembled H2A-nucleosome. Our cryo-EM analysis examines the structural changes induced by DNA methylation on already formed nucleosomes rather than the process of formation. Thus, probing accessibility changes using a restriction enzyme was the more appropriate biochemical assay to verify our structures.

      We do very much agree that DNA context can influence nucleosome stability under different conditions. A study of molecular dynamics simulations concluded that the "combination of overall DNA geometrical and shape properties upon methylation" makes nucleosomes resistant to unwrapping (Li et al., 2022), while another modeling study suggests that DNA methylation impacts nucleosome stability in a manner dependent on DNA sequence, where "[s]trong binding is weakened and weak binding is strengthened" (Minary and Levitt, 2014). While G/C-dinucleotides are preferentially placed at major groove-inward positions in the nucleosomes in vivo (Chodavarapu et al., 2010; Segal et al., 2006) and G/C-rich segments are excluded from major groove-outward positions in Widom 601-like nucleosomes (Chua et al., 2012), methylated CpG dinucleotides are preferably, if not exclusively, located at major groove-outward positions in vivo. Mechanisms behind this biased mCpG positioning on the nucleosome remain speculative, likely caused by a combination of multiple factors, but the fact that we did not observe clear structural impacts using the Widom 601L sequence, where mCpGs are located at the major groove-outward and -inward positions ((Chua et al., 2012) and our structure), deserves a space for discussion. On the other hand, positioning of mCpG on satellite II-derived sequences that we used in this study was based on a physiological sequence, and thus it may not be appropriate to say that those CpGs are placed at multiple "random" positions. Although we decided not to discuss the position of 5mC on our Sat2R nucleosome structure due to ambiguous base assignments, neither of our two atomic models is consistent with an idea that DNA methylation repositions the CpG to the outward major grooves. As the potential contribution of how DNA methylation affects the nucleosome structure via modulating DNA stiffness has been extensively studied (Choy et al., 2010; Li et al., 2022; Ngo et al., 2016; Perez et al., 2012), we believe that it is appropriate to consider overall DNA properties along the whole DNA sequence, though we are willing to discuss potential positional effects in the revised manuscript.

      Perhaps one of the most important points that we did not emphasize enough in our original manuscript was that in contrast to the subtle intrinsic effect of DNA methylation that was DNA sequence dependent, we observed SRCAP-dependent preferential H2A.Z deposition to unmethylated DNA over methylated DNA in both 601 and satellite II DNAs. In the revised manuscript, we will make the value of comparative studies on 601 and satellite II in two distinct mechanisms.

      Finally, and most importantly, the reported increase in accessibility of the methylated H2A.Z nucleosome is negligible compared with the much larger intrinsic DNA accessibility of the unmethylated H2A.Z nucleosome. These data do not support the authors' hypothesis and contradict the manuscript's conclusions. Claims that methylated H2A.Z nucleosomes are "more open and accessible" must therefore be removed, and the title is misleading, given that no meaningful impact of DNA methylation on H2A.Z nucleosome stability is demonstrated.

      We respectfully disagree with this reviewer's criticism. We investigated the potential impact of DNA methylation on nucleosome stability to the best of our abilities through complementary assays and reported our observations. The effect of DNA methylation is smaller than the difference between H2A.Z and H2A, but we were able to see an effect. It is also not uncommon for small differences to have functional impacts in biological systems. We agree that further testing is required to determine whether this subtle effect is functionally important, and it remains the subject of future research due to the many technical challenges associated with addressing said question. We would like to note that 18 years have passed since Daniel Zilberman first reported the antagonistic relationship between H2AZ and DNA methylation (Zilberman et al., 2008) but very few studies have since directly tested specific mechanistic hypotheses. We believe that our study lays the groundwork for exciting future investigation that better elucidates the pathways that contribute to this antagonism and will have meaningful impacts on the field in general. However, thanks to the reviewer's criticism, we realized that we did not clearly state in the Abstract the relatively subtle effect of DNA methylation on the intrinsic H2A.Z nucleosome stability. Therefore, we will accordingly revise the Abstract to make this point clearer.

      (2) The cryo-EM structures of methylated and unmethylated 601L H2A.Z nucleosomes show no detectable differences. As presented, this negative result adds little value. If anything, it reinforces the point that the positional context of CpG methylation is critical, which the manuscript does not consider.

      We believe the inclusion and factual reporting of negative data is important for the scientific community as one of the major issues currently in biology research is biased omission of negative data. We considered eLife as a venue to publish this work for this reason. We understand that the reviewer believes our 601L structures may detract from the overall message of our manuscript. We believe this data rather emphasizes the importance of DNA sequence context, something that the reviewer also rightfully notes. It is standard practice in the nucleosome field to use the Widom 601 sequence, along with its variants. Our experience has shown that use of an artificially strong positioning sequence may mask weaker physical effects that could play a physiological role. Thus, we were careful to validate all further assays with multiple DNA sequences and believed it important to report these sequence-dependent effects on nucleosome structure.

      (3) Very little H3 signal coincides with H2A.Z at TSSs in sperm pronuclei, yet this is neither explained nor discussed (Supplementary Figure 10D). The authors need to clarify this.

      Our H3 signal, which represents the global nucleosome population, is more broadly distributed across the genome than H2A.Z, which is known to localize at specific genomic sites. Since both histone types were sequenced to similar read depths, H3 peaks are generally shallower than H2A.Z and peak heights cannot be directly compared (i.e. they should be represented in separate appropriate data ranges).

      (4) In my view, the most conceptually important finding is that H2A.Z-associated reads in sperm pronuclei show ~43% CpG methylation. This directly contradicts the model of strict mutual exclusivity and suggests that the antagonism is context-dependent. Similarly, the finding that the depletion of SRCAP reduces H2A.Z deposition only on unmethylated templates is also very intriguing. Collectively, these result warrants further investigation (see below).

      (5) Given that H2A.Z is located at diverse genomic elements (e.g., enhancers, repressed gene bodies, promoters), the manuscript requires a more rigorous genomic annotation comparing H2A.Z occupancy in sperm pronuclei versus XTC-2 cells. The authors should stratify H2A.Z-DNA methylation relationships across promoters, 5′UTRs, exons, gene bodies, enhancers, etc., as described in Supplementary Figure 10A.

      (below is response to (4) and (5) together)

      We agree that the substantial presence of co-localized H2A.Z and DNA methylation specifically in the sperm pronuclei samples and the changes in pattern between nuclear types are highly interesting and require further investigation. However, we faced technical challenges in our sequencing experiments that made us refrain from conducting a more detailed analysis for fear of over-interpreting potential artifacts. These challenges mainly stemmed from the difficulties in collecting enough material from Xenopus egg extracts and Tn5’s innate bias towards accessible regions of the genome. Because of this, open regions of the genome tend to be overrepresented in our data (as noted in our Discussion), making it challenging to rigorously compare methylation profiles and H2A.Z/H3 associated genomic elements.

      While the degree of separation seems to be dependent on nuclei type, we still believe the antagonism exists in both the sperm pronuclei and XTC-2 samples when comparing H2A.Z methylation profiles to the corresponding H3 condition. Our study also demonstrates that H2A.Z is preferentially deposited to hypomethylated DNA in a manner dependent of SRCAP-C (the loss of SRCAP only reduces H2A.Z on unmethylated substrates) but an additional methylation-insensitive H2A.Z deposition mechanism also exists. We realized that this interesting point was not clearly highlighted in Abstract, so we will revise it accordingly.

      (6) Although H2A.Z accumulates less efficiently on exogenous methylated substrates in egg extract, substantial deposition still occurs (~50%). This observation directly challenges the strong antagonistic model described in the manuscript, yet the authors do not acknowledge or discuss it. Moreover, differences between unmethylated and methylated 601 DNA raise further questions about the biological relevance of the cryo-EM 601 structures.

      As depicted in Figure 6 and described in the Discussion, we clearly indicated that both methylation-sensitive and methylation-insensitive pathways exist to deposit H2A.Z within the genome. We also directly stated in our Discussion that a substantial proportion of H2A.Z colocalizes with DNA methylation both in our study as well as in previous reports, which is of major interest for future study. Additionally, we further discussed how the absence of transcription in Xenopus eggs is a likely reason for the more limited effect of DNA methylation restricting H2A.Z deposition in our egg extract system.

      As noted in our response to (2), the lack of a clear impact on our 601L structures implies that this is due to the extraordinarily strong artificial nucleosome positioning capacity of the 601 sequence and its variants. Since 601 is heavily used in chromatin biology, including within DNA methylation research, such negative data are still useful to include and publish.

      (7) The SRCAP depletion is insufficiently validated i.e., the antibody-mediated depletion of SRCAP lacks quantitative verification. A minimum of three biological replicates with quantification is required to substantiate the claims.

      We are willing to address this concern. However, please note that our data showed that methylation-dependent H2A.Z deposition is almost completely erased upon SRCAP depletion, indicating functionally effective depletion. The specificity of the custom antibody against Xenopus SRCAP was verified by mass spectrometry. Additionally, we have obtained the same effect using another commercially available SRCAP antibody, though we did not include this preliminary result in our original manuscript. Due to its relatively low abundance and high molecular weight, SRCAP western blot signals are weak, making it challenging to quantify the degree of depletion. We also believe that the value of quantification in this context, with the points noted above, is rather limited. In the past, our lab has published papers on depleting the H3T3 kinase Haspin from Xenopus egg extracts (Ghenoiu et al., 2013; Kelly et al., 2010) but were never able to detect Haspin via western blot. This protein was only detected by mass spectrometry specifically on nucleosome array beads with H3K9me3 (Jenness et al., 2018). However, depletion of Haspin was readily monitored by erasure of H3T3ph, the enzymatic product of Haspin. In these experiments, it was impossible, and not critical, to quantitatively monitor the depletion of Haspin protein in order to investigate its molecular functions. Similarly, in this current study, the important fact is that depletion of SRCAP suppressed methylation-sensitive H2A.Z deposition and quantifying the degree of SRCAP depletion would not have a major impact on this conclusion.

      (8) It appears that the role of p400-Tip60 has been completely overlooked. This complex is the second major H2A.Z deposition complex. Because p400 exhibits DNA methylation-insensitive binding (Supplementary Figure 14), it may account for the deposition of H2A.Z onto methylated DNA. This possibility is highly significant and must be addressed by repeating the key experiments in Figure 5 following p400-Tip60 depletion.

      We are aware that the Tip60 complex is a very likely candidate for mediating DNA methylation-insensitive H2A.Z deposition, which is why we tested whether DNA binding of p400 is methylation sensitive. Therefore, the reviewer's statement that we "completely overlooked" Tip60-C’s role does not fairly report on our efforts. We wished to test the potential contribution of Tip60-C, but, unfortunately, the antibodies we currently have available to us were not successful in depleting the complex from egg extract. Since we had no direct experimental evidence indicating the role Tip60-C plays, we decided to take a conservative approach to our model and leave the methylation-insensitive pathway as mediated by something still unidentified. While further investigating Tip60-C’s contribution to this pathway is of definite value, we do not believe that it impacts our major conclusion that SRCAP-C is the main mediator responsible for H2A.Z deposition on unmethylated DNA and thus remains a subject for future study.

      (9) The manuscript repeatedly states that H2A.Z nucleosomes are intrinsically unstable; however, this is an oversimplification. Although some DNA unwrapping is observed, multiple studies show that H3/H4 tetramer-H2A.Z/H2B interactions are more stable (important recent studies include the following: DOI: 10.1038/s41594-021-00589-3; 10.1038/s41467-021-22688-x; and reviewed in 10.1038/s41576-024-00759-1).

      We understand that the H2A.Z stability field is highly controversial. We have introduced the many conflicting reports that have been published in the field but can further expand on the controversies if desired. We also understand that the term “nucleosome stability” is broad and encompasses many physical aspects. As noted in a prior response, we will better specify our use of the term within the manuscript. In our assays, we are most focused on the DNA wrapping stability of the nucleosome and have consistently seen in our hands that H2A.Z nucleosomes are much more open and accessible compared to canonical H2A on satellite II-derived sequences, regardless of methylation status. However, we do understand that many groups have observed the opposite findings while others have obtained results similar to us. We reported on our findings of the general H2A.Z stability with the hopes to help clarify some of the field’s controversies.

      In summary, the current manuscript does not present a convincing mechanistic explanation for the antagonism between DNA methylation and H2A.Z. The observation that H2A.Z can substantially coexist with DNA methylation in sperm pronuclei, perhaps, should be the conceptual focus.

      We appreciate this reviewer’s advice. However, please note that the first author who led this project has already successfully defended their PhD thesis primarily based on this project, making it impractical and unrealistic to completely change the focus of this manuscript to include an entirely new avenue of research. We believe that our data provide important insights into the mechanisms by which H2A.Z is excluded from methylated DNA, particularly via the DNA methylation-sensitive binding of SRCAP-C, which has never been described before. We agree that many questions are still left unanswered, including the exact molecular mechanism behind how DNA methylation prevents SRCAP-C binding. We have preliminary data that suggest none of the known DNA-binding modules of SRCAP-C, including ZNHIT1, by themselves can explain this sensitivity. This implies that domain dissection in the context of the holo-SRCAP complex is required to fully address this question. We believe this represents a very exciting future avenue of study; however, it does not negate our finding that SRCAP-C itself is important for maintaining the DNA methylation/H2A.Z antagonism. Therefore, we respectfully disagree with this reviewer's summary statement, which misleadingly undermines the impact of our work.

      Reviewer #3 (Public review):

      Summary:

      Histone variant H2A.Z is evolutionarily conserved among various species. The selective incorporation and removal of histone variants on the genome play crucial roles in regulating nuclear events, including transcription. Shih et al. aimed to address antagonistic mechanisms between histone variant H2A.Z deposition and DNA methylation. To this end, the authors reconstituted H2A.Z nucleosomes in vitro using methylated or unmethylated human satellite II DNA sequence and examined how DNA methylation affects H2A.Z nucleosome structure and dynamics. The cryo-EM analysis revealed that DNA methylation induces a more open conformation in H2A.Z nucleosomes. Consistent with this, their biochemical assays showed that DNA methylation subtly increases restriction enzyme accessibility in H2A.Z nucleosomes compared with canonical H2A nucleosomes. The authors identified genome-wide profiles of H2A.Z and DNA methylation using genomic assays and found their unique distribution between Xenopus sperm pronuclei and fibroblast cells. Using Xenopus egg extract systems, the authors showed SRCAP complex, the chromatin remodelers for H2A.Z deposition, preferentially deposit H2A.Z on unmethylated DNA.

      Strengths:

      The study is solid, and most conclusions are well-supported. The experiments are rigorously performed, and interpretations are clear. The study presents a high-resolution cryo-EM structure of human H2A.Z nucleosome with methylated DNA. The discovery that the SRCAP complex senses DNA methylation is novel and provides important mechanistic insight into the antagonism between H2A.Z and DNA methylation.

      We are grateful that this reviewer recognizes the importance of our study.

      Weaknesses:

      The study is already strong, and most conclusions are well supported. However, it can be further strengthened in several ways.

      (1) It is difficult to interpret how DNA methylation alters the orientation of the H4 tail and leads to the additional density on the acidic patch. The data do not convincingly support whether DNA methylation enhances interactions with H2A.Z mono-nucleosomes, nor whether this effect is specific to methylated H2A.Z nucleosomes.

      The altered H4 tail orientation and extra density seen on the acidic patch were incidental findings that we thought could be interesting for the field to be aware of but decided not to follow up on as there were other structural differences that were more directly related to our central question. We do believe that the above two differences are linked to each other because we used a highly purified and homogenous sample for cryo-EM analysis and the H4 tail/acidic patch interaction is a well characterized contact that mediates inter-nucleosome interactions. Additionally, other groups have reported that the presence of DNA methylation causes condensation of both chromatin and bare DNA (cited within our manuscript), though the mechanics behind this phenomenon remain to be elucidated. We believed that our structure data may also align with those findings. However, the reviewer is fair in pointing out that we do not provide further experimental evidence in verifying the existence of these increased interactions. We can revise our writing to clarify that these points are currently hypotheses rather than validated results.

      (2) It remains unclear whether DNA methylation alters global H2A.Z nucleosome stability or primarily affects local DNA end flexibility. Moreover, while the authors showed locus-specific accessibility by HinfI digestion, an unbiased assay such as MNase digestion would strengthen the conclusions.

      We would like to thank the reviewer for bringing up these issues. Although our current data cannot explicitly clarify these possibilities, we favor an idea that DNA methylation specifically alters histone to DNA contacts and that this effect is felt globally across the entire nucleosome rather than only at specific locations. The intrinsic flexibility of linker DNA ends means that that region tends to exhibit the greatest differences under different physical influences, hence the focus on characterizing that area; flexibility of a thread on a spool is most pronounced at the ends. However, we also found that the DNA backbone of H2A.Z on methylated DNA had a lower local resolution compared to its unmethylated counterpart, despite that structure having a higher global resolution, which suggested to us that DNA positioning along the nucleosome is overall weaker under the presence of DNA methylation. This is corroborated by the increased population of open/shifted structures in our classification analysis. The reviewer raises a fair point about the use of a specific restriction enzyme versus MNase. We agree that our accessibility assay is highly influenced by the position of the restriction site and have previously seen that moving the cut site too close to the linker DNA end will abolish any DNA methylation-dependent differences. We did initially attempt an MNase digestion-based assay, but the data were not as reproducible as with the use of a specific restriction enzyme. We do not know the reason behind this irreproducibility though we believe that the processivity of MNase could make it difficult to capture subtle effects like those induced by DNA methylation on already highly accessible H2A.Z nucleosomes. Overall, while we believe that DNA methylation does exert a physical effect, its subtlety may explain the many contradictory studies present within the DNA methylation and nucleosome stability field.

      References

      Berta, D.G., H. Kuisma, N. Valimaki, M. Raisanen, M. Jantti, A. Pasanen, A. Karhu, J. Kaukomaa, A. Taira, T. Cajuso, S. Nieminen, R.M. Penttinen, S. Ahonen, R. Lehtonen, M. Mehine, P. Vahteristo, J. Jalkanen, B. Sahu, J. Ravantti, N. Makinen, K. Rajamaki, K. Palin, J. Taipale, O. Heikinheimo, R. Butzow, E. Kaasinen, and L.A. Aaltonen. 2021. Deficient H2A.Z deposition is associated with genesis of uterine leiomyoma. Nature. 596:398–403.

      Capurso, D., H. Xiong, and M.R. Segal. 2012. A histone arginine methylation localizes to nucleosomes in satellite II and III DNA sequences in the human genome. BMC Genomics. 13:630.

      Chodavarapu, R.K., S. Feng, Y.V. Bernatavichute, P.Y. Chen, H. Stroud, Y. Yu, J.A. Hetzel, F. Kuo, J. Kim, S.J. Cokus, D. Casero, M. Bernal, P. Huijser, A.T. Clark, U. Kramer, S.S. Merchant, X. Zhang, S.E. Jacobsen, and M. Pellegrini. 2010. Relationship between nucleosome positioning and DNA methylation. Nature. 466:388–392.

      Choy, J.S., S. Wei, J.Y. Lee, S. Tan, S. Chu, and T.H. Lee. 2010. DNA methylation increases nucleosome compaction and rigidity. J Am Chem Soc. 132:1782–1783.

      Chua, E.Y., D. Vasudevan, G.E. Davey, B. Wu, and C.A. Davey. 2012. The mechanics behind DNA sequence-dependent properties of the nucleosome. Nucleic Acids Res. 40:6338–6352.

      Collings, C.K., P.J. Waddell, and J.N. Anderson. 2013. Effects of DNA methylation on nucleosome stability. Nucleic Acids Res. 41:2918–2931.

      Davey, C., S. Pennings, and J. Allan. 1997. CpG methylation remodels chromatin structure in vitro. J Mol Biol. 267:276–288.

      Davey, C.S., S. Pennings, C. Reilly, R.R. Meehan, and J. Allan. 2004. A determining influence for CpG dinucleotides on nucleosome positioning in vitro. Nucleic Acids Res. 32:4322–4331.

      Funabiki, H., I.E. Wassing, Q. Jia, J.D. Luo, and T. Carroll. 2023. Coevolution of the CDCA7-HELLS ICF-related nucleosome remodeling complex and DNA methyltransferases. Elife. 12.

      Ghenoiu, C., M.S. Wheelock, and H. Funabiki. 2013. Autoinhibition and polo-dependent multisite phosphorylation restrict activity of the histone h3 kinase haspin to mitosis. Mol Cell. 52:734–745.

      Jenness, C., S. Giunta, M.M. Muller, H. Kimura, T.W. Muir, and H. Funabiki. 2018. HELLS and CDCA7 comprise a bipartite nucleosome remodeling complex defective in ICF syndrome. Proc Natl Acad Sci U S A. 115:E876–E885.

      Jimenez-Useche, I., J. Ke, Y. Tian, D. Shim, S.C. Howell, X. Qiu, and C. Yuan. 2013. DNA methylation regulated nucleosome dynamics. Sci Rep. 3:2121.

      Jimenez-Useche, I., D. Shim, J. Yu, and C. Yuan. 2014. Unmethylated and methylated CpG dinucleotides distinctively regulate the physical properties of DNA. Biopolymers. 101:517–524.

      Kelly, A.E., C. Ghenoiu, J.Z. Xue, C. Zierhut, H. Kimura, and H. Funabiki. 2010. Survivin reads phosphorylated histone H3 threonine 3 to activate the mitotic kinase Aurora B. Science. 330:235–239.

      Li, S., Y. Peng, D. Landsman, and A.R. Panchenko. 2022. DNA methylation cues in nucleosome geometry, stability and unwrapping. Nucleic Acids Res. 50:1864–1874.

      Minary, P., and M. Levitt. 2014. Training-free atomistic prediction of nucleosome occupancy. Proc Natl Acad Sci U S A. 111:6293–6298.

      Ngo, T.T., J. Yoo, Q. Dai, Q. Zhang, C. He, A. Aksimentiev, and T. Ha. 2016. Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun. 7:10813.

      Nishiyama, A., L. Yamaguchi, J. Sharif, Y. Johmura, T. Kawamura, K. Nakanishi, S. Shimamura, K. Arita, T. Kodama, F. Ishikawa, H. Koseki, and M. Nakanishi. 2013. Uhrf1-dependent H3K23 ubiquitylation couples maintenance DNA methylation and replication. Nature. 502:249–253.

      Osakabe, A., F. Adachi, Y. Arimura, K. Maehara, Y. Ohkawa, and H. Kurumizaka. 2015. Influence of DNA methylation on positioning and DNA flexibility of nucleosomes with pericentric satellite DNA. Open Biol. 5.

      Perez, A., C.L. Castellazzi, F. Battistini, K. Collinet, O. Flores, O. Deniz, M.L. Ruiz, D. Torrents, R. Eritja, M. Soler-Lopez, and M. Orozco. 2012. Impact of methylation on the physical properties of DNA. Biophys J. 102:2140–2148.

      Segal, E., Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I.K. Moore, J.P. Wang, and J. Widom. 2006. A genomic code for nucleosome positioning. Nature. 442:772–778.

      Wassing, I.E., A. Nishiyama, R. Shikimachi, Q. Jia, A. Kikuchi, M. Hiruta, K. Sugimura, X. Hong, Y. Chiba, J. Peng, C. Jenness, M. Nakanishi, L. Zhao, K. Arita, and H. Funabiki. 2024. CDCA7 is an evolutionarily conserved hemimethylated DNA sensor in eukaryotes. Sci Adv. 10:eadp5753.

      Zilberman, D., D. Coleman-Derr, T. Ballinger, and S. Henikoff. 2008. Histone H2A.Z and DNA methylation are mutually antagonistic chromatin marks. Nature. 456:125–129.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The Reviewer structured their review such that their first two recommendations specifically concerned the two major weaknesses they viewed in the initial submission. For clarity and concision, we have copied their recommendations to be placed immediately following their corresponding points on weaknesses.

      Strengths:

      Studying prediction error from the lens of network connectivity provides new insights into predictive coding frameworks. The combination of various independent datasets to tackle the question adds strength, including two well-powered fMRI task datasets, resting-state fMRI interpreted in relation to behavioral measures, as well as EEG-fMRI.

      Weaknesses:

      Major:

      (R1.1) Lack of multiple comparisons correction for edge-wise contrast:

      The analysis of connectivity differences across three levels of prediction error was conducted separately for approximately 22,000 edges (derived from 210 regions), yet no correction for multiple comparisons appears to have been applied. Then, modularity was applied to the top 5% of these edges. I do not believe that this approach is viable without correction. It does not help that a completely separate approach using SVMs was FDR-corrected for 210 regions.

      [Later recommendation] Regarding the first major point: To address the issue of multiple comparisons in the edge-wise connectivity analysis, I recommend using the Network-Based Statistic (NBS; Zalesky et al., 2010). NBS is well-suited for identifying clusters (analogous to modules) of edges that show statistically significant differences across the three prediction error levels, while appropriately correcting for multiple comparisons.

      Thank you for bringing this up. We acknowledge that our modularity analysis does not evaluate statistical significance. Originally, the modularity analysis was meant to provide a connectome-wide summary of the connectivity effects, whereas the classification-based analysis was meant to address the need for statistical significance testing. However, as the reviewer points out, it would be better if significance were tested in a manner more analogous to the reported modules. As they suggest, we updated the Supplemental Materials (SM) to include the results of Network-Based Statistic analysis (SM p. 1-2):

      “(2.1) Network-Based Statistic

      Here, we evaluate whether PE significantly impacts connectivity at the network level using the Network-Based Statistic (NBS) approach.[1] NBS relied on the same regression data generated for the main-text analysis, whereby a regression is performed examining the effect of PE (Low = –1, Medium = 0, High = +1) on connectivity for each edge. This was done across the connectome, and for each edge, a z-score was computed. For NBS, we thresholded edges to |Z| > 3.0, which yielded one large network cluster, shown in Figure S3. The size of the cluster – i.e., number of edges – was significant (p < .05) per a permutation-test using 1,000 random shuffles of the condition data for each participant, as is standard.[1] These results demonstrate that the networklevel effects of PE on connectivity are significant. The main-text modularity analysis converts this large cluster into four modules, which are more interpretable and open the door to further analyses”.

      We updated the Results to mention these findings before describing the modularity analysis (p. 8-9):

      “After demonstrating that PE significantly influences brain-wide connectivity using Network-Based Statistic analysis (Supplemental Materials 2.1), we conducted a modularity analysis to study how specific groups of edges are all sensitive to high/low-PE information.”

      (R1.2) Lack of spatial information in EEG:

      The EEG data were not source-localized, and no connectivity analysis was performed. Instead, power fluctuations were averaged across a predefined set of electrodes based on a single prior study (reference 27), as well as across a broader set of electrodes. While the study correlates these EEG power fluctuations with fMRI network connectivity over time, such temporal correlations do not establish that the EEG oscillations originate from the corresponding network regions. For instance, the observed fronto-central theta power increases could plausibly originate from the dorsal anterior cingulate cortex (dACC), as consistently reported in the literature, rather than from a distributed network. The spatially agnostic nature of the EEG-fMRI correlation approach used here does not support interpretations tied to specific dorsal-ventral or anterior-posterior networks. Nonetheless, such interpretations are made throughout the manuscript, which overextends the conclusions that can be drawn from the data.

      [Later recommendation] Regarding the second major point: I suggest either adopting a source-localized EEG approach to assess electrophysiological connectivity or revising all related sections to avoid implying spatial specificity or direct correspondence with fMRI-derived networks. The current approach, which relies on electrode-level power fluctuations, does not support claims about the spatial origin of EEG signals or their alignment with specific connectivity networks.

      We thank the reviewer for this important point, which allows us to clarify the specific and distinct contributions of each imaging modality in our study. Our primary goal for Study 3 was to leverage the high temporal resolution of EEG to identify the characteristic frequency at which the fMRI-defined global connectivity states fluctuate. The study was not designed to infer the spatial origin of these EEG signals, a task for which fMRI is better suited and which we addressed in Studies 1 and 2.

      As the reviewer points out, fronto-central theta is generally associated with the dACC. We agree with this point entirely. We suspect that there is some process linking dACC activation to the identified network fluctuations – some type of relationship that does not manifest in our dynamic functional connectivity analyses – although this is only a hypothesis and one that is beyond the present scope.

      We updated the Discussion to mention these points and acknowledge the ambiguity regarding the correlation between network fluctuation amplitude (fMRI) and Delta/Theta power (EEG) (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism. Additionally, Theta is widely seen as a sign of dorsal anterior cingulate cortex activity,3 and it is unclear how to reconcile this with our claims about network fluctuations. Nonetheless, as we show with simulations (Supplemental Materials 5), a correlation between slow fMRI network fluctuations and fast EEG Delta/Theta oscillations is also consistent with a common global neural process oscillating rapidly and eliciting both measures.”

      Regarding source-localization, several papers have described known limitations of this strategy for drawing precise anatomical inferences,[4–6] and this seems unnecessary given that our fMRI analyses already provide more robust anatomical precision. We intentionally used EEG in our study for what it measures most robustly: millisecond-level temporal dynamics.

      (R1.2a)Examples of problematic language include:

      Line 134: "detection of network oscillations at fast speeds" - the current EEG approach does not measure networks.

      This is an important issue. We acknowledge that our EEG approach does not directly measure fMRI-defined networks. Our claim is inferential, designed to estimate the temporal dynamics of the large-scale fMRI patterns we identified. The correlation between our fMRI-derived fluctuation amplitude (|PA – VD|) and 3-6 Hz EEG power provides suggestive evidence that the transitions between these network states occur at this frequency, rather than being a direct measurement of network oscillations.

      To support the validity of this inference, we performed two key analyses (now in Supplemental Materials). First, a simulation study provides a proof-of-concept, confirming our method can recover the frequency of a fast underlying oscillator from slow fMRI and fast EEG data. Second, a specificity analysis shows the EEG correlation is unique to our measure of fluctuation amplitude and not to simpler measures like overall connectivity strength. These analyses demonstrate that our interpretation is more plausible than alternative explanations.

      Overall, we have revised the manuscript to be more conservative in the language employed, such as presenting alternative explanations to the interpretations put forth based on correlative/observational evidence (e.g., our modifications above described in our response to comment R1.2). In addition, we have made changes throughout the report to state the issues related to reverse inference more explicitly and to better communicate that the evidence is suggestive – please see our numerous changes described in our response to comment R3.1. For the statement that the reviewer specifically mentioned here, we revised it to be more cautious (p. 7):

      “Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.”

      (R1.2b) Line 148: "whether fluctuations between high- and low-PE networks occur sufficiently fast" - this implies spatial localization to networks that is not supported by the EEG analysis.

      Building on our changes described in our immediately prior response, we adjusted our text here to say our analyses searched for evidence consistent with the idea that the network fluctuations occur quickly rather than searching for decisive evidence favoring this idea (p. 7-8):

      “Finally, we examined rs-fMRI-EEG data to assess whether we find parallels consistent with the high/low-PE network fluctuations occurring at fast timescales suitable for the type of cognitive operations typically targeted by PE theories.”

      (R1.2c) Line 480: "how underlying neural oscillators can produce BOLD and EEG measurements" - no evidence is provided that the same neural sources underlie both modalities.

      As described above, these claims are based on the simulation study demonstrating that this is a possibility, and we have revised the manuscript overall to be clearer that this is our interpretation while providing alternative explanations.

      Reviewer #2 (Public review):

      Strengths:

      Clearly, a lot of work and data went into this paper, including 2 task-based fMRI experiments and the resting state data for the same participants, as well as a third EEG-fMRI dataset. Overall, well written with a couple of exceptions on clarity, as per below, and the methodology appears overall sound, with a couple of exceptions listed below that require further justification. It does a good job of acknowledging its own weakness.

      Weaknesses:

      (R2.1) The paper does a good job of acknowledging its greatest weakness, the fact that it relies heavily on reverse inference, but cannot quite resolve it. As the authors put it, "finding the same networks during a prediction error task and during rest does not mean that the networks' engagement during rest reflects prediction error processing". Again, the authors acknowledge the speculative nature of their claims in the discussion, but given that this is the key claim and essence of the paper, it is hard to see how the evidence is compelling to support that claim.

      We thank the reviewer for this comment. We agree that reverse inference is a fundamental challenge and that our central claim requires a particularly high bar of evidence. While no single analysis resolves this issue, our goal was to build a cumulative case that is compelling by converging on the same conclusion from multiple, independent lines of evidence.

      For our investigation, we initially established a task-general signature of prediction error (PE). By showing the same neural pattern represents PE in different contexts, we constrain the reverse inference, making it less likely that our findings are a task-specific artifact and more likely that they reflect the core, underlying process of PE. Building on this, our most compelling evidence comes from linking task and rest at the individual level. We didn't just find the same general network at rest; we showed that an individual’s unique anatomical pattern of PE-related connectivity during the task specifically predicts their own brain's fluctuation patterns at rest. This highly specific, person-by-person correspondence provides a direct bridge between an individual's task-evoked PE processing and their intrinsic, resting-state dynamics. Furthermore, these resting-state fluctuations correlate specifically with the 3-6 Hz theta rhythm—a well-established neural marker for PE.

      While reverse inference remains a fundamental limitation for many studies on resting-state cognition, the aspects mentioned above, we believe, provide suggestive evidence, favoring our PE interpretation. Nonetheless, we have made changes throughout the manuscript to be more conservative in the language we use to describe our results, to make it clear what claims are based on correlative/observational evidence, and to put forth alternative explanations for the identified effects. Please find our numerous changes detailed in our response to comment R3.1.

      (R2.2) Given how uncontrolled cognition is during "resting-state" experiments, the parallel made with prediction errors elicited during a task designed for that effect is a little difficult to make. How often are people really surprised when their brains are "at rest", likely replaying a previously experienced event or planning future actions under their control? It seems to be more likely a very low prediction error scenario, if at all surprising.

      We (and some others) take a broad interpretation of PE and believe it is often more intuitive to think about PE minimization in terms of uncertainty rather than “surprise”; the word “surprise” usually implies a sudden emotive reaction from the violation of expectations, which is not useful here.

      When planning future actions, each step of the plan is spurred by the uncertainty of what is the appropriate action given the scenario set up by prior steps. Each planned step erases some of that uncertainty. For example, you may be mentally simulating a conversation, what you will say, and what another person will say. Each step of this creates uncertainty of “what is the appropriate response?” Each reasoning step addresses contingencies. While planning, you may also uncover more obvious forms of uncertainty, sparking memory retrieval to finish it. A resting-state participant may think to cook a frozen pizza when they arrive home, but be uncertain about whether they have any frozen pizzas left, prompting episodic memory retrieval to address this uncertainty. We argue that every planning step or memory retrieval can be productively understood as being sparked by uncertainty/surprise (PE), and the subsequent cognitive response minimizes this uncertainty.

      We updated the Introduction to include a paragraph near the start providing this explanation (p. 3-4):

      “PE minimization may broadly coordinate brain functions of all sorts, including abstract cognitive functions. This includes the types of cognitive processes at play even in the absence of stimuli (e.g., while daydreaming). While it may seem counterintuitive to associate this type of cognition with PE – a concept often tied to external surprises – it has been proposed that the brain's internal generative model is continuously active.[12–14] Spontaneous thought, such as planning a future event or replaying a memory, is not a passive, low-PE process. Rather, it can be seen as a dynamic cycle of generating and resolving internal uncertainty. While daydreaming, you may be reminded of a past conversation, where you wish you had said something different. This situation contains uncertainty about what would have been the best thing to say. Wondering about what you wish you said can be viewed as resolving this uncertainty, in principle, forming a plan if the same situation ever arises again in the future. Each iteration of the simulated conversation repeatedly sparks and then resolves this type of uncertainty.”

      (R2.3)The quantitative comparison between networks under task and rest was done on a small subset of the ROIs rather than on the full network - why? Noting how small the correlation between task and rest is (r=0.021) and that's only for part of the networks, the evidence is a little tenuous. Running the analysis for the full networks could strengthen the argument.

      We thank the reviewer for this opportunity to clarify our method. A single correlation between the full, aggregated networks would be conceptually misaligned with what we aimed to assess. To test for a personspecific anatomical correspondence, it is necessary to examine the link between task and rest at a granular level. We therefore asked whether the specific parts of an individual's network most responsive to PE during the task are the same parts that show the strongest fluctuations at rest. Our analysis, performed iteratively across all 3,432 possible ROI subsets, was designed specifically to answer this question, which would be obscured by an aggregated network measure.

      We appreciate the reviewer's concern about the modest effect size (r = .021). However, this must be contextualized, as the short task scan has very low reliability (.08), which imposes a severe statistical ceiling on any possible task-rest correlation. Finding a highly significant effect (p < .001) in the face of such noisy data, therefore, provides robust evidence for a genuine task-rest correspondence.

      We updated the Discussion to discuss this point (p. 22-23):

      “A key finding supporting our interpretation is the significant link between individual differences in task-evoked PE responses and resting-state fluctuations. One might initially view the effect size of this correspondence (r = .021) as modest. However, this interpretation must be contextualized by the considerable measurement noise inherent in short task-fMRI scans; the split-half reliability of the task contrast was only .08. This low reliability imposes a severe statistical ceiling on any possible task-rest correlation. Therefore, detecting a highly significant (p < .001) relationship despite this constraint provides robust evidence for a genuine link. Furthermore, our analytical approach, which iteratively examined thousands of ROI subsets rather than one aggregated network, was intentionally granular. The goal was not simply to correlate two global measures, but to test for a personspecific anatomical correspondence – that is, whether the specific parts of an individual's network most sensitive to PE during the task are the same parts that fluctuate most strongly at rest. An aggregate analysis would obscure this critical spatial specificity. Taken together, this granular analysis provides compelling evidence for an anatomically consistent fingerprint of PE processing that bridges task-evoked activity and spontaneous restingstate dynamics, strengthening our central claim.”

      (R2.4) Looking at the results in Figure 2C, the four-quadrant description of the networks labelled for low and high PE appears a little simplistic. The authors state that this four-quadrant description omits some ROIs as motivated by prior knowledge. This would benefit from a more comprehensive justification.Which ROIs are excluded, and what is the evidence for exclusion?

      Our four-quadrant model is a principled simplification designed to distill the dominant, large-scale connectivity patterns from the complex modularity results. This approach focuses on coherent, well-documented anatomical streams while setting aside a few anatomically distant and disjoint ROIs that were less central to the main modules. This heuristic additionally unlocks more robust and novel analyses.

      The two low-PE posterior-anterior (PA) pathways are grounded in canonical processing streams. (i) The OCATL connection mirrors the ventral visual stream (the “what” pathway), which is fundamental for object recognition and is upregulated during the smooth processing of expected stimuli. (ii) The IPL-LPFC connection represents a core axis of the dorsal attention stream and the Fronto-Parietal Control Network (FPCN), reflecting the maintenance of top-down cognitive control when information is predictable; the IPL-LPFC module excludes ROIs in the middle temporal gyrus, which are often associated with the FPCN but are not covered here.

      In contrast, the two high-PE ventral-dorsal (VD) pathways reflect processes for resolving surprise and conflict. (i) The OC-IPL connection is a classic signature of attentional reorienting, where unexpected sensory input (high PE) triggers a necessary shift in attention; the OC-IPL module excludes some ROIs that are anterior to the occipital lobe and enter the fusiform gyrus and inferior temporal lobe. (ii) The ATL-LPFC connection aligns with mechanisms for semantic re-evaluation, engaging prefrontal control regions to update a mental model in the face of incongruent information.

      Beyond its functional/anatomical grounding, this simplification provides powerful methodological and statistical advantages. It establishes a symmetrical framework that makes our dynamic connectivity analyses tractable, such as our “cube” analysis of state transitions, which required overlapping modules. Critically, this model also offers a statistical safeguard. By ensuring each quadrant contributes to both low- and high-PE connectivity patterns, we eliminate confounds like region-specific signal variance or global connectivity. This design choice isolates the phenomenon to the pattern of connectivity itself (posterior-anterior vs. ventral-dorsal), making our interpretation more robust.

      We updated the end of the Study 1A results (p. 10-11):

      “Some ROIs appear in Figure 2C but are excluded from the four targeted quadrants (Figures 2C & 2D) – e.g., posterior inferior temporal lobe and fusiform ROIs are excluded from the OC-IPL module, and middle temporal gyrus ROIs are excluded from the IPL-LPFC modules. These exclusions, in favor of a four-quadrant interpretation, are motivated by existing knowledge of prominent structural pathways among these quadrants. This interpretation is also supported by classifier-based analyses showing connectivity within each quadrant is significantly influenced by PE (Supplemental Materials 2.2), along with analyses of single-region activity showing that these areas also respond to PE independently (Supplemental Materials 3). Hence, we proceeded with further analyses of these quadrants’ connections, which summarize PE’s global brain effects.

      “This four-quadrant setup also imparts analytical benefits. First, this simplified structure may better generalize across PE tasks, and Study 1B would aim to replicate these results with a different design. Second, the four quadrants mean that each ROI contributes to both the posterior-anterior and ventral-dorsal modules, which would benefit later analyses and rules out confounds such as PE eliciting increased/decreased connectivity between an ROI and the rest of the brain. An additional, less key benefit is that this setup allows more easily evaluating whether the same phenomena arise using a different atlas (Supplemental Materials Y).”

      (R2.5) The EEG-fMRI analysis claiming 3-6Hz fluctuations for PE is hard to reconcile with the fact that fMRI captures activity that is a lot slower, while some PEs are as fast as 150 ms. The discussion acknowledges this but doesn't seem to resolve it - would benefit from a more comprehensive argument.

      We thank the reviewer for raising this important point, which allows us to clarify the logic of our multimodal analysis. Our analysis does not claim that the fMRI BOLD signal itself oscillates at 3-6 Hz. Instead, it is based on the principle that the intensity of a fast neural process can be reflected in the magnitude of the slow BOLD response. It’s akin to using a long-exposure photograph to capture a fast-moving object; while the individual movements are blurred, the intensity of the blur in the photo serves as a proxy for the intensity of the underlying motion. In our case, the magnitude of the fMRI network difference (|PA – VD|) acts as the "blur," reflecting the intensity of the rapid fluctuations between states within that time window.

      Following this logic, we correlated this slow-moving fMRI metric with the power of the fast EEG rhythms, which reflects their amplitude. To bridge the different timescales, we averaged the EEG power over each fMRI time window and convolved it with the standard hemodynamic response function (HRF) – a crucial step to align the timing of the neural and metabolic signals. The resulting significant correlation specifically in the 3-6 Hz band demonstrates that when this rhythm is stronger, the fMRI data shows a greater divergence between network states. This allows us to infer the characteristic frequency of the underlying neural fluctuations without directly measuring them at that speed with fMRI, thus reconciling the two timescales.

      Reviewer #3 (Public review):

      Bogdan et al. present an intriguing and timely investigation into the intrinsic dynamics of prediction error (PE)-related brain states. The manuscript is grounded in an intuitive and compelling theoretical idea: that the brain alternates between high and low PE states even at rest, potentially reflecting an intrinsic drive toward predictive minimization. The authors employ a creative analytic framework combining different prediction tasks and imaging modalities. They shared open code, which will be valuable for future work.

      (R3.1) Consistency in Theoretical Framing

      The title, abstract, and introduction suggest inconsistent theoretical goals of the study.

      The title suggests that the goal is to test whether there are intrinsic fluctuations in high and low PE states at rest. The abstract and introduction suggest that the goal is to test whether the brain intrinsically minimizes PE and whether this minimization recruits global brain networks. My comments here are that a) these are fundamentally different claims, and b) both are challenging to falsify. For one, task-like recurrence of PE states during resting might reflect the wiring and geometry of the functional organization of the brain emerging from neurobiological constraints or developmental processes (e.g., experience), but showing that mirroring exists because of the need to minimize PE requires establishing a robust relationship with behavior or showing a causal effect (e.g., that interrupting intrinsic PE state fluctuations affects prediction).

      The global PE hypothesis-"PE minimization is a principle that broadly coordinates brain functions of all sorts, including abstract cognitive functions"-is more suitable for discussion rather than the main claim in the abstract, introduction, and all throughout the paper.

      Given the above, I recommend that the authors clarify and align their core theoretical goals across the title, abstract, introduction, and results. If the focus is on identifying fluctuations that resemble taskdefined PE states at rest, the language should reflect that more narrowly, and save broader claims about global PE minimization for the discussion. This hypothesis also needs to be contextualized within prior work. I'd like to see if there is similar evidence in the literature using animal models.

      Thank you for bringing up this issue. We have made changes throughout the paper to address these points. First, we have omitted reference to a “global PE hypothesis” from the Abstract and Introduction, in favor of structuring the Introduction in terms of a falsifiable question (p. 4):

      “We pursued this goal using three studies (Figure 1) that collectively targeted a specific question: Do the taskdefined connectivity signatures of high vs. low PE also recur during rest, and if so, how does the brain transition between exhibiting high/low signatures?”

      We made changes later in the Introduction to clarify that the investigation is based on correlative evidence and requires interpretations that may be debated (p. 5-7):

      “Although this does not entirely address the reverse inference dilemma and can only produce correlative evidence, the present research nonetheless investigates these widely speculated upon PE ideas more directly than any prior work.

      Although such speed outpaces the temporal resolution of fMRI, correlating fluctuations in dynamic connectivity measured from fMRI data with EEG oscillations can provide an estimate of the fluctuations’ speed. This interpretation of a correlation again runs up against issues related to reverse inference but would nonetheless serve as initial suggestive evidence that spontaneous transitions between network states occur rapidly.

      Second, we examined the recruitment of these networks during rs-fMRI, and although the problems related to reverse inference are impossible to overcome fully, we engage with this issue by linking rs-fMRI data directly to task-fMRI data of the same participants, which can provide suggestive evidence that the same neural mechanisms are at play in both.”

      We made changes throughout the Results now better describing the results as consistent with a hypothesis rather than demonstrating it (p. 12-19):

      “In other words, we essentially asked whether resting-state participants are sometimes in low PE states and sometimes in high PE states, which would be consistent with spontaneous PE processing in the absence of stimuli.

      These emerging states overlap strikingly with the previous task effects of PE, suggesting that rs-fMRI scans exhibit fluctuations that resemble the signatures of low- and high-PE states. 

      To be clear, this does not entirely dissuade concerns about reverse inference, which would require a type of causal manipulation that is difficult (if not impossible) to perform in a resting state scan. Nonetheless, these results provide further evidence consistent with our interpretation that the resting brain spontaneously fluctuates between high/low PE network states.

      These patterns are most consistent with a characteristic timescale near 3–6 Hz for the amplitude of the putative high/low-PE fluctuations. This is notably consistent with established links between PE and Delta/Theta and is further consistent with an interpretation in which these fluctuations relate to PE-related processing during rest.”

      We have also made targeted edits to the Discussion to present the findings in a more cautious way, more clearly state what is our interpretation, and provide alternative explanations (p. 19-26):

      “The present research conducted task-fMRI, rs-fMRI, and rs-fMRI-EEG studies to clarify whether PE elicits global connectivity effects and whether the signatures of PE processing arise spontaneously during rest. This investigation carries implications for how PE minimization may characterize abstract task-general cognitive processes. […] Although there are different ways to interpret this correlation, it is consistent with high/low PE states generally fluctuating at 3-6 Hz during rest. Below, we discuss these three studies’ findings.

      Our rs-fMRI investigation examined whether resting dynamics resemble the task-defined connectivity signatures of high vs. low PE, independent of the type of stimulus encountered. The resting-state analyses indeed found that, even at rest, participants’ brains fluctuated between strong ventral-dorsal connectivity and strong posterior-anterior connectivity, consistent with shifts between states of high and low PE. This conclusion is based on correlative/observational evidence and so may be controversial as it relies on reverse inference.

      These patterns resemble global connectivity signatures seen in resting-state participants, and correlations between fMRI and EEG data yield associations, consistent with participants fluctuating between high-PE (ventral-dorsal) and low-PE (posterior-anterior) states at 3-6 Hz. Although definitively testing these ideas is challenging, given that rs-fMRI is defined by the absence of any causal manipulations, our results provide evidence consistent with PE minimization playing a role beyond stimulus process.”

      (R3.2) Interpretation of PE-Related Fluctuations at Rest and Its Functional Relevance. It would strengthen the paper to clarify what is meant by "intrinsic" state fluctuations. Intrinsic might mean taskindependent, trait-like, or spontaneously generated. Which do the authors mean here? Is the key prediction that these fluctuations will persist in the absence of a prediction task?

      Of the three terms the reviewer mentioned, “spontaneous” and “task-independent” are the most accurate descriptors. We conceptualize these fluctuations as a continuous background process that persists across all facets of cognition, without requiring a task explicitly designed to elicit prediction error – although we, along with other predictive coding papers, would argue that all cognitive tasks are fundamentally rooted in PE mechanisms and thus anything can be seen as a “prediction task” (see our response to comment R2.2 for our changes to the Introduction that provide more intuition for this point). The proposed interactions can be seen as analogous to cortico-basal-thalamic loops, which are engaged across a vast and diverse array of cognitive processes.

      The prior submission only used the word “intrinsic” in the title. We have since revised it to “spontaneous,” which is more specific than “intrinsic,” and we believe clearer for a title than “task-independent” (p. 1): “Spontaneous fluctuations in global connectivity reflect transitions between states of high and low prediction error”

      We have also made tweaks throughout the manuscript to now use “spontaneously” throughout (it now appears 8 times in the paper).

      Regardless of the intrinsic argument, I find it challenging to interpret the results as evidence of PE fluctuations at rest. What the authors show directly is that the degree to which a subset of regions within a PE network discriminates high vs. low PE during task correlates with the magnitude of separation between high and low PE states during rest. While this is an interesting relationship, it does not establish that the resting-state brain spontaneously alternates between high and low PE states, nor that it does so in a functionally meaningful way that is related to behavior. How can we rule out brain dynamics of other processes, such as arousal, that also rise and fall with PE? I understand the authors' intention to address the reverse inference concern by testing whether "a participant's unique connectivity response to PE in the reward-processing task should match their specific patterns of resting-state fluctuation". However, I'm not fully convinced that this analysis establishes the functional role of the identified modules to PE because of the following:

      Theoretically, relating the activities of the identified modules directly to behavior would demonstrate a stronger functional role.

      (R3.2a) Across participants: Do individuals who exhibit stronger or more distinct PE-related fluctuations at rest also perform better on tasks that require prediction or inference? This could be assessed using the HCP prediction task, though if individual variability is limited (e.g., due to ceiling effects), I would suggest exploring a dataset with a prediction task that has greater behavioral variance.

      This is a good idea, but unfortunately difficult to test with our present data. The HCP gambling task used in our study was not designed to measure individual differences in prediction or inference and likely suffers from ceiling effects. Because the task outcomes are predetermined and not linked to participants' choices, there is very little meaningful behavioral variance in performance to correlate with our resting-state fluctuation measure.

      While we agree that exploring a different dataset with a more suitable task would be ideal, given the scope of the existing manuscript, this seems like it would be too much. Although these results would be informative, they would ultimately still not be a panacea for the reverse inference issues.

      Or even more broadly, does this variability in resting state PE state fluctuations predict general cognitive abilities like WM and attention (which the HCP dataset also provides)? I appreciate the inclusion of the win-loss control, and I can see the intention to address specificity. This would test whether PE state fluctuations reflect something about general cognition, but also above and beyond these attentional or WM processes that we know are fluctuating.

      This is a helpful suggestion, motivating new analyses: We measured the degree of resting-state fluctuation amplitude across participants and correlated it with the different individual differences measures provided with the HCP data (e.g., measures of WM performance). We computed each participant’s fluctuation amplitude measure as the average absolute difference between posterior-anterior and ventral-dorsal connectivity; this is the average of the TR-by-TR fMRI amplitude measure from Study 3. We correlated this individual difference score with all of the ~200 individual difference measures provided with the HCP dataset (e.g., measures of intelligence or personality). We measured the Spearman correlation between mean fluctuation amplitude with each of those ~200 measures, while correcting for multiple hypotheses using the False Discovery Rate approach.[18]

      We found a robust negative association with age, where older participants tend to display weaker fluctuations (r = -.16, p < .001). We additionally find a positive association with the age-adjusted score on the picture sequence task (r = .12, p<sub>corrected</sub> = .03) and a negative association with performance in the card sort task (r = -.12, p<sub>corrected</sub> = 046). It is unclear how to interpret these associations, without being speculative, given that fluctuation amplitude shows one positive association with performance and one negative association, albeit across entirely different tasks.  We have added these correlation results as Supplemental Materials 8 (SM p. 11):

      “(8) Behavioral differences related to fluctuation amplitude 

      To investigate whether individual differences in the magnitude of resting-state PE-state fluctuations predict general cognitive abilities, we correlated our resting-state fluctuation measure with the cognitive and demographic variables provided in the HCP dataset.

      (8.1) Methods

      For each of the 1,000 participants, we calculated a single fluctuation amplitude score. This score was defined as the average absolute difference between the time-varying posterior-anterior (PA) and ventral-dorsal (VD) connectivity during the resting-state fMRI scan (the average of the TR-by-TR measure used for Study 3). We then computed the Spearman correlation between this score and each of the approximately 200 individual difference measures provided in the HCP dataset. We corrected for multiple comparisons using the False Discovery Rate (FDR) approach.

      (8.2) Results

      The correlations revealed a robust negative association between fluctuation amplitude and age, indicating that older participants tended to display weaker fluctuations (r = -.16, p<sub>corrected</sub> < .001). After correction, two significant correlations with cognitive performance emerged: (i) a positive association with the age-adjusted score on the Picture Sequence Memory Test (r = .12, p<sub>corrected</sub> = .03), (ii) a negative association with performance on the Card Sort Task (r = -.12, p<sub>corrected</sub> = .046). As greater fluctuation amplitude is linked to better performance on one task but worse performance on another, it is unclear how to interpret these findings.”

      We updated the main text Methods to direct readers to this content (p. 39-40):

      “(4.4.3) Links between network fluctuations and behavior

      We considered whether the extent of PE-related network expression states during resting-state is behaviorally relevant. We specifically investigated whether individual differences in the overall magnitude of resting-state fluctuations could predict individual difference measures, provided with the HCP dataset. This yielded a significant association with age, whereby older participants tended to display weaker fluctuations. However, associations with cognitive measures were limited. A full description of these analyses is provided in Supplemental Materials 8.”

      (R3.2b) Within participants: Do momentary increases in PE-network expression during tasks relate to better or faster prediction? In other words, is there evidence that stronger expression of PE-related states is associated with better behavioral outcomes?

      This is a good question that probes the direct behavioral relevance of these network states on a trial-by-trial basis. We agree with the reviewer's intuition; in principle, one would expect a stronger expression of the low-PE network state on trials where a participant correctly and quickly gives a high likelihood rating to a predictable stimulus.

      Following this suggestion, we performed a new analysis in Study 1A to test this. We found that while network expression was indeed linked to participants’ likelihood ratings: higher likelihood ratings correspond to stronger posterior-anterior connectivity, whereas lower ratings correspond to stronger ventral-dorsal connectivity (Connectivity-Direction × likelihood, β [standardized] = .28, p = .02). Yet, this is not a strong test of the reviewer’s hypothesis, and different exploratory analyses of response time yield null results (p > .05). We suspect that this is due to the effect being too subtle, so we have insufficient statistical power. A comparable analysis was not feasible for Study 1B, as its design does not provide an analogous behavioral measure of trialby-trial prediction success.

      (R3.3) A priori Hypothesis for EEG Frequency Analysis.

      It's unclear how to interpret the finding that fMRI fluctuations in the defined modules correlate with frontal Delta/Theta power, specifically in the 3-6 Hz range. However, in the EEG literature, this frequency band is most commonly associated with low arousal, drowsiness, and mind wandering in resting, awake adults, not uniquely with prediction error processing. An a priori hypothesis is lacking here: what specific frequency band would we expect to track spontaneous PE signals at rest, and why? Without this, it is difficult to separate a PE-based interpretation from more general arousal or vigilance fluctuations.

      This point gets to the heart of the challenge with reverse inference in resting-state fMRI. We agree that an interpretation based on general arousal or drowsiness is a potential alternative that must be considered. However, what makes a simple arousal interpretation challenging is the highly specific nature of our fMRI-EEG association. As shown in our confirmatory analyses (Supplemental Materials 6), the correlation with 3-6 Hz power was found exclusively with the absolute difference between our two PE-related network states (|PA – VD|)—a measure of fluctuation amplitude. We found no significant relationship with the signed difference (a bias toward one state) or the sum (the overall level of connectivity). This specificity presents a puzzle for a simple drowsiness account; it seems less plausible that drowsiness would manifest specifically as the intensity of fluctuation between two complex cognitive networks, rather than as a more straightforward change in overall connectivity. While we cannot definitively rule out contributions from arousal, the specificity of our finding provides stronger evidence for a structured cognitive process, like PE, than for a general, undifferentiated state. 

      We updated the Discussion to make the argument above and also to remind readers that alternative explanations, such as ones based on drowsiness, are possible (p. 24):

      “We specifically interpret the fMRI-EEG correlation as reflecting fluctuation speed because we correlated EEG oscillatory power with the fluctuation amplitude computed from fMRI data. Simply correlating EEG power with the average connectivity or the signed difference between posterior-anterior and ventral-dorsal connectivity yields null results (Supplemental Materials 6), suggesting that this is a very particular association, and viewing it as capturing fluctuation amplitude provides a parsimonious explanation. Yet, this correlation may be interpreted in other ways. For example, resting-state Theta is also a signature of drowsiness,[2] which may correlate with PE processing, but perhaps should be understood as some other mechanism.”

      (R3.4) Significance Assessment

      The significance of the correlation above and all other correlation analyses should be assessed through a permutation test rather than a single parametric t-test against zero. There are a few reasons: a) EEG and fMRI time series are autocorrelated, violating the independence assumption of parametric tests;

      Standard t-tests can underestimate the true null distribution's variance, because EEG-fMRI correlations often involve shared slow drifts or noise sources, which can yield spurious correlations and inflating false positives unless tested against an appropriate null.

      Building a null distribution that preserves the slow drifts, for example, would help us understand how likely it is for the two time series to be correlated when the slow drifts are still present, and how much better the current correlation is, compared to this more conservative null. You can perform this by phase randomizing one of the two time courses N times (e.g., N=1000), which maintains the autocorrelation structure while breaking any true co-occurrence in patterns between the two time series, and compute a non-parametric p-value. I suggest using this approach in all correlation analyses between two time series.

      This is an important statistical point to clarify, and the suggested analysis is valuable. The reviewer is correct that the raw fMRI and EEG time series are autocorrelated. However, because our statistical approach is a twolevel analysis, we reasoned that non-independence at the correlation-level would not invalidate the higher-level t-test. The t-test’s assumption of independence applies to the individual participants' coefficients, which are independent across participants. Thus, we believe that our initial approach is broadly appropriate, and its simplicity allows it to be easily communicated.

      Nonetheless, the permutation-testing procedure that the Reviewer describes seems like an important analysis to test, given that permutation-testing is the gold standard for evaluating statistical significance, and it could guarantee that our above logic is correct. We thus computed the analysis as the reviewer described. For each participant, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This was done for each participant once per permutation. Each participant’s phase-randomized data was submitted to the analysis of each oscillatory power band as originally, generating one mean correlation for each band. This was done 1,000 times.

      Across the five bands, we find that the grand mean correlation is near zero (M<sub>r</sub> = .0006) and the 97.5<sup>th</sup> percentile critical value of the null distribution is r = ~.025; this 97.5<sup>th</sup> percentile corresponds to the upper end of a 95% confidence interval for a band’s correlation; the threshold minimally differs across bands (.024 < rs < .026). Our original correlation coefficients for Delta (M<sub>r</sub> = .042) and Theta (M<sub>r</sub> = .041), which our conclusions focused on, remained significant (p ≤ .002); we can perform family-wise error-rate correction by taking the highest correlation across any band for a given permutation, and the Delta and Theta effects remain significant (p<sub>FWE</sub>corrected ≤ .003); previously Reviewer comment R1.4c requested that we employ family-wise error correction.

      These correlations were previously reported in Table 1, and we updated the caption to note what effects remain significant when evaluated using permutation-testing and with family-wise error correction (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      We updated the Methods to describe the permutation-testing analysis (p. 43):

      “To confirm the significance of our fMRI-EEG correlations with a non-parametric approach, we performed a group-level permutation-test. For each of 1,000 permutations, we phase-randomized the fMRI fluctuation amplitude time series. Specifically, we randomized the Fourier phases of the |PA–VD| series (within run), while retaining the original amplitude spectrum; inverse transforms yielded real surrogates with the same power spectrum. This procedure breaks the true temporal relationship between the fMRI and EEG data while preserving its structure. We then re-computed the mean Spearman correlation for each frequency band using this phase-randomized data. We evaluated significance using a family-wise error correction approach that accounts for us analyzing five oscillatory power bands. We thus create a null distribution composed of the maximum correlation value observed across all frequency bands from each permutation. Our observed correlations were then tested for significance against this distribution of maximums.”

      (R3.5) Analysis choices

      If I'm understanding correctly, the algorithm used to identify modules does so by assigning nodes to communities, but it does not itself restrict what edges can be formed from these modules. This makes me wonder whether the decision to focus only on connections between adjacent modules, rather than considering the full connectivity, was an analytic choice by the authors. If so, could you clarify the rationale? In particular, what justifies assuming that the gradient of PE states should be captured by edges formed only between nearby modules (as shown in Figure 2E and Figure 4), rather than by the full connectivity matrix? If this restriction is instead a by-product of the algorithm, please explain why this outcome is appropriate for detecting a global signature of PE states in both task and rest.

      We discuss this matter in our response to comment R2.(4).

      When assessing the correspondence across task-fMRI and rs-fMRI in section 2.2.2, why was the pattern during task calculated from selecting a pair of bilateral ROIs (resulting in a group of eight ROIs), and the resting state pattern calculated from posterior-anterior/ventral-dorsal fluctuation modules? Doesn't it make more sense to align the two measures? For example, calculating task effects on these same modules during task and rest?

      We thank the reviewer for this question, as it highlights a point in our methods that we could have explained more clearly. The reviewer is correct that the two measures must be aligned, and we can confirm that they were indeed perfectly matched.

      For the analysis in Section 2.2.2, both the task and resting-state measures were calculated on the exact same anatomical substrate for each comparison. The analysis iteratively selected a symmetrical subset of eight ROIs from our larger four quadrants. For each of these 3,432 iterations, we computed the task-fMRI PE effect (the Connectivity Direction × PE interaction) and the resting-state fluctuation amplitude (E[|PA – VD|]) using the identical set of eight ROIs. The goal of this analysis was precisely to test if the fine-grained anatomical pattern of these effects correlated within an individual across the task and rest states. We will revise the text in Section 2.2.2 to make this direct alignment of the two measures more explicit.

      Recommendations for authors:

      Reviewer #1 (Recommendations for authors):

      (R1.3) Several prior studies have described co-activation or connectivity "templates" that spontaneously alternate during rest and task states, and are linked to behavioral variability. While they are interpreted differently in terms of cognitive function (e.g., in terms of sustained attention: Monica Rosenberg; alertness: Catie Chang), the relationship between these previously reported templates and those identified in the current study warrants discussion. Are the current templates spatially compatible with prior findings while offering new functional interpretations beyond those already proposed in the literature? Or do they represent spatially novel patterns?

      Thank you for this suggestion. Broadly, we do not mean to propose spatially novel patterns but rather focus on how these are repurposed for PE processing. In the Discussion, we link our identified connectivity states to established networks (e.g., the FPCN). We updated this paragraph to mention that these patterns are largely not spatially novel (p. 20):

      “The connectivity patterns put forth are, for the most part, not spatially novel and instead overlap heavily with prior functional and anatomical findings.”

      Regarding the specific networks covered in the prior work by Rosenberg and Chang that the reviewer seems to be referring to, [7,8] this research has emphasized networks anchored heavily in sensorimotor, subcortical– cerebellar, and medial frontal circuits, and so mostly do not overlap with the connectivity effects we put forth.

      (R1.4) Additional points:

      (R1.4a) I do not think that the logic for taking the absolute difference of fMRI connectivity is convincing. What happens if the sign of the difference is maintained ?

      Thank you for pointing out this area that requires clarification. Our analysis targets the amplitude of the fluctuation between brain states, not the direction. We define high fluctuation amplitude as moments when the brain is strongly in either the PA state (PA > VD) or the VD state (VD > PA). The absolute difference |PA – VD| correctly quantifies this intensity, whereas a signed difference would conflate these two distinct high-amplitude moments. Our simulation study (Supplemental Materials, Section 5) provides the theoretical validation for this logic, showing how this absolute difference measure in slow fMRI data can track the amplitude of a fast underlying neural oscillator.

      When the analysis is tested in terms of the signed difference, as suggested by the Reviewer, the association between the fMRI data and EEG power is insignificant for each power band (ps<sub>uncorrected</sub> ≥ .47). We updated Supplemental Materials 6 to include these results. Previously, this section included the fluctuation amplitude (fMRI) × EEG power results while controlling for: (i) the signed difference between posterior-anterior and ventral-dorsal connectivity, (ii) the sum of posterior-anterior and ventral-dorsal connectivity, and (iii) the absolute value of the sum of posterior-anterior and ventral-dorsal connectivity. For completeness, we also now report the correlation between each EEG power band and each of those other three measures (SM, p. 9)

      “We additionally tested the relationship between each of those three measures and the five EEG oscillation bands. Across the 15 tests, there were no associations (ps<sub>uncorrected</sub>  ≥ .04); one uncorrected p-value was at p = .044, although this was expected given that there were 15 tests. Thus, the association between EEG oscillations and the fMRI measure is specific to the absolute difference (i.e., amplitude) measure.”

      (R1.4b) Reasoning of focus on frontal and theta band is weak, and described as "typical" (line 359) based on a single study.

      Sorry about this. There is a rich literature on the link between frontal theta and prediction error,[3,9–11] and we updated the Introduction to include more references to this work (p. 18): “The analysis was first done using power averaged across frontal electrodes, as these are the typical focus of PE research on oscillations.[3,9–11]”

      We have also updated the Methods to cite more studies that motivate our electrode choice (p. 41): “The analyses first targeted five midline frontal electrodes (F3, F1, Fz, F2, F4; BioSemi64 layout), given that this frontal row is typically the focus of executive-function PE research on oscillations.[9–11]”

      (R1.4c) No correction appears to have been applied for the association between EEG power and fMRI connectivity. Given that 100 frequency bins were collapsed into 5 canonical bands, a correction for 5 comparisons seems appropriate. Notably, the strongest effects in the delta and theta bands (particularly at fronto-central electrodes) may still survive correction, but this should be explicitly tested and reported.

      Thanks for this suggestion. We updated the Table 1 caption to mention what results survive family-wise error rate correction – as the reviewer suggests, the Delta/Theta effects would survive Bonferroni correction for five tests, although per a later comment suggesting that we evaluate statistical significance with a permutationtesting approach (comment R3.4), we instead report family-wise error correction based on that. The revised caption is as follows (p. 19):

      “The effects for Delta, Theta, Beta, and Gamma remain significant if significance testing is instead performed using permutation-testing and with family-wise error rate correction (p<sub>corrected</sub> < .05).”

      (R1.4d) Line 135. Not sure I understand what you mean by "moods". What is the overall point here?

      The overall argument is that the fluctuations occur rapidly rather than slowly. By slow “moods” we refer to how a participant could enter a high anxiety state of >10 seconds, linked to high PE fluctuations, and then shift into a low anxiety state, linked to low PE fluctuations. We argue that this is not occurring. Regardless, we recognize that referring to lengths of time as short as 10 seconds or so is not a typical use of the word “mood” and is potentially ambiguous, so we have omitted this statement, which was originally on page 6: “Identifying subsecond fluctuations would broaden the relevance of the present results, as they rule out that the PE states derive from various moods.”

      (R1.4e) Line 100. "Few prior PE studies have targeted PE, contrasting the hundreds that have targeted BOLD". I don't understand this sentence. It's presumably about connectivity vs activity?

      Yes, sorry about this typo. The reviewer is correct, and that sentence was meant to mention connectivity. We corrected (p. 5): “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R1.4f) Line 373: "0-0.5Hz" in the caption is probably "0-50Hz".

      Yes, this was another typo, thank you. We have corrected it (p. 19): “… every 0.5 Hz interval from 0-50 Hz.”

      Reviewer #2 (Recommendations for authors):

      (R2.6) (Page 3) When referring to the "limited" hypothesis of local PE, please clarify in what sense is it limited. That statement is unclear.

      Thank you for pointing out this text, which we now see is ambiguous. We originally use "limited" to refer to the hypothesis's constrained scope – namely, that PE is relevant to various low-level operations (e.g., sensory processing or rewards) but the minimization of PE does not guide more abstract cognitive processes. We edited this part of the Introduction to be clearer (p. 3)

      “It is generally agreed that the brain uses PE mechanisms at neuronal or regional levels,[15,16] and this idea has been useful in various low-level functional domains, including early vision [15] and dopaminergic reward processing.[17] Some theorists have further argued that PE propagates through perceptual pathways and can elicit downstream cognitive processes to minimize PE.”

      (R2.7) (Page 5) "Few prior PE have targeted PE"... this statement appears contradictory. Please clarify.

      Sorry about this typo, which we have corrected (p. 5):

      “Few prior PE studies have targeted connectivity, contrasting the hundreds that have targeted BOLD.”

      (R2.8) What happened to the data of the medium PE condition in Study 1A?

      The medium PE condition data were not excluded. We modeled the effect of prediction error on connectivity using a linear regression across the three conditions, coding them as a continuous variable (Low = -1, Medium = 0, High = +1). This approach allowed us to identify brain connections that showed a linear increase or decrease in strength as a function of increasing PE. This linear contrast is a more specific and powerful way to isolate PErelated effects than a High vs. Low contrast. We updated the Results slightly to make this clearer (p. 8-9):

      “In the fMRI data, we compared the three PE conditions’ beta-series functional connectivity, aiming to identify network-level signatures of PE processing, from low to high. […] For the modularity analysis, we first defined a connectome matrix of beta values, wherein each edge’s value was the slope of a regression predicting that edge’s strength from PE (coded as Low = -1, Medium = 0, High = +1; Figure 2A).”

      (R2.9) (Page 15) The point about how the dots in 6H follow those in 6J better than those in 6I is a little subjective - can the authors provide an objective measure?

      Thank you for pointing out this issue. The visual comparison using Figure 6 was not meant as a formal analysis but rather to provide intuition. However, as the reviewer describes, this is difficult to convey. Our formal analysis is provided in Supplemental Materials 5, where we report correlation coefficients between a very large number of simulated fMRI data points and EEG data points corresponding to different frequencies. We updated this part of the Results to convey this (p. 16-17):

      “Notice how the dots in Figure 6H follow the dots in Figure 6J (3 Hz) better than the dots in Figure 6I (0.5 Hz) or Figure 6K (10 Hz); this visual comparison is intended for illustrative purposes only, and quantitative analyses are provided in Supplemental Materials 5.”

      References

      (1) Zalesky, A., Fornito, A. & Bullmore, E. T. Network-based statistic: identifying differences in brain networks. Neuroimage 53, 1197–1207 (2010)

      (2) Strijkstra, A. M., Beersma, D. G., Drayer, B., Halbesma, N. & Daan, S. Subjective sleepiness correlates negatively with global alpha (8–12 Hz) and positively with central frontal theta (4–8 Hz) frequencies in the human resting awake electroencephalogram. Neuroscience letters 340, 17–20 (2003).

      (3) Cavanagh, J. F. & Frank, M. J. Frontal theta as a mechanism for cognitive control. Trends in cognitive sciences 18, 414–421 (2014).

      (4) Grech, R. et al. Review on solving the inverse problem in EEG source analysis. Journal of neuroengineering and rehabilitation 5, 25 (2008)

      (5) Palva, J. M. et al. Ghost interactions in MEG/EEG source space: A note of caution on inter-areal coupling measures. Neuroimage 173, 632–643 (2018).

      (6) Koles, Z. J. Trends in EEG source localization. Electroencephalography and clinical Neurophysiology 106, 127–137 (1998).

      (7) Rosenberg, M. D. et al. A neuromarker of sustained attention from whole-brain functional connectivity. Nature neuroscience 19, 165–171 (2016).

      (8) Goodale, S. E. et al. fMRI-based detection of alertness predicts behavioral response variability. elife 10, e62376 (2021).

      (9) Cavanagh, J. F. Cortical delta activity reflects reward prediction error and related behavioral adjustments, but at different times. NeuroImage 110, 205–216 (2015)

      (10) Hoy, C. W., Steiner, S. C. & Knight, R. T. Single-trial modeling separates multiple overlapping prediction errors during reward processing in human EEG. Communications Biology 4, 910 (2021).

      (11) Neo, P. S.-H., Shadli, S. M., McNaughton, N. & Sellbom, M. Midfrontal theta reactivity to conflict and error are linked to externalizing and internalizing respectively. Personality neuroscience 7, e8 (2024).

      (12) Friston, K. J. The free-energy principle: a unified brain theory? Nature reviews neuroscience 11, 127–138 (2010)

      (13) Feldman, H. & Friston, K. J. Attention, uncertainty, and free-energy. Frontiers in human neuroscience 4, 215 (2010).

      (14) Friston, K. J. et al. Active inference and epistemic value. Cognitive neuroscience 6, 187–214 (2015).

      (15) Rao, R. P. & Ballard, D. H. Predictive coding in the visual cortex: a functional interpretation of some extraclassical receptive-field effects. Nature neuroscience 2, 79–87 (1999)

      (16) Walsh, K. S., McGovern, D. P., Clark, A. & O’Connell, R. G. Evaluating the neurophysiological evidence for predictive processing as a model of perception. Annals of the new York Academy of Sciences 1464, 242– 268 (2020)

      (17) Niv, Y. & Schoenbaum, G. Dialogues on prediction errors. Trends in cognitive sciences 12, 265–272 (2008).

      (18) Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological) 57, 289–300 (1995).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Summary

      We thank the reviewer for the constructive and thoughtful evaluation of our work. We appreciate the recognition of the novelty and potential implications of our findings regarding UPR activation and proteasome activity in germ cells.

      (1) The microscopy images look saturated, for example, Figure 1a, b, etc. Is this a normal way to present fluorescent microscopy?

      The apparent saturation was not present in the original images, but likely arose from image compression during PDF generation. While the EMA granule was still apparent, in the revised submission, we will provide high-resolution TIFF files to ensure accurate representation of fluorescence intensity and will carefully optimize image display settings to avoid any saturation artifacts.

      (2) The authors should ensure that all claims regarding enrichment/lower vs. lower values have indicated statistical tests.

      We fully agree. In the revised version, we will correct any quantitative comparisons where statistical tests were not already indicated, with a clear statement of the statistical tests used, including p-values in figure legends and text.

      (a) In Figure 2f, the authors should indicate which comparison is made for this test. Is it comparing 2 vs. 6 cyst numbers?

      We acknowledge that the description was not sufficiently detailed. Indeed, the test was not between 2 vs 6 cyst numbers, but between all possible ways 8-cell cysts or the larger cysts studied could fragment randomly into two pieces, and produce by chance 6-cell cysts in 13 of 15 observed examples. We will expand the legend and main text to clarify that a binomial test was used to determine that the proportion of cysts producing 6-cell fragments differed very significantly from chance.

      Revised text:

      “A binomial test was used to assess whether the observed frequency of 6-cell cyst products differed from random cyst breakage. Production of 6-cell cysts was strongly preferred (13/15 cysts; ****p < 0.0001).”

      (b) Figures 4d and 4e do not have a statistical test indicated.

      We will include the specific statistical test used and report the corresponding p-values directly in the figure legends.

      (3) Because the system is developmentally dynamic, the major conclusions of the work are somewhat unclear. Could the authors be more explicit about these and enumerate them more clearly in the abstract?

      We will revise the abstract to better clarify the findings of this study. We will also replace the term Visham with mouse fusome to reflect its functional and structural analogy to the Drosophila and Xenopus fusomes, making the narrative more coherent and conclusive.

      (4) The references for specific prior literature are mostly missing (lines 184-195, for example).

      We appreciate this observation of a problem that occurred inadvertently when shortening an earlier version.  We will add 3–4 relevant references to appropriately support this section.

      (5) The authors should define all acronyms when they are first used in the text (UPR, EGAD, etc).

      We will ensure that all acronyms are spelled out at first mention (e.g., Unfolded Protein Response (UPR), Endosome and Golgi-Associated Degradation (EGAD)).

      (6) The jumping between topics (EMA, into microtubule fragmentation, polarization proteins, UPR/ERAD/EGAD, GCNA, ER, balbiani body, etc) makes the narrative of the paper very difficult to follow.

      We are not jumping between topics, but following a narrative relevant to the central question of whether female mouse germ cells develop using a fusome.  EMA, microtubule fragmentation, polarization proteins, ER, and balbiani body are all topics with a known connection to fusomes. This is explained in the general introduction and in relevant subsections. We appreciate this feedback that further explanations of these connections would be helpful. In the revised manuscript, use of the unified term mouse fusome will also help connect the narrative across sections.  UPR/ERAD/EGAD are processes that have been studied in repair and maintenance of somatic cells and in yeast meiosis.  We show that the major regulator XbpI is found in the fusome, and that the fusome and these rejuvenation pathway genes are expressed and maintained throughout oogenesis, rather than only during limited late stages as suggested in previous literature.

      (7) The heading title "Visham participates in organelle rejuvenation during meiosis" in line 241 is speculative and/or not supported. Drawing upon the extensive, highly rigorous Drosophila literature, it is safe to extrapolate, but the claim about regeneration is not adequately supported.

      We believe this statement is accurate given the broad scope of the term "participates." It is supported by localization of the UPR regulator XbpI to the fusome. XbpI is the ortholog of HacI a key gene mediating UPR-mediated rejuvenation during yeast meiosis.  We also showed that rejuvenation pathway genes are expressed throughout most of meiosis (not previously known) and expanded cytological evidence of stage-specific organelle rejuvenation later in meiosis, such as mitochondrial-ER docking, in regions enriched in fusome antigens. However, we recognize the current limitations of this evidence in the mouse, and want to appropriately convey this, without going to what we believe would be an unjustified extreme of saying there is no evidence.

      Reviewer #2 (Public review):

      We thank the reviewer for the comprehensive summary and for highlighting both the technical achievement and biological relevance of our study. We greatly appreciate the thoughtful suggestions that have helped us refine our presentation and terminology.

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title “Mouse germline cysts contain a fusome-like structure that mediates oocyte development”

      We will change the statement to: “Mouse germline cysts contain a fusome that supports germline cyst polarity and rejuvenation.”

      (1b) Result title “Visham overlaps centrosomes and moves on microtubules”

      We acknowledge that “moves” implies dynamics. We will include additional supplementary images showing small vesicular components of the mouse fusome on spindle-derived microtubule tracks.

      (1c) Result title “Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation”

      We will revise this title to: “The mouse fusome associates with the UPR regulatory protein Xbp1 beginning at the onset of cyst formation” to reflect the specific UPR protein that was immunolocalized.

      (1d) Result title “Visham participates in organelle rejuvenation during meiosis”

      We will revise this to: “The mouse fusome persists during organelle rejuvenation in meiosis.”

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

      We appreciate the reviewer’s insightful comment. To maintain conceptual clarity and align with existing literature, we will refer to the structure as the mouse fusome throughout the manuscript, avoiding introduction of a new term.

      Reviewer #3 (Public review):

      We thank the reviewer for emphasizing the importance of our study and for providing constructive feedback that will help us clarify and strengthen our conclusions.

      (1) Line 86 - the heading for this section is "PGCs contain a Golgi-rich structure known as the EMA granule"

      We agree that the enrichment of Golgi within the EMA PGCs was not shown until the next section. We will revise this heading to:

      “PGCs contain an asymmetric EMA granule.” 

      (2) Line 105-106, how do we know if what's seen by EM corresponds to the EMA1 granule?

      We will clarify that this identification is based on co-localization with Golgi markers (GM130 and GS28) and response to Brefeldin A treatment, which will be included as supplementary data. These findings support that the mouse fusome is Golgi-derived and can therefore be visualized by EM. The Golgi regions in E13.5 cyst cells move close together and associate with ring canals as visualized by EM (Figure 1E), the same as the mouse fusomes identified by EMA.

      (3) Line 106-107-states "Visham co-stained with the Golgi protein Gm130 and the recycling endosomal protein Rab11a1". This is not convincing as there is only one example of each image, and both appear to be distorted.

      Space is at a premium in these figures, but we have no limitation on data documenting this absolutely clear co-localization. We will replace the existing images with high-resolution, noncompressed versions for the final figures to clearly illustrate the co-staining patterns for GM130 and Rab11a1.

      (4) Line 132-133---while visham formation is disrupted when microtubules are disrupted, I am not convinced that visham moves on microtubules as stated in the heading of this section.

      We will include additional supplementary data showing small mouse fusome vesicles aligned along microtubules.

      (5) Line 156 - the heading for this section states that Visham associates with polarity and microtubule genes, including pard3, but only evidence for pard3 is presented.

      We agree and will revise the heading to: “Mouse fusome associates with the polarity protein Pard3.” We are adding data showing association of small fusome vesicles on microtubules.

      (6) Lines 196-210 - it's strange to say that UPR genes depend on DAZ, as they are upregulated in the mutants. I think there are important observations here, but it's unclear what is being concluded.

      UPR genes are not upregulated in DAZ in the sense we have never documented them increasing. We show that UPR genes during this time behave like pleuripotency genes and normally decline, but in DAZ mutants their decline is slowed.  We will rephrase the paragraph to clarify that Dazl mutation partially decouples developmental processes that are normally linked, which alters UPR gene expression relative to cyst development.

      (7) Line 257-259-wave 1 and 2 follicles need to be explained in the introduction, and how these fits with the observations here clarified.

      Follicle waves are too small a focus of the current study to explain in the introduction, but we will request readers to refer to the cited relevant literature (Yin and Spradling, 2025) for further details.

      We sincerely thank all reviewers for their insightful and constructive feedback. We believe that the planned revisions—particularly the refined terminology, improved image quality, clarified statistics, and restructured abstract—will substantially strengthen the manuscript and enhance clarity for readers.

      Reviewer #1 (Recommendations for the authors):

      (1) Figure 1E: need to use some immuno-gold staining to identify the Visham. Just circling an area of cytoplasm that contains ER between germ cell pairs is not enough.

      We appreciate the reviewer’s insistence that the association between the mouse fusome and Golgi be clearly demonstrated. However, the EMA granule is a large structure discovered and defined by light microscopy, and presents no inherent challenge to documenting its Golgi association by immunofluorescence experiments, which we presented and now further strengthened as described in the next paragraph.  We believe that the suggested EM experiment would add little to the EM we already presented (Figure 1E, E')  Moreover, due to facility limitations, we are currently unable to perform immunogold staining. 

      To strengthen previous immunolocalization experiments, we have now included additional immunostaining data showing the clear colocalization of the fusome region with the Golgi markers GM130 and GS28 (Figure S1H). We have also incorporated a new experiment using the Golgi-specific inhibitor Brefeldin A (BFA) see Figure S1I.  Treatment of in vitro–cultured gonads with BFA, disrupted EMA granule formation, demonstrating that EMA granules not only associate with Golgi, but require Golgi function to to be maintained.

      Additionally, in Figure 2, we showed that the fusome overlaps with the peri-centriolar region—a characteristic locus for Golgi due to its movement on microtubules.  We showed that the dynamic behavior of the fusome during the cell cycle, parallels Golgi dispersal and reassembly, and all these facts provide further strong support for the Golgi-association of the EMA granule and fusome.

      (2) Figure 1F: is this image compressed?

      We have now substituted the image in Figure 1F with a better image and have avoided the compression of the image. 

      (3) In the figure legends, are the sample sizes individual animals or individual sections? Please ensure that all figure legends for each figure panel consistently contain the sample size.

      We have now included the number of measurements (N) in every figure legend. Each experiment was performed using samples from at least three different animals, and in most cases from more than three. This information has also been added to the Methods section under Statistics. In addition, N values are now consistently provided for each graph throughout the figures.

      (4) Figure 2b/c: seemly likely based on the snapshot of different stages of cytokinesis that the "newly formed" visham is accurate, but without live imaging, this claim of "newly formed" is putative/speculative. It is OK if it is labeled as "putative" in the figure panel.  

      The behavior of the Drosophila fusome during mitosis was deduced without live imaging (deCuevas et al. 1998). We clarified that the conversion of a single mouse germ cell with one round fusome to an interconnected pair of cells with two round fusomes of greater total volume following mitosis is the basis for deducing that new fusome formation occurs each cell cycle. However, we agree with the reviewer that the phrase "newly formed" in the original label on Figure 2c suggested a specific mechanism of fusome increase that was not intended and this phrase has been removed entirely.  

      (5) Figure 2e/e is extremely difficult to follow. In order to improve the readability of these figure panels, can individual panels with a single stain be shown? The 'gap' between YFP+ sister cells is not immediately obvious in panel e or e" with the current layout. Since this is a key aspect of the author's claim about cleavage of the cyst, it would be best to make this claim more robust by showing more convincing images. In Figure 2E, the staining pattern of EMA needs to be clarified and described more fully in the text.

      We mapped discontinuities in the microtubule connections, not the fusome or YFP.  YFP is the lineage marker indicating that the cells of a single cyst are being studied. Consequently, no gap between YFP cytoplasmic expression is expected because only in the last example (figure E”), has fragmentation already occurred (and here there is a YFP gap).  The acetylated tubulin gap proceeds fragmentation.  The mitotic spindle remnants labeled by AcTub link the cells into two groups separated by a gap, which is clearly shown in the data images and in the third column where only the relevant AcTub from the cyst itself is shown. In response to the reviewers question about the fusome, which is not directly relevant to fragmentation, we have now provided images of the separate fusome channel and corresponding measurements for all three Figure 2E-E'' cysts in the supplementary Figure S4H. We have improved the text regarding this important figure to try and make it easier to follow, and also added a new example of a 10-cell cyst also in S2H (lower panels).  We also added, movies allowing full 3D study of one of the 8 cell cysts and the new 10-cell cyst.  I also suggest that the reviewer examine how the deduced mechanism of fragmentation explains previously published but not fully understood data on cyst fragmentation going back to 1998 as described in the expanded Discussion on this topic.  

      (6) It would be best to support the proposed model in Figure 2G (4+4+4) with microscopy images of a 12-cell or 16-cell cyst? Would these 12-cell or 16-cell cysts be too large to technically recover in a section?

      Unfortunately the reviewer 's suggestion that 12- or 16-cell cysts are too large to recover and present convincingly is correct. Because our analysis depends on capturing lineage-labeled cysts specifically at telophase with acetylated-tubulin connections, the likelihood of obtaining the correct stage is very low.  In addition, the dense packing of germ cells in the mouse gonad further limits our ability to fully reconstruct all the cells in large cysts, with difficulty increasing as cyst size grows.

      However, as noted, we added a well-resolved 10-cell cyst—the largest size we could confidently analyze—in a 3D video in Supplementary Figure S2H (lower panel), which shows a 6 + 4 breakage pattern.

      (7) We did not find a reference in the text for Figure 2G.

      We have now provided reference for 2G in the text and in the discussion section. 

      (8) Line 189: ERAD is used as an acronym, but is not defined until the discussion.

      We have now provided full form of acronym at its first usage in the text.

      (9) Fig 3i/i': the increase of UPR pathway components, increasing expression during zygotene, is interesting to note, but is not commented enough in the text of the paper.

      We have discussed this issue in the discussion section with specific reference to figure 3I. Please find the detailed discussion under the heading “Germ cell rejuvenation is highly active during cyst formation.”

      (10) Please quantify DNMT3A expression levels in WT control vs Dazl KO germ cells in Figure 4a.

      We have now quantified DNMT3A expression levels in WT control vs Dazl KO germ cells and have added the data in the Figure 4A.

      (11) Please introduce the rationale behind selecting DazL KO for studying cyst formation (text in line 197). This comes out of nowhere.

      True.  We significantly expanded our discussion of Dazl and citations of previous work, including evidence that it can affect cyst structures like ring canals, in the Introduction.  

      (12) It would be best to stain WT control vs DazL KO oogonia in Figure 4a with 5mC antibodies to support their claim that DNA methylation might be affected in the mutants.

      We respectfully disagree that this additional experiment is necessary within the scope of the current study. At the developmental stage examined (E12.5), germ cells in the Dazl mutant are clearly in an arrested and hypomethylated state, as supported by previous evidence (Haston et al. 2009).This initial experiments was designed to show that in our hands Dazl mutants show this known pkuripotency delay. However, the effects of Dazl mutation on female germline cyst development as it relates to polarity or the fusome was not studied before, and that is what the paper addresses, building on previous work.

      Because our study does not focus on germ-cell epigenetic modifications but rather on the consequences of Dazl loss on germ cell cyst development, adding 5mC immunostaining would not substantially advance the main conclusions. The existing data and previous published work already provide sufficient background.

      (13) Figure 4c: a very interesting figure, it would be best to quantify developmental pseudotime (perhaps using monocle3 analysis) and compare more rigorously the developmental stage of WT control vs DazL KO.

      Developmental pseudotime, such as through Monocle3 analysis, might sometimes be valuable but involves assumptions that when possible are better addressed by direct experimental examination. Our conclusions regarding cyst developmental stage are supported by straightforward evidence rather to which computational trajectory inference would add little. Specifically, we have performed analysis of germ-cell methylation state, ring canal formation, pluripotency markers, UPR pathway activity assay (Xbp1 and Proteomic assay), Golgi-stress analysis and Pard3 which collectively document the developmental status of the WT and Dazl KO germ cells. These empirical data demonstrate the same developmental pattern reflected in Figure 4c, making the less reliable pseudotime-based computational method superfluous.

      (14) Figure 4d has two panels labeled as "d".

      We have now corrected the labelling of the figure

      (15) Color coding in 4d, d', d" is confusing; please harmonize some visual presentation here.

      We have now harmonized the visual representation of all the graph in figure 4

      (16) Fig 4e' is labeled as DazL +/- but is this really a typo?

      Thank you for pointing it out. We have now corrected the typo

      (17) Figure F': typo labeled as E3.5, which is E13.5?

      Thank you for pointing it out. We have now corrected the typo

      (18) Figure F': was DazL KO mutant but no WT control.

      The WT control was not provided to avoid the redundancy. Please refer to earlier figure 3A-B, Fig S3C and D and videos S3A and S3b to refer to WT control at every stage.

      (19) Figure G: unusual choice in punctuation marks for cartoon schematic. No key to guide the reader for color-coded structures would be helpful to have something similar to 4h.

      We have now provided the key to guide the readers in the mentioned figure 4G.

      (20) The authors use WGA and EMA as interchangeable markers (Figure 5a) without fully explaining why they have switched markers.

      Because it is germ cell specific, we used EMA as a fusome marker during the time when it is found up through E13.5.  After that point we used WGA which is still usable, but also labels somatic cells.  This rationale is explicitly described at the end of the section “Fusome is highly enriched in Golgi and vesicles”, where we state:

      “EMA staining disappears from germ cells at E14.5 (Figure 1I). However, very similar (but non–germ-cell-specific) staining continued with wheat germ agglutinin (WGA) at later stages (Figure 1G, G’; Figure S1G).”

      To ensure this is fully clear to readers, we have now added an additional statement in the start of the text section discussing the figure 5:

      “For the reasons explained previously (see text for Figure 1G), WGA was used as a fusome marker beyond stage E14.5.”

      (21) Figure 5b' is compressed.

      We have now decompressed the image

      (22) Line 267, Balbiani body is misspelled.  

      We have now corrected the spelling.

      (23) The explanation of why the authors switch focus from DazL KO to DazL +/- is not adequately described. The authors should also explain the phenotype of the DazL +/- animals or reference a paper citing the hets are sterile or subfertile.

      We have now added the explanation of why Dazl KO is used in our introduction section where we have mentioned the phenotype of Dazl homozygous and heterozygous mouse.

      (24) Is Figure 5i actually DazL +/-? It is not labeled clearly in the text, the figure legend, or the figure panel. 

      We have now labelled the figure correctly in figure and in the legend.

      (25) The paper ends abruptly at line 275 with no context or summary.

      The manuscript does not end at line 275; the apparent interruption is due to a page break occurring immediately before the beginning of the Discussion section. We hope that continuation is fully visible in the reviewer 1 (your) version of the PDF.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 93: Fig. 1B: DDX4 marks germ cells; do all the red and yellow cells in the NE inset originate from the same PGC? There are only 2 cells marked in yellow among the group of red cells. Is it a z-projection issue? Or do they come from different PGCs?

      This experiment used vasa staining to identify all germ cells, which are produced by multiple PGCs. Green labeling is a lineage marker derived from a single PGC (due to the low frequency of tamoxifen-activated labeling). Consequently, the two yellow cells observed in the NE inset of Fig. 1B represent YFP-labeled germ cells (YFP + DDX4 double-positive) that have arisen from a single, lineage-traced PGC. This approach, introduced in 2013, is described in the Methods, and represents the field's single largest technical advance that has made it possible to analyze mouse germ cell development at single cell resolution.

      To ensure clarity, we have added a brief explanatory note to the figure legend indicating that yellow cells represent the lineage-traced progeny of a single PGC, while the red staining marks all germ cells.

      (2) Line 96: Figure 1C vs 1C'. The difference between female and male Visham is not obvious, although quantification shows a clear difference. How was the quantification made? Manual or automatic thresholding? Would it be possible to show only the Visham channel?

      We thank the reviewer for pointing out this problem. We have now more clearly described in the text that the female fusome increases in some cells with close attachments to other cells (future oocytes) and decreases in distant nurse cells.  It branches due to rosette formation..  In males, the fusome remains much like the initial EMA granules present in early germ cells, with only fine and difficult to see connections.  The quantification shown in Figures 1C and 1C′ was performed manually, based on the presence of either (i) fused, branched EMA-positive fusome structures or (ii) dispersed, punctate EMA granules. This assessment was carried out across multiple E13.5 male and female gonad samples to ensure robustness.  To facilitate independent evaluation, we have already provided supplementary videos S3B1 and S3B2, which display the EMA-stained E13.5 male and female gonads in three dimensions. These videos allow the structural differences to be examined more clearly than in static images.

      In response to the reviewer’s request, we now additionally include the single-channel fusome image in Supplementary Figure S1E′. This presentation highlights the fusome signal alone and further clarifies the morphological differences underlying the quantification.

      (3) L118: Figure 2A, third row = 2-cell cyst? Please specify PCNT in the legend.

      We appreciate the reviewer’s observation. In Figure 2A (third row), the cells were not specifically labeled as a 2-cell cyst; rather, the intention was to illustrate the presence of two distinct centrosomes positioned on a fused fusome structure, a configuration we frequently observe.

      We have now updated the figure legend to explicitly define PCNT.

      (4) L169: Missing reference to S3B and video S3B1?

      We have now included the reference to S3B1 and S3B2 in the text and in the legend

      (5) L170: Please describe the graph in the Figure 3D legend.

      We have now described the Graph in the legend

      (6) L171: Would it be possible to have a close-up showing both Pard3 and Visham in a ringlike pattern related to RACGAP (RC) staining? The images are too small.

      It is difficult to capture this relationship perfectly in a two dimensional picture. The images represent the maximum close-up possible that still includes enough relevant area for the necessary conclusions. We have now provided additional three close-up images exclusively for ring-canal and Pard3 association in the supplementary Figure S3C for further clarity. However, we also note that the quality of the image permits the reader of a pdf to zoom and to visualize the images in great detail.

      (7) L181: Wrong reference, should be 3 then 3I.

      Thank you for pointing it out, we have now corrected the reference.

      (8) L199: In Figure S4B, was DNMT3 staining quantified? Red intensity differs globally between images; use the somatic red level as a reference? Note: EMA seems higher in Dazl- vs. WT?

      We have now performed quantification of DNMT3 staining, which is presented in Figure 4A. While the red intensity (DNMT3 or EMA) can appear to differ between images, this variation can result from biological differences between tissues or minor technical variability despite using consistent microscope settings. To account for this, we normalized the staining intensity using the somatic cell signal as an internal reference, ensuring that the quantification reflects genuine differences between WT and Dazl-/- samples rather than global intensity variation.

      (9) L229: Should be "proteasome."

      We have now corrected the spelling error.

      (10) L233: Quantify fragmentation of Gs28? EMA doesn't seem affected. Could you quantify both Gs28 and EMA? Images are too small.

      We thank the reviewer for this suggestion. While the current images are small, they can be examined in detail using zoom to visualize the structures clearly. As noted, EMA staining is not affected, (we agree) as cells are in arrested state. This arrested state creates stress on Golgi. The fragmentation of Gs28-labeled Golgi membranes is a classical indicator of Golgi stress, even though the fragmented membranes may remain functionally active. Our results show that Dazl deletion specifically affects Golgi in germ cells, while Golgi in neighboring somatic cells appears healthy. To quantify this effect, we have now included manual quantification of Golgi fragmentation in Figure 4F, assessing tissues for the presence of fragmented versus intact Golgi structures. This confirms that Golgi fragmentation is a germ cell–specific phenotype in Dazl– samples, while pre-formed EMA-positive fusomes remain unaffected but probably in arrested state.

      (11) L237: Figure 4F graph shows E3.5, not E13.5.

      We have now corrected the typo in the figure 

      (12) L257: Figure 5D: quantify as in 5A? overlap?

      Yes, it's an overlap and shown as two separate image with ring canal for better clarity. We have now quantified the image and have produced combined graph for fusome and pard3 in Figure 5A graph.

      (13) L261: Figure 5E-E': black arrowhead not mentioned in legend.

      We have now mentioned the black arrowhead in the legend

      (14) L262: Figure 5C: arrowhead not mentioned in legend. Figure 5F: oocyte appears separated from nurse cells compared to 5C.

      Yes, that may happen as cysts undergo fragmentation; what matters is all cells are lineage labelled and hence are members of a single cyst derived from one PGC.

      (15) L263: Figure 5G has no legend reference; nurse cells are not outlined as in 5C.

      We have now outlined the nurse cells and have added the reference to the graph in the legend.

      (16) L279: "The fusome and Visham and both..." should be replaced with "Both fusome and Visham...".

      We have now replaced the term Visham with fusome as suggested by reviewers and editor.  We updated the statement to correct the grammatical error.

      (17) L1127: Video S3B1: It is unclear what to focus on.

      We have now added the Rectangle area and arrow to highlight what to focus on

      (18) L1128: Video "S3B1" should be "S3B2."

      We have now corrected the legend

      (19) Finally: curiosity question: have the authors tried to use known markers of the Drosophila fusome in mice, such as Spectrin or other markers described in Lighthouse, Buszczak and Spradling, Dev Bio, 2008? And conversely, do EMA and WGA label the fusome in Drosophila?

      Yes, we and others used the most specific markers of the Drosophila fusome such alpha-spectrin, adducin-like Hts, tropomodulin, etc. to search for fusomes in vertebrate species. It was unsuccessful in clarifying the situation, because Hts and alpha-spectrin in Drosophila and other insects generate a protein skeleton that stabilizes the fusome and is easily stained. But this structure is simply not conserved in vertebrates. The polarity behavior of the fusome, it core developmental property, is conserved, however. The mammalian fusome still acquires and maintains cyst polarity, and goes even farther and reflects both initial cyst formation and cyst cleavage, before marking oocyte vs nurse cell development in the smaller cysts.  Expression of the inner microtubule-rich portion of the fusome, its Par proteins, and many ER-related and lysosomal fusome proteins are mostly conserved but their ability to mark the fusome alone varies with time and context (only some of the examples are shown in Figure 3I'). Nearly all of the proteins identified in Lighthouse et al. 2008 are expressed.  These proteins may be involved in rejuvenation as studied here.  We modified the first section of the Discussion to explicitly compare mouse, Xenopus and Drosophila fusomes, which was not possible before this work.  

      Reviewer #3 (Recommendations for the authors):

      The authors should either revise the conclusions or add additional evidence to support their claims. In addition, minor corrections are listed below.

      We have added additional evidence as noted in responses above, and revised some claims that were stated inaccurately.  In addition, we have attempted to clarify the evidence we do present, so that its full significance is more easily grasped by readers.    

      (1) Lines 20-21 are unclear - the cyst doesn't get sent into meiosis, each oocyte does.

      Research is showing that it's more complicated than that.  All cyst cells enter "pre-meiotic S phase", and most cell cycles are conventionally considered to start after the previous M phase-

      i.e. in G1 or S, not in the next prophase, an ancient view limited just to meiosis. Absent this old tradition from meiosis cytology, pre-meiotic S would just be called meiotic S as some workers on meiosis do.  In addition, in different species, nurse cells diverge from meiosis on different schedules, including many much later in the meiotic cycle.  Two cyst cells in Drosophila fully enter meiosis by all criteria, the oocyte and one nurse cell that only exits in late zygotene.  In Xenopus and mouse, scRNAseq shows that many cyst cells enter meiosis up to leptotene and zygotene, including nurse cells that specifically downregulate meiotic genes during this time, possibly to assist their nurse cell functions, while others remain in meiosis even longer (Davidian and Spradling, 2025; Niu and Spradling, 2022). Eventually, only the oocytes within each fragmented mouse cyst complete meiosis. 

      (2) Many places in the manuscript abbreviations are never defined or not defined the first time they are used (but the second or third time): Line 23-ER, Line 29-UPR, Line 33-PGC (not defined until line 45), Line 79-EGAD.

      We have defined full acronyms now upon their first occurrence.

      (3) Line 5 should be the pachytene substage of meiosis I.

      We have now updated the statement to “In pachytene stage of meiosis I…”

      (4) Line 59-61 - this statement needs a reference(s).

      These statements are a continuation from the references cited in the previous statements. However, for further clarity we have again cited the relevant reference here (Niu and Spradling, 2022).

      (5) Line 80 - should it be oocyte proteome quality control?

      We have now updated the statement to “Oocyte proteome quality control begins early”.

      (6) Line 87 - in this case, EMA does not stand for epithelial membrane antigen (AI will call it that, but it is not correct). I believe it originally was the abbrev for (Em)bryonic (a)ntigen, though some papers call it (e)mbryonic (m)ouse (a)ntigen. And the reference here is Hahnel and Eddy, 1986, but in the reference list is a different paper, 1987 (both refer to EMA-1).

      We have now updated the acronym EMA-1 in corrected form and have corrected the citation.

      (7) Line 176 - RNA seq.

      We have now updated the statement to “We performed single cell RNA sequencing (scRNA seq) of mouse gonad”.

      (8) Line 181 - Figure 4E and 4I should be 3E and 3I.

      We have now updated the figure reference in the text to correct one.

      (9) Line 183 - missing period.

      Added.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) The network they propose is extremely simple. This simplicity has pros and cons: on the one hand, it is nice to see the basic phenomenon exposed in the simplest possible setting. On the other hand, it would also be reassuring to check that the mechanism is robust when implemented in a more realistic setting, using, for instance, a network of spiking neurons similar to the one they used in the 2008 paper. The more noisy and heterogeneous the setting, the better.

      The choice of a minimal model to illustrate our hypothesis is deliberate. Our main goal was to suggest a physiologically-grounded mechanism to rapidly encode temporally-structured information (i.e., sequences of stimuli) in Working Memory, where none was available before. Indeed, as discussed in the manuscript, previous proposals were unsatisfactory in several respects. In view of our main goal, we believe that a spiking implementation is beyond the scope of the present work.

      We would like to note that the mechanism originally proposed in Mongillo et al. (2008), has been repeatedly implemented, by many different groups, in various spiking network models with different levels of biological realism (see, e.g., Lundquivst et al. (2016), for an especially ‘detailed’ implementation) and, in all cases, the relevant dynamics has been observed. We take this as an indication of ‘robustness’; the relevant network dynamics doesn’t critically depend on many implementation details and, importantly, this dynamics is qualitatively captured by a simple rate model (see, e.g., Mi et al. (2017)).

      In the present work, we make a relatively ‘minor’ (from a dynamical point of view) extension of the original model, i.e., we just add augmentation. Accordingly, we are fairly confident that a set of parameters for the augmentation dynamics can be found such that the spiking network behaves, qualitatively, as the rate model. A meaningful study, in our opinion, then would require extensively testing the (large) parameters’ space (different models of augmentation?) to see how the network behavior compares with the relevant experimental observations (which ones? Behavioral? Physiological?). As said above, we believe that this is beyond the scope of the present work.

      This being said, we definitely agree with the reviewer that not presenting a spiking implementation is a limitation of the present work. We have clearly acknowledged this limitation here, by adding the following paragraph to the Discussion.

      “To illustrate our theory in a simple setting, we used a minimal model network that neglects many physiological details. This, however, constitutes a limitation of the present study. It would be reassuring to see that the mechanism we propose here is robust enough to reliably operate also in spiking networks, in the presence of heterogeneity in both single-cell and synaptic properties. While we are fairly confident that this is the case, a spiking implementation of our model is beyond the scope of the present study and will be addressed in the future. Also, because of the simplicity of the model network, a comparison between the model behavior and the electrophysiological observations cannot be completely direct. Nevertheless the model qualitatively accounts for a diverse set of experimental data”.

      (2) One major issue with the population spike scenario is that (to my knowledge) there is no evidence that these highly synchronized events occur in delay periods of working memory experiments. It seems that highly synchronized population spikes would imply (a) a strong regularity of spike trains of neurons, at odds with what is typically observed in vivo (b) high synchronization of neurons encoding for the same item (and also of different items in situations where multiple items have to be held in working memory), also at odds with in vivo recordings that typically indicate weak synchronization at best. It would be nice if the authors at least mention this issue, and speculate on what could possibly bridge the gap between their highly regular and synchronized network, and brain networks that seem to lie at the opposite extreme (highly irregular and weakly synchronized). Of course, if they can demonstrate using a spiking network simulation that they can bridge the gap, even better.

      Direct experimental evidence (in monkeys) in support of the existence of highly synchronized events -- to be identified with the ‘population spikes’ of our model -- during the delay period of a memory task is available in the literature, i.e., Panichello et al. (2024). we provide a short discussion of the results of Panichello et al. (2024) and how these results directly relate to our model. We also provide a short discussion of the results of Liebe et al. (2025), which, again, are fully consistent with our model.

      We note that there is no fundamental contradiction between highly synchronized events in ‘small’ neural populations (e.g., a cell assembly) on one hand, and temporally irregular (i.e., Poisson-like) spiking at the single-neuron level and weakly synchronized activity at the network level, on the other hand. This was already illustrated in our original publication, i.e., Mongillo et al. (2008) (see, in particular, Fig. S2). We further note that the mechanism we propose to encode temporal order -- a temporal gradient in the synaptic efficacies brought about by synaptic augmentation -- would also work if the memory of the items is maintained by ‘tonic’ persistent activity (i.e., without highly synchronized events), provided this activity occurs at suitably low rates such as to prevent the saturation of the synaptic augmentation.

      We have added the following two paragraphs to the Discussion.

      “More direct support to this interpretation comes from recent electrophysiological studies [Panichello et al., 2024, Liebe et al., 2025]. By recording large neuronal populations (∼ 300) simultaneously in the prefrontal cortex of monkeys performing a WM task, [Panichello et al., 2024] found that, during the maintenance period, the decoding of the actively held item from neural activity was ’intermittent’; that is, decoding was only possible during short epochs (∼ 100ms) interleaved with epochs (also ∼ 100ms) where decoding was at chance level. The inability to decode resulted from a loss of selectivity at the population level, with a return of the single-neuron firing rates to their spontaneous (pre-stimulus) activity levels. The transitions between these two activity states (decodable/not-decodable) were coordinated across large populations of neurons in PFC. By recording single-neuron activity in the medial temporal lobe of humans performing a sequential multi-item WM task, [Liebe et al., 2025] found that during maintenance, neurons coding for a given item tended to fire at a specific phase of the underlying theta rhythm, again suggesting that the corresponding neuronal populations reactivate briefly and sequentially. In summary, these experimental results suggest that active memory maintenance relies on brief reactivations of the neural representations of the items, which we identify with the population spikes in our model, and that these reactivatations occur sequentially in time, as predicted by our theory”.

      “We note that the proposed mechanism would still work if the items were maintained by tonically-enhanced firing rates, instead of population spikes, provided that those firing rates were suitably low. However, obtaining low firing rates in model networks of persistent activity is quite difficult”.

      Reviewer #2 (Public review):

      The study relates to the well-known computational theory for working memory, which suggests short-term synaptic facilitation is required to maintain working memory, but doesn't rely on persistent spiking. This previous theory appears similar to the proposed theory, except for the change from facilitation to augmentation. A more detailed explanation of why the authors use augmentation instead of facilitation in this paper is warranted: is the facilitation too short to explain the whole process of WM? Can the theory with synaptic facilitation also explain the immediate storage of novel sequences in WM?

      In the model, synaptic dynamics displays both short-term facilitation and augmentation (and shortterm depression). Indeed, synaptic facilitation, alone, would be too short-lived to encode novel sequences. This is illustrated in Fig. 1B.

      We provide a discussion of this important point, by adding the following paragraph to the Results section.

      “If augmentation was the only form of synaptic plasticity present in the network, the encoding of an item in WM would require long presentation times, or alternatively high firing rates upon presentation, precisely because K_A is small. Instead, rapid encoding is made possible by the presence of the short-term facilitation, which builds up significantly faster than augmentation, as U >> K_A . For the same reason, however, the level of facilitation rapidly reaches the steady state; therefore, short-term facilitation alone is unable to encode temporal order (see Fig. 1B). Thus, our model requires the existence of transitory synaptic enhancement on at least two time scales, such that longer decays are accompanied by slower build-ups. Intriguingly, this pattern is experimentally observed [Fisher et al., 1997]”.

      In Figure 1, the authors mention that synaptic augmentation leads to an increased firing rate even after stimulus presentation. It would be good to determine, perhaps, what the lowest threshold is to see the encoding of a WM task, and whether that is biologically plausible.

      We believe that this comment is related to the above point. The reviewer is correct; augmentation alone would require fairly long stimulus presentations to encode an item in WM. ‘Fast’ encoding, indeed, is guaranteed by the presence of short-term facilitation. This important point is emphasized; see above.

      In the middle panel of Figure 4, after 15-16 sec, when the neuronal population prioritizes with the second retro-cue, although the second retro-cue item's synaptic spike dominates, why is the augmentation for the first retro-cue item higher than the second-cue augmentation until the 20 sec?

      This is because of the slow build-up and decay of the augmentation. When the second item is prioritized, and the corresponding neuronal population re-activates, its augmentation level starts to increase. At the same time, as the first item is now de-prioritized and the corresponding neuronal population is now silent, its augmentation level starts to decrease. Because of the ‘slowness’ of both processes (i.e., augmentation build-up and decay), it takes about 5 seconds for the augmentation level of the second item to overcome the augmentation level of the first item.

      We note that the slow time scales of the augmentation dynamics, consistently with experimental observations, are necessary for our mechanism to work; see above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 46 identify -> identity.

      (2) Line 207 scale -> scales.

      Fixed. Thank you.

      (3) Lines 222-224 what about behavioral time-scale plasticity? This type of plasticity can apparently be induced very quickly.

      We have removed the corresponding paragraph.

      (4) Line 231 identification of `gamma bursts' with population spikes: These two phenomena seem to be very different - one can be weakly synchronized and can be consistent with highly irregular activity, while it is not clear whether the other can (see major issue 2). Also, it seems that population spikes occur at frequencies that are an order of magnitude lower than gamma.

      We have rewritten the corresponding paragraph and we rely now on more direct electrophysiological evidence (i.e., on the simultaneous recording of large neuronal populations) to identify putative population spikes; see above.

      Reviewer #2 (Recommendations for the authors):

      (1) On page 7, the behavioral study of Rose et al. (2016) is quite important for readers to understand the 'low-activity regime', and to fully appreciate Figure 4, it would be beneficial to explain that study in greater detail.

      We have added a panel to Fig. 4, and accompanying text in the caption, to better illustrate the main task events in the experiment of Rose et al. (2016).

      (2) Line 17: "wrong order", but wrong timing matters too

      Definitely, depending on the task. Specifically, in our example, timing is immaterial.

      (3) Line 33-34: "special training", what is considered special? One could argue that the number of trials needed to learn, depending on the TI timing, is special, depending on the task.

      We have removed the sentence as apparently it was confusing. We simply meant that ‘naive’ human subjects can perform the task (e.g., serial recall); that is, they didn’t undergo any kind of practice that can be construed as ‘training’.

      (4) Line 40-41: but timing is also part of working memory processing. Perhaps it can be merged with the next sentence.

      We have merged the two sentences.

      (5) Line 53: Is the implication here that what happens in the synapses is what drives WM, and not just that the neurons stay persistently on?

      Yes. The idea is that information can be maintained in the synaptic facilitation level, without enhanced spiking activity. Reading-out and refreshing the memory contents, however, requires neuronal activity. We explain this in some detail in the next paragraph (i.e., lines 60-65 in the revised submission).

      (6) Line 102: could a lack of excitatory activity be explained by inhibitory signaling? It appears the inhibitory component is quite understated here.

      Here we are just defining A-bar; according to Eq. (6), if r_a is 0 (i.e., no synaptic activity, for whatever reason), then A_a will converge to A-bar after a time much longer than \tau_A (i.e., a long period). We have rephrased the sentence to improve clarity.

      (7) Line 158-172: please consider revising this paragraph for a more general audience.

      We have rewritten this paragraph to improve clarity. For the same purpose, we have also slightly modified Fig. 3.

      (8) Line 227: it would seem this is due to a singular inhibitory group making the model highly dependent on the excitatory groups.

      We are not sure that we understand this comment. Here, we are just saying that if the item-coding populations don’t reactivate during the maintenance period (i.e., activity-silent regime) then the augmentation gradient cannot build up. If, on the other hand, the item-coding populations are constantly active at high rates during the maintenance period (i.e., persistent-activity regime) then then augmentation levels will rapidly saturate and, again, there will be no augmentation gradient. This is independent of how ‘silence’ or ‘activity’ of the item-coding populations is determined by the interplay of excitation and inhibition.

      (9) Line 284: this would certainly be an interesting take, but it isn't clear that the model proved this type of decoupling of the temporal aspect of the recall.

      This is an ‘educated’ speculation, based on the model and on a specific interpretation of some experimental results, as discussed in the paper and, in particular, in the last paragraph of the Discussion. We believe that the phrasing of the paragraph makes clear that this is, indeed, a speculation.

    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This valuable study combines a computational language model, i.e., HM-LSTM, and temporal response function (TRF) modeling to quantify the neural encoding of hierarchical linguistic information in speech, and addresses how hearing impairment affects neural encoding of speech. The analysis has been significantly improved during the revision but remain somewhat incomplete - The TRF analysis should be more clearly described and controlled. The study is of potential interest to audiologists and researchers who are interested in the neural encoding of speech.

      We thank the editors for the updated assessment. In the revised manuscript, we have added a more detailed description of the TRF analysis on p. of the revised manuscript. We have also updated Figure 1 to better visualize the analyses pipeline. Additionally, we have included a supplementary video to illustrate the architecture of the HM-LSTM model, the ridge regression methods using the model-derived features, and mTRF analysis using the acoustic envelop and the binary rate models.

      Public Reviews:

      Reviewer #1 (Public review):

      About R squared in the plots:

      The authors have used a z-scored R squared in the main ridge regression plots. While this may be interpretable, it seems non-standard and overly complicated. The authors could use a simple Pearson r to be most direct and informative (and in line with similar work, including Goldstein et al. 2022 which they mentioned). This way the sign of the relationships is preserved.

      We did not use Pearson’s r as in Goldstein et al. (2022) because our analysis did not involve a train-test split, which was a key aspect of their approach. Specifically, Goldstein et al. (2022) divided their data into training and testing sets, trained a ridge regression model on the training set, and then used the trained model to predict neural responses on the test set. They calculated Pearson’s r to assess the correlation between the predicted and observed neural responses, making the correlation coefficient (r) their primary measure of model performance. In contrast, our analysis focused on computing the model fitting performance (R²) of the ridge regression model for each sensor and time point for each subject. At the group level, we conducted one-sample t-tests with spatiotemporal cluster-based correction on the R² values to identify sensors and time windows where R² values were significantly greater than baseline. We established the baseline by normalizing the R² values using Fisher z-transformation across sensors within each subject. We have added this explanation on p.13 of the revised manuscript.

      About the new TRF analysis:

      The new TRF analysis is a necessary addition and much appreciated. However, it is missing the results for the acoustic regressors, which should be there analogous to the HM-LSTM ridge analysis. The authors should also specify which software they have utilized to conduct the new TRF analysis. It also seems that the linguistic predictors/regressors have been newly constructed in a way more consistent with previous literature (instead of using the HM-LSTM features); these specifics should also be included in the manuscript (did it come from Montreal Forced Aligner, etc.?). Now that the original HM-LSTM can be compared to a more standard TRF analysis, it is apparent that the results are similar.

      We used the Python package Eelbrain (https://eelbrain.readthedocs.io/en/r0.39/auto_examples/temporal-response-functions/trf_intro.html) to conduct the multivariate temporal response function (mTRF) analyses. As we previously explained in our response to R3, we did not apply mTRF to the acoustic features due to the high dimensionality of the input. Specifically, our acoustic representation consists of a 130-dimensional vector sampled every 10 ms throughout the speech stimuli (comprising a 129-dimensional spectrogram and a 1dimensional amplitude envelope). This led to interpreting the 130-dimensional TRF estimation difficult to interpret. A similar constraint applied to the hidden-layer activations from our HMLSTM model for the five linguistic features. After dimensionality reduction via PCA, each still resulted in 150-dimensional vectors. To address this, we instead used binary predictors marking the offset of each linguistic unit (phoneme, syllable, word, phrase, sentence). Since our speech stimuli were computer-synthesized, the phoneme and syllable boundaries were automatically generated. The word boundaries were manually annotated by a native Mandarin as in Li et al. (2022). The phrase boundaries were automatically annotated by the Stanford parser and manually checked by a native Mandarin speaker. These rate models are represented as five distinct binary time series, each aligned with the timing of the corresponding linguistic unit, making them well-suited for mTRF analysis. Although the TRF results from the 1-dimensional rate predictors and the ridge regression results from the high-dimensional HM-LSTM-derived features are similar, they encode different things: The rate regressors only encode the timing of linguistic unit boundaries, while the model-derived features encode the representational content of the linguistic input. Therefore, we do not consider the mTRF analyses to be analogous to the ridge regression analyses. Rather, these results complement each other and both provide informative results into the neural tracking of linguistic structures at different levels for the attended and unattended speech.

      Since the TRF result for the continuous acoustic features also concerns R2, we have added an mTRF analysis where we fitted the one-dimensional speech envelope to the EEG. We extracted the envelope at 10 ms intervals for both attended and unattended speech and computed mTRFs independently for each subject and sensor using a basis of 50 ms Hamming windows spanning –100 ms to 300 ms relative to envelope onset. The results showed that in hearing-impaired participants, attended speech elicited a significant cluster in the bilateral temporal regions from 270 to 300 ms post-onset (t = 2.40, p = 0.01, Cohen’s d = 0.63). Unattended speech elicited an early cluster in right temporal and occipital regions from –100 ms to –80 ms (t = 3.07, p = 0.001, d = 0.83). Normal-hearing participants showed significant envelope tracking in the left temporal region at 280–300 ms after envelope onset (t = 2.37, p = 0.037, d = 0.48), with no significant cluster for unattended speech. These results further suggest that hearing-impaired listeners may have difficulty suppressing unattended streams. We have added the new TRF results for envelope to Figure S3 and the “mTRF results for attended and unattended speech” on p.7 and the “mTRF analysis” in Material and Methods of the revised manuscript.

      The authors' wording about this suggests that these new regressors have a nonzero sample at each linguistic event's offset, not onset. This should also be clarified. As the authors know, the onset would be more standard, and using the offset has implications for understanding the timing of the TRFs, as a phoneme has a different duration than a word, which has a different duration from a sentence, etc.

      In our rate‐model mTRF analyses, we initially labelled linguistic boundaries as “offsets” because our ridge‐regression with HM-LSTM features was aligned to sentence offsets rather than onsets. However, since each offset coincides with the next unit’s onset—and our regressors simply mark these transition points as 1—the “offset” and “onset” models yield identical mTRFs. To avoid confusion, we have relabeled “offset” as “boundary” in Figure S2.

      As discussed in our prior responses, this design was based on the structure of our input to the HM-LSTM model, where each input consists of a pair of sentences encoded in phonemes, such as “t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1” (“It can fly <sep> This is an airplane”). The two sentences are separated by a special <sep> token, and the model’s objective is to determine whether the second sentence follows the first, similar to a next-sentence prediction task. Since the model processes both sentences in full before making a prediction, the neural activations of interest should correspond to the point at which the entire sentence has been processed by humans. To enable a fair comparison between the model’s internal representations and brain responses, we aligned our neural analyses with the sentence offsets, capturing the time window after the sentence has been fully perceived by the participant. Thus, we extracted epochs from -100 to +300 ms relative to each sentence offset, consistent with our model-informed design.

      We understand that phonemes, syllables, words, phrases, and sentences differ in their durations. However, the five hidden activity vectors extracted from the model are designed to capture the representations of these five linguistic levels across the entire sentence. Specifically, for a sentence pair such as “It can fly <sep> This is an airplane,” the first 2048-dimensional vector represents all the phonemes in the two sentences (“t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1”), the second vector captures all the syllables (“ta_1 nəŋ_2 fei_1 <sep> zhə_4 shiii_4 fei_1jii_1”), the third vector represents all the words, the fourth vector captures the phrases, and the fifth vector represents the sentence-level meaning. In our dataset, input pairs consist of adjacent sentences from the stimuli (e.g., Sentence 1 and Sentence 2, Sentence 2 and Sentence 3, and so on), and for each pair, the model generates five 2048-dimensional vectors, each corresponding to a specific linguistic level. To identify the neural correlates of these model-derived features—each intended to represent the full linguistic level across a complete sentence—we focused on the EEG signal surrounding the completion of the second sentence rather than on incremental processing. Accordingly, we extracted epochs from -100 ms to +300 ms relative to the offset of the second sentence and performed ridge regression analyses using the five model features (reduced to 150 dimensions via PCA) at every 50 ms across the epoch. We have added this clarification on p.12 of the revised manuscript.

      About offsets:

      TRFs can still be interpretable using the offset timings though; however, the main original analysis seems to be utilizing the offset times in a different, more confusing way. The authors still seem to be saying that only the peri-offset time of the EEG was analyzed at all, meaning the vast majority of the EEG trial durations do not factor into the main HM-LSTM response results whatsoever. The way the authors describe this does not seem to be present in any other literature, including the papers that they cite. Therefore, much more clarification on this issue is needed. If the authors mean that the regressors are simply time-locked to the EEG by aligning their offsets (rather than their onsets, because they have varying onsets or some such experimental design complexity), then this would be fine. But it does not seem to be what the authors want to say. This may be a miscommunication about the methods, or the authors may have actually only analyzed a small portion of the data. Either way, this should be clarified to be able to be interpretable.

      We hope that our response in RE4, along with the supplementary video, has helped clarify this issue. We acknowledge that prior studies have not used EEG data surrounding sentence offsets to examine neural responses at the phoneme or syllable levels. However, this is largely due to a lack of model that represent all linguistic levels across an entire sentence. There is abundant work comparing model predictors with neural data time-locked to offsets because they mark the point at which participants has already processed the relevant information (Brennan, 2016; Brennan et al., 2016; Gwilliams et al., 2024, 2025). Similarly, in our model– brain alignment study, our goal is to identify neural correlates for each model-derived feature. If we correlate model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. Although this limits our analysis to a subset of the data (143 sentences × 400 ms windows × 4 conditions), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We have added this clarification on p.12 of the revised manuscript.

      Reviewer #2 (Public review):

      This study presents a valuable finding on the neural encoding of speech in listeners with normal hearing and hearing impairment, uncovering marked differences in how attention to different levels of speech information is allocated, especially when having to selectively attend to one speaker while ignoring an irrelevant speaker. The results overall support the claims of the authors, although a more explicit behavioural task to demonstrate successful attention allocation would have strengthened the study. Importantly, the use of more "temporally continuous" analysis frameworks could have provided a better methodology to assess the entire time course of neural activity during speech listening. Despite these limitations, this interesting work will be useful to the hearing impairment and speech processing research community. The study compares speech-in-quiet vs. multi-talker scenarios, allowing to assess within-participant the impact that the addition of a competing talker has on the neural tracking of speech. Moreover, the inclusion of a population with hearing loss is useful to disentangle the effects of attention orienting and hearing ability. The diagnosis of high-frequency hearing loss was done as part of the experimental procedure by professional audiologists, leading to a high control of the main contrast of interest for the experiment. Sample size was big, allowing to draw meaningful comparisons between the two populations.

      We thank you very much for your appreciation of our research and we have now added a more description of the mTRF analyses on p.13-14 of the revised manuscript.

      An HM-LSTM model was employed to jointly extract speech features spanning from the stimulus acoustics to word-level and phrase-level information, represented by embeddings extracted at successive layers of the model. The model was specifically expanded to include lower level acoustic and phonetic information, reaching a good representation of all intermediate levels of speech. Despite conveniently extracting all features jointly, the HMLSTM model processes linguistic input sentence-by-sentence, and therefore only allows to assess the corresponding EEG data at sentence offset. If I understood correctly, while the sentence information extracted with the HM-LSTM reflects the entire sentence - in terms of its acoustic, phonetic and more abstract linguistic features - it only gives a condensed final representation of the sentence. As such, feature extraction with the HM-LSTM is not compatible with a continuous temporal mapping on the EEG signal, and this is the main reason behind the authors' decision to fit a regression at nine separate time points surrounding sentence offsets.

      Yes, you are correct. As explained in RE4, the model generates five hidden-layer activity vectors, each intended to represent all the phonemes, syllables, words, phrases within the entire sentence (“a condensed final representation”). This is the primary reason we extract EEG data surrounding the sentence offsets—this time point reflects when the full sentence has been processed by the human brain. We assume that even at this stage, residual neural responses corresponding to each linguistic level are still present and can be meaningfully analyzed.

      While valid and previously used in the literature, this methodology, in the particular context of this experiment, might be obscuring important attentional effects impacted by hearing-loss. By fitting a regression only around sentence-final speech representations, the method might be overlooking the more "online" speech processing dynamics, and only assessing the permanence of information at different speech levels at sentence offset. In other words, the acoustic attentional bias between Attended and Unattended speech might exist even in hearing-impaired participants but, due to a lower encoding or permanence of acoustic information in this population, it might only emerge when using methodologies with a higher temporal resolution, such as Temporal Response Functions (TRFs). If a univariate TRF fit simply on the continuous speech envelope did not show any attentional bias (different trial lengths should not be a problem for fitting TRFs), I would be entirely convinced of the result. For now, I am unsure on how to interpret this finding.

      We agree and we have added the mTRF results using the rate models for the 5 linguistic levels in the prior revision. The rate model aligns with the boundaries of each linguistic unit at each level. As explained in RE3, the rate regressors encode the timing of linguistic unit boundaries, while the model-derived features encode the representational content of the linguistic input. The mTRF results showed similar patterns to those observed using features from our HM-LSTM model with ridge regression (see Figure S2). These results complement each other and both provide informative results into the neural tracking of linguistic structures at different levels for the attended and unattended speech.

      We have also added TRF results fitting the envelope of attended and unattended speech at every 10 ms to the whole 10-minute EEG data at every 10 ms. Our results showed that in hearing-impaired participants, attended speech elicited a significant cluster in the bilateral temporal regions from 270 to 300 ms post-onset (t = 2.40, p = 0.01, Cohen’s d = 0.63). Unattended speech elicited an early cluster in right temporal and occipital regions from –100 ms to –80 ms (t = 3.07, p = 0.001, d = 0.83). Normal-hearing participants showed significant envelope tracking in the left temporal region at 280–300 ms after envelope onset (t = 2.37, p = 0.037, d = 0.48), with no significant cluster for unattended speech. These results further suggest that hearing-impaired listeners may have difficulty suppressing unattended streams. We have added the new TRF results for envelope to Figure S3 and the “mTRF results for attended and unattended speech” on p.7 and the “mTRF analysis” in Material and Methods of the revised manuscript.

      Despite my doubts on the appropriateness of condensed speech representations and singlepoint regression for acoustic features in particular, the current methodology allows the authors to explore their research questions, and the results support their conclusions. This work presents an interesting finding on the limits of attentional bias in a cocktail-party scenario, suggesting that fundamentally different neural attentional filters are employed by listeners with highfrequency hearing loss, even in terms of the tracking of speech acoustics. Moreover, the rich dataset collected by the authors is a great contribution to open science and will offer opportunities for re-analysis.

      We sincerely thank you again for your encouraging comments regarding the impact of our study.

      Reviewer #3 (Public review):

      Summary:

      The authors aimed to investigate how the brain processes different linguistic units (from phonemes to sentences) in challenging listening conditions, such as multi-talker environments, and how this processing differs between individuals with normal hearing and those with hearing impairments. Using a hierarchical language model and EEG data, they sought to understand the neural underpinnings of speech comprehension at various temporal scales and identify specific challenges that hearing-impaired listeners face in noisy settings.

      Strengths:

      Overall, the combination of computational modeling, detailed EEG analysis, and comprehensive experimental design thoroughly investigates the neural mechanisms underlying speech comprehension in complex auditory environments. The use of a hierarchical language model (HM-LSTM) offers a data-driven approach to dissect and analyze linguistic information at multiple temporal scales (phoneme, syllable, word, phrase, and sentence). This model allows for a comprehensive neural encoding examination of how different levels of linguistic processing are represented in the brain. The study includes both single-talker and multi-talker conditions, as well as participants with normal hearing and those with hearing impairments. This design provides a robust framework for comparing neural processing across different listening scenarios and groups.

      Weaknesses:

      The analyses heavily rely on one specific computational model, which limits the robustness of the findings. The use of a single DNN-based hierarchical model to represent linguistic information, while innovative, may not capture the full range of neural coding present in different populations. A low-accuracy regression model-fit does not necessarily indicate the absence of neural coding for a specific type of information. The DNN model represents information in a manner constrained by its architecture and training objectives, which might fit one population better than another without proving the non-existence of such information in the other group. It is also not entirely clear if the DNN model used in this study effectively serves the authors' goal of capturing different linguistic information at various layers. More quantitative metrics on acoustic/linguistic-related downstream tasks, such as speaker identification and phoneme/syllable/word recognition based on these intermediate layers, can better characterize the capacity of the DNN model.

      We agree that, before aligning model representations with neural data, it is essential to confirm that the model encodes linguistic information at multiple hierarchical levels. This is the purpose of our validation analysis: We evaluated the model’s representations across five layers using a test set of 20 four-syllable sentences in which every syllable shares the same vowel—e.g., “mā ma mà mǎ” (mother scolds horse), “shū shu shǔ shù” (uncle counts numbers; see Table S1). We hypothesized that the activity in the phoneme and syllable layer would be more similar than other layers for same-vowel sentences. The results confirmed our hypothesis: Hidden-layer activity for same-vowel sentences exhibited much more similar distributions at the phoneme and syllable levels compared to those at the word, phrase and sentence levels Figure 3C displays the scatter plot of the model activity at the five linguistic levels for each of the 20 4-syllable sentences, post dimension reduction using multidimensional scaling (MDS). We used color-coding to represent the activity of five hidden layers after dimensionality reduction. Each dot on the plot corresponds to one test sentence. Only phonemes are labeled because each syllable in our test sentences contains the same vowels (see Table S1).The plot reveals that model representations at the phoneme and syllable levels are more dispersed for each sentence, while representations at the higher linguistic levels—word, phrase, and sentence—are more centralized. Additionally, similar phonemes tend to cluster together across the phoneme and syllable layers, indicating that the model captures a greater amount of information at these levels when the phonemes within the sentences are similar.

      Apart from the DNN model, we also included the rate models which simply mark 1 at each unit boundaries across the 5 levels. We performed mTRF analyses with these rate models and found similar patterns to our ridge‐regression results with the DNN: (see Figure S2). This provides further evidence that the model reliably captures information across all five hierarchical levels.

      Since EEG measures underlying neural activity in near real-time, it is expected that lower-level acoustic information, which is relatively transient, such as phonemes and syllables, would be distributed throughout the time course of the entire sentence. It is not evident if this limited time window effectively captures the neural responses to the entire sentence, especially for lower-level linguistic features. A more comprehensive analysis covering the entire time course of the sentence, or at least a longer temporal window, would provide a clearer understanding of how different linguistic units are processed over time.

      We agree that lower-level linguistic features may be distributed throughout the whole sentence, however, using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentences. This would introduce ambiguity as to whether the EEG responses correspond to the current or the following sentence. Additionally, our model activity represents a “condensed final representation” at the five linguistic levels for the whole sentence, rather than incrementally during the sentence. We think the -100 to 300 ms time window relative to each sentence offset targets the exact moment when full-sentence representations are comprehended and a “condensed final representation” for the whole sentence across five linguistic level have been formed in the brain. We have added this clarification on p.13 of the revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Here are some specifics and clarifications of my public review:

      Initially I was interpreting the R squared as a continuous measure of predicted EEG relative to actual EEG, based on an encoding model, but this does not appear to be correct. Thank you for pointing out that the y axis is z-scored R squared in your main ridge regression plots. However, I am not sure why/how you chose to represent this that way. It seems to me that a simple Pearson r would be most informative here (and in line with similar work, including Goldstein et al. 2022 that you mentioned). That way you preserve the sign of the relationships between the regressors and the EEG. With R squared, we have a different interpretation, which is maybe also ok, but I also don't see the point of z-scoring R squared. Another possibility is that when you say "z-transformed" you are referring to the Fisher transformation; is that the case? In the plots you say "normalized", so that sounds like a z-score, but this needs to be clarified; as I say, a simple Pearson r would probably be best.

      We did not use Pearson’s r, as in Goldstein et al. (2022), because our analysis did not involve a train-test split, which was central to their approach. In their study, the data were divided into training and testing sets, and a ridge regression model was trained on the training set. They then used the trained model to predict neural responses on the held-out test set, and calculated Pearson’s r to assess the correlation between the predicted and observed neural responses. As a result, their final metric of model performance was the correlation coefficient (r). In contrast, our analysis is more aligned with standard temporal response function (TRF) approaches. We did not perform a train-test split; instead, we computed the model fitting performance (R²) of the ridge regression model at each sensor and time point for each subject. At the group level, we conducted one-sample t-tests with spatiotemporal cluster-based correction on the R² values to determine which sensors and time windows showed significantly greater R² values than baseline. To establish a baseline, we z-scored the R² values across sensors and time points, effectively centering the distribution around zero. This normalization allowed us to interpret deviations from the mean R² as meaningful increases in model performance and provided a suitable baseline for the statistical tests. We have added this clarification on p.13 of the revised manuscript.

      Thank you for doing the TRF analysis, but where are the acoustic TRFs, analogous to the acoustic results for your HM-LSTM ridge analyses? And what tools did you use to do the TRF analysis? If it is something like the mTRF MATLAB toolbox, then it is also using ridge regression, as you have already done in your original analysis, correct? If so, then it is pretty much the same as your original analysis, just with more dense timepoints, correct? This is what I meant by referring to TRFs originally, because what you have basically done originally was to make a 9-point TRF (and then the plots and analyses are contrasts of pairs of those), with lags between -100 and 300 ms relative to the temporal alignment between the regressors and the EEG, I think (more on this below).

      Also with the new TRF analysis, you say that the regressors/predictors had "a value of 1 at each unit boundary offset". So this means you re-made these predictors to be discrete as I and reviewer 3 were mentioning before (rather than using the HM-LSTM model layer(s)), and also, that you put each phoneme/word/etc. marker at its offset, rather than its onset? I'm also confused as to why you would do this rather than the onset, but I suppose it doesn't change the interpretation very much, just that the TRFs are slid over by a small amount.

      We used the Python package Eelbrain (https://eelbrain.readthedocs.io/en/r0.39/auto_examples/temporal-response-functions/trf_intro.html) to conduct the multivariate temporal response function (mTRF) analyses. As we previously explained in our response to Reviewer 3, we did not apply mTRF to the acoustic features due to the high dimensionality of the input. Specifically, our acoustic representation consists of a 130-dimensional vector sampled every 10 ms throughout the speech stimuli (comprising a 129-dimensional spectrogram and a 1-dimensional amplitude envelope). This renders the 130 TRF weights to the acoustic features uninterpretable. However, we have now added TRF results from the 1- dimension envelope to the attended and unattended speech at every 10 ms.

      A similar constraint applied to the hidden-layer activations from our HM-LSTM model for the five linguistic features. After dimensionality reduction via PCA, each still resulted in 150-dimensional vectors, further preventing their use in mTRF analyses. To address this, we instead used binary predictors marking the offset of each linguistic unit (phoneme, syllable, word, phrase, sentence). These rate models are represented as five distinct binary time series, each aligned with the timing of the corresponding linguistic unit, making them well-suited for mTRF analysis. It is important to note that these rate predictors differ from the HM-LSTMderived features: They encode only the timing of linguistic unit boundaries, not the content or representational structure of the linguistic input. Therefore, we do not consider the mTRF analyses to be equivalent to the ridge regression analyses based on HM-LSTM features

      For onset vs. offset, as explained RE4, we labelled them “offsets” because our ridge‐regression with HM-LSTM features was aligned to sentence offsets rather than onsets (see RE4 and RE15 below for the rationale of using sentence offset). However, since each unit offset coincides with the next unit’s onset—and the rate model simply mark these transition points as 1—the “offset” and “onset” models yield identical mTRFs. To avoid confusion, we have relabeled “offset” as “boundary” in Figure S2.

      I'm still confused about offsets generally. Does this maybe mean that the EEG, and each predictor, are all aligned by aligning their endpoints, which are usually/always the ends of sentences? So e.g. all the phoneme activity in the phoneme regressor actually corresponds to those phonemes of the stimuli in the EEG time, but those regressors and EEG do not have a common starting time (one trial to the next maybe?), so they have to be aligned with their ends instead?

      We chose to use sentence offsets rather than onsets based on the structure of our input to the HM-LSTM model, where each input consists of a pair of sentences encoded in phonemes, such as “t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1” (“It can fly <sep> This is an airplane”). The two sentences are separated by a special <sep> token, and the model’s objective is to determine whether the second sentence follows the first, similar to a next-sentence prediction task. Since the model processes both sentences in full before making a prediction, the neural activations of interest should correspond to the point at which the entire sentence has been processed. To enable a fair comparison between the model’s internal representations and brain responses, we aligned our neural analyses with the sentence offsets, capturing the time window after the sentence has been fully perceived by the participant. Thus, we extracted epochs from -100 to +300 ms relative to each sentence offset, consistent with our modelinformed design. If we align model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation.

      We understand that it is a bit confusing why the regressor of each level is not aligned to their own offsets in the data. The hidden-layer activations of the HM-LSTM model corresponding to the five linguistic levels (phoneme, syllable, word, phrase, sentence) are consistently 150-dimensional vectors after PCA reduction. As a result, for each input sentence pair, the model produces five distinct hidden-layer activations, each capturing the representational content associated with one linguistic level for the whole sentence. We believe our -100 to 300 ms time window relative to sentence offset reflects a meaningful period during which the brain integrates and comprehends information across multiple linguistic levels.

      Being "time-locked to the offset of each sentence at nine latencies" is not something I can really find in any of the references that you mentioned, regarding the offset aspect of this method. Can you point me more specifically to what you are trying to reference with that, or further explain? You said that "predicting EEG signals around the offset of each sentence" is "a method commonly employed in the literature", but the example you gave of Goldstein 2022 is using onsets of words, which is indeed much more in line with what I would expect (not offsets of sentences).

      You are correct that Goldstein (2022) aligned model predictions to onsets rather than offsets; however, many studies in the literature also align model predictions with unit offsets. typically because they mark the point at which participants has already processed the relevant information (Brennan, 2016; Brennan et al., 2016; Gwilliams et al., 2024, 2025). Similarly, in our study, we aim to identify neural correlates for each model-derived feature. If we correlate model activity with EEG data aligned to sentence onsets, we would be examining linguistic representations at all levels (from phoneme to sentence) of the whole sentence at the time when participants have not heard the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation. Although this limits our analysis to a subset of the data (143 sentences × 400 ms windows × 4 conditions), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We have added this clarification on p.12 of the revised manuscript.

      This new sentence does not make sense to me: "The regressors are aligned to sentence offsets because all our regressors are taken from the hidden layer of our HM-LSTM model, which generates vector representations corresponding to the five linguistic levels of the entire sentence".

      Thank you for the suggestion. We hope our responses in RE4, 15 and 16, along with our supplementary video have now clarified the issue. We have deleted the sentence and provided a more detailed explanation on p.12 of the revised manuscript: The regressors are aligned to sentence offsets because our goal is to identify neural correlates for each model-derived feature of a whole sentence. If we align model activity with EEG data time-locked to sentence onsets, we would be finding neural responses to linguistic levels (from phoneme to sentence) of the whole sentence at the time when participants have not processed the sentence yet. By contrast, aligning to sentence offsets ensures that participants have constructed a full-sentence representation. Although this limits our analysis to a subset of the data (143 sentences × 2 sections × 400 ms windows), it targets the exact moment when full-sentence representations emerge against background speech, allowing us to examine each model-derived feature onto its neural signature. We understand that phonemes, syllables, words, phrases, and sentences differ in their durations. However, the five hidden activity vectors extracted from the model are designed to capture the representations of these five linguistic levels across the entire sentence Specifically, for a sentence pair such as “It can fly <sep> This is an airplane,” the first 2048dimensional vector represents all the phonemes in the two sentences (“t a_1 n əŋ_2 f ei_1 <sep> zh ə_4 sh iii_4 f ei_1 j ii_1”), the second vector captures all the syllables (“ta_1 nəŋ_2 fei_1 <sep> zhə_4 shiii_4 fei_1jii_1”), the third vector represents all the words, the fourth vector captures the phrases, and the fifth vector represents the sentence-level meaning. In our dataset, input pairs consist of adjacent sentences from the stimuli (e.g., Sentence 1 and Sentence 2, Sentence 2 and Sentence 3, and so on), and for each pair, the model generates five 2048dimensional vectors, each corresponding to a specific linguistic level. To identify the neural correlates of these model-derived features—each intended to represent the full linguistic level across a complete sentence—we focused on the EEG signal surrounding the completion of the second sentence rather than on incremental processing. Accordingly, we extracted epochs from -100 ms to +300 ms relative to the offset of the second sentence and performed ridge regression analyses using the five model features (reduced to 150 dimensions via PCA) at every 50 ms across the epoch.

      More on the issue of sentence offsets: In response to reviewer 3's question about -100 - 300 ms around sentence offset, you said "Using the entire sentence duration was not feasible, as the sentences in the stimuli vary in length, making statistical analysis challenging. Additionally, since the stimuli consist of continuous speech, extending the time window would risk including linguistic units from subsequent sentence." This does not make sense to me, so can you elaborate? It sounds like you are actually saying that you only analyzed 400 ms of each trial, but that cannot be what you mean.

      Yes, we analyzed only the 400 ms window surrounding each sentence offset. Although this represents just a subset of our data (143 sentences × 400 ms × 4 conditions), it precisely captures when full-sentence representations emerge against background speech. Because our model produces a single, condensed representation for each linguistic level over the entire sentence—rather than incrementally—we think it is more appropriate to align to the period surrounding sentence offsets. Additionally, extending the window (e.g. to 2 seconds) would risk overlapping adjacent sentences, since sentence lengths vary. Our focus is on the exact period when integrated, level-specific information for each sentence has formed in the brain, and our results already demonstrate different response patterns to different linguistic levels for the two listener groups within this interval. We have added this clarification on p.13 of the revised manuscript.

      In your mTRF analysis, you are now saying that the discrete predictors have "a value of 1" at each of the "boundary offsets", and those TRFs look very similar to your original plots. It sounds to me like you should not be referring to time zero in your original ridge analysis as "sentence offset". If what you mean is that sentence offset time is merely how you aligned the regressors and EEG in time, then your time zero still has a standard, typical TRF interpretation. It is just the point in time, or lag, at which the regressor(s) and EEG are aligned. So activity before zero is "predictive" and activity after zero is "reactive", to think of it crudely. So also in the text, when you say things like "50-150 ms after the sentence offsets", I think this is not really what you mean. I think you are referring to the lags of 50 - 150 ms, relative to the alignment of the regressor and the EEG.

      Thank you very much for the explanation. We agree that, in our ridge‐regression time course, pre zero lags index “predictive” processing and post-zero lags index “reactive” processing. Unlike TRF analysis, we applied ridge regression to our high-dimensional model features at nine discrete lags around the sentence offset. At each lag, we tested whether the regression score exceeded a baseline defined as the mean regression score across all lags. For example, finding a significantly higher regression score between 50 and 150 ms suggests that our regressor reliably predicted EEG activity in that time window. So here time zero refers to the precise moment of the sentence offset—not the the alignment of the regressor and the EEG.

      I look forward to discussing how much of my interpretation here makes sense or doesn't, both with the authors and reviewers.

      Thank you very much for these very constructive feedback and we hope that we have addressed all your questions.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The researchers conducted their study using advanced techniques. They found almost no difference in calcium binding between the two proteins and observed no impact on calcium signaling, specifically store-operated calcium entry (SOCE). The study also noted an increase in ER luminal calcium-binding chaperone proteins. Surprisingly, the authors selected flow cytometry as a technique for measurements of ER luminal calcium. Considering the limitations of this approach, it would be better to use alternative approaches.

      The flow cytometric assay shows good responsiveness to conditions expected to alter ER calcium levels (Figure 4C), is high throughput compared to microscopy, and allows for averaging of signals across a large number of cells. This was thus our original method of choice.

      This is particularly important as previous reports, using cells from MPN patients, indicate reduced ER luminal calcium and effects on SOCE (Blood, 2020). This issue matters because earlier research with MPN patient cells reported reduced ER luminal calcium levels and altered SOCE (Blood, 2020). How do the authors explain the difference between their results and previous findings about lower ER luminal calcium and changed SOCE in MPN patient cells expressing CRTDel52?

      We thank the reviewer for asking for these clarifications. The referenced study (Di Buduo et al. Blood, 135(2):133-143, 2020) first showed that thrombopoietin induces spontaneous cytosolic calcium spikes in cultured megakaryocytes, which is dependent on store operated calcium entry (SOCE). In parallel, STIM1-ORAI interactions were induced by thrombopoietin. On the other hand, the addition of thrombopoietin caused the dissociation of STIM1-calreticulin interactions, based on proximity ligation assays. The implication is that signaling via the thrombopoietin receptor (TPOR/MPL) activation induces the dissociation of calreticulin-STIM1 complexes, and the formation of STIM1-ORAI complexes, which contribute to the measured spontaneous cytosolic calcium spikes. Different MPN mutations induced spontaneous calcium spikes in a thrombopoietin-independent manner, including the JAK2V617F mutations and the CALR type I and type II mutations. The study found that the number of megakaryocytes exhibiting spontaneous calcium spikes was enhanced in the context of both type I and type II CALR mutations compared to the JAK2V617F mutant. Correspondingly, the calreticulin-STIM1 interactions/cell were more significantly reduced for type I and type II CALR mutations compared to the JAK2V617F mutant. It was suggested that defective interactions between mutant calreticulin, ERp57, and STIM1 activated SOCE and generated spontaneous cytosolic calcium spikes. However, based on the findings with thrombopoietin, the spontaneous calcium spikes could simply result from thrombopoietin-independent MPL activation by the mutant calreticulin and JAK2V617F and downstream signaling. Importantly, the referenced studies did not directly measure ER luminal calcium. A number of undefined factors could account for the measured differences between the megakaryocytes from patients with calreticulin mutations vs. JAK2V617F. These include the relative mutant allele burdens, the extent of MPL activation, as well as genetic differences unrelated to calreticulin. Different from these experiments, through the use of purified proteins, our studies show that the Del52 mutant has calcium binding characteristics resembling that of the wild type protein. Additionally, through genetic manipulations in cell lines, our studies directly address the effects of calreticulin KO and its Del52 mutation upon ER luminal and cytosolic calcium levels, and cellular SOCE signals. We did not measure significant differences in any of these parameters between the KO cells and those reconstituted with wild type calreticulin or the Del52 mutant. As noted by the editors, these results show that Ca2+ binding by calreticulin and store-operated Ca2+ entry in a cell are not fundamentally impacted by the type I deletion mutation. On the other hand, in primary megakaryocytes, when co-expressed with MPL, the Del52 mutant, through its known ability to bind and activate TPOR/MPL, is expected to induce SOCE and calcium fluxes similar to those induced by thrombopoietin. These points will be clarified in the revised discussion.

      Other studies have found that unfolded protein responses are activated in MPN cells with CRTDel52 calreticulin (see Blood, 2021), and increased UPR could account for higher levels of some ER-resident calcium-binding proteins observed here.

      Multiple studies have suggested the induction of the unfolded protein response (UPR) in cells expressing MPN mutants of calreticulin.  We don’t know the specific signals that cause the upregulation of various calcium binding proteins in calreticulin-KO cells and cells expressing the Del52 mutant. Indeed, these could result from increased protein misfolding in cells with wild type calreticulin deficiency. Alternatively, the sensing of cellular calcium perturbations could induce their expression. Regardless of the precise mechanisms underlying the expression changes in calcium binding proteins, the upregulated factors are predicted to compensate for calreticulin deficiency and contribute to the maintenance of the overall cellular calcium homeostasis. These points will be clarified in the revised discussion.

      Overall, it remains unclear how this work improves our understanding of MPN or clarifies calreticulin's role in MPN pathophysiology.

      The points discussed above as well as their implications for the understanding of calreticulin’s role in MPN pathophysiology will be clarified in the revised manuscript.

      Reviewer #2 (Public review):

      Tagoe and colleagues present a thorough analysis of the calcium (Ca2+) binding capacity of calreticulin (CRT), an endoplasmic reticulum (ER) Ca2+-buffer protein, using a mutant version (CRT del52) found in myeloproliferative neoplasms (MPNs). The authors use purified human CRT protein variants, CRT-KO cell lines, and an MPN cell line to elucidate the differing Ca2+ dynamics, both on the level of the protein and on cell-wide Ca2+-governed processes. In sum, the authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      First, the authors purify CRT protein and perform isothermal titration calorimetry to quantify the Ca2+ binding capacity of CRT. They use full-length human CRT, CRT del52, and two truncations of CRT (1-339 and 1-351, the former of which should lead to the entire loss of low-affinity Ca2+ binding). While CRT del52 has previously been shown to lead to a decrease in Ca2+ binding affinity in other models, the ITC data show that this is retained in CRT del52.

      Next, the authors utilize a CRT-KO cell line with subsequent addition of CRT protein variants to validate these findings with flow cytometric analysis. Cells were transfected with a ratiometric ER Ca2+ probe, and fluorescence indicates that CRT del52 is unable to restore basal ER Ca2+ levels to the same extent as CRT wild-type. To translate these findings to MPNs, the authors perform CRT-KO in a megakaryocytic cell line, where reconstitution with either CRT variant did not cause a difference in cytosolic calcium levels. The authors further test store-operated calcium entry (SOCE), an important process for maintaining ER Ca2+ levels, in these cells, and find that CRT-KO cells have lower SOCE activity, and that this can be slightly recovered with CRT addition.

      Finally, the authors ask whether other effects of CRT-KO/reconstitution can affect the cellular Ca2+ signaling pathway and levels. RNASeq analysis revealed that CRT-KO leads to an increase in various chaperone protein expressions, and that reconstitution with CRT del52 is unable to reduce expression to the same extent as reconstitution with CRT wildtype.

      Strengths:

      The authors provide new insights into CRT that can be applied to both normal and malignant cell biology.

      We thank the reviewer for the recognition that this study is important for our understanding of both normal and malignant cell biology.

      Weaknesses:

      (1) The authors should consider discussing the high-affinity Ca2+ binding site more in the introduction. Can they show a proof-of-concept experiment that validates that incubation of recombinant CRT reduces the function of that high-affinity Ca2+ binding site?

      In a previous study (Wijeyesakere et al. 2011 J. Biol Chem, 286 8771-8785), we showed that at a starting calcium concentration of 0 mM and with 3.3 mM injections of CaCl<sub>2</sub>, the measured K<sub>D</sub> value was 16.6 mM for calcium binding to wild type murine calreticulin, (which has  ~95% % sequence identity with human calreticulin), corresponding to the high affinity site. On the other hand, at a starting calcium concentration of 50 mM and with 33 mM CaCl<sub>2</sub>  injections, the measured K<sub>D</sub> value for calcium binding to wild type murine calreticulin was 590 mM (corresponding to the low affinity sites). Thus, we did not measure the high affinity sites when the starting calcium concentration was 50 mM. This point will be clarified in the revised manuscript.

      (2) For Figure 2B, do you have an explanation for why the purified proteins run higher than predicted (48-52kDa) - are these proteins still tagged with pGB1?

      Yes, the purified proteins shown in Figure 2B retained a GB1 tag. This point will be clarified in the revised manuscript.

      (3) The MEG-01 cell line has the BCR:ABL1 translocation, while CRT mutations are strictly found in BCR:ABL1 negative MPNs. Could these experiments be repeated in these cells treated with imatinib to decrease these effects, or see if basal MEG-01 Ca2+ levels/activity are changed with or without imatinib?

      Thank you for the important point. We will assess cytosolic calcium levels in MEG-01 cells with or without imatinib.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use methylphenidate (MPH) administration after learning a Pavlovian to instrumental transfer (PIT) task to parse decision making from instrumental influences. While the main effects were null, individual differences in working memory ability moderated the tendency of MPH to boost cognitive control in order to override PIT-biased instrumental learning. Importantly, this working memory moderator had symmetrical effects in appetite and aversive conditions, and these patterns replicated within each valence condition across different values of gain/loss (Fig S1c), suggesting a reliable effect that is generalized across instances of Pavlovian influence.

      Strengths:

      The idea of using pharmacological challenge after learning but prior to transfer is a novel technique that highlights the influence of catecholamines on the expression of learning under Pavlovian bias, and importantly it dissociated this decision feature from the learning of stimulus-outcome or action-outcome pairings.

      We thank the reviewer for highlighting the timing of the pharmacological intervention as a strength for this study and for the suggested improvements for clarification.

      Weaknesses:

      While the report is largely straightforward and clearly written, some aspects may be edited to improve the clarity for other readers.

      (1) Theoretical clarity. The authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) Analytic clarity: what's c^2?

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Public review):

      Summary:

      In this study, Geurts et al. investigated the effects of the catecholamine reuptake inhibitor methylphenidate (MPH) on value-based decision making using a combination of aversive and appetitive Pavlovian to Instrumental Transfer (PIT) in a human cohort. Using an elegant behavioural design they showed a valence- and action-specific effects of Pavlovian cues on instrumental responses. Initial analyses show no effect of MPH on these processes. However the authors performed a more in-depth analysis and demonstrated that MPH actually modulates PIT in actionspecific manner depending of individual working memory capacities. The authors interpret that as an effect on cognitive control of Pavlovian biasing of actions and decision making more than an invigoration of motivational biases.

      Strengths:

      A major strength of this study is its experimental design. The elegant combination of appetitive and aversive Pavlovian learning with approach/avoidance instrumental actions allows to precisely investigate the different modulation of value-based decision making depending on the context and environmental stimuli. Important MPH is only administered after Pavlovian and instrumental learning, restricting the effect on PIT performance only. Finally, the use of a placeboontrolled crossover design allows within-comparisons between PIT effect under placebo and MPH and the investigation of the relationships between working memory abilities, PIT and MPH effects.

      We thank the reviewer for highlighting the experimental design as a strength for this study and the suggested improvements for clarification.

      Weaknesses:

      As authors stated in their discussion, this study is purely correlational and their conclusions could be strengthened by the addition of interesting (but time- and resource-consuming) neuroimaging work.

      We employ a pharmacological intervention within a randomized placebo controlled cross-over design, which allows for causal inferences with respect to the placebo-controlled intervention. Thus, the reported interactions of interest include correlations, but these are causally dependent on our intervention.

      Perhaps the reviewer refers to the implications of our findings for hypotheses regarding neural implementation of Pavlovian bias-generation. Indeed, based on our data we are not able to arbitrate between frontal and striatal accounts, due to the systemic nature of the pharmacological intervention. Thus, we agree with the reviewer that neuroimaging (in combination with for example brain stimulation) would be a valuable next step to identify the neural correlates to these pharmacological intervention effects, to dissociate between frontal and striatal basis of the effects. In the revision, as per our reply to reviewer 1, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      The originality of this work compared to their previous published work using the same cohort could also be clarified at different stages of the article, as I initially wondered what was really novel. This point is much clearer in the discussion section.

      As recommended, we brought forward parts of the Discussion that clarify the originality of the current experiment to the introduction (page 4/5) and result section (page 8).

      A point which, in my opinion, really requires clarification is when the working memory performance presented in Figure 2B has been determined. Was it under placebo (as I would guess) or under MPH? If it is the former, it would be also interesting to look at how MPH modulates working memory based on initial abilities.

      We now clarified that working memory span was assessed for all participants on Day 2 prior to the start of instrumental training (as illustrated in figure 1A). Importantly, this was done prior to ingestion of the drug or placebo (which subjects received after Pavlovian training, which followed the instrumental training). This design also precludes an assessment of the effects of MPH on working memory capacity.

      A final point is that it could be interesting to also discuss these results, not only regarding dopamine signalling, but also including potential effect of MPH on noradrenaline in frontal regions, considering the known role of this system in modulating behavioural flexibility.

      We indeed focus our Discussion more on dopamine than on noradrenaline. Our revision now also discusses noradrenaline in light of our frontal control hypothesis and the recommendation, in future studies, to use a multi-drug design, incorporating, for example, a session with the drug atomoxetine, which modulates cortical catecholamines, but not striatal dopamine (Discussion, page 12).

      Reviewer #3 (Public review):

      The manuscript by Geurts and colleagues studies the effects of methylphenidate on Pavlovian to instrumental transfer in humans and demonstrates that the effects of the drug depend on the baseline working memory capacity of the participants. The experiment used a well established cognitive task that allows to measure the effects of Pavlovian cues predicting monetary wins and losses on instrumental responding in two different contexts, namely approach and withdraw. By administering the drug after participants went through the instrumental and Pavlovian learning phases of the experiment, the authors limited the effects of the drug to the transfer phase in extinction. This allowed the authors to make inference about the invigorating effects of the cues independently from any learning bias. Moreover, the authors employed a within subject design to study the effect of the drug on 100 participants, which also allows to detect continuous between-subject relationships with covariates such as working memory capacity.

      The study replicates previous findings using this task, namely that appetitive cues promote active responding, and aversive cues promote passive responding in an approach instrumental context, whereas the effect of the cues reverses in a withdraw instrumental context. The results of the methylphenidate manipulation show that the drug decreases the effects of the Pavlovian cues on instrumental responding in participants with low working memory capacity but increases the Pavlovian effects in participants with high working memory capacity. Importantly, in the latter group, methylphenidate increases the invigorating effect of appetitive Pavlovian cues on active approach and aversive Pavlovian cues on active withdrawal as well as the inhibitory effects of aversive Pavlovian cues on active approach and appetitive Pavlovian cues on active withdrawal. These results cannot be explained if catecholamines are just involved in Pavlovian biases by modulating behavioral invigoration driven by the anticipation of reward and punishment in the striatum, as this account can't account for the reversal of the effects of a valence cue on vigor depending on the instrumental context.

      In general, I find the methods of this study very robust and the results very convincing and important. However, I have some concerns:

      We thank the Reviewer for highlighting the robustness of the methods and the importance of the results. We are glad to shortly address the concerns here and have incorporated these in our revision.

      I am not convinced that the inclusion of impulsivity scores in the logistic mixed model to analyze the effects of methylphenidate on PIT is warranted. The authors do not show whether inclusion of this covariate is justified in terms of BIC. Moreover, they include this covariate but do not report the effects. Finally, it is possible that impulsivity is correlated with working memory capacity. In that case, multicollinearity may impact the estimation of the coefficient estimates and may inflate the p-values for the correlated covariates. Are the reported results robust when this factor is not included?

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results: Control analyses, page 28.

      The authors state that working memory capacity is an established proxy for dopamine synthesis capacity and cite some studies supporting this view. However, the authors omit a recent reference by van den Bosch et al that provides evidence for the absence of links between striatal dopamine synthesis capacity and working memory capacity. The lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum strengthens the alternative explanations of the results suggested in the discussion.

      We agree with the Reviewer that the lack of a robust link between working memory capacity and dopamine synthesis capacity in the striatum, as measured with [<sup>18</sup>F]-FDOPA PET imaging, is lending support for the proposed hypothesis incorporating a broader perspective on Pavlovian bias generation than the dopaminergic direct/indirect pathway account (although it is possible that the association will hold in a larger sample when synthesis capacity is measured with [<sup>18</sup>F]-FMT PET imaging, which is sensitive to a different component of the metabolic pathway). We will indeed incorporate in our planned revision the findings from our group reported in van den Bosch et al (2022).

      See Supplemental methods 2: Working memory and impulsivity assessment, page 26.

      ** Recommendations for the authors:**

      Reviewer #1 (Recommendations for the authors):

      (1) Theoretical clarity. Some aspects of the paper are ideally clear: Figure 1 clearly explains the paradigm. The general take-home message is clearly described in the last line of the abstract, the last line of the introduction, the first line of the discussion, and throughout other places in the discussion. Yet the authors seem to hedge their bets when it comes to placing these findings within a broader theoretical framework.

      The discussion includes many possible theoretical interpretations of the findings, which is laudable, but many readers may get lost in this multitude (particularly anyone who isn't an RL/DA aficionado). The group's prior work (i.e. striatal hypothesis) is first described, followed by a rather complex breakdown of valenceaction tendencies, then the seemingly preferred explanation for the current study (i.e. cognitive control hypothesis) is advanced as "an alternative account ...". This is followed by a third, more complex idea (i.e. cortico-striatal balance hypothesis), then the paper ends. A reader may be forgiven for skimming through this discussion and not having a clear idea of how to frame these effects. I think some subheaders would help, as well as clearer labeling of the theoretical interpretations in line with a more authoritative description of the author's preferred interpretation of the empirical effects.

      Our findings ask for a revision of theories on how catecholamines are involved in instantiation of Pavlovian biases in decision making. The reviewer rightly notices that we offer three routes to modify current theory to be able to incorporate our findings. Briefly, these routes discuss catecholaminergic modulation of Pavlovian biases (i) through modulation of the putative striatal ‘origin’ of Pavlovian biases, (ii) through top-down control, primarily relying on prefrontal processes, and (ii) a combination of the two, where catecholamines regulate the balance between these striatal and frontal processes.

      Given the systemic nature of the pharmacological manipulation, we cannot dissociate between these three accounts. We believe that discussing these possible explanations enriches our Discussion and strengthens our recommendation in the ultimate paragraph to use pharmacological neuroimaging studies to arbitrate between these options. In the revision, we have made this line of reasoning more clear, in part by adding guiding titles to the Discussion section and adding a summary paragraph in the Discussion (Discussion, page 9-12).

      (2) All statistical effects are presented as c^2 with no df. The methods only describe LMER and make no mention of what the c^2 measure represents.

      C^2 seems a technical pdf conversion error problem: all chi-squares (Χ2) have been converted to C2. This is now corrected in our revision.

      Reviewer #2 (Recommendations for the authors):

      Few minor points:

      Figure 2A is not cited in the text I think

      Checked and changed.

      Figure 2C: "C" is not present in the figure. Also I could not see the data corresponding at MPH-Approach context in Neutral Pavlovian condition but I think it is probably masked by another curve.

      Checked and changed. Indeed, the one curve is masked by the other curve.

      As I stated in the public review, a clarification or more detailed analysis of working memory performance depending on if it was measured under MPH or placebo could be a plus.

      Changed this (see public review reply).

      I did not see any statement about the availability of data but I may have missed it.

      Yes, the statement can be found:

      Methods, page 13: Data and code for the study are freely available at https://data.ru.nl/collections/di/dccn/DSC_3017031.02_734.

      Reviewer #3 (Recommendations for the authors):

      The authors should check that inclusion of impulsivity in the logistic mixed model is justified and if it is justified make sure that multicollinearity is not problematic.

      See answer to public review for convenience reiterated below:

      With regard to the inclusion of impulsivity we first like to mention that this inclusion in our analyses was planned a priori and therefore consistently implemented in the other reports resulting from the overarching study (Froböse et al., 2018; Cook et al., 2019; Rostami Kandroodi et al., 2021), especially the study with regard to which the current report is an e-life research advance (Swart et al., 2017). Moreover, we preregistered both working memory span and impulsivity as potential factors (under secondary measures) that could mediate the effects of catecholamines (see https://onderzoekmetmensen.nl/nl/trial/26989). The inclusion of working memory span was based on evidence from PET imaging studies demonstrating a link with dopamine synthesis capacity (Cools et al., 2008; Landau et al, 2009), whereas the inclusion of trait impulsivity was based on evidence from other PET imaging studies showing a link with dopamine (auto)receptor availability (Buckholtz et al., 2010; Kim et al., 2014; Lee et al., 2009; Reeves et al., 2012). Although there was no significant improvement for the model with impulsivity compared with the model without impulsivity, we feel that we should follow our a priori established analyses.

      We can confirm that impulsivity and working memory were not correlated in this sample (r98=-0.16, p=0.88), which rules out multicollinearity.

      Most importantly, results are robust to excluding impulsivity scores as evidenced by a significant four-way interaction from the omnibus GLMM without impulsivity (Action Context x Valence x Drug x WM span: X<sup>2</sup> = 9.5, p=0.002). We will report these findings in the revised manuscript. We now added the text to the Supplemental Results Control analyses, page 28.

      I would recommend that the authors make clear that the effects of methylphenidate are dependent on working memory capacity in the first sentence of the fore last paragraph of the introduction on page 4.

      Changed this accordingly, see Introduction, page 5.

      I would make sure that the text in the figures is readable without needing to enlarge the figures. I would also highlight the significant effects in the figures.

      We changed the font size accordingly and added significance statements to the caption, because depicting the significance of a four-way interaction including one continuous variable is not straightforward.

      The distributions of p(Go) by conditions such as in figure 1D or 2A are very intuitive. Figure 2B is very informative as it shows the continuous effects of working memory capacity on the PIT effect. I would add (in figure 2 or in the supplement) a plot of the p(Go) with a tertile split based on working memory. Considering that the correspondent analysis is being reported, having the plot would strengthen and simplify the understanding of the results.

      The continuous effects of working memory are based on WM values on the listening span ranging from 2.5-7, in steps of 0.5, resulting in 10 different values. A tertile split would result in binning these into two bins of three values, and one bin of four values. Given that all of the datapoints for this tertile split are already presented in the current figures, we strongly prefer not to include this additional figure.

      I would add some sentences in the results section (and maybe in the discussion if needed) addressing the results that the effect of Valence by drug by WM span is only significant in the withdrawal context but not in the approach context.

      We now added an emphasis on the specifically significant drug effects in withdrawal in the Results section, page 8.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This is a valuable polymer model that provides insight into the origin of macromolecular mixed and demixed states within transcription clusters. The well-performed and clearly presented simulations will be of interest to those studying gene expression in the context of chromatin. While the study is generally solid, it could benefit from a more direct comparison with existing experimental data sets as well as further discussion of the limits of the underlying model assumptions.

      We thank the editors for their overall positive assessment. In response to the Referees’ comments, we have addressed all technical points, including a more detailed explanation of the methodology used to extract gene transcription from our simulations and its analogy with real gene transcription. Regarding the potential comparison with experimental data and our mixing–demixing transition, we have added new sections discussing the current state of the art in relevant experiments. We also clarify the present limitations that prevent direct comparisons, which we hope can be overcome with future experiments using the emerging techniques.

      Reviewer #1 (Public Review):

      This manuscript discusses from a theory point of view the mechanisms underlying the formation of specialized or mixed factories. To investigate this, a chromatin polymer model was developed to mimic the chromatin binding-unbinding dynamics of various complexes of transcription factors (TFs).

      The model revealed that both specialized (i.e., demixed) and mixed clusters can emerge spontaneously, with the type of cluster formed primarily determined by cluster size. Non-specific interactions between chromatin and proteins were identified as the main factor promoting mixing, with these interactions becoming increasingly significant as clusters grow larger.

      These findings, observed in both simple polymer models and more realistic representations of human chromosomes, reconcile previously conflicting experimental results. Additionally, the introduction of different types of TFs was shown to strongly influence the emergence of transcriptional networks, offering a framework to study transcriptional changes resulting from gene editing or naturally occurring mutations.

      Overall I think this is an interesting paper discussing a valuable model of how chromosome 3D organisation is linked to transcription. I would only advise the authors to polish and shorten their text to better highlight their key findings and make it more accessible to the reader.

      We thank the Referee for carefully reading our manuscript and recognizing its scientific value. As suggested, we tried to better highlight our key findings and make the text more accessible while addressing also the comments from the other Referees.

      Reviewer #2 (Public Review):

      Summary:

      With this report, I suggest what are in my opinion crucial additions to the otherwise very interesting and credible research manuscript ”Cluster size determines morphology of transcription factories in human cells”.

      Strengths:

      The manuscript in itself is technically sound, the chosen simulation methods are completely appropriate the figures are well-prepared, the text is mostly well-written spare a few typos. The conclusions are valid and would represent a valuable conceptual contribution to the field of clustering, 3D genome organization and gene regulation related to transcription factories, which continues to be an area of most active investigation.

      Weaknesses:

      However, I find that the connection to concrete biological data is weak. This holds especially given that the data that are needed to critically assess the applicability of the derived cross-over with factory size is, in fact, available for analysis, and the suggested experiments in the Discussion section are actually done and their results can be exploited. In my judgement, unless these additional analysis are added to a level that crucial predictions on TF demixing and transcriptional bursting upon TU clustering can be tested, the paper is more fitted for a theoretical biophysics venue than for a biology journal such as eLife.

      We thank the Reviewer for their positive assessment of the soundness of our work and its contribution to the field. We have added a paragraph to the Conclusions highlighting the current state of experimental techniques and outlining near-term experiments that could be extended to test our predictions. We also emphasise that our analysis builds on state-of-the-art polymer models of chromatin and on quantitative experimental datasets, which we used both to build the model construction and to validate its outcomes (gene activity). We hope this strengthened link to experiment will catalyse further studies in the field.

      Major points:

      (1) My first point concerns terminology.The Merriam-Webster dictionary describes morphology as the study of structure and form. In my understanding, none of the analyses carried out in this study actually address the form or spatial structuring of transcription factories. I see no aspects of shape, only size. Unless the authors want to assess actual shapes of clusters, I would recommend to instead talk about only their size/extent. The title is, by the same argument, in my opinion misleading as to the content of this study.

      We agree with the Referee that the title could be misleading. In our study we characterized clusters size, that is a morphological descriptor, and cluster composition that isn’t morphology per se but used in the community in a broader sense. Nevertheless to strength the message we have changed the title in: “Cluster size determines internal structure of transcription factories in human cells”

      (2) Another major conceptual point is the choice of how a single TF:pol particle in the model relates to actual macromolecules that undergo clustering in the cell. What about the fact that even single TF factories still contain numerous canonical transcription factors, many of which are also known to undergo phase separation? Mediator, CDK9, Pol II just to name a few. This alone already represents phase separation under the involvement of different species, which must undergo mixing. This is conceptually blurred with the concept of gene-specific transcription factors that are recruited into clusters/condensates due to sequencespecific or chromatin-epigenetic-specific affinities. Also, the fact that even in a canonical gene with a ”small” transcription factory there are numerous clustering factors takes even the smallest factories into a regime of several tens of clustering macromolecules. It is unclear to me how this reality of clustering and factory formation in the biological cell relates to the cross-over that occurs at approximately n=10 particles in the simulations presented in this paper.

      This is a good point. However in our case we can either look at clustering transcription factors or transcription units. In an experimental situation, transcription units could be “coloured”, or assigned different types, by looking at different cell types, so that they can be classified as housekeeping, or cell-type independent, or cell-type specific. This is similar to how DHS can be clustered. In this way the mixing or demixing state can be identified by looking at the type of transcription unit, removing any ambiguity due to the fact that the same protein may participate in different TF complexes..

      (3) The paper falls critically short in referencing and exploiting for analysis existing literature and published data both on 3D genome organization as well as the process of cluster formation in relation to genomic elements. In terms of relevant literature, most of the relevant body of work from the following areas has not been included:

      (i) mechanisms of how the clustering of Pol II, canonical TFs, and specific TFs is aided by sequence elements and specific chromatin states

      (ii) mechanisms of TF selectivity for specific condensates and target genomic elements

      (iii) most crucially, existing highly relevant datasets that connect 3D multi-point contacts with transcription factor identity and transcriptional activity, which would allow the authors to directly test their hypotheses by analysis of existing data

      Here, especially the data under point (iii) are essential. The SPRITE method (cited but not further exploited by the authors), even in its initial form of publication, would have offered a data set to critically test the mixing vs. demixing hypothesis put forward by the authors. Specifically, the SPRITE method offers ordered data on k-mers of associated genomic elements. These can be mapped against the main TFs that associate with these genomic elements, thereby giving an account of the mixed / demixed state of these k-mer associations. Even a simple analysis sorting these associations by the number of associated genomic elements might reveal a demixing transition with increasing association size k. However, a newer version of the SPRITE method already exists, which combines the k-mer association of genomic elements with the whole transcriptome assessment of RNAs associated with a particular DNA k-mer association. This can even directly test the hypotheses the authors put forward regarding cluster size, transcriptional activation, correlation between different transcription units’ activation etc.

      To continue, the Genome Architecture Mapping (GAM) method from Ana Pombo’s group has also yielded data sets that connect the long-range contacts between gene-regulatory elements to the TF motifs involved in these motifs, and even provides ready-made analyses that assess how mixed or demixed the TF composition at different interaction hubs is. I do not see why this work and data set is not even acknowledged? I also strongly suggest to analyze, or if they are already sufficiently analyzed, discuss these data in the light of 3D interaction hub size (number of interacting elements) and TF motif composition of the involved genomic elements.

      Further, a preprint from the Alistair Boettiger and Kevin Wang labs from May 2024 also provides direct, single-cell imaging data of all super-enhancers, combined with transcription detection, assessing even directly the role of number of super-enhancers in spatial proximity as a determinant of transcriptional state. This data set and findings should be discussed, not in vague terms but in detailed terms of what parts of the authors’ predictions match or do not match these data.

      For these data sets, an analysis in terms of the authors’ key predictions must be carried out (unless the underlying papers already provide such final analysis results). In answering this comment, what matters to me is not that the authors follow my suggestions to the letter. Rather, I would want to see that the wealth of available biological data and knowledge that connects to their predictions is used to their full potential in terms of rejecting, confirming, refining, or putting into real biological context the model predictions made in this study.

      References for point (iii):

      - RNA promotes the formation of spatial compartments in the nucleus https://www.cell.com/cell/fulltext/S0092-8674(21)01230-7?dgcid=raven_jbs_etoc_email

      - Complex multi-enhancer contacts captured by genome architecture mapping https://www.nature.com/articles/nature21411

      - Cell-type specialization is encoded by specific chromatin topologies https://www.nature.com/articles/s41586-021-04081-2

      - Super-enhancer interactomes from single cells link clustering and transcription https://www.biorxiv.org/content/10.1101/2024.05.08.593251v1.full

      For point (i) and point (ii), the authors should go through the relevant literature on Pol II and TF clustering, how this connects to genomic features that support the cluster formation, and also the recent literature on TF specificity. On the last point, TF specificity, especially the groups of Ben Sabari and Mustafa Mirx have presented astonishing results, that seem highly relevant to the Discussion of this manuscript.

      We appreciate the Reviewer’s insightful suggestion that a comparison between our simulation results and experimental data would strengthen the robustness of our model. In response, we have thoroughly revised the literature on multi-way chromatin contacts, with particular attention to SPRITE and GAM techniques. However, we found that the currently available experimental datasets lack sufficient statistical power to provide a definitive test of our simulation predictions, as detailed below.

      As noted by the Reviewer, SPRITE experiments offer valuable information on the composition of highorder chromatin clusters (k-mers) that involve multiple genomic loci. A closer examination of the SPRITE data (e.g., Supplementary Material from Ref. [1]) reveals that the majority of reported statistics correspond to 3-mers (three-way contacts), while data on larger clusters (e.g., 8-mers, 9-mers, or greater) are sparse. This limitation hinders our ability to test the demixing-mixing transition predicted in our simulations, which occurs for cluster sizes exceeding 10.

      Moreover, the composition of the k-mers identified by SPRITE predominantly involves genomic regions encoding functional RNAs—such as ITS1 and ITS2 (involved in rRNA synthesis) and U3 (encoding small nucleolar RNA)—which largely correspond to housekeeping genes. Conversely, there is little to no data available for protein-coding genes. This restricts direct comparison to our simulations, where the demixing-mixing transition depends critically on the interplay between housekeeping and tissue-specific genes.

      Similarly, while GAM experiments are capable of detecting multi-way chromatin contacts, the currently available datasets primarily report three-way interactions [2,3].

      In summary, due to the limited statistical data on higher-order chromatin clusters [4], a quantitative comparison between our simulation results and experimental observations is not currently feasible. Nevertheless, we have now briefly discussed the experimental techniques for detecting multi-way interactions in the revised manuscript to reflect the current state of the field, mentioning most of the references that the Reviewer suggested.

      (4) Another conceptual point that is a critical omission is the clarification that there are, in fact, known large vs. small transcription factories, or transcriptional clusters, which are specific to stem cells and ”stressed cells”. This distinction was initially established by Ibrahim Cisse’s lab (Science 2018) in mouse Embryonic Stem Cells, and also is seen in two other cases in differentiated cells in response to serum stimulus and in early embryonic development:

      - Mediator and RNA polymerase II clusters associate in transcription-dependent condensates https://www.science.org/doi/10.1126/science.aar4199

      - Nuclear actin regulates inducible transcription by enhancing RNA polymerase II clustering https://www.science.org/doi/10.1126/sciadv.aay6515

      - RNA polymerase II clusters form in line with surface condensation on regulatory chromatin https://www.embopress.org/doi/full/10.15252/msb.202110272

      - If ”morphology” should indeed be discussed, the last paper is a good starting point, especially in combination with this additional paper: Chromatin expansion microscopy reveals nanoscale organization of transcription and chromatin https://www.science.org/doi/10.1126/science.ade5308

      We thank the Reviewer for pointing out the discussion about small and large clusters observed in stressed cells. Our study aims to provide a broader mechanistic explanation on the formation of TF mixed and demixed clusters depending on their size. However, to avoid to generate confusion between our terminology and the classification that is already used for transcription factories in stem and stressed cells, we have now added some comments and references in the revised text.

      (5) The statement scripts are available upon request is insufficient by current FAIR standards and seems to be non-compliant with eLife requirements. At a minimum, all, and I mean all, scripts that are needed to produce the simulation outcomes and figures in the paper, must be deposited as a publicly accessible Supplement with the article. Better would be if they would be structured and sufficiently documented and then deposited in external repositories that are appropriate for the sharing of such program code and models.

      We fully agree with the Reviewer. We have now included in the main text a link to an external repository containing all the codes required to reproduce and analyze the simulations.

      Recommendations for the authors:

      Minor and technical points

      (6) Red, green, and yellow (mix of green and red) is a particularly bad choice of color code, seeing that red-green blindness is the most common color blindness. I recommend to change the color code.

      We appreciate the Reviewer’s thoughtful comment regarding color accessibility. We fully agree that red–green combinations can pose challenges for color-blind readers. In our figures, however, we chose the red–green–yellow color scheme deliberately because it provides strong contrast and intuitive representation for different TF/TU types. To ensure accessibility, we optimized brightness and saturation within red-green schemes and we carefully verified that the chosen hues are distinguishable under the most common forms of color vision deficiency, i.e. trichromatic color blindness, using color-blindness simulation tools (e.g., Coblis).

      How is the dispersing effect of transcriptional activation and ongoing transcription accounted for or expected to affect the model outcome? This affects both transcriptional clusters (they tend to disintegrate upon transcriptional activation) as well as the large scale organization, where dispersal by transcription is also known.

      We thank the Reviewer for this very insightful question. The current versions of both our toy model and the more complex HiP-HoP model do not incorporate the effects of RNA Polymerase elongation. Our primary goal was to develop a minimalisitc framework that focuses on investigating TF clusters formation and their composition. Nevertheless, we find that this straightforward approach provides a good agreement between simulations and Hi-C and GRO-seq experiments, lending confidence to the reliability of our results concerning TF cluster composition.

      We fully agree, however, that the effects of transcription elongation are an interesting topic for further exploration. For example, modeling RNA Polymerases as active motors that continually drive the system out of equilibrium could influence the chromatin polymer conformation and the structure of TF clusters. Additionally, investigating how interactions between RNA molecules and nuclear proteins, such as SAF-A, might lead to significant changes in 3D chromatin organization and, consequently, transcription [5], is also in intriguing prospect. Although we do not believe that the main findings of our study, particularly regarding cluster composition and mixed-demixed transition, would be impacted by transcription elongation effects, we recognize the importance of this aspect. As such, we have now included some comments in the Conclusions section of the revised manuscript.

      “and make the reasonable assumption that a TU bead is transcribed if it lies within 2.25 diameters (2.25σ) of a complex of the same colour; then, the transcriptional activity of each TU is given by the fraction of time that the TU and a TF:pol lie close together.” How is that justified? I do not see how this is reasonable or not, if you make that statement you must back it up.

      As pointed out by the Referee, we consider a TU to be active if at least one TF is within a distance 2.25σ from that TU. This threshold is a slightly larger than the TU-TF interaction cutoff distance, r<sub>c</sub> \= 1.8σ between TFs and TUs. The rationale for this choice is to ensure that, in the presence of a TU cluster surrounded by TFs, TUs that are not directly in contact with a TF are still considered active. Nonetheless, we find that using slightly different thresholds, such as 1.8σ or 1.1σ, leads to comparable results, as shown in Fig. S11, demonstrating the robustness of our analysis.

      Clearly, close proximity in 1D genomic space favours formation of similarly-coloured clusters. This is not surprising, it is what you built the model to do. Should not be presented as a new insight, but rather as a check that the model does what is expected.

      We believed that this sentence already conveyed that the formation of single-color clusters driven by 1D genomic proximity is not a surprising outcome. However, we have now slightly rephrased it to better emphasize that this is not a novel insight.

      That said, we would like to highlight that while 1D genomic proximity facilitates the formation of clusters of the same color, the unmixed-to-mixed transition in cluster composition is not easily predictable solely from the TU color pattern. Furthermore, in simulations of real chromosomes, where TU patterns are dictated by epigenetic marks, the complexity of these patterns makes it challenging—if not impossible—to predict cluster composition based solely on the input data of our model.

      “…how closely transcriptional activities of different TUs correlate…” Please briefly state over what variable the correlation is carried out, is it cross correlation of transcription activity time courses over time? Would be nice to state here directly in the main text to make it easier for the reader.

      We have now included a brief description in the revised manuscript explaining how the transcriptional correlations were evaluated and how the correlation matrix was constructed.

      “The second concerns how expression quantitative trait loci (eQTLs) work. Current models see them doing so post-transcriptionally in highly-convoluted ways [11, 55], but we have argued that any TU can act as an eQTL directly at the transcriptional level [11].” This text does not actually explain what eQTLs do. I think it should, in concise words.

      We agree with the Referee’s suggestion. We have revised the sentence accordingly and now provide a clear explanation of eQTLs upon their first mention. The revised paragraph now reads as follows:

      “The second concerns how expression quantitative trait loci (eQTLs)—genomic regions that are statistically associated with variation in gene expression levels—function. While current models often attribute their effects to post-transcriptional regulation through complex mechanisms [6,7], we have previously argued that any transcriptional unit (TU) can act as an eQTL by directly influencing gene expression at the transcriptional level [7]. Here, we observe individual TUs up-regulating or down-regulating the activity of others TUs – hallmark behaviors of eQTLs that can give rise to genetic effects such as “transgressive segregation” [8]. This phenomenon refers to cases in which alleles exhibit significantly higher or lower expression of a target gene, and can be, for instance, caused by the creation of a non-parental allele with a specific combination of QTLs with opposing effects on the target gene.”

      “In the string with 4 mutations, a yellow cluster is never seen; instead, different red clusters appear and disappear (Fig. 2Eii)…” How should it be seen? You mutated away most of the yellow beads. I think the kymograph is more informative about the general model dynamics, not the effects of mutations. Might be more appropriate to place a kymograph in Figure 1.

      We agree with the Referee that the kymograph is the most appropriate graphical representation for capturing the effects of mutations. Panel 2E already refers to the standard case shown in Figure 1. We have now clarified this both in the caption and in the main text. In addition, we have rephrased the sentence—which was indeed misleading—as follows:

      “From the activity profiles in Fig. 2C, we can observe that as the number of mutations increases, the yellow cluster is replaced by a red cluster, with the remaining yellow TUs in the region being expelled (Fig. 2B(ii)). This behavior is reflected in the dynamics, as seen by comparing panels E(i) and E(ii): in the string with four mutations, transcription of the yellow TUs is inhibited in the affected region, while prominent red stripes—corresponding to active, transcribing clusters—emerge (Fig. 2E(ii)).” We hope that the comparison is now immediately clear to the reader.

      “…but this block fragments in the string with 4 mutations…” I don’t know or cannot see what is meant by ”fragmentation” in the correlation matrix.

      With the sentence “this block fragments in the string with 4 mutations” we mean that the majority of the solid red pixels within the black box become light-red or white once the mutations are applied. We have now added a clarification of this point in the revised manuscript.

      “Fig. 3D shows the difference in correlation between the case with reduced yellow TFs and the case displayed in Fig. 1E.” Can you just place two halves of the different matrices to be compared into the same panel? Similar to Fig. S5. Will be much easier to compare.

      We thank the Referee for this suggestion. We tried to implement this modification, and report the modified figure below (Author response image 1). As we can see, in the new figure it is difficult to spot the details we refer to in the main text, therefore we prefer to keep the original version of the figure.

      Author response image 1.

      Heatmap comparing activity correlations of TUs in the random string under normal conditions (top half) and with reduced yellow-TF concentration (bottom half).

      What is the omnigenic model? It is not introduced.

      We thank the Reviewer for highlighting this important point. The omnigenic model, first introduced by Boyle et al in Ref. [6], was proposed to explain how complex traits, including disease risk, are influenced by a vast number of genes. Accordingly to this model, the genetic basis of a trait is not limited to a small set of core genes whose expression is directly related to the trait, but also includes peripheral genes. The latter, although not directly involved in controlling the trait, can influence the expression of core genes through gene regulatory networks, thereby contributing to the overall genetic influence on the trait. We have now added a few lines in the revised manuscript to explain this point.

      “Additionally, blue off-diagonal blocks indicate repeating negative correlations that reflect the period of the 6-pattern.” How does that look in a kymograph? Does this mean the 6 clusters of same color steal the TFs from the other clusters when they form?

      The intuition of the Referee is indeed correct. The finite number of TFs leads to competition among TUs of the same colour, resulting in anticorrelation:when a group of six nearby TUs of a given colour is active, other, more distant TUs of the same colour are not transcribing due to the lack of available TFs. As the Referee suggested,this phenomenon is visible in the kymograph showing TU activity. In Author response image 2, it can be observed that typically there is a single TU cluster for each of the three colours (yellow, green, and red). These clusters can be long-lived (e.g., the yellow cluster at the center of the kymograph) or may destroy during the simulation (e.g., the red cluster at the top of the kymograph, which dissolves at t ∼ 600 × 10<sup>5</sup> τ<sub>B</sub>). In the latter case, TFs of the corresponding colour are released into the system and can bind to a different location, forming a new cluster (as seen with the red cluster forming at the bottom of the kymograph for t > 600 × 10<sup>5</sup> τ<sub>B</sub>). This point is further discussed at the point 2.30 of this Reply where additional graphical material is provided.

      Author response image 2.

      Kymograph showing the TU activity during a typical run in the 6-pattern case. Each row reports the transcriptional state of a TU during one simulation. Black pixels correspond to inactive TUs, red (yellow, green) pixels correspond to active red (yellow, green) TUs.

      “Conversely, negative correlations connect distant TUs, as found in the single-color model…” But at the most distal range, the negative correlation is lost again! Why leave this out? Your correlation curves show the same , equilibration towards no correlation at very long ranges.

      As highlighted in Figure 5Ai, long-range negative correlations (grey segments) predominantly connect distant TUs of the same colour. This is quantified in Figure 5Bi: restricting to same-colour TUs shows that at large genomic separations the correlation is almost entirely negative, with small fluctuations at distances just below 3000 kbp where sampling is sparse; we therefore avoid further interpretation of this regime.

      “These results illustrate how the sequence of TUs on a string can strikingly affect formation of mixed clusters; they also provide an explanation of why activities of human TUs within genomic regions of hundreds of kbp are positively correlated [60].” This is a very nice insight.

      We thank the Reviewer for the very supportive comment.

      “To quantify the extent to which TFs of different colours share clusters, we introduce a demixing coefficient, θ<sub>dem</sub> (defined in Fig. 1).” This is not defined in Fig. 1 or anywhere else here in the main text.

      We thank the Referee for pointing this out. For a given cluster, the demixing coefficient is defined as

      where n is the number of colors, i indexes each color present in the model, and x<sub>i,max</sub> the largest fraction of TFs of the same i-th color in a single TF cluster.

      The demixing coefficient is defined in the Methods section; therefore, we have replaced defined in Fig. 1 with see Methods for definition.

      “Mixing is facilitated by the presence of weakly-binding beads, as replacing them with non-interacting ones increases demixing and reduces long-range negative correlations (Figure S3). Therefore, the sequence of strong and weak binding sites along strings determines the degree of mixing, and the types of small-world network that emerge. If eQTLs also act transcriptionally in the way we suggest [11], we predict that down-regulating eQTLs will lie further away from their targets than up-regulating ones.” Going into these side topics and minke points here is super distracting and waters down the message. Maybe first deal with the main conclusions on mixed vs demixed clusters in dependence on the strong and specific binding site patterns, before dealing with other additional points like the role of weak binding sites.

      Thank you for the suggestion. We now changed the paragraph to highlight the main results. The new paragraph is as follows. “These results on activity correlation and TF cluster composition suggest that, if eQTLs act transcriptionally as expected [7], down-regulating eQTLs are likely to be located further from their target genes than up-regulating ones. In addition, it is important to note that mixing is promoted by the presence of weakly binding beads; replacing these with non-interacting ones leads to increased demixing and a reduction in long-range negative correlations (Figure S3). More generally, our findings indicate that the presence of multiple TF colors offers an effective mechanism to enrich and fine-tune transcriptional regulation.”

      “…provides a powerful pathway to enrich and modulate transcriptional regulation.” Before going into the possible meaning and implications of the results, please discuss the results themselves first.

      See previous point.

      Figure 5B. Does activation typically coincide with spatial compaction of the binding sites into a small space or within the confines of a condensate? My guess would be that colocalization of the other color in a small space is what leads to the mixing effect?

      As the Reviewer correctly noted, the activity of a given TU is indeed influenced by the presence of nearby TUs of the same color, since their proximity facilitates the recruitment of additional TFs and enhances the overall transcriptional activity. In this context, the mixing effect is certainly affected by the 1D arrangement of TUs along the chromatin fiber. As emphasized in the revised manuscript, when domains of same-color TUs are present (as in the 6-pattern string), the degree of demixing is greater compared to the case where TUs of different colors alternate and large domains are absent (as in the 1-pattern string). This difference in the demixing parameter as a function of the 1D TU arrangement is clearly visible in Fig. S2B.

      “…euchromatic regions blue, and heterochromatic ones grey.” Please also explain what these color monomers mean in terms of non specific interactions with the TFs.

      Generally, in our simulation approach we assume euchromatin regions to be more open and accessible to transcription factors, whereas heterochromatin corresponds to more compacted chromatin segments [9]. To reflect this, we introduce weak, non-specific interactions between euchromatin and TFs, while heterochromatin interacts with TFs only thorugh steric effects. To clarify this point, we have now slightly revised the caption of Fig.6.

      “More quantitatively, Spearman’s rank correlation coefficient is 3.66 10<sup>−1</sup>, which compares with 3.24 10<sup>−1</sup> obtained previously using a single-colour model [11].” This comparison does not tell me whether the improvement in model performance justifies an additional model component. There are other, likelihood based approaches to assess whether a model fits better in a relevant extent by adding a free model parameter. Can these be used for a more conclusive comparison? Besides, a correlation of 0.36 does not seem so good?

      We understand the Reviewer’s concern that the observed increase in the activity correlation may not appear to provide strong evidence for the improvement of the newly introduced model. However, within the context of polymer models developed to study realistic gene transcription and chromatin organization, this type of correlation analysis is a widely accepted approach for model validation. Experimental data commonly used for such validation include Hi-C maps, FISH experiments, and GRO-seq data [10,11]. The first two are typically employed to assess how accurately the model reproduces the 3D folding of chromatin; a comparison between experimental and simulated Hi-C maps is provided in the Supplementary Information (Fig. S5), showing a Pearson correlation of 0.7. GRO-seq or RNA-seq data, on the other hand, are used to evaluate the model’s ability to predict gene transcription levels. To date, the highest correlation for transcriptional activity data has been achieved by the HiP-HoP model at a resolution of 1 kbp [10], reporting a Spearman correlation of 0.6. Therefore, the correlation obtained with our 2-color model represents a good level of agreement when compared with the more complex HiP-HoP model. In this context, the observed increase in correlation—from 0.324 to 0.366—can be regarded as a modest yet meaningful improvement.

      “…consequently, use of an additional color provides a statisticallysignificant improvement (p-value < 10<sup>−6</sup>, 2-sided t-test).” I do not follow this argument. Given enough simulation repeats, any improvement, no matter how small, will lead to statistically significant improvements.

      We agree that this sentence could be misleading. We have now rephrased it in a clearer manner specifying that each of the two correlation values is statistically significant alone, while before we were wrongly referring to the significance of the improvement.

      “Additionally, simulated contact maps show a fair agreement with Hi-C data (Figure S5), with a Pearson correlation r ∼ 0.7 (p-value < 10<sup>−6</sup>, 2-sided t-test).” Nice!

      We thank the Reviewer for the positive comment.

      “Because we do not include heterochromatin-binding proteins, we should not however expect a very accurate reproduction of Hi-C maps: we stress that here instead we are interested in active chromatin, transcription and structure only as far as it is linked to transcription.” Then why do you not limit your correlation assessment to only these regions to show that these are very well captured by your model?

      We thank the Reviewer for this insightful comment. Indeed, we could have restricted our investigation to active chromatin regions, as done in our previous works [11,12]. However, our intention in this section of the manuscript was to clarify that the current model is relatively simple and therefore not expected to achieve a very high level of agreement between experimental and simulated Hi-C maps. Another important limitation of the two color model described in the section is the absence of active loop extrusion mediated by SMC proteins, which is known to play a central role in establishing TADs boundaries. Consequently, even if our analysis were limited to active chromatin regions, the agreement with experimental Hi-C maps would still remain lower than that obtained with more comprehensive models, such as HiP-HoP, that we use later in the last section of the paper. We have now added a comment in the revised manuscript explicitly noting the lack of active loop extrusion in our 2-color model.

      “We also measure the average value of the demixing coefficient, θ<sub>dem</sub> (Materials and Methods). If θ<sub>dem</sub> = 1, this means that a cluster contains only TFs of one colour and so is fully demixed; if θ<sub>dem</sub> = 0, the cluster contains a mixture of TFs of all colors in equal number, and so is maximally mixed.” Repetitive.

      We have now rephrased the sentence in a more concise way.

      “…notably, this is similar to the average number of productivelytranscribing pols seen experimentally in a transcription factory [6].” That seems a bit fast and loose. The number of Polymerases can differ depending on state, type of factory, gene etc. and vary between anything from to a few hundreds of Polymerase complexes depending on definition of factory, and what is counted as active. Also, one would think that polymerases only make up a small part of the overall protein pool that constitutes a condensate, so it is unclear whether this is a pertinent estimate.

      Here we refer to the average size of what is normally referred to as a PolII factory, not a generic nuclear condensate. These are the clusters which arise in our simulations. These structures emerge through microphase separation and have been well characterised, for instance see [13] for a recent review. For these structures while there is a distribution the average is well defined and corresponds to a size of about 100 nm, which is very much in line with the size of the clusters we observe, both in terms of 3D diameter and number of participating proteins. Because of the size, the number of active complexes which can contribute cannot be significantly more than ∼ 10. These estimates are, we note, very much in line with super-resolution measurements of SAF-A clusters [14], which are associated with active transcription and hence it is reasonable to assume they colocalise with RNA and polymerase clusters.

      “Conversely, activities of similar TUs lying far from each other on the genetic map are often weakly negatively correlated, as the formation of one cluster sequesters some TFs to reduce the number available to bind elsewhere.” This point is interesting, and I strongly suspect that this indeed happening. But I don’t think it was shown in the analysis of the simulation results in sufficient clarity. We need direct assessment of this sequestration, currently it’s only indirectly inferred.

      Indeed, this is the mechanism underlying the emergence of negative long-range correlations among TU activity values. As the Reviewer correctly pointed out, the competition for a finite number of TFs was only indirectly inferred in the original manuscript. To address this, we have now included a new figure explicitly illustrating this effect. In Fig. S12, we show the kymograph of active TUs (left panel), as in Fig. 2E(i) of the main text, alongside a new kymograph depicting the number of green TFs within a sphere of radius 10σ centered on each green TU (right panel). For simplicity, we focus here only on green TUs and TFs. It can be observed that, during the initial part of the simulation, green TFs are localized near genomic position ∼ 2000(right panel), where green TUs are transcriptionally active (left panel). Toward the end of the simulation, TUs near genomic position ∼ 500 become active, coinciding with the relocation of TFs to this region and the depletion of the previous one.

      In the definition for the demixing coefficient (equation 1), what does the index i stand for?

      Here i is an index denoting each of the colors present in the model. We have now specified the meaning of i after Eq. 1.

      Reviewer 3 (Public Review):

      In this work, the authors present a chromatin polymer model with some specific pattern of transcription units (TUs) and diffusing TFs; they simulate the model and study TFclustering, mixing, gene expression activity, and their correlations. First, the authors designed a toy polymer with colored beads of a random type, placed periodically (every 30 beads, or 90kb). These colored beads are considered a transcription unit (TU). Same-colored TUs attract with each other mediated by similarly colored diffusing beads considered as TFs. This led to clustering (condensation of beads) and correlated (or anti-correlation) ”gene expression” patterns. Beyond the toy model, when authors introduce TUs in a specific pattern, it leads to emergence of specialized and mixed cluster of different TFs. Human chromatin models with realistic distribution of TUs also lead to the mixing of TFs when cluster size is large.

      Strengths.

      This is a valuable polymer model for chromatin with a specific pattern of TUs and diffusing TF-like beads. Simulation of the model tests many interesting ideas. The simulation study is convincing and the results provide solid evidence showing the emergence of mixed and demixed TF clusters within the assumptions of the model.

      Weaknesses.

      Weakness of the work: The model has many assumptions. Some of the assumptions are a bit too simplistic. Concerns about the work are detailed below:

      We thank the Referee for this overall positive evaluation.

      We thank the Referee for this important observation. The way we The authors assume that when the diffusing beads (TFs) are near a TU, the gene expression starts. However, mammalian gene expression requires activation by enhancer-promoter looping and other related events. It is not a simple diffusion-limited event. Since many of the conclusions are derived from expression activity, will the results be affected by the lack of looping details?

      We do not need to assume promoter-enhancer contact, this emerges naturally through the bridging-induced phase separation and indeed is a key strength of our model. Even though looping is not assumed as key to transcriptional initiation, in practice the vast majority of events in which a TF is near a TU are associated with the presence of a cluster where regulatory elements are looped. So transcription in our case is associated with the bridging-induced phase separation, and there is no lack of looping, looping is naturally associated with transcription, and this is an emergent property of the model (not an assumption), which is an important feature of our model. Accordingly, both contact maps and transcriptional activity are well predicted by our model, both in the version described here and in the more sophisticated single-colour HiP-HoP model [10] (an important ingredient of which is the bridging-induced phase separation).

      Authors neglect protein-protein interactions. Without proteinprotein interactions, condensate formation in natural systems is unlikely to happen.

      We thank the Reviewer for pointing out the absence of protein-protein interactions in our simulations. While we acknowledge this limitation, we would like to emphasize that experimental studies have not observed nuclear proteins forming condensates at physiological concentrations in the absence of DNA or chromatin. For example, studies such as Ryu et al. [15] and Shakya et al. [16] show that protein-protein interactions alone are insufficient to drive condensate formation in vivo. Instead, the presence of a substrate, such as DNA or chromatin, is essential to favor and stabilize the formation of protein clusters.

      In our simulations, we propose that protein liquid-liquid phase separation (LLPS) is driven by the presence of both strong and weak attractions between multivalent protein complexes and the chromatin filament. As stated in our manuscript, the mechanism leading to protein cluster formation is the bridging induced attraction. This mechanism involves a positive feedback loop, where protein binding to chromatin induces a local increase in chromatin density, which then attracts more proteins, further promoting cluster formation.

      While we acknowledge that adding protein-protein interactions could be incorporated into our simulations, we believe this would need to be a weak interaction to remain consistent with experimental data. Additionally, incorporating such interactions would not alter the conclusions of our study.

      What is described in this paper is a generic phenomenon; many kinds of multivalent chromatin-binding proteins can form condensates/clusters as described here. For example, if we replace different color TUs with different histone modifications and different TFs with Hp1, PRC1/2, etc, the results would remain the same, wouldn’t they? What is specific about transcription factor or transcription here in this model? What is the logic of considering 3kb chromatin as having a size of 30 nm? See Kadam et al. (Nature Communications 2023). Also, DNA paint experimental measurement of 5kb chromatin is greater than 100 nm (see work by Boettiger et al.).

      We thank the Reviewer for this important observation, which we now address. To begin, we consider the toy model introduced in the first part of the manuscript, where TUs are randomly positioned rather than derived from epigenetic data. As the Reviewer points out, in this simplified context, our results reflect a generic phenomenon: the composition of clusters depends primarily on their size, independent of the specific types of proteins involved. However, the main goal of our work is to gain insights into apparently contradictory experimental findings, which show that some transcription factories consist of a single type of transcription factors, while other contain multiple types. This led us to focus on TF clusters and their role in transcriptional regulation and co-regulation of distant genes. Therefore, in the second part of the manuscript, we use DNase I hypersensitive site (DHS) data to position TUs based on predicted TF binding sites, providing a more biological framework. In both the toy model and the more realistic HiP-HoP model, we observe a size-dependent transition in cluster composition. However, we refrain from generalizing these results to clusters composed of other protein complexes, such as HP1 and PRC, as their binding is governed by distinct epigenetic marks (e.g. H3K927me3 and H3K27me3), which exhibit different genomic distributions compared to DHS marks.

      Finally, the mapping of 3kb to 30nm is an estimate which does not significantly impact our conclusions. The relationship between genomic distance (in kbp) and spatial distance (in nm) is highly dependent on the degree of chromatin compaction, which can vary across cell types and genomic context. As such, providing an exact conversion is challenging [17]. For example, in a previous work based on the HiP-HoP model [12] we compared simulated and experimental FISH measurements and found that 1kbp typically corresponds to 15 − 20nm, implying that 3kbp could span 60nm. Nevertheless, we emphasize that varying this conversion factor does not affect the core results or conclusions of our study. We have now included a clarification in the revised SI to highlight this point.

      Recommendations for the authors:

      Other points.

      Figure 1(D) caption says 2.25σ = 1.6 nanometer. Is this a typo? Sigma is 30nm.

      Yes, it was. As 1σ ∼ 30nm, we have 2.25σ = 2.25 · 30 nm = 67.2 nm ∼ 6.7 × 10<sup>−8</sup>m. We have now corrected the caption.

      Page 6, column 2nd, 3rd para, it is written that θ<sub>dem</sub> (”defined in Fig.1”). There is no θ<sub>dem</sub> defined in Fig.1, is there? I can see it defined in Methods but not in Fig. 1.

      Correct, we replaced (defined in Fig.1) with (see Methods for definition).

      Page 6, column 2, 4th para: what does “correlations overlap and correlations diverge mean”?

      With reference to the plots from Fig. 5B, correlation overlap and diverge simply refers to the fact that same-colour (red curves) and different-colour (blue curves) correlation trends may or may not overlap on each other. We have now clarified this point.

      What is the precise definition of correlation in Fig 5B (Y-axis)?

      In Fig.5B, correlation means Pearson correlation. We have now specified this point in the revised text and in the caption of Fig.5.

      References

      (1) S. A. Quinodoz, J. W. Jachowicz, P. Bhat, N. Ollikainen, A. K. Banerjee, I. N. Goronzy, M. R. Blanco, P. Chovanec, A. Chow, Y. Markaki et al., “Rna promotes the formation of spatial compartments in the nucleus,” Cell, vol. 184, no. 23, pp. 5775–5790, 2021.

      (2) R. A. Beagrie, A. Scialdone, M. Schueler, D. C. Kraemer, M. Chotalia, S. Q. Xie, M. Barbieri, I. de Santiago, L.-M. Lavitas, M. R. Branco et al., “Complex multi-enhancer contacts captured by genome architecture mapping,” Nature, vol. 543, no. 7646, pp. 519–524, 2017.

      (3) R. A. Beagrie, C. J. Thieme, C. Annunziatella, C. Baugher, Y. Zhang, M. Schueler, A. Kukalev, R. Kempfer, A. M. Chiariello, S. Bianco et al., “Multiplex-gam: genome-wide identification of chromatin contacts yields insights overlooked by hi-c,” Nature Methods, vol. 20, no. 7, pp. 1037–1047, 2023.

      (4) L. Liu, B. Zhang, and C. Hyeon, “Extracting multi-way chromatin contacts from hi-c data,” PLOS Computational Biology, vol. 17, no. 12, p. e1009669, 2021.

      (5) R.-S. Nozawa, L. Boteva, D. C. Soares, C. Naughton, A. R. Dun, A. Buckle, B. Ramsahoye, P. C. Bruton, R. S. Saleeb, M. Arnedo et al., “Saf-a regulates interphase chromosome structure through oligomerization with chromatin-associated rnas,” Cell, vol. 169, no. 7, pp. 1214–1227, 2017.

      (6) E. A. Boyle, Y. I. Li, and J. K. Pritchard, “An expanded view of complex traits: from polygenic to omnigenic,” Cell, vol. 169, no. 7, pp. 1177–1186, 2017.

      (7) C. Brackley, N. Gilbert, D. Michieletto, A. Papantonis, M. Pereira, P. Cook, and D. Marenduzzo, “Complex small-world regulatory networks emerge from the 3d organisation of the human genome,” Nat. Commun., vol. 12, no. 1, pp. 1–14, 2021.

      (8) R. B. Brem and L. Kruglyak, “The landscape of genetic complexity across 5,700 gene expression traits in yeast,” Proceedings of the National Academy of Sciences, vol. 102, no. 5, pp. 1572– 1577, 2005.

      (9) M. Chiang, C. A. Brackley, D. Marenduzzo, and N. Gilbert, “Predicting genome organisation and function with mechanistic modelling,” Trends in Genetics, vol. 38, no. 4, pp. 364–378, 2022.

      (10) M. Chiang, C. A. Brackley, C. Naughton, R.-S. Nozawa, C. Battaglia, D. Marenduzzo, and N. Gilbert, “Genome-wide chromosome architecture prediction reveals biophysical principles underlying gene structure,” Cell Genomics, vol. 4, no. 12, 2024.

      (11) A. Buckle, C. A. Brackley, S. Boyle, D. Marenduzzo, and N. Gilbert, “Polymer simulations of heteromorphic chromatin predict the 3d folding of complex genomic loci,” Mol. Cell, vol. 72, no. 4, pp. 786–797, 2018.

      (12) G. Forte, A. Buckle, S. Boyle, D. Marenduzzo, N. Gilbert, and C. A. Brackley, “Transcription modulates chromatin dynamics and locus configuration sampling,” Nature Structural & Molecular Biology, vol. 30, no. 9, pp. 1275–1285, 2023.

      (13) P. R. Cook and D. Marenduzzo, “Transcription-driven genome organization: a model for chromosome structure and the regulation of gene expression tested through simulations,” Nucleic acids research, vol. 46, no. 19, pp. 9895–9906, 2018.

      (14) M. Marenda, D. Michieletto, R. Czapiewski, J. Stocks, S. M. Winterbourne, J. Miles, O. C. Flemming, E. Lazarova, M. Chiang, S. Aitken et al., “Nuclear rna forms an interconnected network of transcription-dependent and tunable microgels,” BioRxiv, pp. 2024–06, 2024.

      (15) J.-K. Ryu, C. Bouchoux, H. W. Liu, E. Kim, M. Minamino, R. de Groot, A. J. Katan, A. Bonato, D. Marenduzzo, D. Michieletto et al., “Bridging-induced phase separation induced by cohesin smc protein complexes,” Science advances, vol. 7, no. 7, p. eabe5905, 2021.

      (16) A. Shakya, S. Park, N. Rana, and J. T. King, “Liquid-liquid phase separation of histone proteins in cells: role in chromatin organization,” Biophysical journal, vol. 118, no. 3, pp. 753–764, 2020.

      (17) A.-M. Florescu, P. Therizols, and A. Rosa, “Large scale chromosome folding is stable against local changes in chromatin structure,” PLoS computational biology, vol. 12, no. 6, p. e1004987, 2016.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knockdown mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage. The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated for by exogenous addition of iron or zinc. 

      While the manuscript does not directly investigate the transport function of ZFT through biochemical assays, the authors indirectly support the notion that ZFT can transport zinc by demonstrating its ability to compensate for a lack of zinc transport in a yeast heterologous system. Furthermore, phenotypic analyses suggest defects in iron availability, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function. Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. Although direct biochemical evidence for the transporter's substrate specificity and transport activity is lacking, the converging evidence, including changes in metal concentrations upon ZFT depletion, yeast complementation data, and phenotypic changes linked to iron deficiency, presents a convincing case. Some aspects of the results may appear somewhat unbalanced, particularly since iron transport could not be confirmed through heterologous complementation, while zinc-related phenotypes in the parasites have not been thoroughly explored (which is challenging given the limited number of zinc-dependent proteins characterized in Toxoplasma). Nevertheless, given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful. 

      We thank the reviewer for their assessment and would like to highlight that we now add direct biochemical characterisation in the new Figure 8, supporting our hypothesis and confirming iron transport by this protein.

      Reviewer #2 (Public review): 

      Summary: 

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite. 

      Strengths: 

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. Additionally, the authors build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion. 

      Weaknesses: 

      (1) Excess zinc was shown not to alter ZFT expression, but a cation chelator (TPEN) did lead to decreased expression. While TPEN is often used to reduce zinc levels, does it have any effect on iron levels? Could the reduction in ZFT after TPEN treatment be due to a reduction in the level of iron or another cation?

      WE thank the reviewers for this comment, we agree that TPEN is a fairly unspecific cation chelator so to determine if its effects are due to removal of zinc or other cations we treated with TPEN and either zinc or iron. Co-incubation of TPEN and zinc prevented ZFT depletion, while TPEN+FAC had no effect compared to TPEN alone (new Figure 6h and i), strongly suggesting the effects on ZFT abundance are linked to zinc and not just iron.  

      (2) ZFT expression was found to be dynamic depending on the size of the vacuole, based on mean fluorescence intensity measurements. Looking at protein levels by Western blot at different times during infection would strengthen this finding. 

      We show here that ZFT expression is highly dynamic, depending both the iron status of the host cell and the number of parasites/vacuole. However, validating this finding by western would be complex due to the highly unsynchronised nature of parasite replication and the large number (5x10<sup>6</sup> - 1x10<sup>7</sup>cells) of parasites required to visualise ZFT. Further, we show that ZFT is apparently internalised prior to degradation. For this reason, we have not attempted to validate this finding by western blotting at this time.

      (3) ZFT localization remained at the parasite periphery under low iron conditions. However, in the images shown in Figure S1c, larger vacuoles (containing 4-8 parasites) are shown for the untreated conditions, and single parasite-containing vacuoles are shown for the low iron condition. As ZFT localization is predominantly at the basal end of the parasite in larger PV and at the parasite periphery for smaller vacuoles, it would be better to compare vacuoles of similar size between the untreated and low-iron conditions.

      The reviewer brings up a good point, the concentration of iron chelator that we used here does not enable parasite replication, making an assessment of changes in localisation challenging. To address this, have new data using a much lower concentration of chelator (20 mM), which is still expected to impact the parasites (Hanna et al, 2025), but allows for replication. In this low iron environment, ZFT localisation remained significantly more peripheral (Fig. S1d,e), supporting our hypothesis that ZFT localisation is iron dependent, independent of vacuolar stage.

      Reviewer #3 (Public review): 

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements, including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. Overall, the data by Aghabi et al. reveal that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes. 

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging from the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration), as well as performing a yeast mutant complementation. This work is very thorough and clearly presented, leaving little doubt about this protein's function. 

      Weaknesses:

      This study offers no major novel insights into the biology of T. gondii. The transporter was already annotated as a zinc transporter (ToxoDB), was deemed essential (PMID: 27594426), and localized to the plasma membrane (PMID: 33053376). This study mostly confirms and validates these previous datasets. The authors identify three other proteins with a ZIT domain. Particularly, the role of TGME49_225530 is intriguing, as it is likely fitness-conferring (score: -2.8, PMID: 27594426) and has no subcellular localization assigned. Characterizing this protein as well, revealing its localization, and identifying if and how these transporters coordinate metal ion transport would have been worthwhile. 

      We agree that the work presented here validates the previous datasets, and if that was all we had done, we agree that the biological insights would be limited. However, we have gone significantly beyond the predictions, demonstrating dynamic localisation changes, iron-mediated regulation, the lack of substrate-based complementation and validating transport activity of both zinc and iron. Although in silico predictions and screens can be informative, it remains important to validate biological functions experimentally. While we agree that characterisation of TGME49_225530 (as well as the other two annotated ZIP proteins) would be interesting, and will certainly form part of our future plans, it is significantly beyond the scope of the presented manuscript.

      Another weakness is the data related to the impact of ZFT downregulation on the apicoplast in Figure 4. The authors show that downregulation of ZFT causes an increase in elongated apicoplasts (Figure 4d). The subsequent panels seem to show that the parasites present a dramatic growth defect at that time point. This growth arrest can directly explain the elongated apicoplast, but does not allow any conclusion about an impact on the organelle. In any case, an assessment of 'delayed death' as presented in Figure 4c seems futile, since the many other processes affected by zinc and iron depletion likely cause a rapid death, masking any potential delayed death.

      To address this point, we agree that given the importance of iron and zinc to the parasite that we cannot differentiate the death of the parasite due to apicoplast defects from death from other causes and we have modified the discussion to reflect this, as below.

      “However, given the delayed phenotype typically seen upon apicoplast disruption, we cannot determine if this is a direct effect of ZFT, or a downstream consequence of metal depletion”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific Comments: 

      (1) The background on the typical sequence features that would identify Toxoplasma ZIP homologues should be expanded and clarified. While these proteins are likely quite divergent and may lack many conserved features, the manuscript currently does not provide enough detail to assess how similar (or different) TgZIPs are from well-characterized family members. Additionally, the justification for focusing on TGGT1_261720 (ZFT) over TGGT1_225530, as stated in the first paragraph of the results section, seems unclear. There is no predictive data supporting a potential plasma membrane localization for TGGT1_225530 (yet this cannot be excluded), and TGGT1_225530 appears to have more canonical metal-binding motifs. I believe that the fact that only TGGT1_261720 is iron-regulated should be sufficient justification for its selection, and this point could be emphasized more clearly. Furthermore, the discussion mentions a leucine residue that may be associated with broad substrate specificity, but this is not addressed in the initial comparative sequence analysis. These residues and the HK motif are not actually addressed in the Gyimesi et al. reference currently mentioned; thus this could be clarified and updated with references (such as PMID: 31914589) that provide more recent insights into key residues involved in metal selectivity in ZIP transporters.

      We thank you for this comment, to address these points:

      We agree that the iron-mediated regulation is sufficient for our focus on ZFT and have clarified the text to reflect this, as described above.

      We have also updated the references as suggested, our apologies for this oversight.

      We have further expanded the discussion, especially with reference to our new results using heterologous expression in oocytes (please see above).

      (2) Figure 1D, Figure 2A, C, H, Figure 3D, Figure 6F, H, corresponding text and paragraph 2 of the Discussion: It seems that most of the "non-specific bands" annotated in Figure 1D, which are lower molecular weight products, are not present in the parental cell line, suggesting they may not be non-specific after all. These bands also vary depending on the cell line (e.g., promoter used, see Figures 2H and 3D) or experimental conditions (e.g., iron excess or depletion). Given the dynamic localization of ZFT during intracellular development, it may be worth exploring whether these lower molecular weight bands represent degraded forms of TgZFT, possibly corresponding to the basally-clustered signal observed by immunofluorescence, with only the full-length protein associating with the plasma membrane. This possibility should be investigated or at least discussed further.

      While the lower bands are not present in the parental, we do see them in other HA-tagged lines, especially when the expression of the tagged protein is low, seen below (Author response image 1). We don’t currently have an explanation for these, but we can confirm that they do not change in abundance in parallel with the full length protein, supporting our hypothesis that these bands are an artefact of the anti-HA antibody in our system. Although ZFT is clearly degraded (e.g. Fig. 1g), we currently do not believe these bands are ZFT c-terminal degradation products.

      Author response image 1.

      Western blot of ZFT-3HA<sub>zft</sub> and another HA-tagged unrelated cytosolic protein, demonstrating that the lower bands are most likely nonspecific.

      (3) It is unfortunate that ZFT could not complement a yeast iron transporter mutant cell line, as this would have provided a strong argument for ZFT's role in iron transport. The manuscript does not provide much detail about the Δfet2/3 yeast mutant line. Fet3 is the ferroxidase subunit, while Ftr1 is the permease subunit of the high-affinity iron transport complex in yeast. Fet2, however, appears to be Saccharomyces cerevisiae's VPS41 homolog. Therefore, is Δfet2/3 the most appropriate mutant to use, or would another mutant line (e.g., ΔFtr1) be a better choice? Additionally, while Figure 7 suggests a decrease in metal uptake upon ZFT depletion, it would be useful to test whether overexpression of ZFT leads to enhanced metal incorporation, perhaps using a FerroOrange assay. 

      We thank the reviewer for their comments, which we have answered below:

      The Δfet2/3 yeast mutant was a typo and has been corrected, or apologies, we did use the  Δfet3/4 mutant line, based on previous successful experiments involving plant metal transporters (e.g  (DiDonato et al., 2004)).

      Unfortunately, we were unable to perform the FerroOrange assay in the overexpression line as this line is endogenously fluorescent in the same channel as FerroOrange.

      However, as detailed above we have now added significant new data, confirming our hypothesis that ZFT is an iron/zinc transporter through heterologous expression in Xenopus oocytes in the new figure 8. This provides direct evidence of transport of iron, and evidence that zinc can inhibit this transport, consistent with our hypothesis.  

      (4) The annotation of the blot in Figure 2H suggests that overexpressed ZFT-TY can only be detected in the absence of heat denaturation. However, this is not addressed in the text. Does heat denaturation also affect the detection of ZFT-3HA or the lower molecular weight products? This should be clarified in the manuscript. 

      Interestingly, ZFT is detectable after boiling at 95° C for 5 minutes when expressed at endogenous (or near endogenous) levels in the ZFT-3HA<sub>sag1</sub> and ZFT-3HA<sub>zft</sub> tagged parasite lines. However, overexpression of ZFT leads to a loss of detection via western blot when boiled, although the protein is detectable without heat denaturation.

      A possible explanation for this is that overexpression of protein may cause ZFT to miss-fold, making the protein more prone to aggregation following boiling, rendering the protein insoluble and unable to enter the gel. Moreover, heat aggregation can sometimes mask the epitope tags on the protein that is required for the antibody to be recognised, possibly explaining by ZFT is undetectable when overexpressed and exposed to boiling conditions, as has previously been observed for other transmembrane proteins (e.g. (Tsuji, 2020)).

      We have clarified this in the results section, although we do not have a full explanation for this, we consider it important to share for others who may be looking at expression of these proteins.

      (5) Figure 3G: It might be helpful to include an uncropped gel profile to allow readers to visualize that the main product does indeed correspond to a potential dimeric form in the native PAGE. 

      This has now been added in Figure S3e, thank you for this suggestion.

      (6) The investigation of the impact of ZFT depletion on the apicoplast could be improved. The authors suggest that ZFT knockdown inhibits apicoplast replication based on a modest increase in elongated organelles, but the term "delayed death" is not appropriate in that case, as it is typically linked to a loss of the organelle. This is not observed here and is also illustrated by the unchanged CPN60 processing profile. So, clearly, there seems to be no strong morphological effect on the apicoplast early on after ZFT depletion. On the other hand, the authors dismiss any impact on TgPDH-E2 lipoylation (which is iron-dependent) based on the fact that the lipoylated form of the protein is still detected by Western blot. However, closer inspection of the blot in Figure 4B suggests that the intensity of the annotated TgPDH-E2 signal is reduced compared to the -ATc condition (although there might be differences in protein loading, as indicated by the control) or even with the mitochondrial 2-oxoglutarate dehydrogenase-E2, whose lipoylation is presumably iron-independent (see PMID: 16778769). This experiment should be repeated, and the results quantified properly in case something was missed, and the duration of depletion conditions perhaps extended further. Of note, it would also be worthwhile to revisit size estimations, as the displayed profiles seem inconsistent with the typical sizes of lipoylated proteins detected with the anti-lipoyl antibody (e.g., ~100 kDa for PDH-E2, ~60 kDa for branched-chain 2-oxo acid dehydrogenase, and ~40 kDa 2-oxoglutarate dehydrogenase).

      We thank the reviewer for this comment. We agree that there is no strong defect on the apicoplast in the first lytic cycle and we have modified the language to remove reference to delayed death, as given the magnitude of changes associated with loss of iron and zinc, we cannot be certain about the role of the apicoplast.

      Based on this suggestion, we have now quantified the levels of lipoylation of PDH-E2, BDCK-E2 and OGDH-E2 and now include this in Figure S4b, c, d. Supporting our other results, we do not see a significant change in PDH-E2 lipolyation upon ZFT knockdown. However, although OGDH-E2 lipoylation is unchanged (Figure S4c) interestingly we do see a significant increase in BDCK-E2 lipoylation (Figure S4d). This process is not expected to be directly iron related, as mitochondrial lipoylation is through scavenging rather than synthesis however, speaks to the larger mitochondrial disruption that we see. We now consider this further in the discussion.

      For the sizes, we thank the reviewer for bringing this up, our apologies this was due to an error in the annotation, and we have now corrected this in the figure.

      (7) In the third paragraph of the discussion, the authors mention the inability to complement ZFT loss by adding exogenous metals. One argument is the potential lack of metal access to the parasitophorous vacuole (PV). Although largely unexplored, this point could be expanded further in the discussion, as the issue of metal transport to the parasite involves not only the parasite plasma membrane but also the PV membrane. Additionally, the authors mention the absence of functional redundancy in transporters, but it would be helpful to discuss potential stage-specific or differential expression of other ZIP candidates. Transcriptomic data available on Toxodb.org could provide useful insights into this, and experimental approaches, such as RT-PCR, could be used to assess the expression of these candidates in the absence of ZFT. 

      On the issue of metals crossing the PV membrane, we agree that while we do not currently know mechanisms of metal transport within the infected host cell, we do have experimental confirmation that the concentration and form of the metals that we are using can impact the parasites. We show that metal treatment inhibits parasites growth (e.g. Figure 3k-n, Figure 6a-d) and we can detect the increased metals through our experiments using FerroOrange and FluroZine (Figure 7a, c). In these experiments, parasites were treated intracellularly and so we can confirm that, regardless of the mechanism, iron and zinc can reach the parasite. While entry of metals across the PV is an intriguing question, it is beyond the scope of the present work which focuses on the role of the selected transporter.

      We agree that a more detailed discussion of the other ZIP transporters is warranted. We have extended this section of the discussion although for now, we cannot determine the role of the other ZIP transporters in Toxoplasma.

      (8) In the discussion, the authors mention that « Inhibition of respiration has previously been linked to bradyzoite conversion ». To strengthen their point, the authors could mention that mitochondrial Fe-S mutants, as well as mutants affecting mitochondrial translation or the mitochondrial electron transport chain, also initiate bradyzoite conversion (PMID: 34793583). This would reinforce the connection between mitochondrial dysfunction and stage conversion. 

      This is an excellent point and we have added this to the discussion as follows:

      “Inhibition of mitochondrial Fe-S biogenesis or mitochondrial respiration have both previously been linked to bradyzoite conversion (Pamukcu et al., 2021; Tomavo and Boothroyd, 1995), however we do not yet know the signalling factors linking iron, zinc or mitochondrial function to bradyzoite differentiation”.

      (9) As a general comment on manuscript formatting, providing page and line numbers would significantly improve the manuscript's readability and allow reviewers to more easily reference specific sections. This would help address the minor issues of typos (e.g., multiple occurrences of "promotor"). I suggest a careful read-through to correct these issues. 

      We thank the reviewer for this comment and in the resubmitted version we have corrected these issues. 

      Reviewer #2 (Recommendations for the authors): 

      (1) In the alignment (Figure 1a), the BPZIP sequence is from which organism (genus, species)? It would be helpful to include this information in the figure legend.

      Apologies for this oversight, this figure and section have been reworked and the species name (Bordetella bronchiseptica) added.

      (2) In reference to Figure 1a, the authors state, "Interestingly, all parasite ZIP-domain proteins examined have a HK motif at the M2 metal binding". I was wondering if by "all" the authors mean Toxoplasma and Plasmodium falciparum (shown in Figure 1a) or did the authors also look at other apicomplexan parasites such as Cryptosporidium or Neospora? Is this a general feature of apicomplexan parasites? 

      We looked at this, and the HK motif in the M2 binding site is conserved in Neospora Cryptosporidium, and even the digenic gregarine Porospora cf. gigantea. However, in the more distantly related Chromera we find a HH motif at the same position. This suggests that the HK motif is present in the Apicomplexa, but not conserved in the free-living Alveolata. Although we cannot speculate on the role of this motif currently, its role in metal import in Apicomplexa does deserve future scrutiny. To reflect this finding we have modified Figure 1a and the text.

      (3) In Figure 1e, to better visualize the ZFT-3HA staining at the basal pole, it would be better to omit the DAPI staining from the merged image. It is difficult to see the ZFT staining in the image of the large vacuole.

      We have removed the DAPI from this image to improve clarity.

      (4) Based on the "delayed-death" phenotype of the apicoplast, it is not surprising that no defects were observed in CPN60 processing or protein lipoylation. Have the authors considered measuring these phenotypes after a further round of growth (as was done for visualizing apicoplast morphology)? 

      We agree that changes in apicoplast function are often only seen in the second round of replication. However, here we wanted to check if ZFT depletion led to immediate changes in function of the organelle, which was not the case. It is highly likely that after the second round, we would see significant defects in the apicoplast function, however given the immediate importance of iron and zinc to many processes within the parasite, we believe that these experiments would be complicated to interpret.

      (5) Depleting ZFT led to a reduction in expression levels for the mitochondrial Fe-S protein SDHB but not for a cytosolic Fe-S protein. Is it expected that less intracellular iron (via depleted ZFT) would differentially affect mitochondrial versus cytosolic Fe-S proteins? 

      Previous studies (e.g., Maclean et al., 2024; Renaud et al., 2025) have shown that upon direct inhibition of the cytosolic Fe-S pathway, ABCE1 is fairly stable and levels can persist for 2-3 days post treatment. However, our recent work has shown that rapid and acute depletion of iron directly (though treatment with a chelator) can lead to ABCE1 levels decreasing within 24h (Hanna et al., 2025). In the case of ZFT knockdown, due to the more gradual reduction in iron levels seen (e.g. Figure 7j) we believe the parasites are prioritising key Fe-S pathways (e.g. essential proteostasis through ABCE1), probably while remodelling metabolism (as seen in our Seahorse assays). However, there are many proteins expected to be directly impacted by iron and zinc restriction that these parasites experience, and different protein classes are expected to behave differently in these conditions.

      Reviewer #3 (Recommendations for the authors): 

      (1) Is the effect on the plaque size between T7S4-ZFT (-aTc) in regular and 'high iron' conditions significant? The authors show convincingly that the plaque size is smaller due to the swapped promoter and the resulting overexpression of ZFT. But is the effect aggravated in high iron? This would be expected if excess iron were the problem.

      The plaque sizes are significantly smaller in the T7S4-ZFT line under high iron compared to the untreated condition, and compared to the parental untreated line. However, if we normalise plaque size to untreated conditions for both lines, there is not a significant change in plaque size in high iron between the parental and T7S4-ZFT. This is possibly due to the concentration of iron used (200 mM), which may not be optimal to see this effect, or the time taken for plaque assays (6-7 days), which may allow the excess iron to be stored by the host cells, changing the effective concentration of parasite exposure.

      (2) I struggle to understand the intracellular growth assay in Figure 5b. Here, T7S4-ZFT parasites show 25 % of vacuoles with more than 8 parasites (labelled 8+). But such large vacuoles are not observed in the parental strain. It appears as if the inducible strain grows faster even though it was earlier shown to have a fitness defect (see Figure 3j). Can you please clarify?

      This is a result of rapid growth of the parental line, some vacuoles in this line lysed and initiated a new round of replication at this time point while we saw no evidence at any timepoint that ZFT-depleted parasites were able to lyse the host cell. However, the initial (24-48h post ATc addition) replication rate of the ZFT KD remains similar to the parental. In this panel, we wanted to emphasize that the major phenotype we see upon ZFT depletion is vacuole disorganisation, which we believe is linked to the start of differentiation into bradyzoites.

      (3) Did the authors perform an IFA in addition to the Western blot to localize the 2nd Ty-tagged ZFT copy? It seems important to validate that the protein correctly localizes to the plasma membrane. 

      We have done so and now include these data in Figure S2b. Overexpression of ZFT-Ty localises to internal structures (probably vesicles) with some signal at the periphery, however, this limited expression at the periphery is sufficient to mediate the phenotypes that we see.

      (4) First sentence of the abstract and introduction: The authors speak of metabolism and cellular respiration as though they are two different processes. Is respiration not part of metabolism? 

      This is an excellent point, we wanted to distinguish mitochondrial respiration  from general cellular metabolism, but this was not clear. We have now changed this in the introduction to the below:

      “Iron, and other transition metals such as zinc, manganese and copper, are essential nutrients for almost all life, playing vital roles in biological processes such as DNA replication, translation, and metabolic processes including mitochondrial respiration (Teh et al., 2024)”

      (5) 2nd paragraph of the introduction: toxoplasmosis is written capitalized but should be lower case.

      This has been corrected.

      (6) Figure 4j legend: change 'shits parasites to a more quiescent stage' to 'shifts parasites'.

      This has been corrected, our apologies.

      (7) Please correct the following sentence: 'These data demonstrate ZFT depletion leads to the expression of the bradyzoite-specific markers BAG1 and DBL.' DBL is not expressed by the parasite. It is a lectin that binds to the sugars in the cyst wall.

      We have now modified this in the text. The sentence now reads: “These data show that ZFT depletion leads to the expression of the bradyzoite marker BAG1 and the production of the cyst wall, as detected by DBL”.

      (8) In the section on yeast complementation with TgZFT, the authors write: 'Based on this success, we also attempted to complement...'. Please consider changing 'Success' to something more neutral.

      We have modified the text to now read: “Based on these results, we also attempted to complement”…

      (9) In the discussion, the authors write: 'We see a delayed phenotype on the apicoplast, suggesting that metal import is also required in this organelle, although no apicoplast metal transporters have yet been identified.' Please consider the study Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites (PMID: (38163252).

      We thank the reviewer for the note and have modified the text to include this and the reference. Please see below:

      “Iron is known to be required in the apicoplast (Renaud et al., 2022), zinc also may be required, as the fitness-conferring Plasmodium zinc transporter ZIP1 is transiently localised to the apicoplast (Shrivastava et al., 2024), although the functional relevance of this localisation has not yet been established”.

      (10) The authors write: 'Iron is known to be required in the apicoplast (Renaud et al., 2022), although a potential role for zinc in this organelle has not yet been established.' The role for zinc in the apicoplast may not have been shown formally, but surely among its hundreds of proteins, and those involved in replication and transcription, there are some that depend on zinc...?

      Yes, we agree it would make sense, however multiple searches using ToxoDB and the datasets from Chen et al (2025) were unable to find any apicoplast-localised proteins with zinc-binding domains. We cannot exclude that zinc is in the apicoplast, and the results from Plasmodium (Shrivastava et al., 2024) may suggest that is, however currently we do not have any evidence for its role within this organelle.

      References

      DiDonato, R.J., Roberts, L.A., Sanderson, T., Eisley, R.B., Walker, E.L., 2004. Arabidopsis Yellow Stripe-Like2 (YSL2): a metal-regulated gene encoding a plasma membrane transporter of nicotianamine-metal complexes. Plant J 39, 403–414. https://doi.org/10.1111/j.1365-313X.2004.02128.x

      Hanna, J.C., Shikha, S., Sloan, M.A., Harding, C.R., 2025. Global translational and metabolic remodelling during iron deprivation in Toxoplasma gondii. https://doi.org/10.1101/2025.08.11.669662

      Maclean, A.E., Sloan, M.A., Renaud, E.A., Argyle, B.E., Lewis, W.H., Ovciarikova, J., Demolombe, V., Waller, R.F., Besteiro, S., Sheiner, L., 2024. The Toxoplasma gondii mitochondrial transporter ABCB7L is essential for the biogenesis of cytosolic and nuclear iron-sulfur cluster proteins and cytosolic translation. mBio 15, e00872-24. https://doi.org/10.1128/mbio.00872-24

      Pamukcu, S., Cerutti, A., Bordat, Y., Hem, S., Rofidal, V., Besteiro, S., 2021. Differential contribution of two organelles of endosymbiotic origin to iron-sulfur cluster synthesis and overall fitness in Toxoplasma. PLoS Pathog 17, e1010096. https://doi.org/10.1371/journal.ppat.1010096

      Renaud, E.A., Maupin, A.J.M., Berry, L., Bals, J., Bordat, Y., Demolombe, V., Rofidal, V., Vignols, F., Besteiro, S., 2025. The HCF101 protein is an important component of the cytosolic iron–sulfur synthesis pathway in Toxoplasma gondii. PLoS Biol 23, e3003028. https://doi.org/10.1371/journal.pbio.3003028

      Shrivastava, D., Jha, A., Kabrambam, R., Vishwakarma, J., Mitra, K., Ramachandran, R., Habib, S., 2024. Plasmodium falciparum ZIP1 Is a Zinc-Selective Transporter with Stage-Dependent Targeting to the Apicoplast and Plasma Membrane in Erythrocytic Parasites. ACS Infect. Dis. 10, 155–169. https://doi.org/10.1021/acsinfecdis.3c00426

      Teh, M.R., Armitage, A.E., Drakesmith, H., 2024. Why cells need iron: a compendium of iron utilisation. Trends in Endocrinology & Metabolism 35, 1026–1049. https://doi.org/10.1016/j.tem.2024.04.015 Tomavo, S., Boothroyd, J.C., 1995. Interconnection between organellar functions, development and drug resistance in the protozoan parasite, Toxoplasma gondii. International Journal for Parasitology 25, 1293–1299. https://doi.org/10.1016/0020-7519(95)00066-B.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We thank Reviewer #1 for the careful reading of our manuscript and for the constructive comments. We have provided responses to each of the comments below.

      We greatly appreciate Reviewer #1’s accurate public review of our study on the kinesin motor using the DNA origami nanospring (NS). With respect to the strengths, we fully agree with Reviewer #1’s comments. Regarding the weakness, we would like to respond as follows.

      It is true that, unlike optical tweezers, our method does not provide real-time data display. Optical tweezers enable real-time observation and manipulation of kinesin molecules at arbitrary time points. Achieving real-time observation and manipulation is indeed an important challenge for the future development of the NS technique. On the other hand, Iwaki et al. (our co-corresponding author) has already investigated dynamic properties of motor proteins under load, such as step size and force–velocity relationship of myosin VI using NS. We are now preparing high spatiotemporal resolution microscopy experiments on the KIF1A system to measure its step size and force–velocity relationship, which inherently require such resolution.

      Reviewer #2 Public Review

      We appreciate the constructive comments of Reviewer #2, which have strengthened both the presentation and interpretation of our results.

      We would like to thank Reviewer #2 for providing a highly accurate assessment of the strengths of our experiments. Regarding the weaknesses, we would like to respond as follows. First, Iwaki et al. (our co-corresponding author) have already succeeded in observing the stepping motion of myosin VI using the nanospring (NS) in their previous work. We are also currently preparing high spatiotemporal resolution microscopy experiments to observe the stepping motion of KIF1A in our system. Second, while it is true that the NS does not follow Hooke’s law, it is possible to design and construct NSs with an appropriate dynamic range by tuning the spring constant to match the forces exerted by protein molecules. Finally, we agree that our first observation of the stall plateau in KIF1A using the NS is a meaningful achievement. However, with respect to the suggestion that “increasing validity requires also studying kinesin-1,” we have a somewhat different perspective. The validity of the NS method has already been thoroughly examined in the previous work on myosin VI by Iwaki et al., where results were compared with those obtained using optical tweezers. Moreover, the focus of this manuscript is on KAND caused by KIF1A mutations. From this perspective, although we appreciate the suggestion, we consider it important to keep the present study focused on KIF1A and its implications for KAND.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The authors detect the attachments that occur during a processive run by KIF1A by monitoring the suppression of the angular fluctuations of the fluorescent signal and plot this, for example, in Figure 3a as the Length of the NS (which presumably is a readout of force) vs time. This interval includes the time when the KIF1A is actively moving along the MT and when it is stalled. It would be interesting to know the actual stall time of the motor in order to be able to calculate a detachment rate constant. For attachment periods such as the first example highlighted in pink in Figure 3a, the stall time is pretty much equal to the attachment time since the motor is moving so fast and the stall period is so long. However, for short attachment times such as the fifth pink interval shown in this same figure or the traces with the mutant KIF1As in Figure 4 this is not so. Can the authors institute a program to identify the periods where the motor has stretched the NS spring to the point where it stalls, and then calculate this time in order to do an exponential fit to the "dwell time distribution"?

      By introducing another criterion (see Methods, “Rate of relative increase in NS’s length”), the attachment duration was separated into the two time regions noted by the reviewer. After reanalyzing all the data, we evaluated only the stall duration this time. As a result, the estimated stall-force values became more reliable and accurate. The dwell time analysis of was performed and included in the supplementary material for WT KIF1A, for which sufficient data were available.

      (2) The histogram of stall events in Figure 3b is quite broad. Please discuss.

      The newly added distributions from individual molecules (Fig. 3b) show that the variety in the stall force distribution is not due to multiple molecules, but is primarily an intrinsic property of single KIF1A molecules reflecting the complex kinetics of KIF1A under load, including occasional backward steps and reattachments. In addition, because the nanospring is a non-linear spring, a disadvantage is that even small fluctuations in extension can result in a substantial deviation in the measured stall force. These points have been added to the Discussion section.

      (3) Figure 3c, it is clear that for attachment times greater than 5s the attachment duration is independent of the Lstall, but this is not so clear for the short attachment durations. Some of this may relate to the fact that you're measuring attachment durations and not stall or dwell times as described in my first comment. Do you feel this is due to less precision in measuring the "attachment duration" during the short attachments, or just simply that more data is needed here? I assume that you do not want to imply that there is a load-dependence of the attachment durations here? Perhaps an expanded view of the data set from 0-10 seconds would clarify. 

      As described in our response to comment (1), the stall durations were separated from the attachment durations. This improved the measurement accuracy and revealed that and are uncorrelated (Fig. 3c). We appreciate this constructive comment.

      Reviewer #2 (Recommendations for the authors):

      (1) Off-axis forces are described as 'upward', 'perpendicular', and 'horizontal'. Consider referring to off-axis force, and if necessary, defining the direction of the force(s) relative to the axis of the immobilised MT. If necessary, a cartoon of XYZ axes might be added to F1c? 

      An XZ axis was added to the schematic in Fig. 1c.

      (2) If I understand correctly, stall forces are calculated by averaging the entire region in which the angular fluctuation is reduced below a threshold. In cases like the 3rd and 7th events on the trace in F1a, this will reduce the average. Perhaps consider separately averaging the later time points in each stall event? Perhaps also consider correlating the angular fluctuation signals and the spring length signal? Some fluctuations during stall plateaus might indicate slip back and re-engage events? 

      Instead of separately averaging the later time points in each stall event, we separated the stall force duration from the overall attachment duration (Fig. 3). This allowed us to obtain more accurate stall force values. The relationship between the NS length and the angular fluctuation during KIF1A slip-back events differed among individual stall events, and no clear trend was observed. Two representative examples are shown in the Author response image 1.

      Author response image 1.

      (3) Please describe all relevant methods fully instead of referencing previous work. For example, nanospring preparation refers readers to reference 21 (which in turn references an earlier paper).

      We revised the Methods section to include the procedures described in the previous reference, and we added the sequence information of the DNA origami to the supplementary information.

      (4) Were any experiments tried at reduced ATP concentration?

      (5) Were any data obtained from WT KIF5B? For kinesin-1, stall plateau forces of >7 pN are obtained.

      This study focused on comparing the stall forces of wild-type and KAND-related mutant KIF1A molecules under physiological ATP conditions, as our main goal was to characterize the disease-relevant phenotypes. Experiments at reduced ATP concentrations and with WT KIF5B are indeed important future directions but are beyond the scope of the present study. These follow-up experiments are currently in progress.

      (6) In Figure 1b, consider showing the attachment to the mutant KIF5B, and reversing the orientation so it corresponds to Figure 1c.

      KIF1A and KIF5B share the same binding method, so to indicate that the schematic in Fig. 1b represents both, we replaced ‘KIF1A’ with ‘Kinesin’.

      (7) In Figure 3d, add force axis. In general, please re-check all force axes. In Supplement S3, the stall plateau labels appear well above their corresponding axis ticks. In Figure 4, several mutants appear to be stalling at well over 5 pN, yet Table 1 gives a much lower value. Presumably, this reflects averaging effects?

      We added the force axis to Fig. 3d. Besides, we corrected Fig. S3 and Fig. 4 because there were errors in the conversion from length to force. As the reviewer pointed out, the apparent discrepancy between the force values in Fig. 4 and Table 1 arises mainly from averaging effects.

    1. Author response:

      General Statements

      In this manuscript we characterize an exquisitely reproducible model of iPSC differentiation into neuroepithelial cells, use it to mechanistically study cell shape changes and planar cell polarity signaling activation during this transition, then apply it to identify patient-specific cell deficiencies in both forward and reverse genetic screens as a power tool for patient-stratification in personalized medicine. To our knowledge, we provide the first evidence of a human pathogenic mutation directly impairing apical constriction: an evolutionarily conserved behavior of epithelial cells which is the subject of intense research. 

      We are very pleased with the balanced and rigorous reviews generated through Review Commons, which we have already used to improve our manuscript. Reviewer 1 highlights that our study “is significant not only for verifying the cell behaviors necessary for neural tube closure in a human iPSC model, but also for establishing a robust assay for the functional testing of NTD-associated sequence variants.” Reviewer 2 agrees that “results are solid and convincing, the data are quantitative, and the manuscript is well written”, and that our “derivation of patient lines from amniotic fluid and execution of experiments before birth is a remarkable demonstration that points toward precision-medicine applications, while motivating rescue strategies and additional clean genetic models.” Reviewer 3 is “enthusiastic about this work and believe it represents a significant step forward in the effort to establish precision medicine approaches for diagnoses of the patient-specific causative cellular defects underlying human neural tube closure defects.” 

      Below, we have replied to each of the reviewers’ comments.

      Description of the planned revisions

      R2.2. Lines 156-166. The authors claim that changes in gene expression precede morphological changes. I am not convinced this is supported by their data. Fig. 1g (epithelial thickness) and Fig. 1k (PAX6 expression) seem to have similar dynamics. The authors can perform a cross-correlation between the two plots to see which Δt gives maximum correlation. If Δt < 0, then it would suggest that gene expression precedes morphology, as they claim. Fig. 1j shows that NANOG drops before the morphological changes, but loss of NANOG is not specific to neural differentiation and therefore should not be related to the observed morphological changes.

      We are happy to do this analysis fully in revision. Our initial analysis performing crosscorrelation between apical area and CDH2 protein in one line shows the highest crosscorrelation at Δt = -1, suggesting neuroepithelial CDH2 increases before apical area decreases. In contrast, the same analysis comparing apical area versus PAX6 shows Δt = 0, suggesting concurrence. This analysis will be expanded to include the other markers we quantified and the manuscript text amended accordingly. We are keen to undertake additional experiments to test whether these cells swap their key cadherins – CDH1 and CDH2 - before they begin to undergo morphological changes (see the response to Reviewer 3’s minor comment 1 immediately below).

      R3.1(Minor) There seems to be a critical window at day 5 of the differentiation protocol, both in terms of cell morphology and the marker panel presented in Figure 1i. Do the authors have any data spanning the hours from day 5 to 6? If not, I don't think they need to generate any, but do I think this is a very interesting window worthy of further discussion for a couple of reasons. First, several studies of mouse neural tube closure have shown that various aspects of cell remodeling are temporally separable. For example, between Grego-Bessa et al 2016 and Brooks et al 2020 we can infer that apicobasal elongation rapidly increases starting at E8.5, whereas apical surface area reduction and constriction are apparent somewhat earlier at E8.0. I think it would be interesting to see if this separability is conserved in humans. Second, is there a sense of how the temporal correlation between the pluripotent and early neural fate marker data presented here corroborate or contradict the emerging set of temporally resolved RNA seq data sets of mouse development at equivalent early neural stages?

      Cell shape analysis between days 5 and 6 has now been added (see the response to point 2.1 below). As the reviewer predicted, this is a transition point when apical area begins to decrease and apicobasal elongation begins to increase.

      We also thank the reviewer for this prompt to more closely compare our data to the previous mouse publications, which we have added to the discussion. The Grego-Bessa 2016 paper appears to show an increase in thickness between E7.75 and E8.5, but these are not statistically compared. Previous studies showed rapid apicobasal elongation during the period of neural fold elevation, when neuroepithelial cells apically constrict. This has now been added to the discussion: 

      Discussion: “In mice, neuroepithelial apicobasal thickness is spatially-patterned, with shorter cells at the midline under the influence of SHH signalling[14,77,78]. Apicobasal thickness of the cranial neural folds increases from ~25 µm at E7.75 to ~50 µm at E8.5[79]: closely paralleling the elongation between days 2 and 8 of differentiation in our protocol. The rate of thickening is non-uniform, with the greatest increase occurring during elevation of the neural folds[80], paralleled in our model by the rapid increase in thickness between days 4-6 as apical areas decrease. Elevation requires neuroepithelial apical constriction and these cells’ apical area also decreases between E7.75 and E8.5 in mice[79], but we and others have recently shown that this reduction is both region and sex-specific[14,81]. Specifically, apical constriction occurs in the lateral (future dorsal) neuroepithelium: this corresponds with the identity of the cells generated by the dual SMAD inhibition model we use[56]. More recently, Brooks et al[82] showed that the rapid reduction in apical area from E8-E8.5 is associated with cadherin switching from CDH1 (E-cadherin) to CDH2 (N-cadherin). This is also directly paralleled in our human system, which shows low-level co-expression of CDH1 and CDH2 at day 4 of differentiation, immediately before apical area shrinks and apicobasal thickness increases.”

      Prompted by the in vivo data in Brooks et al (2025)[82], we are keen to further explore the timing of CDH1/CDH2 switching versus apical constriction with new experimental data in revisions.

      R3.2(Minor) 2) Can the authors elaborate a bit more on what is known regarding apicobasal thickening and pseudo-stratification and how their work fits into the current understanding in the discussion? This is a very interesting and less well studied mechanism critical to closure, which their model is well suited to directly address. I am thinking mainly of the Grego-Bessa at al., 2016 work on PTEN, though interestingly the work of Ohmura et al., 2012 on the NUAK kinases also shows reduced tissue thickening (and apical constriction) and I am sure I have missed others. Given that the authors identify MED24 as a likely candidate for the lack of apicobasal thickening in one of their patient derived lines, is there any evidence that it interacts with any of the known players?

      We have now added further discussion on the mechanisms by which the neuroepithelium undergoes apicobasal elongation. Nuclear compaction is likely to be necessary to allow pseudostratification and apicobasal elongation. The reviewer’s comment has led us to realise that diminished chromatin compaction is a potential outcome of MED24 down-regulation in our GOSB2 patient-derived line. Figure 4D suggests the nuclei of our MED24 deficient patientderived line are less compacted than control equivalents and we propose to quantify nuclear volume in more detail to explore this possibility.

      Additionally, we have already expanded our discussion as suggested by the reviewer:

      Discussion: “Mechanistic separability of apical constriction and apicobasal elongation is consistent with biomechanical modelling of Xenopus neural tube closure showing that both are independently required for tissue bending[61]. Nonetheless, neuroepithelial apical constriction and apicobasal elongation are co-regulated in mouse models: for example, deletion of Nuak1/2[83], Cfl1[84], and Pten[79] all produce shorter neuroepithelium with larger apical areas. Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium.

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68]. Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85]. As general regulators of polymerase activity, MED proteins have the potential to alter the timing or level of expression of many other genes, including those already known to influence pseudostratification or apicobasal elongation. MED depletion also causes redistribution of cohesion complexes[86] which may impact chromatin compaction, reducing nuclear volume during differentiation.”

      R3.3(Minor) 3) Is there any indication that Vangl2 is weakly or locally planar polarized in this system? Figure 2F seems to suggest not, but Supplementary Figure 5 does show at least more supracellular cable like structures that may have some polarity. I ask because polarization seems to be one of the properties that differs along the anteroposterior axis of the neural plate, and I wonder if this offers some insight into the position along the axis that this system most closely models?

      VANGL2 does not appear to be planar polarised in this system. This is similar to the mouse spinal neuroepithelium, in which apical VANGL2 is homogenous but F-actin is planar polarised (Galea et al Disease Models and Mechanisms 2018). We do observe local supracellular cablelike enrichments of F-actin in the apical surface of iPSC-derived neuroepithelial cells:

      Author response image 1.

      Preliminary identification of apical supracellular cables suggestive of local polarity. Top: F-actin staining shown in inverted grey LUT highlighting enrichment along directionally-polarised cell borders (blue arrows). Bottom: Staining orientation (blue ~ X axis, red ~ Y axis) based on OrientationJ analysis illustrating localised organisation of F-actin enrichment.

      We propose to compare the length of F-actin cables and coherency of their orientation at the start and end of neuroepithelial differentiation, and in wild-type versus VANGL2mutant epithelia.

      Description of the revisions that have already been incorporated in the transferred manuscript

      Reviewer #1:

      Major points

      (1) It is mentioned throughout the manuscript that 3 plates were evaluated per line. I believe these are independently differentiated plates. This detail is critical concerning rigor and reproducibility. This should be clearly stated in the Methods section and in the first description of the experimental system in the Results section for Figure 1.

      These experimental details have now been clarified. Unless otherwise stated, all findings were confirmed in three independently differentiated plates from the same line or at least one differentiation from each of three lines. 

      Methods: Unless otherwise stated, for each iPSC line three independently differentiated plates were generated and analysed, with each plate representing a separate differentiation experiment performed on different days.

      (2) For the patient-specific lines - how many lines were derived per patient?

      This has now been clarified in the methods. Microfluidic reprogramming of a small number of amniocytes produces one line per patient representing a pool of clones. Subcloning from individual cells would not be possible within the timeframe of a pregnancy. 

      Methods: For patient-specific iPSC lines, one independent iPSC line was obtained per patient following microfluidic mmRNA reprogramming.

      (3) Was the Vangl2 variant introduced by prime editing? Base editing? The details of the methods are sparse.

      We have now expanded these details:

      Methods: “VANGL2 knock-in lines were generated using CRSIPR-Cas9 homology directed repair editing by Synthego (SO-9291367-1). The guide sequence was AUGAGCGAAGGGUGCGCAAG and the donor sequence was CAATGAGTACTACTATGAGGAGGCTGAGCATGAGCGAAGGGTGTGCAAGAGGAGGGCCAGGTGGGTCCCTGGGGGAGAAGAGGAGAG.

      Sequence modification was confirmed by Sanger sequencing before delivery of the modified clones, and Sanger sequencing was repeated after expansion of the lines (Supplementary Figure 5) as well as SNP arrays (Illumina iScan, not shown) confirming genomic stability.”

      Author response image 2.

      Snapshot of Illumina iScan SNP array showing absence of chromosomal duplications or deletions in the CRISPR-modified VANGL2-knockin lines or their congenic control.

      (4) Suggested text changes.

      Some additional suggestions for improvement.

      The abstract could be more clearly written to effectively convey the study's importance. Here are some suggestions

      Line 26: Insert "apicobasal" before "elongation" - the way it is written, I initially interpreted it as anterior-posterior elongation.

      Line 29: Please specify that the lines refer to 3 different established parent iPSC lines with distinct origins and established using different reprogramming methods, plus 2 control patient-derived lines. - The reproducibility of the cell behaviors is impressive, but this is not captured in the abstract.

      Line 32: add that this mutation was introduced by CRISPR-Cas9 base/prime editing.

      The last sentence of the abstract states that the study only links apical constriction to human NTDs, but also reveals that neural differentiation and apical-basal elongation were found. The introduction could also use some editing.

      Line 71: insert "that pulls actin filaments together" after "power strokes" Line 73: "apically localized," do you mean "mediolaterally" or "radially"?

      Line 75: Can you specify that PCP components promote "mediolaterally orientated" apical constriction Lines 127: Specify that NE functions include apical basal elongation and neurodifferentiation are disrupted in patient-derived models

      All have now been corrected.

      Reviewer #2:

      Major comments:

      (1) Figure 1. The authors use F-actin to segment cell areas. Perhaps this could be done more accurately with ZO-1, as F-actin cables can cross the surface of a single cell. In any case, the authors need to show a measure of segmentation precision: segmented image vs. raw image plus a nuclear marker (DAPI, H2B-GFP), so we can check that the number of segmented cells matches the number of nuclei.

      We used ZO-1 to quantify apical areas of the VANGL2-konckin lines in Figure 3. Segmentation of neuroepithelial apical areas based on F-actin staining is commonplace in the field (e.g. in the Brooks et al 2022 paper cited by another reviewer), and is generally robust because the cell junctions are much brighter than any apical fibres not associated with the apical cortex. However, we accept that at earlier stages of differentiation there may be more apical fibres when cells are cuboidal. We have therefore repeated our analysis of apical area using ZO-1 staining as suggested, analysing a more temporally-detailed time course in one iPSC line. This new analysis confirms our finding of lack of apical area change between days 2-4 of differentiation, then progressive reduction of apical area between days 4-8, further validating our system. Including nuclear images is not helpful because of the high nuclear index of pseudostratified epithelia (e.g. see Supplementary Figure 7) which means that nuclei overlap along the apicobasal axis. Individual nuclei cannot be related to their apical surface in projected images.

      (3) Figure 2d. The laser ablation experiment in the presence of ROCK inhibitor is clear, as I can easily see the cell outlines before and after the experiment. In the absence of ROCK inhibitor, the cell edges are blurry, and I am not convinced the outline that the authors drew is really the cell boundary. Perhaps the authors can try to ablate a larger cell patch so that the change in area is more defined.

      The outlines on these images are not intended to show cell boundaries, but rather link landmarks visible at both timepoints to calculate cluster (not cell) change in area. This is as previously shown in Galea et al Nat Commun 2021 and Butler et al J Cell Sci 2019. We have now amended the visualisation of retraction to make representation of differences between conditions more intuitive. 

      (4) Figure 2d. Do the cells become thicker after recoil?

      This is unlikely because the ablated surface remains in the focal plane. Unfortunately, we are unable to image perpendicularly to the direction of ablation to test whether their apical surface moves in Z even by a very small amount. This has now been clarified in the results:

      Results: “The ablated surface remained within the focal plane after ablation, indicating minimal movement along the apical-basal axis.”

      (6) Lines 403-415. The authors report poor neural induction and neuronal differentiation in GOSB2. As far as I understand, this phenotype does not represent the in vivo situation. Thus, it is not clear to what extent the in vitro 2D model describes the human patient.

      The GOSB2 iPSC line we describe does represent the in vivo situation in Med24 knockout mouse embryos, but is clearly less severe because we are still able to detect MED24 protein expressed in this line. We do not have detailed clinical data of the patient from which this line was obtained to determine whether their neurological development is normal. However, it is well established that some individuals who have spina bifida also have abnormalities in supratentorial brain development. It is therefore likely that abnormalities in neuron differentiation/maturation are concomitant with spina bifida. Our findings in the GOSB2 line complement earlier studies which also identified deficiencies in the ability of patient-derived lines to form neurons, but were unable to functionally assess neuroepithelial cell behaviours we studied. This has now been clarified in the discussion:

      Discussion: “Neuroepithelial cells of the GOSB2 line described here, which has partial loss of MED24, similarly produces a thinner neuroepithelium with larger apical areas. Although apical areas were not analysed in mouse models of Med24 deletion, these embryos also have shorter and non-pseudostratified neuroepithelium. 

      Our GOSB2 line – which retains readily detectable MED24 protein – is clearly less severe than the mouse global knockout, and the clinical features of the patient from which this line was derived are milder than the phenotype of Med24 knockout embryos[68].

      Mouse embryos lacking one of Med24’s interaction partners in the mediator complex, Med1, also have thinner neuroepithelium and diminished neuronal differentiation but successfully close their neural tube[85].”

      (7) The experimental feat to derive cell lines from amniotic fluid and to perform experiments before birth is, in my view, heroic. However, I do not feel I learned much from the in vitro assays. There are many genetic changes that may cause the in vivo phenotype in the patient. The authors focus on MED24, but there is not enough convincing evidence that this is the key gene. I would like to suggest overexpression of MED24 as a rescue experiment, but I am not sure this is a single-gene phenotype. In addition, the fact that one patient line does not differentiate properly leads me to think that the patient lines do not strengthen the manuscript, and that perhaps additional clean mutations might contribute more.

      We appreciate the reviewer’s praise of our personalised medicine approach and fully agree that neural tube defects are rarely monogenic. The patient lines we studied were not intended to provide mechanistic insight, but rather to demonstrate the future applicability of our approach to patient care. Our vision is that every patient referred for fetal surgery of spina bifida will have amniocytes (collected as part of routine cystocentesis required before surgery) reprogrammed and differentiated into neuroepithelial cells, then neural progenitors, to help stratify their postnatal care. One could also picture these cells becoming an autologous source for future cellbased therapies if they pass our reproducible analysis pipeline as functional quality control. This has now been clarified in the discussion:

      Discussion: “The multi-genic nature of neural tube defect susceptibility, compounded by uncontrolled environmental risk factors (including maternal age and parity[102]), mean that patient-derived iPSC models are unlikely to provide mechanistic insight. They do provide personalised disease models which we anticipate will enable functional validation of genetic diagnoses for patients and their parents’ recurrence risk in future pregnancies, and may eventually stratify patients’ postnatal care. We also envision this model will enable quality control of patient-derived cells intended for future autologous cell replacement therapies, as is being developed in post-natal spinal cord injury[103]. Thus, the highly reproducible modelling platform we evaluate – which is robust to differences in iPSC reprogramming method, sex and ethnicity – represents a valuable tool for future mechanistic insights and personalised disease modelling applications.”

      Significance:

      In addition, the model was unsuccessful in one of the two patient-derived lines, which limits generalizability and weakens claims of patient-specific predictive value.

      We disagree with the reviewer that “the model was unsuccessful in one of the two patientderived lines”. The GOSB1 line demonstrated deficiency of neuron differentiation independently of neuroepithelial biomechanical function, whereas the GOSB2 line showed earlier failure of neuroepithelial function. We also do not, at this stage, make patient-specific predictive claims: this will require longer-term matching of cell model findings with patient phenotypes over the next 5-10 years.  

      Reviewer #3:

      Major comments

      (1) One of my few concerns with this work is that the relative constriction of the apical surface with respect to the basal surface is not directly quantified for any of the experiments. This worry is slightly compounded by the 3D reconstructions Figure 1h, and the observation that overall cell volume is reduced and cell height increased simultaneously to area loss. Additionally, the net impact of apical constriction in tissues in vivo is to create local or global curvature change, but all the images in the paper suggest that the differentiated neural tissues are an uncurved monolayer even missing local buckles. I understand that these cells are grown on flat adherent surfaces limiting global curvature change, but is there evidence of localized buckling in the monolayer? While I believe-along with the authors-that their phenotypes are likely failures in apical constriction, I think they should work to strengthen this conclusion. I think the easiest way (and hopefully using data they already have) would be to directly compare apical area to basal area on a cell wise basis for some number of cells. Given the heterogeneity of cells, perhaps 30-50 cells per condition/line/mutant would be good? I am open to other approaches; this just seems like it may not require additional experiments.

      As the reviewer observes, our cultures cannot bend because they are adhered on a rigid surface. The apical and basal lengths of the cultures will therefore necessarily be roughly equal in length. Some inwards bending of the epithelium is expected at the edges of the dish, but these cannot be imaged. The live imaging we show in Figure 2 illustrates that, just as happens in vivo, apical constriction is asynchronous. This means not all cells will have ‘bottle’ shapes in the same culture. We now illustrate the evolution of these shapes in more detail in Supplementary Figure 1.

      Additionally, the reviewer’s comment motivated us to investigate local buckles in the apical surface of our cultures when their apical surfaces are dilated by ROCK inhibition. We hypothesised that the very straight apical surface in normal cultures is achieved by a balance of apical cell size and tension with pressure differences at the cell-liquid interface. Consistent with our expectation, the apical surface of ROCK-inhibited cultures becomes wrinkled (Supplementary figure 4). The VANGL2-KI lines do not develop this tortuous apical surface (as shown in Figure 3), which is to be expected given their modification is present throughout differentiation unlike the acute dilation caused by ROCK inhibition.

      This new data complements our visualisation of apical constriction in live imaging, apical accumulation of phospho-myosin, and quantification of ROCK-dependent apical tension as independent lines of evidence that our cultures undergo apical constriction. 

      (2) Another slight experimental concern I have regards the difference in laser ablation experiments detailed in Figure 3h-i from those of Figure 2d-e. It seems like WT recoil values in 3h-I are more variable and of a lower average than the earlier experiments and given that it appears significance is reached mainly by impact of the lower values, can the authors explain if this variability is expected to be due to heterogeneity in the tissue, i.e. some areas have higher local tension? If so, would that correspond with more local apical constriction?

      There is no significant difference in recoil between the control lines in Figures 2 and 3, albeit the data in Figure 3 is more variable (necessitating more replicates: none were excluded). We also showed laser ablation recoil data in Supplementary Figure 10, in which we did identify a graphing error (now corrected, also no significant difference in recoil from the other control groups as shown in Author response image 3).

      Author response image 3.

      Recoil following laser ablation is not significantly different between different experiments. X axis labels indicate the figure panel each set of ablation data is shown in. Points represent an independent differentiation dish.

      (4)(Minor) I think some of the commentary on the strengths and limitations of the model found in the Results section should be collated and moved to the discussion in a single paragraph. For example, this could also briefly touch on/compare to some of the other models utilizing hiPSCs (These are mentioned briefly in the intro, but this comparison could be elaborated on a bit after seeing all the great data in this work).

      These changes have now been made:

      Discussion: “Some of these limitations, potentially including inclusion of environmental risk factors, can be addressed by using alternative iPSC-derived models[93,94]. For example, if patients have suspected causative mutations in genes specific to the surface (non-neural) ectoderm, such as GRHL2/3, 3D models described by Karzbrun et al[49] or Huang et al[95] may be informative. Characterisation of surface ectoderm behaviours in those models is currently lacking. These models are particularly useful for high-throughput screens of induced mutations[95], but their reproducibility between cell lines, necessary to compare patient samples to non-congenic controls, remains to be validated. Spinal cell identities can be generated in human spinal cord organoids, although these have highly variable morphologies[96,97]. As such, each iPSC model presents limitations and opportunities, to which this study contributes a reductionist and highly reproducible system in which to quantitatively compare multiple neuroepithelial functions.”

      (5) While the authors are generally good about labeling figures by the day post smad inhibition, in some figures it is not clear either from the images or the legend text. I believe this includes supplemental figures 2,5,6,8, and 10 (apologies if I simply missed it in one or more of them)

      These have now been added.

      (6) The legend for Figure 2 refers to a panel that is not present and the remaining panel descriptions are off by a letter. I'm guessing this is a versioning error as the text itself seems largely correct, but it may be good to check for any other similar errors that snuck in

      This has now been corrected.

      (7) The cell outlines in Figure 3d are a bit hard to see both in print and on the screen, perhaps increase the displayed intensity?

      This has now been corrected.

      Description of analyses that authors prefer not to carry out

      R2.5. Figure 3. The authors mention their previous study in which they show that Vangl2 is not cell-autonomously required for neural closure. It will be interesting to study whether this also the case in the present human model by using mosaic cultures.

      The reviewer is correct that this is one of the exciting potential future applications of our model, which will first require us to generate stable fluorescently-tagged lines (to identify those cells which lack VANGL2). We will also need to extensively analyze controls to validate that mixing fluo-tagged and untagged lines does not alter the homogeneity of differentiation, or apical constriction, independently of VANGL2 deletion. As such, the reviewer is suggesting an altogether new project which carries considerable risk and will require us to secure dedicated funding to undertake.

      R3.8(Minor) The authors show a fascinating piece of data in Supplementary Figure 1, demonstrating that nuclear volume is halved by day 8. Do they have any indication if the DNA content remains constant (e.g., integrated DAPI density)? I suppose it must, and this is a minor point in the grand scheme, but this represents a significant nuclear remodeling and may impact the overall DNA accessibility.

      We agree with the reviewer that the reduction in nuclear volume is important data both because it informs understanding of the reduction in total cell volume, and because it suggests active chromatin compaction during differentiation. Unfortunately, the thicker epithelium and superimposition of nuclei in the differentiated condition means the laser light path is substantially different, making direct comparisons of intensity uninterpretable. Additionally, the apical-most nuclei will mostly be in G2/M phase due to interkinetic nuclear migration. As such, the comparison of DAPI integrated density between epithelial morphologies would not be informative (Author response image 4).

      Author response image 4.

      Lateral views of DAPI-stained nuclei on Days 2 and 8 of differentiation. Note the rapid loss of staining intensity below the apical pseudo-row of nuclei on Day 8. This intensity change is likely due to the apical nuclei being in G2/M phase and therefore having more DNA, and rapid loss of 405nm wavelength signal at depth.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work addresses a key question in cell signalling: how does the membrane composition affect the behaviour of a membrane signalling protein? Understanding this is important, not just to understand basic biological function but because membrane composition is highly altered in diseases such as cancer and neurodegenerative disease. Although parts of this question have been addressed on fragments of the target membrane protein, EGFR, used here, Srinivasan et al. harness a unique tool, membrane nanodisks, which allow them to probe full-length EGFR in vitro in great detail with cutting-edge fluorescent tools. They find interesting impacts on EGFR conformation in differently charged and fluid membranes, explaining previously identified signalling phenotypes.

      Strengths:

      The nanodisk system enables full-length EGFR to be studied in vitro and in a membrane with varying lipid and cholesterol concentrations. The authors combine this with single-molecule FRET utilising multiple pairs of fluorophores at different places on the protein to probe different conformational changes in response to EGF binding under different anionic lipid and cholesterol concentrations. They further support their findings using molecular dynamics simulations, which help uncover the full atomistic detail of the conformations they observe.

      Weaknesses:

      Much of the interpretation of the results comes down to a bimodal model of an 'open' and 'closed' state between the intracellular tail of the protein and the membrane. Some of the data looks like a bimodal model is appropriate, but its use is not sufficiently justified (statistically or otherwise) in this work in its current form. The experiments with varying cholesterol in particular appear to suggest an alternate model with longer fluorescent lifetimes. More justification of these interpretations of the central experiment of this work would strengthen the paper.

      We thank the reviewer for highlighting the strengths of the study, including the use of nanodiscs, single-molecule FRET, and MD simulations to probe full-length EGFR in controlled membrane environments.

      We agree that statistical justification is important for interpreting the distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two Gaussians (µ = 2.64 and 3.43 ns) are not separable, implying they represent one broad distribution rather than two states.

      Author response table 1.

      Both the two- and three-Gaussian models include a low-value component (µ = ~1.3 ns), but the apparent improvement of the three-Gaussian model arises only from splitting the central population into two overlapping Gaussians. Thus, while the BIC favors the three-Gaussian model statistically, Ashman’s D demonstrates that the central peak should not be interpreted as bimodal. Therefore, when all the distributions are fit globally, the data are best explained as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework.

      We have clarified this in the revised manuscript by adding a section in the Methods (page 26) titled Model Selection and Statistical Analysis, which describes the results of the global two- versus three-Gaussian fits evaluated using BIC and Ashman’s D. Additional details of these analyses are also provided in response to Reviewer #1, Question 8 (Recommendations for the authors).

      Reviewer #2 (Public review):

      Summary:

      Nanodiscs and synthesized EGFR are co-assembled directly in cell-free reactions. Nanodiscs containing membranes with different lipid compositions are obtained by providing liposomes with corresponding lipid mixtures in the reaction. The authors focus on the effects of lipid charge and fluidity on EGFR activity.

      Strengths:

      The authors implement a variety of complementary techniques to analyze data and to verify results. They further provide a new pipeline to study lipid effects on membrane protein function.

      We thank the reviewer for noting the strengths of our approach, particularly the use of complementary techniques and the development of a new pipeline to study lipid effects on membrane protein function.

      Weaknesses:

      Due to the relative novelty of the approach, a number of concerns remain.

      (1) I am a little skeptical about the good correlation of the nanodisc compositions with the liposome compositions. I would rather have expected a kind of clustering of individual lipid types in the liposome membrane, in particular of cholesterol. This should then result in an uneven distribution upon nanodisc assembly, i.e., in a notable variation of lipid composition in the individual nanodiscs. Could this be ruled out by the implemented assays, or can just the overall lipid composition of the complete nanodisc fraction be analyzed?

      We monitored insertion of anionic lipids into nanodiscs by performing zeta potential measurements, which report on surface charge, and cholesterol insertion by Laurdan fluorescence, which reports on membrane order. Both assays provide information at the ensemble level, not single-nanodisc resolution. We clarified this in the Methods section (see below).

      Cholesterol clustering is well documented in ternary systems with saturated lipids and sphingolipids [Veatch, Biophys J., 2003; Risselada, PNAS, 2008]. However, in unsaturated POPC-cholesterol mixtures such as those used here, cholesterol primarily alters bilayer order and large-scale segregation is not typically observed.  The addition of POPS to the POPC-cholesterol mixture perturbs cholesterol-induced ordering, lowering the likelihood of cholesterol-rich domains [Kumar, J. Mol. Graphics Modell., 2021].

      Lipid heterogeneity between nanodiscs would be expected to give rise to heterogeneity in hydrodynamic properties, including potentially broadening the dynamic light scattering (DLS) distributions. However, the full width at half maximum (FWHM) values from the DLS measurements (see Author response table 2) do not indicate a broadening with cholesterol. Statistical testing (Mann-Whitney U test for non-normal data) showed no significant difference between samples with and without cholesterol (p = 0.486; n = 4 per group). While the sample size is small making firm conclusions challenging, these results suggest that large-scale heterogeneity is unlikely.

      Author response table 2.

      In the case of POPS lipids, clustering of POPS in EGFR embedded nanodiscs is a recognized property of receptor-lipid interactions. Molecular dynamics simulations have shown that POPS, although constituting only 30% of the inner leaflet, accounts for ~50% of the lipids directly contacting EGFR [Arkhipov, Cell, 2013], underscoring that anionic lipids are preferentially recruited to the receptor’s immediate environment.

      For nanodiscs containing cholesterol and anionic lipids, our smFRET experiments were designed to isolate the effect of EGF binding. The nanodisc population is the same in the ± EGF conditions as EGF was introduced just prior to performing sm-FRET experiments, and not during nanodisc assembly. Thus, for a given lipid composition, any observed differences between ligand-free and ligand-bound states reflect conformational changes of EGFR.

      Methods, page 23, “Zeta potential measurements to quantify surface charge of nanodiscs: Data analysis was processed using the instrumental Malvern’s DTS software to obtain the mean zeta-potential value. This ensemble measurement reports the average surface charge of the nanodisc population, verifying incorporation of anionic POPS lipids.”

      Methods, page 23, “Fluorescence measurements with Laurdan to confirm cholesterol insertion into nanodiscs: The excitation spectrum was recorded by collecting the emission at 440 nm and emission spectra was recorded by exciting the sample at 385 nm. Laurdan fluorescence provides an ensemble readout of membrane order and confirms cholesterol incorporation into the nanodisc population. While laurdan does not resolve the composition of individual nanodiscs, prior work has shown that POPC–cholesterol mixtures are miscible without forming cholesterol-rich domains[91,92], thus the observed ordering changes likely reflect the intended input cholesterol content at the ensemble level.”

      (91) Veatch, S. L. & Keller, S. L. Separation of liquid phases in giant vesicles of ternary mixtures of phospholipids and cholesterol. Biophysical journal, 85(5), 3074-3083 (2003).

      (92) Risselada, H. J. & Marrink, S. J. The molecular face of lipid rafts in model membranes. Proceedings of the National Academy of Sciences 105(45), 17367–17372 (2008).

      (2) Both templates have been added simultaneously, with a 100-fold excess of the EGFR template. Was this the result of optimization? How is the kinetics of protein production? As EGFR is in far excess, a significant precipitation, at least in the early period of the reaction, due to limiting nanodiscs, should be expected. How is the oligomeric form of the inserted EGFR? Have multiple insertions into one nanodisc been observed?

      We thank the reviewer for these insightful questions. Yes, the EGFR:ApoA1∆49 template ratio of 100:1 was empirically determined through optimization experiments now shown in the revised Supplementary Fig. 3. Cell-free reactions were performed across a range of EGFR:ApoA1∆49 template ratios (1:2 to 1:200) and sampled at different time points (2-19 hours). As shown in the gels, EGFR expression increased with higher template ratios and longer reaction times up to ~9 hours, while ApoA1 expression became clearly detectable only after 6 hours. Based on these results, we selected an EGFR:ApoA1∆49 ratio of 100:1 and 8-hour reaction time as the optimal condition, which yielded sufficient full-length EGFR incorporated into nanodiscs for ensemble and single-molecule experiments.

      In cell-free systems, protein yield does not scale directly with DNA template concentration, as translation efficiency is limited by factors such as ribosome availability and co-translational membrane insertion [Hunt, Chem. Rev., 2024; Blackholly, Front. Mol. Biosci., 2022]. Consistent with this, we observed that ApoA1∆49 is produced at higher levels than EGFR despite the lower DNA input (Supplementary Fig. 2b). Providing an excess EGFR template prevents the reaction from becoming limited by scaffold availability and helps compensate for the fact that, as a large multi-domain receptor, EGFR expression can yield truncated as well as full-length products. This strategy ensures that sufficient full-length receptors are available for nanodisc incorporation. We will clarify this in the Methods section (see below).

      We observed little to no visible precipitation under the reported cell-free conditions, likely due to the following reasons: (i) EGFR and ApoA1∆49 are co-expressed in the cell-free reaction, and ApoA1∆49 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink (ii) ApoA1∆49 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      The sample contains donor-labeled EGFR (snap surface 594) together with acceptor-labeled lipids (cy5-labeled PE doped in the nanodisc). We assess the oligomerization state of EGFR in nanodiscs using single-molecule photobleaching of the donor channel. Snap surface 594 is a benzyl guanine derivative of Atto 594 that reacts with the SNAP tag with near-stoichiometry efficiency [Sun, Chembiochem, 2011]. Most molecules (~75%) exhibited a single photobleaching step, consistent with incorporation of a single EGFR per nanodisc [Srinivasan, Nat. Commun., 2022]. A minority of traces (~15%) showed two photobleaching steps and about ~10% of traces showed three or more photobleaching steps, consistent with occasional multiple insertions. For all smFRET analysis, we restricted the dataset to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      Methods, page 20, “Production of labeled, full-length EGFR nanodiscs: Briefly, the E.Coli slyD lysate, in vitro protein synthesis E.Coli reaction buffer, amino acids (-Methionine), Methionine, T7 Enzyme, protease inhibitor cocktail (Thermofisher Scientific), RNAse inhibitor (Roche) and DNA plasmids (20ug of EGFR and 0.2ug of ApoA1∆49) were mixed with different lipid mixtures. The DNA template ratio of EGFR:ApoA1∆49 = 100:1 was empirically chosen by testing different ratios on SDS-PAGE gels and selecting the condition that maximized full-length EGFR expression in DMPC lipids (Supplementary Fig. 3).”

      (3) The IMAC purification does not discriminate between EGFR-filled and empty nanodiscs. Does the TEM study give any information about the composition of the particles (empty, EGFR monomers, or EGFR oligomers)? Normalizing the measured fluorescence, i.e., the total amount of solubilized receptor, with the total protein concentration of the samples could give some data on the stoichiometry of EGFR and nanodiscs.

      Negative-stain TEM was performed to confirm nanodisc formation and morphology, but this method does not resolve whether a given disc contains EGFR. To directly assess receptor stoichiometry, we instead relied on single-molecule photobleaching of snap surface 594-labeled EGFR (see response to Point 2). These experiments showed that the majority of nanodiscs contain a single receptor, with a minority containing two receptors. For all smFRET analyses, we restricted data to single-step photobleaching traces, ensuring measurements were performed on monomeric EGFR.

      We did not normalize EGFR fluorescence to total protein concentration because the bulk protein fraction after IMAC purification includes both receptor-loaded and empty nanodiscs. The latter contribute to ApoA1∆49 mass but do not contain receptors and including them would underestimate receptor occupancy. Importantly, the presence of empty nanodiscs does not affect our measurements as photobleaching and single-molecule FRET analyses selectively report only on receptor-containing nanodiscs. This clarification has been added to the Methods.

      Methods, page 26, “Fluorescence Spectroscopy: Traces with a single photobleaching step for the donor and acceptor were considered for further analysis. Regions of constant intensity in the traces were identified by a change-point algorithm95. Donor traces were assigned as FRET levels until acceptor photobleaching. The presence of empty nanodiscs does not influence these measurements, as photobleaching and single-molecule FRET analyses selectively report on receptor-containing nanodiscs.”

      (4) The authors generally assume a 100% functional folding of EGFR in all analyzed environments. While this could be the case, with some other membrane proteins, it was shown that only a fraction of the nanodisc solubilized particles are in functional conformation. Furthermore, the percentage of solubilized and folded membrane protein may change with the membrane composition of the supplied nanodiscs, while non-charged lipids mostly gave rather poor sample quality. The authors normalize the ATP binding to the total amount of detectable EGFR, and variations are interpreted as suppression of activity. Would the presence of unfolded EGFR fractions in some samples with no access to ATP binding be an alternative interpretation?

      We agree that not all nanodisc-embedded EGFR molecules may be fully functional and that the fraction of folded protein could vary with lipid composition. In our ATP-binding assay, EGFR detection relies on the C-terminal SNAP-tag fused to an intrinsically disordered region. Successful labeling requires that this segment be translated, accessible, and folded sufficiently to accommodate the SNAP reaction, which imposes an additional requirement compared to the rigid, structured kinase domain where ATP binds. Misfolded or truncated EGFR molecules would therefore likely fail to label at the C-terminus. These factors strongly imply that our assay predominantly reports on receptor molecules that are intact and well folded.

      Additionally, our molecular dynamics simulations at 0% and 30% POPS support the experimental ATP-binding measurements (Fig. 2c, d). This consistency between both the experimental and simulated evidence, including at 0% POPS where reduced receptor folding might be expected, suggests that the observed lipid-dependent changes are more likely due to modulation of the functional receptor rather than receptor misfolding. We have clarified these points by adding the following

      Results, page 7, “Role of anionic lipids in EGFR kinase activity: In the presence of EGF, increasing the anionic lipid content decreased the number of contacts from 71.8 ± 1.8 to 67.8 ± 2.4, indicating increased accessibility, again in line with the experimental findings. Because detection of EGFR relies on labeling at the C-terminus and ATP binding requires an intact kinase domain, the ATPbinding assay is for receptors that are properly folded and competent for nucleotide binding. The consistency between experimental results and MD simulations suggests that the observed lipiddependent changes are more likely due to modulation of functional EGFR than to artifacts from misfolding.”

      Reviewer #1 (Recommendations for the authors):

      The experimental program presented here is excellent, and the results are highly interesting. My enthusiasm is dampened by the presentation in places which is confusing, especially Figure 3, which contains so many of the results. I also have some reservations about the bimodal interpretation of the lifetime data in Figure 3.

      We thank the reviewer for their positive assessment of our experimental approach and results. In the revised version, we have improved figure organization and readability by adding explicit labels for lipid composition and EGF presence/absence in all lifetime distributions, moving key supplementary tables into main text, and reorganizing the supplementary figures as Extended Data Figures following eLife’s format. Figures and tables now appear in the order in which they are referenced in the text to further improve readability.

      Regarding the bimodal interpretation of the lifetime distribution, we have performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC) and Ashman’s D analysis, which supported the bimodal interpretation. Details of this analysis are provided in our response to comment (8) below and included in the manuscript.

      Specific comments below:

      (1) Abstract -"Identifying and investigating this contribution have been challenging owing to the complex composition of the plasma membrane" should be "has".

      We have corrected this error in the revised manuscript.

      (2) Results - p4 - some explanation of what POPC/POPS are would be helpful.

      We have added the text below discussing POPC and POPS.

      Results, page 4, “POPC is a zwitterionic phospholipid forming neutral membranes, whereas POPS carries a net negative charge and provides anionic character to the bilayer[56]. Both PC and PS lipids are common constituents of mammalian plasma membranes, with PC enriched in the outer leaflet and PS in the inner leaflet[22].”

      (22) Lorent, J. H., Levental, K. R., Ganesan, L., Rivera-Longsworth, G., Sezgin, E., Doktorova, M., Lyman, E. & Levental, I. Plasma membranes are asymmetric in lipid unsaturation, packing and protein shape. Nature Chemical Biology 16, 644–652 (2020).

      (56) Her, C., Filoti, D. I., McLean, M. A., Sligar, S. G., Ross, J. A., Steele, H. & Laue, T. M. The charge properties of phospholipid nanodiscs. Biophysical journal 111(5), 989–998 (2016).

      (3) Figure 2b - it would be easier to compare if these were plotted on top of each other. Are we at saturating ATP binding concentration or below it? Also, please put a key to say purple - absent and orange +EGF on the figure. I am also confused as to why, with no EGF, ATP binding is high with 0% POPS, but low when EGF is present, but that then reverses with physiological lipid content.

      While we agree that a direct comparison would be easier, the ATP-binding experiments for the ± EGF conditions were actually performed independently on separate SDS-PAGE gels, which unfortunately precludes such a comparison. We have added a color key to clarify the -EGF and +EGF datasets.

      The experiments were carried out at 1 µM of the fluorescently labeled ATP analogue (atto647Nγ ATP). Reported kinetic measurements for the isolated EGFR kinase domain indicate an K<sub>m</sub> of 5.2 µM suggesting that our experimental concentration is below, but close to the saturating range ensuring sensitivity to changes in accessibility of the binding site rather than saturating all available receptors.

      We have revised the manuscript to clarify these details by including the following text:

      Results, page 6, “To investigate how the membrane composition impacts accessibility, we measured ATP binding levels for EGFR in membranes with different anionic lipid content. 1 µM of fluorescently-labeled ATP analogue, atto647N-γ ATP, which binds irreversibly to the active site, was added to samples of EGFR nanodiscs with 0%, 15%, 30% or 60% anionic lipid content in the absence or presence of EGF.”

      Methods, page 24, “ATP binding experiments: Full-length EGFR in different lipid environments was prepared using cell-free expression as described above. 1μM of snap surface 488 (New England Biolabs) and atto647N labeled gamma ATP (Jena Bioscience) was added after cell-free expression and incubated at 30 °C , 300 rpm for 60 minutes. 1μM of atto647N-γ ATP was used, corresponding to a concentration near the reported Km of 5.2 µM for ATP binding to the isolated EGFR kinase domain[93], ensuring sensitivity to lipid-dependent changes in ATP accessibility.”

      (ii) Nucleotide binding is suppressed under basal conditions, likely to ensure that the catalytic activity is promoted only upon EGF stimulation.

      The molecular dynamics simulations at 0% and 30% POPS further support this interpretation, showing that anionic lipids modulate the accessibility of the ATP-binding site in a manner consistent with experimental trends (Fig. 2c and 2d).

      We have clarified these points in the main text with the following additions:

      Results, page 6, “In the presence of EGF, ATP binding overall increased with anionic lipid content with the highest levels observed in 60% POPS bilayers. In the neutral bilayer, ligand seemed to suppress ATP binding, indicating anionic lipids are required for the regulated activation of EGFR.”

      Results, page 7, “In the absence of EGF, increasing the anionic lipid content from 0\% POPS to 30% POPS increased the number of ATP-lipid contacts 58.6±0.7 to 74.4±1.2, indicating reduced accessibility, consistent with the experimental results and suggesting anionic lipids are required for ligand-induced EGFR activity.”

      (93) Yun, C. H., Mengwasser, K. E., Toms, A. V., Woo, M. S., Greulich, H., Wong, K. K., Meyerson,M. & Eck, M.J. The T790M mutation in EGFR kinase causes drug resistance by increasing the affinity for ATP. PNAS, 105(6), 2070–2075 (2008).

      (4) Figure 2d - how was the 16A distance arrived at?

      We thank the reviewer for pointing this out. The 16 Å cutoff was chosen based on the physical dimensions of the ATP analogue used in the experiments. Specifically, the largest radius of the atto647N-γ ATP molecule is ~16.9 Å, which defines the maximum distance at which lipid atoms could sterically obstruct access of ATP to the binding pocket. Accordingly, in the simulations, contacts were defined as pairs of coarse-grained atoms between lipid molecules and the residues forming the ATP-binding site (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831) separated by less than 16 Å.

      We have rewritten the rationale for selecting the 16 Å cutoff in the Methods section to improve clarity.

      Methods, page 28, “Coarse-grained, Explicit-solvent Simulations with the MARTINI Force Field: We analyzed our simulations using WHAM[108,109] to reweight the umbrella biases and compute the average values of various metrics introduced in this manuscript. Specifically, we calculated the distance between Residue 721 and Residue 1186 (EGFR C-terminus) of the protein. To quantify the accessibility of the ATP-binding site, we calculated the number of contacts between lipid molecules and the residues forming the ATP-binding pocket (residues 694-703, 719, 766-769, 772-773, 817, 820, and 831)[110]. Close contact between the bilayer and these residues would sterically hinder ATP binding; thus, the contact number serves as a proxy for ATP-site accessibility. The cutoff distance for defining a contact was set to 16 Å, corresponding to the largest molecular radius of the fluorescent ATP analogue (atto647N-γ ATP, 16.96 Å111). Accordingly, we defined a contact as a pair of coarse-grained atoms, one from the lipid membrane and one from the ATP binding site, within a mutual distance of less than 16 Å.”

      (5) Figure 2e-h - I think a bar chart/violin plot/jitter plot would make it easier to compare the peak values. The statistics in the table should just be quoted in the text as value +/- error from the 95% confidence interval. The way it is written currently is confusing, as it implies that there is no conformational change with the addition of EGF in neutral lipids, but there is ~0.4nm one from the table. I don't understand what you mean by "The larger conformational response of these important domains suggests that the intracellular conformation may play a role in downstream signaling steps, such as binding of adaptor proteins"?

      We thank the reviewer for these suggestions. For the smFRET lifetime distributions (Figure 2j, k; previously Figure 2e, f), we have now included jitter plots of the donor lifetimes in the Supplementary Figure 11 to facilitate direct visual comparison of the median and distribution widths for each lipid composition and ±EGF conditions. The distance distributions for the ATP to C-terminus in Figure 2e, f (previously Figure 2g, h) were obtained from umbrella-sampling simulations that calculate free-energy profiles rather than raw, unbiased distance values. Because the sampling is guided by biasing potentials, individual distance values cannot be used to construct violin or jitter plots. We therefore present the simulation data only as probability density distributions, which best reflect the equilibrium distributions derived from them.

      We have also revised the text to report the median ± 95% confidence interval, improving clarity and consistency with the statistical table.

      Results, page 9: “In the neutral bilayer (0% POPS), the distributions in the absence of EGF peaks at 8.1 nm (95% CI: 8.0–8.2 nm) and in the presence of EGF peaks at 8.6 nm (95% CI: 8.5–8.7 nm) (Table 1, Supplementary Table 1). In the physiological regime of 30% POPS nanodiscs, the peak of the donor lifetime distribution shifts from 9.1 nm (95% CI: 8.9–9.2 nm) in the absence of EGF to 11.6 nm (95% CI: 11.1–12.6 nm) in the presence of EGF (Table 1, Supplementary Table 1), which is a larger EGF-induced conformational response than in neutral lipids.”

      Finally, we have rephrased the sentence in question for clarity. The revised text now reads:

      Results, page 9: “The larger conformational response observed in the presence of anionic lipids suggests that these lipids enhance the responsiveness of the intracellular domains to EGF, potentially ensuring interactions between C-terminal sites and adaptor proteins during downstream signaling.”

      (6) "r, highlighting that the charged lipids can enhance the conformational response even for protein regions far away from the plasma membrane" - is it not that the neutral membrane is just very weird and not physiological that EGFR and other proteins don't function properly?

      We agree with the reviewer that completely neutral (0% POPS) membranes are not physiological and likely do not support the native organization or activity of EGFR. We have revised the text to clarify that the 30% POPS condition represents a more native-like lipid environment that restores or stabilizes the expected conformational response, rather than "enhancing" it. The revised sentence now reads:

      Results, page 10: “Both experimental and computational results show a larger EGF-induced conformational change in the partially anionic bilayer, consistent with the notion that a partially anionic lipid bilayer provides a more native environment that supports proper receptor activation, compared to the non-physiological neutral membrane.”

      (7) "snap surface 594 on the C-terminal tail as the donor and the fluorescently-labeled lipid (Cy5) as the acceptor (Supplementary Fig. 2, 11)." Why not refer to Figure 3a here to make it easier to read?

      We have added the reference to Figure 3a, and we thank the Reviewer for the suggestion.

      (8) Figure 3 - the bimodality in many of these plots is dubious. It's very clear in some, i.e. 0% POPS +EGF, but not others. Can anything be done to justify bimodality better?

      We agree that statistical justification is important for interpreting lifetime distributions. To address this, we performed global fits of the data with both two- and three-Gaussian models and evaluated them using the Bayesian Information Criterion (BIC), which balances the model fit with a penalty for additional parameters. The three-Gaussian model gave a substantially lower BIC, indicating statistical preference for the more complex model. However, we also assessed the separability of the Gaussian components using Ashman’s D, which quantifies whether peaks are distinct. This analysis showed that two of the Gaussians are not separable, implying they represent one broad distribution rather than two discrete states. Therefore, when all the distributions are fit globally, the data are best described as two Gaussians, one centered at ~1.3 ns and the other at ~2.7 ns, with cholesterol-dependent shifts reflecting changes in the distribution of this population rather than the emergence of a separate state. We better justified our choice of model by incorporating the results of the global two- vs three-Gaussian fits with BIC and Ashman’s D analysis in the revised manuscript.

      Methods, page 27: “Model Selection and Statistical Analysis

      Global fitting of lifetime distributions was performed across all experimental conditions using maximum likelihood estimation. Both two-Gaussian and three-Gaussian distribution models were evaluated as described previously.62 Model performance was compared using the Bayesian Information Criterion (BIC),[101] which balances model likelihood and complexity according to

      BIC = -2 ln L + k ln n

      where L is the likelihood, k is the number of free parameters, and n is the number of singlemolecule photon bunches across all experimental conditions. A lower BIC value indicates a statistically better model[101]. The separation between Gaussian components was subsequently assessed using the Ashman’s D where a score above 2 indicates good separation[102]. For two Gaussian components with means µ1, µ2 and standard deviations σ1, σ2,

      where Dij represents the distance metric between Gaussian components i and j. All fitted parameters, likelihood values, BIC scores, and Ashman’s D values are summarized in Supplementary Table 5.”

      (101) Schwarz, G. Estimating the dimension of a model. The Annals of Statistics, 461–464 (1978).

      (102) Ashman, K. M., Bird, C. M. & Zepf, S. E. Detecting bimodality in astronomical datasets. The Astronomical Journal 108(6), 2348–2361 (1994).

      (9) Figure 3c - can you better label the POPS/POPC on here?

      We thank the reviewer for this suggestion. In the revised manuscript, Figure 3b (previously Figure 3c) has been updated to label the lipid composition corresponding to each smFRET distribution to make the comparison across conditions easier to follow.

      (10) Figure 3g - it looks like cholesterol causes a shift in both the peaks, such that the previous open and closed states are not the same, but that there are 2 new states. This is key as the authors state: "Remarkably, high anionic lipids and cholesterol content produce the same EGFR conformations but with opposite effects on signaling-suppression or enhancement." But this is only true if there really are the same conformational states for all lipid/cholesterol conditions. Again, the bimodal models used for all conditions need to be justified.

      We appreciate the reviewer’s insightful comment. We agree that the interpretation of the lifetime distributions depends on whether cholesterol and anionic lipids modulate existing conformational states or create new ones. To test this, we performed global fits of all distributions using the two- and three-Gaussian models and compared them using the Bayesian Information Criterion (BIC) and Ashman’s D, the results of which are described in detail in response to (8) above.

      Both fitting models, two- and three-Gaussian, identified the same short lifetime component (µ = 1.3 ns), suggesting this reflects a well separated conformation. While the three-Gaussian model gave a lower BIC, Ashman’s D analysis indicated that the two of the three components (µ = 2.6 ns and 3.4 ns) are not statistically separable, suggesting they represent a single broad conformational population rather than distinct states. If instead these two components reflected distinct states present under different conditions, Ashman’s D analysis would have found the opposite result. This supports our interpretation that high cholesterol and high anionic lipid content produce similar conformation ensembles with opposite effects on signaling output.

      Finally, we acknowledge that additional conformations may exist, but based on this analysis a bimodal model describes the populations captured in our data and so we limit ourselves to this simplest framework. We have clarified this rationale in the revised manuscript and added the results of the BIC and Ashman’s D analysis to support this interpretation.

      (11) Why are we jumping about between figures in the text? Figure 1d is mentioned after Figure 2. Also, DMPC is shown in the figures way before it is described in the text. It is very confusing. Figure 3 is so compact. I think it should be spread out and only shown in the order presented in the text. Different parts of the figure are referred to seemingly at random in the text. Why is DMPC first in the figure, when it is referred to last in the text?

      Following the Reviewer’s comment, we have revised the figure order and layout to improve readability and ensure consistency with the text. The previous Figures 1d-f which introduce the single-molecule fluorescence setup are now Figure 2g-i, positioned immediately before the first single-molecule FRET experiments (Fig 2j, k). The DMPC distribution in Figure 3 has been moved to the Supplementary Information (Supplementary Fig. 17), where it is shown alongside POPC, as these datasets are compared in the section “Mechanism of cholesterol inhibition of EGFR transmembrane conformational response”. The smFRET distributions in Figure 3 are now presented in the same sequence as they are discussed in the text, and the figure has been spread out for better clarity.

      (12) Throughout, I find the presentation of numerical results, their associated error, and whether they are statistically significantly different from each other confusing. A lot of this is in supplementary tables, but I think these need to go in the main text.

      To improve clarity and ensure that key quantitative results are easily accessible, we have moved the relevant supplementary tables to the main text. Specifically, the following tables have been incorporated into the main manuscript:

      (i) Median distance between the ATP binding site and the EGFR C-terminus, or between membrane and EGFR C-terminus from smFRET measurements (previously supplementary table 1 is now main table 1)

      (ii) Median distance between the membrane and the EGFR C-terminus in different anionic lipid environments (previously supplementary table 4 is now main table 2)

      (iii) Median distance between the membrane and the EGFR C-terminus in different cholesterol environments (previously supplementary table 8 and 12 is now combined to be main table 3)

      (13) Supplementary figures - in general, there is a need to consider how to combine or simplify these for eLife, as they will have to become extended data figures.

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have reorganized the supplementary figures into extended data figures in accordance with eLife’s format. Specifically:

      - Supplementary Figs. 1–7 are now grouped as Extended Data Figures for Figure 1 in the main text. They are now Figure 1 - figure supplements 1–7.

      - Supplementary Fig. 8–11 is now Extended Data Figure associated with Figure 2. It is now Figure 2 - figure supplements 1–4.

      - Supplementary Figs. 12–17 are now grouped as Extended Data Figures for Figure 3. They are now Figure 3 - figure supplements 1–6.

      (14) Supplementary Figure 2 - label what the two bands are in the EGFR and pEGFR sets at the bottom of panel c.

      We thank the reviewer for this comment. The two bands shown in the EGFR and pEGFR blots in Supplementary Fig. 2d (previously Supplementary Fig. 2c) corresponds to replicate samples under identical conditions. We have now clarified this in the figure legend and labeled the lanes as “Rep 1” and “Rep 2” in the revised figure and modified the figure legend.

      Supplementary Figure 2, page 31: “(d) Western blots were performed on labelled EGFR in nanodiscs. Anti-EGFR Western blots (left) and anti-phosphotyrosine Western blots (right) tested the presence of EGFR and its ability to undergo tyrosine phosphorylation, respectively, consistent with previous experiments on similar preparations[18, 54, 55]. The two lanes in each blot correspond to replicate samples under identical conditions.”

      (15) Supplementary Figures 3+4 - a bar chart/boxplot or similar would be easier for comparison here.

      In the revised version, we have replaced the histograms with jitter plots showing the nanodisc size distributions for each condition in supplementary figures 4 and 5 (previously supplementary figures 3 and 4). The plots display individual measurements with a horizontal line indicating the mean size (mean ± standard deviation values provided in the caption).

      (16) Supplementary Figures 10, 12, 13, 15, 16 - I would jitter these.

      We have incorporated jitter plots for the relevant datasets in Supplementary Figures 11, 13, 15, 16 and 17 (previously supplementary figures 10, 12 13, 15 and 16) to provide a clearer visualization of the data distributions and median values.

      Reviewer #2 (Recommendations for the authors):

      (1) Reactions were performed in 250 µL volumes. What is the average yield of solubilized EGFR in those reactions? Are there differences in the EGFR solubilization with the various lipid mixtures?

      The amount of solubilized EGFR produced in each 250 µL cell-free reaction was below the reliable detection limit for quantitative absorbance assays. At these protein levels, little to no EGFR precipitation was observed for all lipid compositions. Although exact yields could not be determined, fluorescence-based detection confirmed the presence of functional, nanodiscincorporated EGFR suitable for smFRET and ensemble fluorescence experiments. We observed variability in total yield between independent reactions within the same lipid composition, which is common for cell-free systems, but no consistent trend attributable to lipid composition.

      (2) Figure S2: It would be better to have a larger overview of the particles on a grid to get a better impression of sample homogeneity.

      TEM images showing a larger field of view have been added for each lipid composition in Supplementary Figures 4 and 5.

      (3) Figure 2b: It appears that there is some variation in the stoichiometry of ApoA1 and EGFR within the samples. Have equal amounts of each sample been analyzed? Are there, in addition, some precipitates of EGFR? It would further be good to have a negative control without expression to get more information about the additional bands in Figure S2b. As they do not appear in the fluorescent gel, it is unlikely that they represent premature terminations of EGFR.

      The fluorescence intensity from the bound ATP analogue (Atto 647N-ATP) and from the snap surface 488 label, which binds stoichiometrically to the SNAP tag at the EGFR C-terminus, was measured for each sample. The relative amount of ATP binding was quantified for each sample by normalizing to the EGFR content (Figure 2b). This normalization accounts for the different amounts of EGFR produced in each condition.

      We did not observe any visible precipitation under the reported cell-free conditions, likely due to the following reasons:

      (i) EGFR and ApoA1 are co-expressed in the cell-free reaction, and ApoA1 assembles into nanodiscs concurrently with receptor translation, providing an immediate membrane sink

      (ii) ApoA1 is expressed at high levels, maintaining disc concentrations that keep the reaction in a soluble regime.

      A control cell-free reaction containing only ApoA1∆49 (1 µg) and no EGFR template, analyzed after affinity purification, showed a single prominent band at ~ 25 kDa (gel image below), corresponding to ApoA1, along with faint background bands typical of Ni-NTA purification from cell-lysates. These weak, non-specific bands likely arise from co-purification of endogenous E.coli proteins.  

      The ApoA1∆49-only control gel has now been included as part of the supplementary figure 2.

      (4) Figure S2c: It would be better to show the whole lanes to document the specificity of the antibodies. Anti-Phosphor antibodies are frequently of poor selectivity. In that case, a negative control with corresponding tyrosine mutations would be helpful.

      We have updated Figure S2d (previously Figure S2c) to include the full gel lanes to better illustrate the specificity of both the total EGFR and phospho-EGFR (Y1068) antibodies. The results show a single clear band at the expected molecular weight for EGFR, conforming antibody specificity.

      (5) The Results section already contains quite some discussion. I would thus recommend combining both sections.

      We thank the reviewer for the suggestion. We have now created a results and discussion section to better reflect the content of these paragraphs, with the previous discussion section now a subsection focused on implications of these results.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study identifies a mechanism responsible for the accumulation of the MET receptor in invadopodia, following stimulation of Triple-negative breast cancer (TNBC) cells with HGF. HGF-driven accumulation and activation of MET in invadopodia causes the degradation of the extracellular matrix, promoting cancer cell invasion, a process here investigated using gelatin-degradation and spheroid invasion assays.

      Mechanistically, HGF stimulates the recycling of MET from RAB14-positive endosomes to invadopodia, increasing their formation. At invadopodia, MET induces matrix degradation via direct binding with the metalloprotease MT1-MMP. The delivery of MET from the recycling compartment to invadopodia is mediated by RCP, which facilitates the colocalization of MET to RAB14 endosomes. In this compartment, HGF induces the recruitment of the motor protein KIF16B, promoting the tubulation of the RAB14-MET recycling endosomes to the cell surface. This pathway is critical for the HGF-driven invasive properties of TNBC cells, as it is impaired upon silencing of RAB14.

      Strengths:

      The study is well-organized and executed using state-of-the-art technology. The effects of MET recycling in the formation of functional invadopodia are carefully studied, taking advantage of mutant forms of the receptor that are degradation-resistant or endocytosis-defective.

      Data analyses are rigorous, and appropriate controls are used in most of the assays to assess the specificity of the scored effects. Overall, the quality of the research is high.

      The conclusions are well-supported by the results, and the data and methodology are of interest for a wide audience of cell biologists.

      We sincerely thank the reviewer for his/her positive feedback and for considering our study to be well executed and rigorous. The valuable suggestions and comments will certainly improve the understanding of the role of the RAB14-RCP-KIF16B axis in MET trafficking and breast cancer invasion. Below we have addressed each of the concerns and suggestions point to point raised by the reviewer.

      Weakness:

      The role of the MET receptor in invadopodia formation and cancer cell dissemination has been intensively studied in many settings, including triple-negative breast cancer cells. The novelty of the present study mostly consists of the detailed molecular description of the underlying mechanism based on HGF-driven MET recycling. The question of whether the identified pathway is specific for TNBC cells or represents a general mechanism of HGF-mediated invasion detectable in other cancer cells is not addressed or at least discussed

      We thank the reviewer for raising this point. We want to clarify that in TNBCs, the lack of the hormonal receptor progesterone receptor, estrogen receptor, and HER2 makes the overexpression of EGFR and MET crucial in terms of prognosis and treatment (PMID: 27655711, 25368674). Hence study of MET signalling and trafficking is more relevant for TNBCs compared to other cancer cells. We will add an explanation in the discussion section in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Khamari and colleagues investigate how HGF-MET signaling and the intracellular trafficking of the MET receptor tyrosine kinase influence invadopodia formation and invasion in triple-negative breast cancer (TNBC) cells. They show that HGF stimulation enhances both the number of invadopodia and their proteolytic activity. Mechanistically, the authors demonstrate that HGF-induced, RAB4- and RCP-RAB14-KIF16B-dependent recycling routes deliver MET to the cell surface specifically at sites where invadopodia form. Moreover, they report that MET physically interacts with MT1-MMP - a key transmembrane metalloproteinase required for invadopodia function- and that these two proteins co-traffic to invadopodia upon HGF stimulation.

      Although the HGF-MET axis has previously been implicated in invadopodia regulation (e.g., by Rajadurai et al., Journal of Cell Science 2012), studies directly linking ligand-induced MET trafficking with the spatial regulation of MT1-MMP localization and activity have been lacking.

      Overall, the manuscript addresses a relevant and timely topic and provides several novel insights. However, some sections require clearer and more concise writing (details below). In addition, the quality, reliability, and robustness of several data sets need to be improved.

      Strengths:

      A key strength of the study is the novel demonstration that HGF-mediated, RAB4- and RAB14-dependent recycling of MET delivers this receptor, together with MT1MMP, to invadopodia -highlighting a previously unrecognized mechanism, regulating the formation and proteolytic function of these invasive structures. Another strong point is the breadth of experimental approaches used and the substantial amount of supporting data. The authors also include an appropriate number of biological replicates and analyze a sufficiently large number of cells in their imaging experiments, as clearly described in the figure legends.

      We greatly appreciate the positive assessment we have from the reviewer, who also acknowledged the novelty and relevance of our study. Below, we have carefully addressed the comments/concerns raised regarding this study and will strengthen the reliability and robustness by revisiting the data, providing additional analyses where required, and clarifying methodological details.

      Weakness:

      (1) Inappropriate stimulation times for endocytosis and recycling assays. The experiments examining MET endocytosis and recycling following HGF stimulation appear to use inappropriate incubation times. After ligand binding, RTKs typically undergo endocytosis within minutes and reach maximal endosomal accumulation within 5-15 minutes. Although continuous stimulation allows repeated rounds of internalization, the temporal dynamics of MET trafficking should be examined across shorter time points, ideally up to 1 hour (e.g., 15, 30, and 60 minutes). The authors used 2-, 3-, or 6-hour HGF stimulation, which, in my opinion, is far too long to study ligandinduced RTK trafficking.

      We understand the reviewer’s concern regarding the HGF stimulation time point for endocytosis and recycling. We want to highlight that to study the recycling/surface delivery of MET in response to HGF, we performed TIRF microscopy-based imaging, where images were taken within 1h of HGF addition (Fig. 2I). Additionally, we will incorporate surface biotinylation to show the recycling of MET as suggested in comment -7. Moreover, we have observed the effect of HGF on gelatin degradation and invadopodia formation after 3h of HGF stimulation. We were curious to know where MET resides with prolonged ligand stimulation. Hence, to study the localization of MET to invadopodia or the endocytic markers, the cells were stimulated with HGF for 2-3 hours. 

      (2) Low efficiency of MET silencing in Figure S1I. The very low MET knockdown efficiency shown in Figure S1I raises concerns. Given the potential off-target effects of a single shRNA and the insufficient silencing level, it is difficult to conclude whether the reduction in invadopodia number in Figure 1F is genuinely MET-dependent. The authors later used siRNA-mediated silencing (Figure S5C), which was more effective. Why was this siRNA not used to generate the data in Figure 1F? Why did the authors rely on the inefficient shRNA C#3?

      We understand the concern raised by the reviewer. We want to emphasize that we have employed three different approaches to investigate the effect of MET silencing/inhibition on invadopodia formation. (i) A MET kinase inhibitor, PHA665752, which shows reduced invadopodia formation. (Fig. 1D, E). (ii) Silencing with shRNA: Since the level of silencing of MET with the shRNA was not sufficient, cells were stained with MET as a readout for MET silencing, and images of the cells with depleted MET expression were captured, and invadopodia numbers were quantified (Fig. 1F). (iii) Using the SMARTpool siRNA of MET, we have shown the MT1-MMP containing invadopodia in Fig S5E, which shows another evidence of the role of MET in invadopodia activity. An additional graph showing invadopodia formation derived from the siRNA-mediated MET silencing will be added to the revised figure.

      (3) Missing information on incubation times and inconsistencies in MET protein levels. The figure legends do not indicate how long the cells were incubated with HGF or the MET inhibitor PHA665752 before immunoblotting. This information is crucial, particularly because both HGF and PHA665752 cause a substantial decrease in the total MET protein level. Notably, such a decrease is absent in MDA-MB-231 cells treated with HGF in the presence of cycloheximide (Figure S2F). The authors should comment on these inconsistencies. Additionally, the MET bands in Figure S1J appear different from those in Figure S1C, and MET phosphorylation seems already high under basal conditions, with no further increase upon stimulation (Figure S1J). The authors should address these issues. 

      We apologise for the unintentional omission of experimental detailing about HGF or drug incubation time, which will be incorporated into the figure legend appropriately. The blot will be replaced with a more appropriate representative image.

      Regarding the decreased MET level in the drug-treated condition: literature suggests that the MET inhibitor PHA665752 also promotes MET degradation, corroborating our result shown in Fig. S1J (PMID: 15788682, 18327775). Further in Fig. S1J, the relative phosphorylation of MET when compared to the total MET level in the HGF-treated condition is higher. We will add the quantification in the revised manuscript to add more clarity.

      Next, in the fig. S1C, the rabbit anti-MET (CST, D1C2 XP) antibody has been used, which binds to a c-terminal motif of MET and identifies both the 170kDa as well as 140kDa protein representing the uncleaved and cleaved form of MET. In Fig. S1J, the mouse antiMET (CST, L6E7) antibody has been used, which binds to an N-terminal motif of MET and recognizes only the 140kDa protein.

      (4) Insufficient representation and randomization of microscopic data. For microscopy, only single representative cells are shown, rather than full fields containing multiple cells. This is particularly problematic for invadopodia analysis, as only a subset of cells forms these structures. The authors should explain how they ensured that image acquisition and quantification were randomized and unbiased. The graphs should also include the percentage of cells forming invadopodia, a standard metric in the field. Furthermore, some images include altered cells - for example, multinucleated cells - which do not accurately represent the general cell population.

      We thank the reviewer for raising this point. The single-cell images are shown for clarity and to visualize the subcellular features; however, the conclusions are made based on the quantitative analysis of multiple cells collected from multiple frames (at least 30 frames per condition). Here, we would like to highlight that the image acquisition has been done over random fields in a coverslip. In the graphs shown in Fig. 1B, 1C, 4F, S1F, S1H, S5J’ it can be seen that there are frames where there is no degradation or invadopodia formed, which has also been taken into account. For a better representation of the population of cellforming invadopodia, a graph showing the percentage of cells forming invadopodia will be added to the figure.

      (5) Use of a single siRNA/shRNA per target. As noted earlier, using only one siRNA or shRNA carries the risk of off-target effects. For every experiment involving gene silencing (MET, RAB4, RAB14, RCP, MT1-MMP), at least two independent siRNAs/shRNAs should be used to validate the phenotype.

      We would like to clarify that we are using SMARTPool siRNA, which contains 4 individual siRNAs for the target gene. Literature suggests that using a pool of siRNA has reduced offtarget effects compared to using single oligos for gene silencing (PMID: 14681580, 33584737, 24875475).

      While SMARTpool siRNA minimizes the off-target effect, it does not eliminate the possibility of it. To confirm that the observed phenotypes are specifically attributable to the genes investigated in this study, we will perform additional experiments using two independent siRNAs targeting RCP and RAB14. RAB4 is known to be associated with MET trafficking (PMID: 21664574, 30537020), and we have taken RAB4 as a positive control. Hence, we feel the suggested experiment is not required to support the conclusion made regarding RAB4.

      For MET, we have used shRNA and an inhibitor to show the effect of MET inhibition/perturbation in the invadopodia-associated activity, which validates the observations of siRNA-mediated gene silencing.

      We have shown the effect of MT1-MMP depletion on invadopodia formation using a CRISPR-based gene knock-out study, and another study from our group has shown the effect using siRNA (PMID: 31820782), which supports our MT1-MMP KO cell observation.    

      (6) Insufficient controls for antibody specificity. The specificity of MET, p-MET, and MT1-MMP staining should be demonstrated in cells with effective gene silencing. This is an essential control for immunofluorescence assays.

      MET immunofluorescence staining in the MET-depleted condition has been provided in Fig. 1F, and an immunoblot for the siRNA-mediated gene silencing has been provided in Fig. S5C. We will add the entire field of view to show the MET silencing in Fig. 1F.

      The inhibition of MET kinase activity using PHA665752 abolished the MET phosphorylation, as shown in Fig S1J. In line with Joffre et.al. Fig 3C, S2I shows increased Tyr 1234/1235 phosphorylation of M1250T MET mutant (PMID: 21642981). Further, studies have shown the specificity of the antibody by immunoblotting and immunofluorescence using MET inhibitors (PMID: 21973114, 41009793).

      For the MT1-MMP immunoblot showing significant depletion in MT1-MMP protein level by the SMARTpool siRNA has been provided in Fig. S5L. Further MT1-MMP silencing has been validated by immunofluorescence in the following studies. PMID: 22291036, 21571860, 20505159.

      (7) Inadequate demonstration of MET recycling. MET recycling should be directly demonstrated using the same approaches applied to study MT1-MMP recycling. The current analysis - based solely on vesicles near the plasma membrane - is insufficient to conclude that MET is recycled back to the cell surface.

      We appreciate the reviewer’s suggestion for an alternative approach to show MET trafficking. We aim to demonstrate MET trafficking using a biochemical approach, which will be included in the revised version. 

      (8) Insufficient evidence for MET-MT1-MMP interaction. The interaction between MET and MT1-MMP should be validated by immunoprecipitation of endogenous proteins, particularly since both are endogenously expressed in the studied cell lines.

      We thank the reviewer for pointing out the lack of MET-MT1-MMP interaction at the endogenous level. We have carried out the immunoprecipitation of endogenous MET to validate the interaction with MT1-MMP. However, we could not capture the interaction of these proteins at endogenous levels. We hypothesize that the interaction between MT1MMP and MET may be weak in nature, with a high K<sub>d</sub> value, and accordingly, it was difficult to precipitate the endogenous MT1-MMP by MET. The immunoblot will be added to the revised manuscript and discussed.

      (9) Inconsistent use of cell lines and lack of justification. The authors use two TNBC cell lines: MDA-MB-231 and BT-549, without providing a rationale for this choice. Some assays are performed in MDA-MB-231 and shown in the main figures, whereas others use BT-549, creating unnecessary inconsistency. A clearer, more coherent strategy is needed (e.g., present all main findings in MDA-MB-231 and confirm key results in BT549 in supplementary figures).

      MDA-MB-231 and BT-549 are two well-characterized TNBC cell lines, which are being used extensively to study breast cancer cell invasion. These two cell lines also show overexpression of MET, making them suitable model cell lines for our study. 

      MDA-MB-231 has less transfection efficiency compared to BT-549. Additionally, MET is also a difficult gene to transfect, making it hard to perform experiments in MDA-MB-231 with MET overexpression. Though most of the experiments have been performed in both cell lines, a few of the studies have been performed only in the BT-549 cells. Further, we have focused on displaying the different approaches taken to validate an observation in the main figure, which led to showing the data in distinct cell lines.

      Also, showing observations in different cell lines is a practice that has been followed by multiple authors in the past. (PMID:  39751400, 41079612, 25049275, 22366451)

      (10) Inconsistency in invadopodia numbers under identical conditions. The number of invadopodia formed in Figure 1E is markedly lower than in Figure 1C, despite identical conditions. The authors should explain this discrepancy.

      We sincerely thank the reviewer for pointing out the inconsistency in invadopodia numbers across 2 experiments. Fig. 1C has 2 conditions: UT and the HGF-treated condition. The Untreated condition has the serum-free media without any stimulation. Whereas we have added vehicle (DMSO) in Fig. 1D, E, since the drug is resuspended in DMSO. This difference in the treatment is likely to be responsible for the decreased numbers of invadopodia in Fig. 1E.

      (11) Questionable colocalization in some images. In some figures - for example, Figure 2G - the dots indicated by arrows do not convincingly show colocalization. The authors should clarify or reanalyze these data.

      We thank the reviewer for the valuable comment. The apparent lack of convincing colocalization is likely due to the relatively lower fluorescence intensity of MET at these structures. We will add the line intensity plots for the indicated puncta to show the intensity of both channels in the figure.

      To quantify the colocalization of two channels, we have used the automated image analysis software motiontracking (motiontracking.mpi-cbg.de), which has been detailed in the method section. Motiontracking considers only those objects to be colocalized if there is an overlapping area of more than 35% between the two channels. Lastly, the apparent colocalization is corrected for random colocalization, which is the random permutation of object colocalization. This makes object-based colocalization more reliable than intensitybased colocalization. 

      (12) Abstract, Introduction, and  Discussion require substantial rewriting. a) The abstract should be accessible to a broader audience and should avoid using abbreviations and protein names without context. b) The introduction should better describe the cellular processes and proteins investigated in this study. c) The discussion currently reads more like an extended summary of results. It lacks deeper interpretation, comparison with existing literature, and consideration of the broader implications of the findings.

      We thank the reviewer for this suggestion. We will modify the abstract, introduction, and discussion as per the suggestion.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors report the structure of the human CTF18-RFC complex bound to PCNA. Similar structures (and more) have been reported by the O'Donnell and Li labs. This study should add to our understanding of CTF18-RFC in DNA replication and clamp loaders in general. However, there are numerous major issues that I recommend the authors fix. 

      Strengths: 

      The structures reported are strong and useful for comparison with other clamp loader structures that have been reported lately. 

      Comments on revisions: 

      The revised manuscript is greatly improved. The comparison with hRFC and the addition of direct PCNA loading data from the Hedglin group are particular highlights. I think this is a strong addition to the literature.

      We thank the reviewer for their positive comments.  

      I only have minor comments on the revised manuscript. 

      (1) The clamp loading kinetic data in Figure 6 would be more easily interpreted if the three graphs all had the same x axes, and if addition of RFC was t=0 rather than t=60 sec.

      We now analyze and plot EFRET as a function of time after complex addition, effectively setting the loader addition to t = 0 for each trace (Figure 6 and Figs S10-14 in the new manuscript). Baseline (Ymin) and plateau (Ymax) EFRET values were obtained by averaging the stable signal regions immediately before and after clamp-loader addition, respectively. Traces are normalized to their own dynamic range before fitting.

      (2) The author's statement that "CTF18-RFC displayed a slightly faster rate than RFC" seems to me a bit misleading, even though this is technically correct. The two loaders have indistinguishable rate constants for the fast phase, and RFC is a bit slower than CTF18-RFC in the slow phase. However, the data also show that RFC is overall more efficient than CTF18-RFC at loading PCNA because much more flux through the fast phase (rel amplitudes 0.73 vs 0.36). Because the slow phase represents such a reduced fraction of loading events, the slight reduction in rate constant for the slow phase doesn't impact RFC's overall loading. And because the majority of loading events are in the fast phase, RFC has a faster halftime than CTF18-RFC. (Is it known what the different phases correspond to? If it is known, it might be interesting to discuss.)

      We removed the quoted statement. We avoid comparing amplitude partitions (A₁/A_T) for CTF18-RFC because (i) a substantial fraction of the reaction occurs within the <7 s dead time, and (ii) single- vs double-exponential identifiability differs across complexes. Instead, we report model-minimal progress times: RFC t<sub>0.5</sub> ≤ 7 s (faster onset), CTF18-RFC ~ 8 s, CTF18<sup>Δ165–194</sup>-RFC ~ 12 s; completion (t<sub>0.95</sub>): RFC ≈ 77 s, CTF18-RFC ≈ 77 s, mutant ≈ 145 s. This shows RFC has the steeper onset, while CTF18-RFC catches up in completion, and the mutant is slower overall. We briefly note that RFC’s phases have been assigned in prior stopped-flow work and are consistent with a rapid entry step and a slower repositioning/complex release phase; we do not assign phases for CTF18-RFC here and instead rely on model-minimal timing comparisons to avoid over-interpretation. 

      (3) AAA+ is an acronym for "ATPases Associated with diverse cellular Activities" rather than "Adenosine Triphosphatase Associated". 

      Corrected to ATPases Associated with diverse cellular Activities (AAA+).

      Reviewer #2 (Public review): 

      Summary 

      Briola and co-authors have performed a structural analysis of the human CTF18 clamp loader bound to PCNA. The authors purified the complexes and formed a complex in solution. They used cryo-EM to determine the structure to high resolution. The complex assumed an auto-inhibited conformation, where DNA binding is blocked, which is of regulatory importance and suggests that additional factors could be required to support PCNA loading on DNA. The authors carefully analysed the structure and compared it to RFC and related structures. 

      Strength & Weakness 

      Their overall analysis is of high quality, and they identified, among other things, a humanspecific beta-hairpin in Ctf18 that flexible tethers Ctf18 to Rfc2-5. Indeed, deletion of the beta-hairpin resulted in reduced complex stability and a reduction in a primer extension assay with Pol ε. Moreover, the authors identify that the Ctf18 ATP-binding domain assumes a more flexible organisation. 

      The data are discussed accurately and relevantly, which provides an important framework for rationalising the results. 

      All in all, this is a high-quality manuscript that identifies a key intermediate in CTF18-dependent clamp loading. 

      Comments on revisions: 

      The authors have done a nice job with the revision. 

      We thank the reviewer for their very positive comments.

      Reviewer #3 (Public review): 

      Summary: 

      CTF18-RFC is an alternative eukaryotic PCNA sliding clamp loader which is thought to specialize in loading PCNA on the leading strand. Eukaryotic clamp loaders (RFC complexes) have an interchangeable large subunit which is responsible for their specialized functions. The authors show that the CTF18 large subunit has several features responsible for its weaker PCNA loading activity, and that the resulting weakened stability of the complex is compensated by a novel beta hairpin backside hook. The authors show this hook is required for the optimal stability and activity of the complex. 

      Relevance: 

      The structural findings are important for understanding RFC enzymology and novel ways that the widespread class of AAA ATPases can be adapted to specialized functions. A better understanding of CTF18-RFC function will also provide clarity into aspects of DNA replication, cohesion establishment and the DNA damage response. 

      Strengths: 

      The cryo-EM structures are of high quality enabling accurate modelling of the complex and providing a strong basis for analyzing differences and similarities with other RFC complexes. 

      Weaknesses: 

      The manuscript would have benefited from a more detailed biochemical analysis using mutagenesis and assays to tease apart the differences with the canonical RFC complex. Analysis of the FRET assay could be improved. 

      Overall appraisal: 

      Overall, the work presented here is solid and important. The data is mostly sufficient to support the stated conclusions.

      We thank the reviewer for their mainly positive assessment. Following this reviewer suggestion, we have re-analysed the FRET assay data and amended the manuscript accordingly.

      Comments on revisions: 

      While the authors addressed my previous specific concerns, they have now added a new experiment which raises new concerns. 

      The FRET clamp loading experiments (Fig. 6) appear to be overfitted so that the fitted values are unlikely to be robust and it is difficult to know what they mean, and this is not explained in this manuscript. Specifically, the contribution of two exponentials is floated in each experiment. By eye, CTF18-RFC looks much slower than RFC1-RFC (as also shown previously in the literature) but the kinetic constants and text suggest it is faster. This is because the contribution of the fast exponential is substantially decreased, and the rate constants then compensate for this. There is a similar change in contribution of the slow and fast rates between WT CTF18 and the variant (where the data curves look the same) and this has been balanced out by a change in the rate constants, which is then interpreted as a defect. I doubt the data are strong enough to confidently fit all these co-dependent parameters, especially for CTF18, where a fast initial phase is not visible. I would recommend either removing this figure or doing a more careful and thorough analysis. 

      We appreciate the reviewer’s concern regarding potential overfitting of the kinetic data in Figure 6. To address this, we performed a model-minimal re-analysis designed specifically to avoid parameter covariance and over-interpretation (Figure 6 and Figs S11-14 in the new manuscript). Only data recorded after the instrument’s <7 s dead time were included in the fits, thereby excluding the partially obscured early region of the reaction. For each clamp loader complex, we selected the minimal kinetic model that produced residuals randomly distributed about zero. This approach yielded a single-exponential fit for CTF18-RFC, whereas RFC and CTF18<sup>Δ165–194</sup>-RFC required double-exponential fits; single-exponential models for the latter two complexes left structured residuals, clearly indicating the presence of an additional kinetic phase.

      Rather than relying on co-dependent amplitude and rate parameters, we quantified the reactions by reporting progress times (t<sub>0.5</sub>, t<sub>0.90</sub>, t<sub>0.95</sub>), which provide a model-independent measure of reaction speed. This directly addresses the reviewer’s concern and allows a fair comparison of the relative kinetics among the complexes.

      From this analysis, RFC exhibited the fastest onset (t<sub>0.5</sub> ≤ 7 s; lower bound), while CTF18RFC and CTF18<sup>Δ165–194</sup>-RFC showed progressively slower half-times of approximately 8 s and 12 s, respectively. Completion times further emphasized these differences: both RFC and CTF18-RFC reached 95 % completion at ~77 s, whereas the mutant required ~145 s. Despite these kinetic distinctions, CTF18-RFC and its β-hairpin deletion mutant achieved similar EFRET plateaus, indicating that the mutation slows reaction progression but does not reduce the overall extent of PCNA loading.

      Finally, we emphasize that our interpretation is deliberately conservative. We do not assign distinct kinetic phases to CTF18-RFC, as their molecular basis remains unresolved. RFC’s phases have been characterized in prior stopped-flow studies, but CTF18-RFC likely follows a distinct or simplified pathway. Our conclusions are thus limited to what the data unambiguously support: deletion of the Ctf18 β-hairpin decreases the rate—but not the extent—of PCNA loading, consistent with the reduced stimulation of Pol ε primer extension observed under single-turnover conditions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors explore a novel concept: GPCR-mediated regulation of miRNA release via extracellular vesicles (EVs). They perform an EV miRNA cargo profiling approach to investigate how specific GPCR activations influence the selective secretion of particular miRNAs. Given that GPCRs are highly diverse and orchestrate multiple cellular pathways - either independently or collectively - to regulate gene expression and cellular functions under various conditions, it is logical to expect alterations in gene and miRNA expression within target cells.

      Strengths:

      The novel idea of GPCRs-mediated control of EV loading of miRNAs.

      Weaknesses:

      Incomplete findings failed to connect and show evidence of any physiological parameters that are directly related to the observed changes. The mechanical detail is lacking.

      We appreciate the reviewer's acknowledgment of the novelty of this study. We agree with the reviewer that further mechanistic insights would strengthen the manuscript. The mechanisms by which miRNA is sorted into EVs remain poorly understood. Various factors, including RNAbinding protein, sequence motifs, and cellular location, can influence this sorting process(Garcia-Martin et al., 2022; Liu & Halushka, 2025; Villarroya-Beltri et al., 2013; Yoon et al., 2015). Ago2, a key component of the RNA-induced silencing complexes, binds to miRNA and facilitates miRNA sorting. Ago2 has been found in the EVs and can be regulated by the cellular signaling pathway.  For instance, McKenzie et al. demonstrated that KRAS-dependent activation of MEK-ERK can phosphorylate Ago2 protein, thereby regulating the sorting of specific miRNAs into EVs(McKenzie et al., 2016). In the differentiated PC12 cells, Gαq activation leads to the formation of Ago2-associated granules, which selectively sequester unique transcripts(Jackson et al., 2022). Investigating GPCR, G protein, and GPCR signaling on Ago2 expression, location, and phosphorylation states could provide valuable insights into how GPCRs regulate specific miRNAs within EVs. We have expanded these potential mechanisms and future research in the discussion section (page 16-17).

      The manuscript falls short of providing a comprehensive understanding. Identifying changes in cellular and EV-associated miRNAs without elucidating their physiological significance or underlying regulatory mechanisms limits the study's impact. Without demonstrating whether these miRNA alterations have functional consequences, the findings alone are insufficient. The findings may be suitable for more specialized journals.

      Thank you for the feedback. We acknowledge that validating the target genes of the top candidate miRNAs is an important next step. In response to the reviewer's concerns, we have expanded the discussion of future research in the manuscript (page 19-20). Although this initial study is primarily descriptive, it establishes a novel conceptual link between GPCR signaling and EV-mediated communication.

      Furthermore, a critical analysis of the relationship between cellular miRNA levels and EV miRNA cargo is essential. Specifically, comparing the intracellular and EV-associated miRNA pools could reveal whether specific miRNAs are preferentially exported, a behavior that should be inversely related to their cellular abundance if export serves a beneficial function by reducing intracellular levels. This comparison is vital to strengthen the biological relevance of the findings and support the proposed regulatory mechanisms by GPCRs.

      We appreciate the valuable suggestions from the reviewer. EV miRNA and cell miRNAs may exhibit distinct profiles as miRNAs can be selectively sorted into or excluded from EVs(Pultar et al., 2024; Teng et al., 2017; Zubkova et al., 2021). Investigating the difference between cellular miRNA levels and EV miRNA cargo would provide insight into the mechanism of miRNA sorting and the functions of miRNAs in the recipient cells. The expression of the cellular miRNAs is a highly dynamic process. To accurately compare the miRNA expression levels, profiling of EV miRNA and cellular miRNA should be conducted simultaneously. However, as an exploratory study, we were unable to measure the cellular miRNAs without conducting the entire experiment again.

      Reviewer #2 (Public review):

      Summary:

      This study examines how activating specific G protein-coupled receptors (GPCRs) affects the microRNA (miRNA) profiles within extracellular vesicles (EVs). The authors seek to identify whether different GPCRs produce unique EV miRNA signatures and what these signatures could indicate about downstream cellular processes and pathological processes.

      Methods:

      (1) Used U2OS human osteosarcoma cells, which naturally express multiple GPCR types.

      (2) Stimulated four distinct GPCRs (ADORA1, HRH1, FZD4, ACKR3) using selective agonists.

      (3) Isolated EVs from culture media and characterized them via size exclusion chromatography, immunoblotting, and microscopy.

      (4) Employed qPCR-based miRNA profiling and bioinformatics analyses (e.g., KEGG, PPI networks) to interpret expression changes.

      Key Findings:

      (1) No significant change in EV quantity or size following GPCR activation.

      (2) Each GPCR triggered a distinct EV miRNA expression profile.

      (3) miRNAs differentially expressed post-stimulation were linked to pathways involved in cancer, insulin resistance, neurodegenerative diseases, and other physiological/pathological processes.

      (4) miRNAs such as miR-550a-5p, miR-502-3p, miR-137, and miR-422a emerged as major regulators following specific receptor activation.

      Conclusions:

      The study offers evidence that GPCR activation can regulate intercellular communication through miRNAs encapsulated within extracellular vesicles (EVs). This finding paves the way for innovative drug-targeting strategies and enhances understanding of drug side effects that are mediated via GPCR-related EV signaling.

      Strengths:

      (1) Innovative concept: The idea of linking GPCR signaling to EV miRNA content is novel and mechanistically important.

      (2) Robust methodology: The use of multiple validation methods (biochemical, biophysical, and statistical) lends credibility to the findings.

      (3) Relevance: GPCRs are major drug targets, and understanding off-target or systemic effects via EVs is highly valuable for pharmacology and medicine.

      Weaknesses:

      (1) Sample Size & Scope: The analysis included only four GPCRs. Expanding to more receptor types or additional cell lines would enhance the study's applicability.

      We are encouraged that the reviewer recognized the novelty, methodological rigor, and significance of our work. We recognize the limitations of our current model system and emphasize the need to test additional GPCR families and cell lines in the future studies, as detailed in the discussion section (Page 19, second paragraph).

      (2) Exploratory Nature: This study is primarily descriptive and computational. It lacks functional validation, such as assessing phenotypic effects in recipient cells, which is acknowledged as a future step.

      We appreciate the feedback. We recognize the importance of validating the function of the top candidate miRNAs in the recipient cells, and this will be included in future studies (page 19-20).  

      (3) EV heterogeneity: The authors recognize that they did not distinguish EV subpopulations, potentially confounding the origin and function of miRNAs.

      Thank you for the comment. EV isolation and purification are major challenges in EV research. Current isolation techniques are often ineffective at separating vesicles produced by different biogenetic pathways. Furthermore, the lack of specific markers to differentiate EV subtypes adds to this complexity. We recognize that the presence of various subpopulations can complicate the interpretation of EV cargos. In our study, we used a combined approach of ultrafiltration followed by size-exclusion chromatography to achieve a balance between EV purity and yield. We adhere to the MISEV (Minimal Information for Studies of Extracellular Vesicles 2023) guidelines by reporting detailed isolation methods, assessing both positive and negative protein markers, and characterizing EVs by electron microscopy to confirm vesicle structure, as well as nanoparticle tracking analysis to verify particle size distribution(Welsh et al., 2024). By following these guidelines, we can ensure the quality of our study and enhance the ability to compare our findings with other studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Suggestions for Future Research:

      (1) Functionally validate top candidate miRNAs in recipient cells.

      We acknowledge that validating the target genes of the top candidate miRNAs is a crucial next step. In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      (2) Investigate other GPCR families and repeat in primary or disease-relevant cell lines.

      The inclusion of different GPCRs and cell lines is suggested as an area for further investigation in the discussion. (Page 19).

      (3) Apply similar approaches in in vivo models or patient samples to assess clinical relevance.

      In response to the reviewer's concerns, we have included this in the discussion as future research in the manuscript (page 19-20).

      References

      Garcia-Martin, R., Wang, G., Brandão, B. B., Zanotto, T. M., Shah, S., Kumar Patel, S., Schilling, B., & Kahn, C. R. (2022). MicroRNA sequence codes for small extracellular vesicle release and cellular retention. Nature, 601(7893), 446-451. https://doi.org/10.1038/s41586021-04234-3  

      Jackson, L., Rennie, M., Poussaint, A., & Scarlata, S. (2022). Activation of Gαq sequesters specific transcripts into Ago2 particles. Sci Rep, 12(1), 8758. https://doi.org/10.1038/s41598022-12737-w  

      Liu, X.-M., & Halushka, M. K. (2025). Beyond the Bubble: A Debate on microRNA Sorting Into Extracellular Vesicles. Laboratory Investigation, 105(2), 102206. https://doi.org/10.1016/j.labinv.2024.102206  

      McKenzie, A. J., Hoshino, D., Hong, N. H., Cha, D. J., Franklin, J. L., Coffey, R. J., Patton, J. G., & Weaver, A. M. (2016). KRAS-MEK Signaling Controls Ago2 Sorting into Exosomes. Cell  Rep, 15(5), 978-987. https://doi.org/10.1016/j.celrep.2016.03.085  

      Pultar, M., Oesterreicher, J., Hartmann, J., Weigl, M., Diendorfer, A., Schimek, K., Schädl, B., Heuser, T., Brandstetter, M., Grillari, J., Sykacek, P., Hackl, M., & Holnthoner, W. (2024).Analysis of extracellular vesicle microRNA profiles reveals distinct blood and lymphatic endothelial cell origins. J Extracell Biol, 3(1), e134. https://doi.org/10.1002/jex2.134  

      Teng, Y., Ren, Y., Hu, X., Mu, J., Samykutty, A., Zhuang, X., Deng, Z., Kumar, A., Zhang, L., Merchant, M. L., Yan, J., Miller, D. M., & Zhang, H.-G. (2017). MVP-mediated exosomal sorting of miR-193a promotes colon cancer progression. Nature Communications, 8(1), 14448. https://doi.org/10.1038/ncomms14448  

      Villarroya-Beltri, C., Gutiérrez-Vázquez, C., Sánchez-Cabo, F., Pérez-Hernández, D., Vázquez, J., Martin-Cofreces, N., Martinez-Herrera, D. J., Pascual-Montano, A., Mittelbrunn, M., & Sánchez-Madrid, F. (2013). Sumoylated hnRNPA2B1 controls the sorting of miRNAs into exosomes through binding to specific motifs. Nat Commun, 4, 2980. https://doi.org/10.1038/ncomms3980

      Welsh, J. A., Goberdhan, D. C. I., O'Driscoll, L., Buzas, E. I., Blenkiron, C., Bussolati, B., Cai, H., Di Vizio, D., Driedonks, T. A. P., Erdbrügger, U., Falcon-Perez, J. M., Fu, Q. L., Hill, A. F., Lenassi, M., Lim, S. K., Mahoney, M. G., Mohanty, S., Möller, A., Nieuwland, R., . . .Witwer, K. W. (2024). Minimal information for studies of extracellular vesicles (MISEV2023): From basic to advanced approaches. J Extracell Vesicles, 13(2), e12404. https://doi.org/10.1002/jev2.12404  

      Yoon, J. H., Jo, M. H., White, E. J., De, S., Hafner, M., Zucconi, B. E., Abdelmohsen, K., Martindale, J. L., Yang, X., Wood, W. H., 3rd, Shin, Y. M., Song, J. J., Tuschl, T., Becker, K. G., Wilson, G. M., Hohng, S., & Gorospe, M. (2015). AUF1 promotes let-7b loading on Argonaute 2. Genes Dev, 29(15), 1599-1604. https://doi.org/10.1101/gad.263749.115  

      Zubkova, E., Evtushenko, E., Beloglazova, I., Osmak, G., Koshkin, P., Moschenko, A., Menshikov, M., & Parfyonova, Y. (2021). Analysis of MicroRNA Profile Alterations in Extracellular Vesicles From Mesenchymal Stromal Cells Overexpressing Stem Cell Factor. Front Cell Dev Biol, 9, 754025. https://doi.org/10.3389/fcell.2021.754025

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer # 1 (Public review):

      Significance:

      While most MAVEs measure overall function (which is a complex integration of biochemical properties, including stability), VAMP-seqtype measurements more strongly isolate stability effects in a cellular context. This work seeks to create a simple model for predicting the response for a mutation on the "abundance" measurement of VAMPseq.

      We thank the reviewer for their evaluation of our work and for their comments and feedback below.

      Of course, there is always another layer of the onion, VAMP-seq measures contributions from isolated thermodynamic stability, stability conferred by binding partners (small molecule and protein), synthesis/degradation balance (especially important in "degron" motifs), etc. Here the authors' goal is to create simple models that can act as a baseline for two main reasons:

      (1) how to tell when adding more information would be helpful for a global model;

      (2) how to detect when a residue/mutation has an unusual profile indicative of an unbalanced contribution from one of the factors listed above.

      As such, the authors state that this manuscript is not intended to be a state-of-the-art method in variant effect prediction, but rather a direction towards considering static structural information for the VAMP-seq effects. At its core, the method is a fairly traditional asymmetric substitution matrix (I was surprised not to see a comparison to BLOSUM in the manuscript) - and shows that a subdivision by burial makes the model much more predictive. Despite only having 6 datasets, they show predictive power even when the matrices are based on a smaller number. Another success is rationalizing the VAMPseq results on relevant oligomeric states.

      We thank the reviewer for their summary of the main points of our work. Based on the suggestion by the reviewer, we have added a comparison to predictions with BLOSUM62 to our revised manuscript, noting that we have previously compared the BLOSUM62 matrix to a broader and more heterogeneous set of scores generated by MAVEs (Høie et al, 2022).

      Specific Feedback:

      Major points:

      The authors spend a good amount of space discussing how the six datasets have different distributions in abundance scores. After the development of their model is there more to say about why? Is there something that can be leveraged here to design maximally informative experiments?

      We believe that these effects arise from a combination of intrinsic differences between the systems and assay-specific effects. For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures, will play a role, as will the fact that some proteins contain multiple domains.

      Also, the sequencing-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and on the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition differences can contribute to the differences between VAMP-seq score distributions. 

      From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences. We have briefly expanded the discussion of these points in the manuscript, and we have moreover elaborated on this in subsequent work (Schulze et al., 2025).

      They compare to one more "sophisticated model" - RosettaddG - which should be more correlated with thermodynamic stability than other factors measured by VAMP-seq. However, the direct head-tohead comparison between their matrices and ddG is underdeveloped. How can this be used to dissect cases where thermodynamics are not contributing to specific substitution patterns OR in specific residues/regions that are predicted by one method better than the other? This would naturally dovetail into whether there is orthogonal information between these two that could be leveraged to create better predictions.

      We thank the reviewer for this suggestion and indeed had spent substantial effort trying to gain additional biological insights from variants for which MAVE scores or MAVE predictions do not match predicted ∆∆G values. One major caveat in this analysis is that the experimental MAVE scores, MAVE predictions and the predicted ∆∆G values are rather noisy, making it difficult to draw conclusions based on individual variants or even small subsets of variants.

      In our revised manuscript, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. 

      We find that many substitution profiles are predicted equally well by the two model types, but also that there are residues for which one method predicts substitution effects better than the other method. We have added an analysis of the characteristics of the residues and variants for which either the ∆∆G model or the substitution matrix model is most useful to rank variants. Since we only find relatively few residues for which this is the case, we do not expect a model that leverages predicted scores from both methods to perform better than ThermoMPNN across variants. 

      Perhaps beyond the scope of this baseline method, there is also ThermoMPNN and the work from Gabe Rocklin to consider as other approaches that should be more correlated only with thermodynamics.

      We acknowledge that there are other approaches to predict ∆∆G beyond Rosetta including for example ThermoMPNN and our own method called RaSP (Blaabjerg et al, eLIFE, 2023), and we have added comparisons to ThermoMPNN and RaSP in the revised manuscript. We are unsure how one would use the data from Rocklin and colleagues directly, but we note that e.g. RaSP has been benchmarked on this data and other methods have been trained on this data. We originally used Rosetta since the Rosetta model is known to be relatively robust and because it has never seen large databases during training (though we do not think that training of ThermoMPNN and RaSP would be biased towards the VAMP-seq data). We note also that we have previously compared both Rosetta calculations and RaSP with VAMP-seq data for TPMT, PTEN and NUDT15 (Blaabjerg et al, eLIFE, 2023)

      I find myself drawn to the hints of a larger idea that outliers to this model can be helpful in identifying specific aspects of proteostasis. The discussion of S109 is great in this respect, but I can't help but feel there is more to be mined from Figure S9 or other analyses of outlier higher than predicted abundance along linear or tertiary motifs.

      We agree with these points and have previously spent substantial time trying to make sense of outliers in Figure S9 and Figure S18 (Figure S8 and Figure S18 of revised manuscript). The outlier analysis was challenging, in part due to the relatively high noise levels in both experimental data and predictions, and we did not find any clear signals. Some outliers in e.g. Figure S9 are very likely the result of dataset-specific abundance score distributions, which further complicates the outlier analysis. We now note this in the revised paper and hope others will use the data to gain additional insights on proteostasis-specific effects.  

      Reviewer # 2 (Public review):

      Summary:

      This study analyzes protein abundance data from six VAMP-seq experiments, comprising over 31,000 single amino acid substitutions, to understand how different amino acids contribute to maintaining cellular protein levels. The authors develop substitution matrices that capture the average effect of amino acid changes on protein abundance in different structural contexts (buried vs. exposed residues). Their key finding is that these simple structure-based matrices can predict mutational effects on abundance with accuracy comparable to more complex physics-based stability calculations (ΔΔG).

      Major strengths:

      (1) The analysis focuses on a single molecular phenotype (abundance) measured using the same experimental approach (VAMP-seq), avoiding confounding factors present when combining data from different phenotypes (e.g., mixing stability, activity, and fitness data) or different experimental methods.

      (2) The demonstration that simple structural features (particularly solvent accessibility) can capture a significant portion of mutational effects on abundance.

      (3) The practical utility of the matrices for analyzing protein interfaces and identifying functionally important surface residues.

      We thank the reviewer for the comments above and the detailed assessment of our work.

      Major weaknesses:

      (1) The statistical rigor of the analysis could be improved. For example, when comparing exposed vs. buried classification of interface residues, or when assessing whether differences between prediction methods are significant.

      We agree with the reviewer that it is useful to determine if interface residues (or any of the residues in the six proteins) can confidently be classified as buried- or exposed-like in terms of their substitution profiles. Thus, we have expanded our approach to compare individual substitution profiles to the average profiles of buried and exposed residues to now account for the noise in the VAMP-seq data. In our updated approach, we resample the abundance score substitution profile for every residue several thousand times based on the experimental VAMP-seq scores and score standard deviations, and we then compare every resampled profile to the average profiles for buried and exposed residues, thereby obtaining residue-specific distributions of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. These RMSD distributions are typically narrow, since many variants in several datasets have small standard deviations. In the revised manuscript, we report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the resampled profiles. We do not recalculate average scores in substitution matrices for this analysis. 

      Moreover, to illustrate potential overlap in predictive performance between prediction methods more clearly than in our preprint, we have added confidence intervals in Fig. 2 and Fig. 3 of the revised manuscript. We note that the analysis in Fig. 2 is performed using a leave-one-protein-out approach, which we believe provides the cleanest assessment of how well the different models perform.

      (2) The mechanistic connection between stability and abundance is assumed rather than explained or investigated. For instance, destabilizing mutations might decrease abundance through protein quality control, but other mechanisms like degron exposure could also be at play.

      We agree that we have not provided much description of the relation between stability and abundance in our original preprint. In the revised manuscript, we provide some more detail as well as references to previous literature explaining the ways in which destabilising mutations can cause degradation. We have moreover performed and added additional analyses of the relationship between thermodynamic stability and abundance through comparisons of stability predictions and predictions performed with our substitution matrix models.

      (3) The similar performance of simple matrix-based and complex physics-based predictions calls for deeper analysis. A systematic comparison of where these approaches agree or differ could illuminate the relationship between stability and abundance. For instance, buried sites showing exposed-like behavior might indicate regions of structural plasticity, while the link between destabilization and degradation might involve partial unfolding exposing typically buried residues. The authors have all the necessary data for such analysis but don't fully exploit this opportunity.

      This is similar to a point made by reviewer 1, and our answer is similar. We were indeed hoping that our analyses would have revealed clearer differences between effects on thermodynamic protein stability and cellular abundance and have tried to find clear signals. One major caveat in performing the suggested analysis is that both the experimental MAVE scores, ∆∆G predictions and our simple matrix-based predictions are rather noisy, making it difficult to make conclusions based on individual variants or even small subsets of variants. 

      To address this point, we have added an analysis to discover residue substitution profiles that are predicted most accurately either by a ∆∆G model or by our substitution matrix model, thereby avoiding analysis of individual variant effect scores. We find that many substitution profiles are predicted equally well by the two model types, but we also, in particular, find solvent-exposed residues for which the substitution matrix model is the better predictor. These residues are often aspartate, glutamate and proline, suggesting that surface-level substitutions of these amino acid types often can have effects that are not captured well by a thermodynamical model, either because this model does not describe thermodynamic effects perfectly, or because in-cell effects are necessary to account for to provide an accurate description.

      (4) The pooling of data across proteins to construct the matrices needs better justification, given the observed differences in score distributions between proteins (for example, PTEN's distribution is shifted towards high abundance scores while ASPA and PRKN show more binary distributions).

      We agree with the reviewer that the differences between the score distributions are important to investigate further and keep in mind when analysing e.g. prediction outliers. However, our results show that the pooling of VAMP-seq scores across proteins does result in substitution matrices that make sense biochemically and can identify outlier residues with proteostatic functions. As we also respond to a related point by reviewer 1, the differences in score distributions likely have complex origins. In that sense, we also hope that our results can inspire experimentalists to design methods to generate data that are more comparable across proteins.

      For example, biophysical differences between the systems, such as differences in absolute folding stabilities or melting temperatures will play a role, as will the fact that some proteins contain multiple domains. Also, the sequence-based score for an individual variant in a sort-seq experiment (such as VAMP-seq) depends both on the properties of that variant and from the composition of the entire FACS-sorted cell library. This is because cells are sorted into bins depending on the composition of the entire library, which means that library-to-library composition can contribute to the differences between VAMP-seq score distributions. From our developed models and outliers in predictions from these, it is difficult to tell which of the several possible underlying reasons cause the differences.

      Thus, even when experiments on different proteins are performed using the same technique (VAMP-seq), quantifying the same phenomenon (cellular abundance) and done in similar ways (saturation mutagenesis, sort-seq using four FACS bins), there can still be substantial differences in the results across different systems. An interesting side result of our work is to highlight this including how such variation makes it difficult to learn across experiments. We now elaborate on these points in the revised manuscript.

      (5) Some key methodological choices require better justification. For example, combining "to" and "from" mutation profiles for PCA despite their different behaviors, or using arbitrary thresholds (like 0.05) for residue classification.

      We hope we have explained our methodological choices clearer in the revised paper.

      We removed the dependency of the threshold of 0.05 used for residue classification in Fig. S19 of the original manuscript; in the revised manuscript we only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> <RMSD<sub>exposed</sub> for at least 95% of the abundance score profiles that we resampled according to VAMP-seq score noise levels, as explained above.

      With respect to combining “to” and “from” mutational profiles for PCA, we could have also chosen to analyse these two sets of profiles separately to take potentially different behaviours along the two mutational axes into account. We do not think that there should be anything wrong with concatenating the two sets of profiles in a single analysis, since the analysis on the concatenated profiles simply expresses amino acid similarities and differences in a more general manner.

      The authors largely achieve their primary aim of showing that simple structural features can predict abundance changes. However, their secondary goal of using the matrices to identify functionally important residues would benefit from more rigorous statistical validation. While the matrices provide a useful baseline for abundance prediction, the paper could offer deeper biological insights by investigating cases where simple structure-based predictions differ from physics-based stability calculations.

      This work provides a valuable resource for the protein science community in the form of easily applicable substitution matrices. The finding that such simple features can match more complex calculations is significant for the field. However, the work's impact would be enhanced by a deeper investigation of the mechanistic implications of the observed patterns, particularly in cases where abundance changes appear decoupled from stability effects.

      We agree that disentangling stability and other effects on cellular abundance is one of the goals of this work. As discussed above, it has been difficult to find clear cases where amino acid substitutions affect abundance without stability beyond for example the (rare) effects of creating surface exposed degrons. Our new analysis, in which we compare substitution matrix-based predictions to stability predictions, does offer deeper insight into the relationship between the two predictor types and hence possibly between folding stability and abundance. 

      Reviewer #3 (Public review): 

      "Effects of residue substitutions on the cellular abundance of proteins" by Schulze and Lindorff-Larsen revisits the classical concept of structure-aware protein substitution matrices through the scope of modern protein structure modelling approaches and comprehensive phenotypic readouts from multiplex assays of variant effects (MAVEs). The authors explore 6 unique protein MAVE datasets based on protein abundance (and thus stability) by utilizing structural information, specifically residue solvent accessibility and secondary structure type, to derive combinations of context-specific substitution matrices predicting variant abundance. They are clear to outline that the aim of the study is not to produce a new best abundance predictor but to showcase the degree of prediction afforded simply by utilizing information on residue accessibility. The performance of their matrices is robustly evaluated using a leave-one-out approach, where the abundance effects for a single protein are predicted using the remaining datasets. Using a simple classification of buried and solvent-exposed residues, and substitution matrices derived respectively for each residue group, the authors convincingly demonstrate that taking structural solvent accessibility contexts into account leads to more accurate performance than either a structureunaware matrix, secondary structure-based matrix, or matrices combining both solvent accessibility or secondary structure. Interestingly, it is shown that the performance of the simple buried and exposed residue substitution matrices for predicting protein abundance is on par with Rosetta, an established and specialized protein variant stability predictor. More importantly, the authors finish off the paper by demonstrating the utility of the two matrices to identify surface residues that have buried-like substitution profiles, that are shown to correspond to protein interface residues, posttranslational modification sites, functional residues, or putative degrons.

      Strengths:

      The paper makes a strong and well-supported main point, demonstrating the utility of the authors' approach through performance comparisons with alternative substitution matrices and specialized methods alike. The matrices are rigorously evaluated without introducing bias, exploring various combinations of protein datasets. Supplemental analyses are extremely comprehensive and detailed. The applicability of the substitution matrices is explored beyond abundance prediction and could have important implications in the future for identifying functionally relevant sites.

      We thank the reviewer for the supportive comments on our work. 

      Comments:

      (1) A wider discussion of the possible reasons why matrices for certain proteins seem to correlate better than others would be extremely interesting, touching upon possible points like differences or similarities in local environments, degradation pathways, posttranslation modifications, and regulation. While the initial data structure differences provide a possible explanation, Figure S17A, B correlations show a more complicated picture.

      We agree with the reviewer that biochemical and biophysical differences between the proteins might contribute to the fact that some matrices correlate better than others. We also agree that it would be very interesting to understand these differences better. While it might be possible to examine some of the suggested causes of the differences, like differences or similarities in local environments, we have generally found that noise and differences in score distributions make such analyses difficult (see also responses to reviewers 1 and 2). For now, we will defer additional analyses to future work.

      (2) The performance analysis in Figure 2D seems to show that for particular proteins "less is more" when it comes to which datasets are best to derive the matrix from (CYP2C9, ASPA, PRKN). Are there any features (direct or proxy), that would allow to group proteins to maximize accuracy? Do the authors think on top of the buried vs exposed paradigm, another grouping dimension at the protein/domain level could improve performance?

      We don’t currently know if any protein- or domain-level features could be used to further split residues into useful categories for constructing new substitution matrices, but it is an interesting suggestion. We note that every substitution matrix consists of 380 averages, and creating too many residue groupings will cause some matrix entries to be averaged over very few abundance scores, at least with the current number of scores in the pooled VAMP-seq dataset. For example, while previous work has shown different mutational effects e.g. in helices and sheets (as one would expect), we find that a model with six matrices ({buried,exposed}x{helix,sheet,other}) does not lead to improved predictions (Fig. 2C), presumably because of an unfavourable balance between parameters and data.

      (3) While the matrices and Rosetta seem to show similar degrees of correlation, do the methods both fail and succeed on the same variants? Or do they show a degree of orthogonality and could potentially be synergistic?

      These are good questions and are related to similar questions from reviewers 1 and 2. In the revised manuscript, we have added additional analyses of differences between predictions from our substitution matrix model and a stability model, and we indeed find that the two methods show a degree of orthogonality. However, since we identify only relatively few residues for which one method performs better than the other, we don’t expect a synergistic model to outperform the stability predictor across all variants in any of the six proteins.  

      Overall, this work presents a valuable contribution by creatively utilizing a simple concept through cutting-edge datasets, which could be useful in various.

      Reviewing Editor:

      As discussed in more detail below, to strengthen the assessment, the authors are encouraged to:

      (1) Include more thorough statistical analyses, such as confidence intervals or standard errors, to better validate key claims (e.g., RMSD comparisons).

      (2) Perform a deeper comparison between substitution response matrices and ΔΔG-based predictions to uncover areas of agreement or orthogonality

      (3) Clarify the relationship between structural features, stability, and abundance to provide more mechanistic insights.

      As discussed above and below, we have added new analyses and clarifications to the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      Minor points:

      Why is a continuous version of the contact number used here, instead of a discrete count of neighbouring residues? WCN values of the residues in the core domain can be affected by residues far away (small contribution but not strictly zero; if there are many of them, it adds up).

      We have previously found WCN, which quantifies residue contact numbers in a continuous manner, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also found WCN and the cellular abundance of single substitution variants to correlate well in individual analyses of different proteins (Grønbæk-Thygesen et al., 2024; Gersing et al., 2024; Clausen et al., 2024).

      We have calculated the WCN as well as a contact number based on discrete counts of neighbouring residues for the six proteins in our dataset. When distances between residues are evaluated in the same way (i.e. using the shortest distance between any pair of heavy atoms in the side chains), and when the cutoff value used for the discrete count is equal to the r<sub>0</sub> of the WCN function, the continuous and discrete evaluations of residue contact numbers are highly and linearly correlated, and their rank correlation with the VAMP-seq data are very similar. We only observe minor contributions from residues far away in the structure on the WCN.

      Typos in SI figure captions e.g. Figure S8-11 "All predictions were performed using using...."

      Thank you for pointing this out. We have corrected the typos in Figure S8-11 (Figure S7-S10 in the revised manuscript).

      Personally, I'd appreciate a definition of these new substitution matrices under the constraints of rASA/WCN values. It was unclear to me until I read the code but we think that the definition is averaging the substitution matrix based on the clusters they are assigned to. If so, this could be straightforwardly defined in the method section with a heaviside step function.

      We have added a definition of the “buried” and “exposed” substitution matrices as a function of rASA in the methods section (“Definitions of buried and exposed residues” and “Definition of substitution matrices”) of the manuscript, as well as a definition of how we classified residues as either buried or exposed using both rASA and WCN as input. Our final substitution matrices, as shown in e.g. Fig. 2, do not depend on the WCN; only the substitution matrix results in Figure S6 (Figure S20 in the revised manuscript) depend on both WCN and rASA.

      Reviewer #2 (Recommendations for the authors):

      The following suggestions aim to strengthen the analysis and clarify the presentation of your findings:

      (1) Specific analyses to consider:

      (1.1) Analyze buried positions where the exposed matrix performs better. Understanding these cases might reveal properties of protein core regions that show unexpected mutational tolerance.

      We agree with the reviewer that a more detailed analysis of buried residues with exposed-like substitution profiles would be very interesting.

      We note that for proteins where the VAMP-seq score distribution is shifted towards high values (as it is the case for PTEN, TPMT and CYP2C9), our identification of such residues may be a result of the score distribution differences between the six datasets. To confidently identify mutationally tolerant core regions, it would be best to (a) correct for the distribution differences prior to the analysis or (b) focus the analysis on residues that fall far below the diagonal in Figure S18.

      In additional data (which can be found at https://github.com/KULL-Centre/_2024_Schulze_abundance-analysis)) ,we provide, for each of the proteins, a list of buried residues for which RMSD<sub>exposed</sub> <RMSD<sub>buried</sub> (for more than 95% of resampled substitution profiles, as described under 1.6). We have not analysed these residues further.

      (1.2) A systematic comparison of matrix-based vs. ΔΔG-based predictions could help understand both exposed sites that behave as buried (as analyzed in the paper) and buried sites that behave as exposed (1.1), potentially revealing mechanisms underlying abundance changes.

      In our revised manuscript, we have added additional analyses to compare matrixbased and ΔΔG-based predictions, focusing on exposed sites for which one prediction method captures variant effects on abundance considerably better the other prediction method. We have not investigated buried sites with exposed-like behaviour any further in this work.

      (1.3) Explore different normalization approaches when pooling data across proteins. In particular, consider using log(abundance score): if the experimental error in abundance measurements is multiplicative (which can be checked from the reported standard errors), then log transformation would convert this into a constant additive error, making the analysis more statistically sound.

      As we answer below to point 2.2, the abundance scores are, within each dataset, min-max normalised to nonsense and synonymous variant scores, and the score scale is thus in this way consistent across the six datasets. We have explained above and in the revised manuscript that abundance score distribution differences across datasets are likely partially a result of the FACS binning of assay-specific variant libraries. Using only the VAMP-seq scores (that is, without further information about the individual experiments), we cannot correct for the influence of the sorting strategy on the reported scores. A score normalisation across datasets that places all data points on a single scale would require inter-dataset references variant scores, which we do not have. We note that in a subsequent manuscript (Schulze et al, bioRxiv, 2025) we have attempted to take system- and experimentspecific score distributions into account. We now refer to this work in the revised manuscript.

      (1.4) Consider using correlation coefficients between predicted and observed abundance profiles as an alternative to RMSD, which is sensitive to the absolute values of the scores.

      We agree with the reviewer that using correlation coefficients to compare substitution profiles might also be useful, in particular for datasets with relatively unique VAMP-seq score distributions, such as the ASPA dataset. To explore this idea, we have repeated the analysis presented in Fig. S18 using the Pearson correlation coefficient r rather than the RMSD.

      As in Fig. S18, we derive r<sub>buried</sub> and r<sub>exposed</sub> for every residue in the six proteins, specifically by calculating r between the abundance score substitution profile of every individual residue and the average abundance score substitution profiles of buried and exposed residues. VAMP-seq data for the protein for which r<sub>buried</sub> and r<sub>exposed</sub> are evaluated is omitted from the calculation of average abundance score substitution profiles, and we use only monomer structures to determine whether residues are buried or exposed. 

      We show the results of this analysis in an Author response image 1 below. In each panel of the figure, r<sub>buried</sub> and r<sub>exposed</sub> are shown for individual residues of a single protein. Blue datapoints indicate residues that are solvent-exposed in the wild-type protein structures, and yellow datapoints indicate residues that are buried in the wild-type structures. Residues for which it is not the case that r<sub>buried</sub> < r<sub>exposed</sub> or r<sub>exposed</sub><r<sub>buried</sub> in more than 95% of 1000 resampled residue substitution profiles (see explanation of resampling method above) are coloured grey. “Acc.” is the balanced classification accuracy, calculated using all non-grey datapoints, indicating how many buried residues have buried-like substitution profiles (r<sub>exposed</sub><r<sub>buried</sub>) and how many solvent-exposed residues have exposed-like substitution profiles (r<sub>buried</sub> < r<sub>exposed</sub>). The classification accuracy per protein in this figure cannot be compared to the classification accuracy of the same protein in Fig. S18, since the number of datapoints used in the accuracy calculation differ between the r- and RMSD-based analyses. 

      Author response image 1.

      Comparing the r-based approach to the RMSD-based approach (Fig. S18), it is clear that the r-based method is less robust than the RMSD-based method for noisy and incomplete datasets. For the noisiest and most mutationally incomplete VAMP-seq datasets (i.e., PTEN, TPMT and CYP2C9) (Fig. 1), there are relatively few residues for which we with high confidence can determine if the substitution profile is more buried- or more exposed-like. When the VAMP-seq data is less noisy and has high mutational completeness, the r-based method becomes more robust and may thus be relevant in potential future work on new VAMP-seq data with small error bars.

      In conclusion, we find that RMSD-based approach to compare substitution profiles is more robust than an r-based approach for several of the VAMP-seq datasets that are included in our analysis. We do believe than an approach based on the correlation coefficient, or potentially several metrics, could be relevant to use, since abundance score distributions from VAMP-seq datasets can differ significantly across datasets. So as not to increase the length of the main text of our manuscript, we have not added this analysis to the revised manuscript.

      (1.5) Consider treating missing abundance scores as zero values, as they might indicate variants with very low abundance, rather than omitting them from the analysis.

      This suggestion would be most relevant for the PTEN, TPMT and CYP2C9 datasets, which all have a relatively small average mutational depth and completeness, as shown in Fig. 1B and 1C. To assess if setting missing abundance scores as zero values would be reasonable, we have compared the distributions of predicted ΔΔG values (from RaSP and ThermoMPNN) and of predicted abundance scores (from our exposure-based substitution matrices) for variants with reported and missing VAMP-seq data. We show the result in Author response image 2, with data aggregated across the six protein systems:

      Author response image 2.

      We find that variants with and without VAMP-seq data have similar ΔΔG score distributions and similar predicted abundance score distributions, and there is thus no clear enrichment of predicted loss of abundance for variants with missing VAMP-seq scores. This suggests that missing abundance scores do not necessarily indicate very low abundance. One cause of missing data might instead be problems with library generation (Matreyek et al, 2018, 2021).

      We show in Fig. S9 (Fig. S8 of the revised manuscript) that predicted scores for variants with experimental abundance scores of 0 are often overestimated for NUDT15, ASPA and PRKN, but this is not so much a problem for PTEN, TMPT and CYP2C9, the datasets with most missing scores. The lack of an enrichment of low abundance variants from the various predictors would thus still support that missing scores do not necessarily indicate low abundance.

      (1.6) Develop a proper statistical framework for comparing buried vs exposed predictions (whether using RMSD or correlations), including confidence intervals, rather than using arbitrary thresholds.

      As explained above and in the methods section of our revised manuscript, we have expanded our approach to compare the substitution profile of a residue to the average profiles of buried and exposed residues, and our method now accounts for the noise in the VAMP-seq data, making the analysis more statistically rigorous. In our expanded approach, we compare the substitution profiles of individual residues to the average profiles for buried and exposed residues 10,000 times per residue to get a residue-specific distribution of RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values. Individual RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> values are calculated by resampling abundance scores from a Gaussian distribution defined by the experimentally reported abundance score and abundance score standard deviation per variant. We now only report a residue to have e.g. a buried-like substitution profile if RMSD<sub>buried</sub> < RMSD<sub>exposed</sub> in at least 95% of our samples. We do not recalculate average scores in substitution matrices for this analysis. We have updated the plots in our manuscript, e.g. in Fig. S18 and S19 of the revised version, to indicate which residues are confidently classified as buried- or exposed-like.

      (2) Presentation improvements:

      (2.1) In Figure 4, consider removing the average abundance scores, which are not directly related to the RMSD comparison being shown.

      We have decided to keep the average abundance scores in Fig. 4 (now Fig. 5), as we find the average abundance scores useful for guiding interpretation of the RMSD values. For example, an unusually small average abundance score with a relatively small standard deviation may explain a case where RMSD<sub>buried</sub> and RMSD<sub>exposed</sub> are both large. This is for example the case for residue G185 in ASPA. 

      In our preprint, the error bars on the average abundance scores in Fig. 4 (now Fig. 5) indicated the standard deviation across the abundance scores that were used to calculate the average per position. We have removed these error bars in the revised manuscript, as we realised that these were not necessarily helpful to the reader.

      (2.2) I am assuming that abundance scores are defined as the ratio abundance_variant/abundance_wt throughout the analysis, but I don't think this has been explicitly defined. If this is correct, please state it explicitly. In such case, log(abundance_score) would have a simple interpretation as the difference in abundance between variant and wild-type.

      Abundance scores are defined throughout the manuscript as sequence-based scores that have been min-max normalised to the abundance of nonsense and synonymous variants, i.e. abundance_score = (abundance_variant abundance_nonsense)/(abundance_wt–abundance_nonsense). We have described the normalisation of scores to wild-type and nonsense variant abundance in lines 164-166 of the original manuscript. We have now added additional information about the normalisation scheme in the methods section. We note that we did not ourselves apply this normalisation to the data; the scores were reported in this manner in the original publications that reported the VAMP-seq experiments for the six proteins.

      (2.3) Consider renaming "rASA" to the more commonly used "RSA" for relative solvent accessibility.

      We have decided to keep using “rASA” throughout the manuscript.

      (2.4) The weighted contact number function used differs from the established WCN measure (Σ1/rij²) introduced by Lin et al. (2008, Proteins). This should be acknowledged and the choice of alternative weighting scheme justified.

      As we have also responded to the first minor point of reviewer 1, we have previously found WCN, as it is defined in our manuscript, to be a useful input feature for a classifier that determines whether individual residues are important for maintaining protein abundance or function (Cagiada et al, 2023). We have also previously found this type of WCN to correlate well with variant abundance of individual proteins, as measured with VAMP-seq or protein fragment complementation assays (Grønbæk-Thygesen et al., 2024; Clausen et al., 2024; Gersing et al., 2024). We acknowledge that residue contact numbers or weighted contact numbers could also be expressed in other ways and that alternative contact number definitions would likely also produce values that correlate well with VAMP-seq data. Since the WCN, as defined in our manuscript, already correlates relatively well with abundance scores, we have not explored whether alternative definitions produce better correlations.  

      (2.5) Replace the phrase "in the above" with specific references to sections or simply "above" where appropriate. Also, consider replacing many instances of "moreover" with simpler alternatives such as "also" or "in addition" to improve readability.

      We have changed several sentences according to this suggestion and hope that we have improved the readability of our manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) It should be explicitly confirmed earlier that complex structures are used for NUDT15 and ASPA when assessing rASA/WCN. Additionally, it would be interesting to see the effect that deriving the matrices using NUDT15 and ASPA monomers would have.

      We have commented on the use of NUDT15 and ASPA homodimer structures earlier in the revised manuscript (specifically already in the subsection Abundance scores correlate with the degree of residue solvent-exposure section).

      When residues are classified using monomer rather than dimer structures of NUDT15 and ASPA, there is a small effect on the resulting “buried” and “exposed” substitution matrices. Entries in this set of substitution matrices calculated using either monomer or dimer structures typically differ by less than 0.05, and only a single entry differ by more than 0.1. As expected, the “exposed” matrix tend to contain slightly larger numbers when derived from dimer structures than when derived from monomer structures, meaning that when the interface residues are included in the exposed residue category, the average abundance scores of the “exposed” matrix are lowered. For buried residues, the picture is more mixed, although the overall tendency is that the interface residues make the “buried” matrix contain smaller average abundance scores for dimer compared to monomer structures. These results generally support the use of dimer structures for the residue classification.

      We here show the differences between the substitution matrices calculated with dimer or monomer structures of NUDT15 and ASPA and using data for all six proteins in our combined VAMP-seq dataset (average_abundance_score_differece = average_abundance_score_dimers – average_abundance_score _monomers):

      Author response image 3.

      We have not explored these alternative matrices further.

      (2) While the supplemental analyses are rigorous, the abundance of various metrics being presented can be confusing, especially when they seem to differ in their result. For instance, the discussion of Figure S17 (paragraph starting 428) contains mentions of mean differences but then switches to correlations, while both are presented for all panels. The claim "The datasets thus mainly differ due to differences in substitution effects in buried environments. " is well supported by the observed mean differences, but for Pearson's correlations the average panel A ,B values of buried 0.421 vs exposed 0.427 are hardly different. Which of the metrics is more meaningful, and are both needed?

      We agree with the reviewer that the claim that “The datasets thus mainly differ due to differences in substitution effects in buried environments” is not well-supported by the r between the substitution matrices, and we have removed this claim from the text.

      Since some datasets share VAMP-seq score distribution features, while others do not, the absolute difference between scores or matrices may be relevant to check for some dataset pairs, while the r may be more relevant to check for other dataset pairs. Hence, we have included both metrics in Fig S17 (Fig S11 in the revised manuscript).

      (3) Lines 337-340 - does not feel like S7 is the topic, perhaps the authors meant Figure 2A, B? In general, the supplemental figure references are out of order and panel combinations are sometimes confusing.

      We have corrected figures references to now be correct and changed the arrangement of supplemental figures so that they now occur in the correct order. We have looked through the panel combinations with clarity in mind, and hope that the current set of main and supplementary figures balances overview and detail.

      (4) Line 363 "are also are also".

      We have corrected this typo.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The study analyzes the gastric fluid DNA content identified as a potential biomarker for human gastric cancer. However, the study lacks overall logicality, and several key issues require improvement and clarification. In the opinion of this reviewer, some major revisions are needed:

      (1) This manuscript lacks a comparison of gastric cancer patients' stages with PN and N+PD patients, especially T0-T2 patients.

      We are grateful for this astute remark. A comparison of gfDNA concentration among the diagnostic groups indicates a trend of increasing values as the diagnosis progresses toward malignancy. The observed values for the diagnostic groups are as follows:

      Author response table 1.

      The chart below presents the statistical analyses of the same diagnostic/tumor-stage groups (One-Way ANOVA followed by Tukey’s multiple comparison tests). It shows that gastric fluid gfDNA concentrations gradually increase with malignant progression. We observed that the initial tumor stages (T0 to T2) exhibit intermediate gfDNA levels, which in this group is significantly lower than in advanced disease (p = 0.0036), but not statistically different from non-neoplastic disease (p = 0.74).

      Author response image 1.

      (2) The comparison between gastric cancer stages seems only to reveal the difference between T3 patients and early-stage gastric cancer patients, which raises doubts about the authenticity of the previous differences between gastric cancer patients and normal patients, whether it is only due to the higher number of T3 patients.

      We appreciate the attention to detail regarding the numbers analyzed in the manuscript. Importantly, the results are meaningful because the number of subjects in each group is comparable (T0-T2, N = 65; T3, N = 91; T4, N = 63). The mean gastric fluid gfDNA values (ng/µL) increase with disease stage (T0-T2: 15.12; T3-T4: 30.75), and both are higher than the mean gfDNA values observed in non-neoplastic disease (10.81 ng/µL for N+PD and 10.10 ng/µL for PN). These subject numbers in each diagnostic group accurately reflect real-world data from a tertiary cancer center.

      (3) The prognosis evaluation is too simplistic, only considering staging factors, without taking into account other factors such as tumor pathology and the time from onset to tumor detection.

      Histopathological analyses were performed throughout the study not only for the initial diagnosis of tissue biopsies, but also for the classification of Lauren’s subtypes, tumor staging, and the assessment of the presence and extent of immune cell infiltrates. Regarding the time of disease onset, this variable is inherently unknown--by definition--at the time of a diagnostic EGD. While the prognosis definition is indeed straightforward, we believe that a simple, cost-effective, and practical approach is advantageous for patients across diverse clinical settings and is more likely to be effectively integrated into routine EGD practice.

      (4) The comparison between gfDNA and conventional pathological examination methods should be mentioned, reflecting advantages such as accuracy and patient comfort.

      We wish to reinforce that EGD, along with conventional histopathology, remains the gold standard for gastric cancer evaluation. EGD under sedation is routinely performed for diagnosis, and the collection of gastric fluids for gfDNA evaluation does not affect patient comfort. Thus, while gfDNA analysis was evidently not intended as a diagnostic EGD and biopsy replacement, it may provide added prognostic value to this exam.

      (5) There are many questions in the figures and tables. Please match the Title, Figure legends, Footnote, Alphabetic order, etc.

      We are grateful for these comments and apologize for the clerical oversight. All figures, tables, titles and figure legends have now been double-checked.

      (6) The overall logicality of the manuscript is not rigorous enough, with few discussion factors, and cannot represent the conclusions drawn.

      We assume that the unusual wording remark regarding “overall logicality” pertains to the rationale and/or reasoning of this investigational study. Our working hypothesis was that during neoplastic disease progression, tumor cells continuously proliferate and, depending on various factors, attract immune cell infiltrates. Consequently, both tumor cells and immune cells (as well as tumor-derived DNA) are released into the fluids surrounding the tumor at its various locations, including blood, urine, saliva, gastric fluids, and others. Thus, increases in DNA levels within some of these fluids have been documented and are clinically meaningful. The concurrent observation of elevated gastric fluid gfDNA levels and immune cell infiltration supports the hypothesis that increased gfDNA—which may originate not only from tumor cells but also from immune cells—could be associated with better prognosis, as suggested by this study of a large real-world patient cohort.

      In summary, we thank Reviewer #1 for his time and effort in a constructive critique of our work.

      Reviewer #2 (Public review):

      Summary:

      The authors investigated whether the total DNA concentration in gastric fluid (gfDNA), collected via routine esophagogastroduodenoscopy (EGD), could serve as a diagnostic and prognostic biomarker for gastric cancer. In a large patient cohort (initial n=1,056; analyzed n=941), they found that gfDNA levels were significantly higher in gastric cancer patients compared to non-cancer, gastritis, and precancerous lesion groups. Unexpectedly, higher gfDNA concentrations were also significantly associated with better survival prognosis and positively correlated with immune cell infiltration. The authors proposed that gfDNA may reflect both tumor burden and immune activity, potentially serving as a cost-effective and convenient liquid biopsy tool to assist in gastric cancer diagnosis, staging, and follow-up.

      Strengths:

      This study is supported by a robust sample size (n=941) with clear patient classification, enabling reliable statistical analysis. It employs a simple, low-threshold method for measuring total gfDNA, making it suitable for large-scale clinical use. Clinical confounders, including age, sex, BMI, gastric fluid pH, and PPI use, were systematically controlled. The findings demonstrate both diagnostic and prognostic value of gfDNA, as its concentration can help distinguish gastric cancer patients and correlates with tumor progression and survival. Additionally, preliminary mechanistic data reveal a significant association between elevated gfDNA levels and increased immune cell infiltration in tumors (p=0.001).

      Reviewer #2 has conceptually grasped the overall rationale of the study quite well, and we are grateful for their assessment and comprehensive summary of our findings.

      Weaknesses:

      (1) The study has several notable weaknesses. The association between high gfDNA levels and better survival contradicts conventional expectations and raises concerns about the biological interpretation of the findings.

      We agree that this would be the case if the gfDNA was derived solely from tumor cells. However, the findings presented here suggest that a fraction of this DNA would be indeed derived from infiltrating immune cells. The precise determination of the origin of this increased gfDNA remains to be achieved in future follow-up studies, and these are planned to be evaluated soon, by applying DNA- and RNA-sequencing methodologies and deconvolution analyses.

      (2) The diagnostic performance of gfDNA alone was only moderate, and the study did not explore potential improvements through combination with established biomarkers. Methodological limitations include a lack of control for pre-analytical variables, the absence of longitudinal data, and imbalanced group sizes, which may affect the robustness and generalizability of the results.

      Reviewer #2 is correct that this investigational study was not designed to assess the diagnostic potential of gfDNA. Instead, its primary contribution is to provide useful prognostic information. In this regard, we have not yet explored combining gfDNA with other clinically well-established diagnostic biomarkers. We do acknowledge this current limitation as a logical follow-up that must be investigated in the near future.

      Moreover, we collected a substantial number of pre-analytical variables within the limitations of a study involving over 1,000 subjects. Longitudinal samples and data were not analyzed here, as our aim was to evaluate prognostic value at diagnosis. Although the groups are imbalanced, this accurately reflects the real-world population of a large endoscopy center within a dedicated cancer facility. Subjects were invited to participate and enter the study before sedation for the diagnostic EGD procedure; thus, samples were collected prospectively from all consenting individuals.

      Finally, to maintain a large, unbiased cohort, we did not attempt to balance the groups, allowing analysis of samples and data from all patients with compatible diagnoses (please see Results: Patient groups and diagnoses).

      (3) Additionally, key methodological details were insufficiently reported, and the ROC analysis lacked comprehensive performance metrics, limiting the study's clinical applicability.

      We are grateful for this useful suggestion. In the current version, each ROC curve (Supplementary Figures 1A and 1B) now includes the top 10 gfDNA thresholds, along with their corresponding sensitivity and specificity values (please see Suppl. Table 1). The thresholds are ordered from-best-to-worst based on the classic Youden’s J statistic, as follows:

      Youden Index = specificity + sensitivity – 1 [Youden WJ. Index for rating diagnostic tests. Cancer 3:32-35, 1950. PMID: 15405679]. We have made an effort to provide all the key methodological details requested, but we would be glad to add further information upon specific request.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an excellent study by a superb investigator who discovered and is championing the field of migrasomes. This study contains a hidden "gem" - the induction of migrasomes by hypotonicity and how that happens. In summary, an outstanding fundamental phenomenon (migrasomes) en route to becoming transitionally highly significant.

      Strengths:

      Innovative approach at several levels. Migrasomes - discovered by Dr Yu's group - are an outstanding biological phenomenon of fundamental interest and now of potentially practical value.

      Weaknesses:

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      We sincerely thank the reviewer for the encouraging and insightful comments. We fully agree that the fundamental aspects of migrasome biology are of great importance and deserve deeper exploration.

      In line with the reviewer’s suggestion, we have expanded our discussion on the basic biology of engineered migrasomes (eMigs). A recent study by the Okochi group at the Tokyo Institute of Technology demonstrated that hypoosmotic stress induces the formation of migrasome-like vesicles, involving cytoplasmic influx and requiring cholesterol for their formation (DOI: 10.1002/1873-3468.14816, February 2024). Building on this, our study provides a detailed characterization of hypoosmotic stressinduced eMig formation, and further compares the biophysical properties of natural migrasomes and eMigs. Notably, the inherent stability of eMigs makes them particularly promising as a vaccine platform.

      Finally, we would like to note that our laboratory continues to investigate multiple aspects of migrasome biology. In collaboration with our colleagues, we recently completed a study elucidating the mechanical forces involved in migrasome formation (DOI: 10.1016/j.bpj.2024.12.029), which further complements the findings presented here.

      Reviewer #2 (Public review):

      Summary:

      The authors' report describes a novel vaccine platform derived from a newly discovered organelle called a migrasome. First, the authors address a technical hurdle in using migrasomes as a vaccine platform. Natural migrasome formation occurs at low levels and is labor intensive, however, by understanding the molecular underpinning of migrasome formation, the authors have designed a method to make engineered migrasomes from cultured, cells at higher yields utilizing a robust process. These engineered migrasomes behave like natural migrasomes. Next, the authors immunized mice with migrasomes that either expressed a model peptide or the SARSCoV-2 spike protein. Antibodies against the spike protein were raised that could be boosted by a 2nd vaccination and these antibodies were functional as assessed by an in vitro pseudoviral assay. This new vaccine platform has the potential to overcome obstacles such as cold chain issues for vaccines like messenger RNA that require very stringent storage conditions.

      Strengths:

      The authors present very robust studies detailing the biology behind migrasome formation and this fundamental understanding was used to form engineered migrasomes, which makes it possible to utilize migrasomes as a vaccine platform. The characterization of engineered migrasomes is thorough and establishes comparability with naturally occurring migrasomes. The biophysical characterization of the migrasomes is well done including thermal stability and characterization of the particle size (important characterizations for a good vaccine).

      Weaknesses:

      With a new vaccine platform technology, it would be nice to compare them head-tohead against a proven technology. The authors would improve the manuscript if they made some comparisons to other vaccine platforms such as a SARS-CoV-2 mRNA vaccine or even an adjuvanted recombinant spike protein. This would demonstrate a migrasome-based vaccine could elicit responses comparable to a proven vaccine technology. 

      We thank the reviewer for the thoughtful evaluation and constructive suggestions, which have helped us strengthen the manuscript. 

      Comparison with proven vaccine technologies:

      In response to the reviewer’s comment, we now include a direct comparison of the antibody responses elicited by eMig-Spike and a conventional recombinant S1 protein vaccine formulated with Alum. As shown in the revised manuscript (Author response image 1), the levels of S1-specific IgG induced by the eMig-based platform were comparable to those induced by the S1+Alum formulation. This comparison supports the potential of eMigs as a competitive alternative to established vaccine platforms. 

      Author response image 1.

      eMigrasome-based vaccination showed similar efficacy compared with adjuvanted recombinant spike protein The amount of S1-specific IgG in mouse serum was quantified by ELISA on day 14 after immunization. Mice were either intraperitoneally (i.p.) immunized with recombinant Alum/S1 or intravenously (i.v.) immunized with eM-NC, eM-S or recombinant S1. The administered doses were 20 µg/mouse for eMigrasomes, 10 µg/mouse (i.v.) or 50 µg/mouse (i.p.) for recombinant S1 and 50 µl/mouse for Aluminium adjuvant.

      Assessment of antigen integrity on migrasomes:

      To address the reviewer’s suggestion regarding antigen integrity, we performed immunoblotting using antibodies against both S1 and mCherry. Two distinct bands were observed: one at the expected molecular weight of the S-mCherry fusion protein, and a higher molecular weight band that may represent oligomerized or higher-order forms of the Spike protein (Figure 5b in the revised manuscript).

      Furthermore, we performed confocal microscopy using a monoclonal antibody against Spike (anti-S). Co-localization analysis revealed strong overlap between the mCherry fluorescence and anti-Spike staining, confirming the proper presentation and surface localization of intact S-mCherry fusion protein on eMigs (Figure 5c in the revised manuscript). These results confirm the structural integrity and antigenic fidelity of the Spike protein expressed on eMigs.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      I feel that the overemphasis on practical aspects (vaccine), however important, eclipses some of the fundamental aspects that may be just as important and actually more interesting. If this can be expanded, the study would be outstanding.

      I know that the reviewers always ask for more, and this is not the case here. Can the abstract and title be changed to emphasize the science behind migrasome formation, and possibly add a few more fundamental aspects on how hypotonic shock induces migrasomes?

      Alternatively, if the authors desire to maintain the emphasis on vaccines, can immunological mechanisms be somewhat expanded in order to - at least to some extent - explain why migrasomes are a better vaccine vehicle?

      One way or another, this reviewer is highly supportive of this study and it is really up to the authors and the editor to decide whether my comments are of use or not.

      My recommendation is to go ahead with publishing after some adjustments as per above.

      We’d like to thank the reviewer for the suggestion. We have changed the title of the manuscript and modified the abstract, emphasizing the fundamental science behind the development of eMigrasome. To gain some immunological information on eMig illucidated antibody responses, we characterized the type of IgG induced by eM-OVA in mice, and compared it to that induced by Alum/OVA. The IgG response to Alum/OVA was dominated by IgG1. Quite differently, eM-OVA induced an even distribution of IgG subtypes, including IgG1, IgG2b, IgG2c, and IgG3 (Figure 4i in the revised manuscript). The ratio between IgG1 and IgG2a/c indicates a Th1 or Th2 type humoral immune response. Thus, eM-OVA immunization induces a balance of Th1/Th2 immune responses.

      Reviewer #2 (Recommendations For The Authors):

      The study is a very nice exploration of a new vaccine platform. This reviewer believes that a more head-to-head comparison to the current vaccine SARS-CoV-2 vaccine platform would improve the manuscript. This comparison is done with OVA antigen, but this model antigen is not as exciting as a functional head-to-head with a SARS-CoV-2 vaccine.

      I think that two other discussion points should be included in the manuscript. First, was the host-cell protein evaluated? If not, I would include that point on how issues of host cell contamination of the migrasome could play a role in the responses and safety of a vaccine. Second, I would discuss antigen incorporation and localization into the platform. For example, the full-length spike being expressed has a native signal peptide and transmembrane domain. The authors point out that a transmembrane domain can be added to display an antigen that does not have one natively expressed, however, without a signal peptide this would not be secreted and localized properly. I would suggest adding a discussion of how a non-native signal peptide would be necessary in addition to a transmembrane domain.

      We thank the reviewer for these thoughtful suggestions and fully agree that the points raised are important for the translational development of eMig-based vaccines.

      (1) Host cell proteins and potential immunogenicity:

      We appreciate the reviewer’s suggestion to consider host cell protein contamination. Considering potential clinical application of eMigrasomes in the future, we will use human cells with low immunogenicity such as HEK-293 or embryonic stem cells (ESCs) to generate eMigrasomes. Also, we will follow a QC that meets the standard of validated EV-based vaccination techniques. 

      (2) Antigen incorporation and localization—signal peptide and transmembrane domain:

      We also agree with the reviewer’s point that proper surface display of antigens on eMigs requires both a transmembrane domain and a signal peptide for correct trafficking and membrane anchoring. For instance, in the case of full-length Spike protein, the native signal peptide and transmembrane domain ensure proper localization to the plasma membrane and subsequent incorporation into eMigs. In case of OVA, a secretary protein that contains a native signal peptide yet lacks a transmembrane domain, an engineered transmembrane domain is required. For antigens that do not naturally contain these features, both a non-native signal peptide and an artificial transmembrane domain are necessary. We have clarified this point in the revised discussion and explicitly noted the requirement for a signal peptide when engineering antigens for surface display on migrasomes.

    1. Author response:

      The following is the authors’ response to the original reviews

      We again thank the reviewers for their comments and recommendations. In response to the reviewer’s suggestions, we have performed several additional experiments, added additional discussion, and updated our conclusions to reflect the additional work. Specifically, we have performed additional analyses in female WT and Marco-deficient animals, demonstrating that the Marco-associated phonotypes observed in male mice (reduced adrenal weight, increased lung Ace mRNA and protein expression, unchanged expression of adrenal corticosteroid biosynthetic enzymes) are not present in female mice. We also report new data on the physiological consequences of increased aldosterone levels observed in male mice, namely plasma sodium and potassium titres, and blood pressure alterations in WT vs Marco-deficient male mice. In an attempt to address the reviewer’s comments relating to our proposed mechanism on the regulation of lung Ace expression, we additionally performed a co-culture experiment using an alveolar macrophage cell line and an endothelial cell line. In light of the additional evidence presented herein, we have updated our conclusions from this study and changed the title of our work to acknowledge that the mechanism underlying the reported phenotype remains incompletely understood. Specific responses to reviewers can be seen below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The investigators sought to determine whether Marco regulates the levels of aldosterone by limiting uptake of its parent molecule cholesterol in the adrenal gland. Instead, they identify an unexpected role for Marco on alveolar macrophages in lowering the levels of angiotensin-converting enzyme in the lung. This suggests an unexpected role of alveolar macrophages and lung ACE in the production of aldosterone.

      Strengths:

      The investigators suggest an unexpected role for ACE in the lung in the regulation of systemic aldosterone levels.

      The investigators suggest important sex-related differences in the regulation of aldosterone by alveolar macrophages and ACE in the lung.

      Studies to exclude a role for Marco in the adrenal gland are strong, suggesting an extra-adrenal source for the excess Marco observed in male Marco knockout mice.

      Weaknesses:

      While the investigators have identified important sex differences in the regulation of extrapulmonary ACE in the regulation of aldosterone levels, the mechanisms underlying these differences are not explored.

      The physiologic impact of the increased aldosterone levels observed in Marco -/- male mice on blood pressure or response to injury is not clear.

      The intracellular signaling mechanism linking lung macrophage levels with the expression of ACE in the lung is not supported by direct evidence.

      Reviewer #2 (Public Review):

      Summary:

      Tissue-resident macrophages are more and more thought to exert key homeostatic functions and contribute to physiological responses. In the report of O'Brien and Colleagues, the idea that the macrophage-expressed scavenger receptor MARCO could regulate adrenal corticosteroid output at steady-state was explored. The authors found that male MARCO-deficient mice exhibited higher plasma aldosterone levels and higher lung ACE expression as compared to wild-type mice, while the availability of cholesterol and the machinery required to produce aldosterone in the adrenal gland were not affected by MARCO deficiency. The authors take these data to conclude that MARCO in alveolar macrophages can negatively regulate ACE expression and aldosterone production at steady-state and that MARCO-deficient mice suffer from secondary hyperaldosteronism.

      Strengths:

      If properly demonstrated and validated, the fact that tissue-resident macrophages can exert physiological functions and influence endocrine systems would be highly significant and could be amenable to novel therapies.

      Weaknesses:

      The data provided by the authors currently do not support the major claim of the authors that alveolar macrophages, via MARCO, are involved in the regulation of a hormonal output in vivo at steady-state. At this point, there are two interesting but descriptive observations in male, but not female, MARCO-deficient animals, and overall, the study lacks key controls and validation experiments, as detailed below.

      Major weaknesses:

      (1) According to the reviewer's own experience, the comparison between C57BL/6J wild-type mice and knock-out mice for which precise information about the genetic background and the history of breedings and crossings is lacking, can lead to misinterpretations of the results obtained. Hence, MARCO-deficient mice should be compared with true littermate controls.

      (2) The use of mice globally deficient for MARCO combined with the fact that alveolar macrophages produce high levels of MARCO is not sufficient to prove that the phenotype observed is linked to alveolar macrophage-expressed MARCO (see below for suggestions of experiments).

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. In addition, co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Corticosterone levels in male Marco -/- mice are not significantly different, but there is (by eye) substantially more variability in the knockout compared to the wild type. A power analysis should be performed to determine the number of mice needed to detect a similar % difference in corticosterone to the difference observed in aldosterone between male Marco knockout and wild-type mice. If necessary the experiments should be repeated with an adequately powered cohort.

      Using a power calculator (www.gigacalculator.com) it was determined that our sample size of 13 was one less than sufficient to detect a similar % difference in corticosterone as was detected in corticosterone. We regret that we unable to perform additional measurements as the author suggested in the available timeframe.

      (2) All of the data throughout the MS (particularly data in the lung) should be presented in male and female mice. For example, the induction of ACE in the lungs of Marco-/- female mice should be absent. Similar concerns relate to the dexamethasone suppression studies. Also would be useful if the single cell data could be examined by sex--should be possible even post hoc using Xist etc.

      Given the limitations outlined in our previous response to reviewers it was not possible to repeat every experiment from the original manuscript. We were able to measure the expression of lung Ace mRNA, ACE protein, adrenal weights, adrenal expression of steroid biosynthetic enzymes, presence of myeloid cells, and levels of serum electrolytes in female animals. These are presented in figures 1G, 3B, 4A, 4E, 4F, 4I, and 4J. We have elected to not present single cell seq data according to sex as it did not indicate substantial differences between males and females in Marco or Ace expression and so does not substantively change our approach.

      (3) IF is notoriously unreliable in the lung, which has high levels of autofluorescence. This is the only method used to show ACE levels are increased in the absence of Marco. Orthogonal methods (e.g. immunoblots of flow-sorted cells, or ideally CITE-seq that includes both male and female mice) should be used.

      We used negative controls to guide our settings during acquisition of immunofluorescent images. Additionally, we also used qPCR to show an increase in Ace mRNA expression in the lung in addition to the protein level. This data was presented in the original manuscript and is further bolstered by our additional presentation of expression data for Ace mRNA and protein in female animals in this revised manuscript.

      (4) Given the central importance of ACE staining to the conclusions, validation of the antibody should be included in the supplement.

      We don’t have ACE-deficient mice so cannot do KO validation of the antibody. We did perform secondary stain controls which confirmed the signal observed is primary antibody-derived. Moreover, we specifically chose an anti-ACE antibody (Invitrogen catalogue # MA5-32741) that has undergone advanced verification with the manufacturer. We additionally tested the antibody in the brain and liver and observed no significant levels of staining.

      Author response image 1.

      (5) The link between alveolar macrophage Marco and ACE is poorly explored.

      We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the discussion.

      (6) Mechanisms explaining the substantial sex difference in the primary outcome are not explored.

      This is outside the scope if this project, though we would consider exploring such experiments in future studies.

      (7) Are there physiologic consequences either in homeostasis or under stress to the increased aldosterone (or lung ACE levels) observed in Marco-/- male mice?

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      Reviewer #2 (Recommendations For The Authors):

      Below is a suggestion of important control or validation experiments to be performed in order to support the authors' claims.

      (1) It is imperative to validate that the phenotype observed in MARCO-deficient mice is indeed caused by the deficiency in MARCO. To this end, littermate mice issued from the crossing between heterozygous MARCO +/- mice should be compared to each other. C57BL/6J mice can first be crossed with MARCO-deficient mice in F0, and F1 heterozygous MARCO +/- mice should be crossed together to produce F2 MARCO +/+, MARCO +/- and MARCO -/- littermate mice that can be used for experiments.

      We thank the reviewer for their comments. We recognise the concern of the reviewer but due to limited experimenter availability we are unable to undertake such a breeding programme to address this particular concern.

      (2) The use of mice in which AM, but not other cells, lack MARCO expression would demonstrate that the effect is indeed linked to AM. To this end, AM-deficient Csf2rb-deficient mice could be adoptively transferred with MARCO-deficient AM. In addition, the phenotype of MARCO-deficient mice should be restored by the adoptive transfer of wild-type, MARCO-expressing AM. Alternatively, bone marrow chimeras in which only the hematopoietic compartment is deficient in MARCO would be another option, albeit less specific for AM.

      We recognise the concern of the reviewer. We carried out a co-culture experiments of alveolar macrophages and endothelial cells and measure ACE/Ace expression as a consequence. This is presented in figure 5D and the implications explored in the discussion.

      (3) If the hypothesis of the authors is correct, then additional read-outs could be performed to reinforce their claims: levels of Angiotensin I would be lower in MARCO-deficient mice, levels of Antiotensin II would be higher in MARCO-deficient mice, Arterial blood pressure would be higher in MARCO-deficient mice, natremia would be higher in MARCO-deficient mice, while kaliemia would be lower in MARCO-deficient mice. Similar read-outs could also be performed in the models proposed in point 2).

      We measured blood electrolytes and blood pressure in Marco-deficient and Marco-sufficient mice. The results from these experiments are presented in 4G-4M.

      (4) Co-culture experiments between MARCO-sufficient or deficient alveolar macrophages and lung endothelial cells, combined with the assessment of ACE expression, would allow the authors to evaluate whether the AM-expressed MARCO can directly regulate ACE expression.

      To address this concern we carried out a co-culture experiment as described above.

    1. Author response:

      General Statements

      We are delighted that all reviewers found our manuscript to be a technical advance by providing a much sought after method to arrest budding yeast cells in metaphase of mitosis or both meiotic metaphases. The reviewers also valued our use of this system to make new discoveries in two areas. First, we provided evidence that the spindle checkpoint is intrinsically weaker in meiosis I and showed that this is due to PP1 phosphatase. Second, we determined how the composition and phosphorylation of the kinetochore changes during meiosis, providing key insights into kinetochore function and providing a rich dataset for future studies.

      The reviewers also made some extremely helpful suggestions to improve our manuscript, which we will now implement:

      (1) Improvements to the discussion throughout the manuscript. The reviewers recommended that we focus our discussion on the novel findings of the manuscript and drew out some key points of interest that deserve more attention. We fully agree with this and we will address this in a revised version.

      (2) We will add a new supplemental figure to help interpret the mass spectrometry data, to address Reviewer #3, point 4.

      (3) We are currently performing an additional control experiment to address the minor point 1 from reviewer #3. Our experiment to confirm that SynSAC relies on endogenous checkpoint proteins was missing the cell cycle profile of cells where SynSAC was not induced for comparison. We will add this control to our full revision.

      (4) In our full revision we will also include representative images of spindle morphology as requested by Reviewer #1, point 2

      Description of the planned revisions

      Reviewer #1 (Evidence, reproducibility and clarity):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is that it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division. Overall, I have only a few minor suggestions.

      We appreciate the reviewers’ support of our study.

      (1) In wild-type - Pds1 levels are high during M1 and A1, but low in MII. Can the authors comment on this? In line 217, what is meant by "slightly attenuated? Can the authors comment on how anaphase occurs in presence of high Pds1? There is even a low but significant level in MII.

      The higher levels of Pds1 in meiosis I compared to meiosis II has been observed previously using immunofluorescence and live imaging[1–3]. Although the reasons are not completely clear, we speculate that there is insufficient time between the two divisions to re-accumulate Pds1 prior to separase re-activation.

      We agree “slightly attenuated” was confusing and we have re-worded this sentence to read “Addition ABA at the time of prophase release resulted in Pds1securin stabilisation throughout the time course, consistent with delays in both metaphase I and II”.

      We do not believe that either anaphase I or II occur in the presence of high Pds1. Western blotting represents the amount of Pds1 in the population of cells at a given time point. The time between meiosis I and II is very short even when treated with ABA. For example, in Figure 2B, spindle morphology counts show that the anaphase I peak is around 40% at its maxima (105 min) and around 40% of cells are in either metaphase I or metaphase II, and will be Pds1 positive. In contrast, due to the better efficiency of meiosis II, anaphase II hardly occurs at all in these conditions, since anaphase II spindles (and the second nuclear division) are observed at very low frequency (maximum 10%) from 165 minutes onwards. Instead, metaphase II spindles partially or fully breakdown, without undergoing anaphase extension. Taking Pds1 levels from the western blot and the spindle data together leads to the conclusion that at the end of the time-course, these cells are biochemically in metaphase II, but unable to maintain a robust spindle. Spindle collapse is also observed in other situations where meiotic exit fails, and potentially reflects an uncoupling of the cell cycle from the programme governing gamete differentiation[3–5]. We will explain this point in a revised version while referring to representative images that from evidence for this, as also requested by the reviewer below.

      (2) The figures with data characterizing the system are mostly graphs showing time course of MI and MII. There is no cytology, which is a little surprising since the stage is determined by spindle morphology. It would help to see sample sizes (ie. In the Figure legends) and also representative images. It would also be nice to see images comparing the same stage in the SynSAC cells versus normal cells. Are there any differences in the morphology of the spindles or chromosomes when in the SynSAC system?

      This is an excellent suggestion and will also help clarify the point above. We will provide images of cells at the different stages. For each timepoint, 100 cells were scored. We have already included this information in the figure legends 

      (3) A possible criticism of this system could be that the SAC signal promoting arrest is not coming from the kinetochore. Are there any possible consequences of this? In vertebrate cells, the RZZ complex streams off the kinetochore. Yeast don't have RZZ but this is an example of something that is SAC dependent and happens at the kinetochore. Can the authors discuss possible limitations such as this? Does the inhibition of the APC effect the native kinetochores? This could be good or bad. A bad possibility is that the cell is behaving as if it is in MII, but the kinetochores have made their microtubule attachments and behave as if in anaphase.

      In our view, the fact that SynSAC does not come from kinetochores is a major advantage as this allows the study of the kinetochore in an unperturbed state. It is also important to note that the canonical checkpoint components are all still present in the SynSAC strains, and perturbations in kinetochore-microtubule interactions would be expected to mount a kinetochore-driven checkpoint response as normal. Indeed, it would be interesting in future work to understand how disrupting kinetochore-microtubule attachments alters kinetochore composition (presumably checkpoint proteins will be recruited) and phosphorylation but this is beyond the scope of this work. In terms of the state at which we are arresting cells – this is a true metaphase because cohesion has not been lost but kinetochore-microtubule attachments have been established. This is evident from the enrichment of microtubule regulators but not checkpoint proteins in the kinetochore purifications from metaphase I and II. While this state is expected to occur only transiently in yeast, since the establishment of proper kinetochore-microtubule attachments triggers anaphase onset, the ability to capture this properly bioriented state will be extremely informative for future studies. We appreciate the reviewers’ insight in highlighting these interesting discussion points which we will include in a revised version.

      Reviewer #1 (Significance):

      These authors have developed a method to induce MI or MII arrest. While this was previously possible in MI, the advantage of the method presented here is it works for MII, and chemically inducible because it is based on a system that is sensitive to the addition of ABA. Depending on when the ABA is added, they achieve a MI or MII delay. The ABA promotes dimerizing fragments of Mps1 and Spc105 that can't bind their chromosomal sites. The evidence that the MI arrest is weaker than the MII arrest is convincing and consistent with published data and indicating the SAC in MI is less robust than MII or mitosis. The authors use this system to find evidence that the weak MI arrest is associated with PP1 binding to Spc105. This is a nice use of the system.

      The remainder of the paper uses the SynSAC system to isolate populations enriched for MI or MII stages and conduct proteomics. This shows a powerful use of the system but more work is needed to validate these results, particularly in normal cells.

      Overall the most significant aspect of this paper is the technical achievement, which is validated by the other experiments. They have developed a system and generated some proteomics data that maybe useful to others when analyzing kinetochore composition at each division.

      We appreciate the reviewer’s enthusiasm for our work.

      Reviewer #2 (Evidence, reproducibility and clarity):

      The manuscript submitted by Koch et al. describes a novel approach to collect budding yeast cells in metaphase I or metaphase II by synthetically activating the spinde checkpoint (SAC). The arrest is transient and reversible. This synchronization strategy will be extremely useful for studying meiosis I and meiosis II, and compare the two divisions. The authors characterized this so-named syncSACapproach and could confirm previous observations that the SAC arrest is less efficient in meiosis I than in meiosis II. They found that downregulation of the SAC response through PP1 phosphatase is stronger in meiosis I than in meiosis II. The authors then went on to purify kinetochore-associated proteins from metaphase I and II extracts for proteome and phosphoproteome analysis. Their data will be of significant interest to the cell cycle community (they compared their datasets also to kinetochores purified from cells arrested in prophase I and -with SynSAC in mitosis).

      I have only a couple of minor comments:

      (1) I would add the Suppl Figure 1A to main Figure 1A. What is really exciting here is the arrest in metaphase II, so I don't understand why the authors characterize metaphase I in the main figure, but not metaphase II. But this is only a suggestion.

      This is a good suggestion, we will do this in our full revision.

      (2) Line 197, the authors state: “...SyncSACinduced a more pronounced delay in metaphase II than in metaphase I”. However, line 229 and 240 the authors talk about a "longer delay in metaphase <i compared to metaphase II"... this seems to be a mix-up.

      Thank you for pointing this out, this is indeed a typo and we have corrected it.

      (3) The authors describe striking differences for both protein abundance and phosphorylation for key kinetochore associated proteins. I found one very interesting protein that seems to be very abundant and phosphorylated in metaphase I but not metaphase II, namely Sgo1. Do the authors think that Sgo1 is not required in metaphase II anymore? (Top hit in suppl Fig 8D).

      This is indeed an interesting observation, which we plan to investigate as part of another study in the future. Indeed, data from mouse indicates that shugoshin-dependent cohesin deprotection is already absent in meiosis II in mouse oocytes[6], though whether this is also true in yeast is not known. Furthermore, this does not rule out other functions of Sgo1 in meiosis II (for example promoting biorientation). We will include this point in the discussion.

      Reviewer #2 (Significance):

      The technique described here will be of great interest to the cell cycle community. Furthermore, the authors provide data sets on purified kinetochores of different meiotic stages and compare them to mitosis. This paper will thus be highly cited, for the technique, and also for the application of the technique.

      Reviewer #3 (Evidence, reproducibility and clarity):

      In their manuscript, Koch et al. describe a novel strategy to synchronize cells of the budding yeast Saccharomyces cerevisiae in metaphase I and metaphase II, thereby facilitating comparative analyses between these meiotic stages. This approach, termed SynSAC, adapts a method previously developed in fission yeast and human cells that enables the ectopic induction of a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC components upon addition of the plant hormone abscisic acid (ABA). This is a valuable tool, which has the advantage that induces SAC-dependent inhibition of the anaphase promoting complex without perturbing kinetochores. Furthermore, since the same strategy and yeast strain can be also used to induce a metaphase arrest during mitosis, the methodology developed by Koch et al. enables comparative analyses between mitotic and meiotic cell divisions. To validate their strategy, the authors purified kinetochores from meiotic metaphase I and metaphase II, as well as from mitotic metaphase, and compared their protein composition and phosphorylation profiles. The results are presented clearly and in an organized manner.

      We are grateful to the reviewer for their support.

      Despite the relevance of both the methodology and the comparative analyses, several main issues should be addressed:

      (1) In contrast to the strong metaphase arrest induced by ABA addition in mitosis (Supp. Fig. 2), the SynSAC strategy only promotes a delay in metaphase I and metaphase II as cells progress through meiosis. This delay extends the duration of both meiotic stages, but does not markedly increase the percentage of metaphase I or II cells in the population at a given timepoint of the meiotic time course (Fig. 1C). Therefore, although SynSAC broadens the time window for sample collection, it does not substantially improve differential analyses between stages compared with a standard NDT80 prophase block synchronization experiment. Could a higher ABA concentration or repeated hormone addition improve the tightness of the meiotic metaphase arrest?

      For many purposes the enrichment and extended time for sample collection is sufficient, as we demonstrate here. However, as pointed out by the reviewer below, the system can be improved by use of the 4A-RASA mutations to provide a stronger arrest (see our response below). We did not experiment with higher ABA concentrations or repeated addition since the very robust arrest achieved with the 4A-RASA mutant deemed this unnecessary.

      (2) Unlike the standard SynSAC strategy, introducing mutations that prevent PP1 binding to the SynSAC construct considerably extended the duration of the meiotic metaphase arrests. In particular, mutating PP1 binding sites in both the RVxF (RASA) and the SILK (4A) motifs of the Spc105(1-455)-PYL construct caused a strong metaphase I arrest that persisted until the end of the meiotic time course (Fig. 3A). This stronger and more prolonged 4A-RASA SynSAC arrest would directly address the issue raised above. It is unclear why the authors did not emphasize more this improved system. Indeed, the 4A-RASA SynSAC approach could be presented as the optimal strategy to induce a conditional metaphase arrest in budding yeast meiosis, since it not only adapts but also improves the original methods designed for fission yeast and human cells. Along the same lines, it is surprising that the authors did not exploit the stronger arrest achieved with the 4A-RASA mutant to compare kinetochore composition at meiotic metaphase I and II.

      We agree that the 4A-RASA mutant is the best tool to use for the arrest and going forward this will be our approach. We collected the proteomics data and the data on the SynSAC mutant variants concurrently, so we did not know about the improved arrest at the time the proteomics experiment was done. Because very good arrest was already achieved with the unmutated SynSAC construct, we could not justify repeating the proteomics experiment which is a large amount of work using significant resources. However, we will highlight the potential of the 4A-RASA mutant more prominently in our full revision.

      (3) The results shown in Supp. Fig. 4C are intriguing and merit further discussion. Mitotic growth in ABA suggest that the RASA mutation silences the SynSAC effect, yet this was not observed for the 4A or the double 4A-RASA mutants. Notably, in contrast to mitosis, the SynSAC 4A-RASA mutation leads to a more pronounced metaphase I meiotic delay (Fig. 3A). It is also noteworthy that the RVAF mutation partially restores mitotic growth in ABA. This observation supports, as previously demonstrated in human cells, that Aurora B-mediated phosphorylation of S77 within the RVSF motif is important to prevent PP1 binding to Spc105 in budding yeast as well.

      We agree these are intriguing findings that highlight key differences as to the wiring of the spindle checkpoint in meiosis and mitosis and potential for future studies, however, currently we can only speculate as to the underlying cause. The effect of the RASA mutation in mitosis is unexpected and unexplained. However, the fact that the 4A-RASA mutation causes a stronger delay in meiosis I compared to mitosis can be explained by a greater prominence of PP1 phosphatase in meiosis. Indeed, our data (Figure 4A) show that the PP1 phosphatase Glc7 and its regulatory subunit Fin1 are highly enriched on kinetochores at all meiotic stages compared to mitosis.

      We agree that the improved growth of the RVAF mutant is intriguing and points to a role of Aurora B-mediated phosphorylation, though previous work has not supported such a role [7].

      We will include a discussion of these important points in a revised version.

      (4) To demonstrate the applicability of the SynSAC approach, the authors immunoprecipitated the kinetochore protein Dsn1 from cells arrested at different meiotic or mitotic stages, and compared kinetochore composition using data independent acquisition (DIA) mass spectrometry. Quantification and comparative analyses of total and kinetochore protein levels were conducted in parallel for cells expressing either FLAG-tagged or untagged Dsn1 (Supp. Fig. 7A-B). To better detect potential changes, protein abundances were next scaled to Dsn1 levels in each sample (Supp. Fig. 7C-D). However, it is not clear why the authors did not normalize protein abundance in the immunoprecipitations from tagged samples at each stage to the corresponding untagged control, instead of performing a separate analysis. This would be particularly relevant given the high sensitivity of DIA mass spectrometry, which enabled quantification of thousands of proteins. Furthermore, the authors compared protein abundances in tagged-samples from mitotic metaphase and meiotic prophase, metaphase I and metaphase II (Supp. Fig. 7E-F). If protein amounts in each case were not normalized to the untagged controls, as inferred from the text (lines 333 to 338), the observed differences could simply reflect global changes in protein expression at different stages rather than specific differences in protein association to kinetochores.

      While we agree with the reviewer that at first glance, normalising to no tag appears to be the most appropriate normalisation, in practice there is very low background signal in the no tag sample which means that any random fluctuations have a big impact on the final fold change used for normalisation. This approach therefore introduces artefacts into the data rather than improving normalisation.

      To provide reassurance that our kinetochore immunoprecipitations are specific, and that the background (no tag) signal is indeed very low, we will provide a new supplemental figure showing the volcanos comparing kinetochore purifications at each stage with their corresponding no tag control.

      It is also important to note that our experiment looks at relative changes of the same protein over time, which we expect to be relatively small in the whole cell lysate. We previously documented proteins that change in abundance in whole cell lysates throughout meiosis[8]. In this study, we found that relatively few proteins significantly change in abundance.

      Our aim in the current study was to understand how the relative composition of the kinetochore changes and for this, we believe that a direct comparison to Dsn1, a central kinetochore protein which we immunoprecipitated is the most appropriate normalisation.

      (5) Despite the large amount of potentially valuable data generated, the manuscript focuses mainly on results that reinforce previously established observations (e.g., premature SAC silencing in meiosis I by PP1, changes in kinetochore composition, etc.). The discussion would benefit from a deeper analysis of novel findings that underscore the broader significance of this study.

      We strongly agree with this point and we will re-frame the discussion to focus on the novel findings, as also raised by the other reviewers.

      Finally, minor concerns are:

      (1) Meiotic progression in SynSAC strains lacking Mad1, Mad2 or Mad3 is severely affected (Fig. 1D and Supp. Fig. 1), making it difficult to assess whether, as the authors state, the metaphase delays depend on the canonical SAC cascade. In addition, as a general note, graphs displaying meiotic time courses could be improved for clarity (e.g., thinner data lines, addition of axis gridlines and external tick marks, etc.).

      We will generate the data to include a checkpoint mutant +/- ABA for direct comparison. We will take steps to improve the clarity of presentation of the meiotic timecourse graphs, though our experience is that uncluttered graphs make it easier to compare trends.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (3) It is surprising that, although SAC activity is proposed to be weaker in metaphase I, the levels of CPC/SAC proteins seem to be higher at this stage of meiosis than in metaphase II or mitotic metaphase (Fig. 4A-B).

      We agree, this is surprising and we will point this out in the revised discussion. We speculate that the challenge in biorienting homologs which are held together by chiasmata, rather than back-to-back kinetochores results in a greater requirement for error correction in meiosis I. Interestingly, the data with the RASA mutant also point to increased PP1 activity in meiosis I, and we additionally observed increased levels of PP1 (Glc7 and Fin1) on meiotic kinetochores, consistent with the idea that cycles of error correction and silencing are elevated in meiosis I.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (5) Several typographical errors should be corrected (e.g., "Knetochores" in Fig. 4 legend, "250uM ABA" in Supp. Fig. 1 legend, etc.)

      Thank you for pointing these out, they have been corrected.

      Reviewer #3 (Significance):

      Koch et al. describe a novel methodology, SynSAC, to synchronize budding yeast cells in metaphase I or metaphase II during meiosis, as well and in mitotic metaphase, thereby enabling differential analyses among these cell division stages. Their approach builds on prior strategies originally developed in fission yeast and human cells models to induce a synthetic spindle assembly checkpoint (SAC) arrest by conditionally forcing the heterodimerization of two SAC proteins upon addition of abscisic acid (ABA). The results from this manuscript are of special relevance for researchers studying meiosis and using Saccharomyces cerevisiae as a model. Moreover, the differential analysis of the composition and phosphorylation of kinetochores from meiotic metaphase I and metaphase II adds interest for the broader meiosis research community. Finally, regarding my expertise, I am a researcher specialized in the regulation of cell division.

      Description of the revisions that have already been incorporated in the transferred manuscript

      We have only corrected minor typos as detailed above.

      Description of analyses that authors prefer not to carry out

      The revisions we plan are detailed above. There are just two revisions we believe are either unnecessary or beyond the scope, both minor concerns of Reviewer #3. For clarity we have reproduced them, along with our justification below. In the latter case, the reviewer also acknowledged that further work in this direction is beyond the scope of the current study.

      (2) Spore viability following SynSAC induction in meiosis was used as an indicator that this experimental approach does not disrupt kinetochore function and chromosome segregation. However, this is an indirect measure. Direct monitoring of genome distribution using GFP-tagged chromosomes would have provided more robust evidence. Notably, the SynSAC mad3Δ mutant shows a slight viability defect, which might reflect chromosome segregation defects that are more pronounced in the absence of a functional SAC.

      Spore viability is a much more sensitive way of analysing segregation defects that GFP-labelled chromosomes. This is because GFP labelling allows only a single chromosome to be followed. On the other hand, if any of the 16 chromosomes mis-segregate in a given meiosis this would result in one or more aneuploid spores in the tetrad, which are typically inviable. The fact that spore viability is not significantly different from wild type in this analysis indicates that there are no major chromosome segregation defects in these strains, and we therefore do not plan to do this experiment.

      (4) Although a more detailed exploration of kinetochore composition or phosphorylation changes is beyond the scope of the manuscript, some key observations could have been validated experimentally (e.g., enrichment of proteins at kinetochores, phosphorylation events that were identified as specific or enriched at a certain meiotic stage, etc.).

      We agree that this is beyond the scope of the current study but will form the start of future projects from our group, and hopefully others.

      (1) Salah, S.M., and Nasmyth, K. (2000). Destruction of the securin Pds1p occurs at the onset of anaphase during both meiotic divisions in yeast. Chromosoma 109, 27–34.

      (2) Matos, J., Lipp, J.J., Bogdanova, A., Guillot, S., Okaz, E., Junqueira, M., Shevchenko, A., and Zachariae, W. (2008). Dbf4-dependent CDC7 kinase links DNA replication to the segregation of homologous chromosomes in meiosis I. Cell 135, 662–678.

      (3) Marston, A.L.A.L., Lee, B.H.B.H., and Amon, A. (2003). The Cdc14 phosphatase and the FEAR network control meiotic spindle disassembly and chromosome segregation. Developmental cell 4, 711–726. https://doi.org/10.1016/S1534-5807(03)00130-8.

      (4) Attner, M.A., and Amon, A. (2012). Control of the mitotic exit network during meiosis. Molecular Biology of the Cell 23, 3122–3132. https://doi.org/10.1091/mbc.E12-03-0235.

      (5) Pablo-Hernando, M.E., Arnaiz-Pita, Y., Nakanishi, H., Dawson, D., del Rey, F., Neiman, A.M., and de Aldana, C.R.V. (2007). Cdc15 Is Required for Spore Morphogenesis Independently of Cdc14 in Saccharomyces cerevisiae. Genetics 177, 281–293. https://doi.org/10.1534/genetics.107.076133.

      (6) El Jailani, S., Cladière, D., Nikalayevich, E., Touati, S.A., Chesnokova, V., Melmed, S., Buffin, E., and Wassmann, K. (2025). Eliminating separase inhibition reveals absence of robust cohesin protection in oocyte metaphase II. EMBO J 44, 5187–5214. https://doi.org/10.1038/s44318-025-00522-0.

      (7) Rosenberg, J.S., Cross, F.R., and Funabiki, H. (2011). KNL1/Spc105 Recruits PP1 to Silence the Spindle Assembly Checkpoint. Current Biology 21, 942–947. https://doi.org/10.1016/j.cub.2011.04.011.

      (8) Koch, L.B., Spanos, C., Kelly, V., Ly, T., and Marston, A.L. (2024). Rewiring of the phosphoproteome executes two meiotic divisions in budding yeast. EMBO J 43, 1351–1383. https://doi.org/10.1038/s44318-024-00059-8.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      (1) It might be good to further discuss potential molecular mechanisms for increasing the TF off rate (what happens at the mechanistic level). 

      This is now expanded in the Discussion

      (2) To improve readability, it would be good to make consistent font sizes on all figures to make sure that the smallest font sizes are readable. 

      We have normalised figure text as much as is feasible.

      (3) upDARs and downDARs - these abbreviations are defined in the figure legend but not in the main text. 

      We have removed references to these terms from the text and included a definition in the figure legend. 

      (4) Figure 3B - the on-figure legend is a bit unclear; the text legend does not mention the meaning of "DEG". 

      We have removed this panel as it was confusing and did not demonstrate any robust conclusion. 

      (5) The values of apparent dissociation rates shown in Figure 5 are a bit different from values previously reported in literature (e.g., see Okamoto et al., 20203, PMC10505915). Perhaps the authors could comment on this. Also, it would be helpful to add the actual equation that was used for the curve fitting to determine these values to the Methods section. 

      We have included an explanation of the curve fitting equation in the Methods as suggested.

      The apparent dissociation rate observed is a sum of multiple rates of decay – true dissociation rate (k<sub>off</sub>), signal loss caused by photobleaching k<sub>pb</sub>, and signal loss caused by defocusing/tracking error (k<sub>tl</sub>).

      k<sub>off</sub><sup>app</sup> = k<sub>off</sub>+ k<sub>pb</sub> + k<sub>tl</sub>

      We are making conclusions about relative changes in k<sub>off</sub><sup>app</sup> upon CHD4 depletion, not about the absolute magnitude of true in k<sub>off</sub> or TF residence times.Our conclusions extend to true in k<sub>off</sub> on the assumption that k<sub>pb</sub> and k<sub>tl</sub> are equal across all samples imaged due to identical experimental conditions and analysis. k<sub>pb</sub> and k<sub>tl</sub> vary hugely across experimental set-ups, especially with different laser powers, so other k<sub>off</sub> or k<sub>off</sub><sup>app</sup> values reported in the literature would be expected to differ from ours. Time-lapse experiments or independent determination of k<sub>pb</sub> (and k<sub>tl</sub>) would be required to make any statements about absolute values of k<sub>off</sub>

      (6) Regarding the discussion about the functionality of low-affinity sites/low accessibility regions, the authors may wish to mention the recent debates on this (https://www.nature.com/articles/s41586-025-08916-0; https://www.biorxiv.org/content/10.1101/2025.10.12.681120v1). 

      We have now included a discussion of this point and referenced both papers.

      (7) It may be worth expanding figure legends a bit, because the definitions of some of the terms mentioned on the figures are not very easy to find in the text. 

      We have endeavoured to define all relevant terms in the figure legends. 

      Reviewer #2 (Public review): 

      (1) Figure 2 shows heat maps of RNA-seq results following a time course of CHD4 depletion (0, 1, 2 hours...). Usually, the red/blue colour scale is used to visualise differential expression (fold-difference). Here, genes are coloured in red or blue even at the 0-hour time point. This confused me initially until I discovered that instead of folddifference, a z-score is plotted. I do not quite understand what it means when a gene that is coloured blue at the 0-hour time point changes to red at a later time point. Does this always represent an upregulation? I think this figure requires a better explanation. 

      The heatmap displays z-scores, meaning expression for each gene has been centred and scaled across the entire time course. As a result, time zero is not a true baseline, it simply shows whether the gene’s expression at that moment is above or below its own mean. A transition from blue to red therefore indicates that the gene increases relative to its overall average, which typically corresponds to upregulation, but it doesn’t directly represent fold-change from the 0-hour time point. We have now included a brief explanation of this in the figure legend to make this point clear.  

      (2) Figure 5D: NANOG, SOX2 binding at the KLF4 locus. The authors state that the enhancers 68, 57, and 55 show a gain in NANOG and SOX2 enrichment "from 30 minutes of CHD4 depletion". This is not obvious to me from looking at the figure. I can see an increase in signal from "WT" (I am assuming this corresponds to the 0 hours time point) to "30m", but then the signals seem to go down again towards the 4h time point. Can this be quantified? Can the authors discuss why TF binding seems to increase only temporarily (if this is the case)? 

      We have edited the text to more accurately reflect what is going on in the screen shot. We have also replaced “WT” with “0” as this more accurately reflects the status of these cells. 

      (3) There is no real discussion of HOW CHD4/NuRD counteracts TF binding (i.e. by what molecular mechanism). I understand that the data does not really inform us on this. Still, I believe it would be worthwhile for the authors to discuss some ideas, e.g., local nucleosome sliding vs. a direct (ATP-dependent?) action on the TF itself. 

      We now include more speculation on this point in the Discussion.

      Reviewer #3 (Public review): 

      The main weakness can be summarised as relating to the fact that authors interpret all rapid changes following CHD4 degradation as being a direct effect of the loss of CHD4 activity. The possibility that rapid indirect effects arise does not appear to have been given sufficient consideration. This is especially pertinent where effects are reported at sites where CHD4 occupancy is initially low. 

      We acknowledge that we cannot definitively say any effect is a direct consequence of CHD4 depletion and have mitigated statements in the Results and Discussion. 

      Reviewing Editor Comments: 

      I am pleased to say all three experts had very complementary and complimentary comments on your paper - congratulations. Reviewer 3 does suggest toning down a few interpretations, which I suggest would help focus the manuscript on its greater strengths. I encourage a quick revision to this point, which will not go back to reviewers, before you request a version of record. I would also like to take this opportunity to thank all three reviewers for excellent feedback on this paper. 

      As advised we have mitigated the points raised by the reviewers. 

      Reviewer #2 (Recommendations for the authors): 

      p9, top: The sentence starting with "Genes increasing in expression after four hours...." is very difficult to understand and should be rephrased or broken up. 

      We agree. This has been completely re-written. 

      Reviewer #3 (Recommendations for the authors): 

      Sites of increased chromatin accessibility emerge more slowly than sites of lost chromatin accessibility. Figure 1D, a little increase in accessibility at 30min, but a more noticeable decrease at 30min. The sites of increased accessibility also have lower absolute accessibility than observed at locations where accessibility is lost. This raises the possibility that the sites of increased accessibility represent rapid but indirect changes occurring following loss of CHD4. Consistent with this, enrichment for CHD4 and MDB3 by CUT and TAG is far higher at sites of decreased accessibility. The low level of CHD4 occupancy observed at sites where accessibility increases may not be relevant to the reason these sites are affected. Such small enrichments can be observed when aligning to other genomic features. The authors interpret their findings as indicating that low occupancy of CHD4 exerts a long-lasting repressive effect at these locations. This is one possible explanation; however, an alternative is that these effects are indirect. Perhaps driven by the very large increase in TF binding that is observed following CHD4 degradation and which appears to occur at many locations regardless of whether CHD4 is present. 

      The reviewer is right to point out that we don’t know what is direct and what is indirect. All we know is that changes happen very rapidly upon CHD4 depletion. The changes in standard ATAC-seq signal appear greater at the sites showing decreased accessibility than those increasing, however the starting points are very different: a small increase from very low accessibility will likely be a higher fold change than a more visible decrease from very high accessibility (Fig. 1D). In contrast, Figure 6 shows a more visible increase in Tn5 integrations at sites increasing in accessibility at 30 minutes than the change in sites decreasing in accessibility at 30 minutes. We therefore disagree that the sites increasing in accessibility are more likely to be indirect targets. In further support of this, there is a rapid increase in MNase resistance at these sites upon MBD3 reintroduction (Fig. 6I), possibly indicating a direct impact of NuRD on these sites. 

      Substantial changes in Nanog and SOX2 binding are observed across the time course. These changes are very large, with 43k or 78k additional sites detected. How is this possible? Does the amount of these TF's present in cells change? The argument that transient occupancy of CHD4 acts to prevent TF's binding to what is likely to be many 100's of thousands of sites (if the data for Nanog and SOX2 are representative of other transcription factors such as KLF4) seems unlikely. 

      The large number of different sites identified gaining TF binding is likely to be a reflection of the number of cells being analysed: within the 10<sup>5</sup>-10<sup>6</sup> cells used for a Cut&Run experiment we detect many sites gaining TF binding. In individual cells we agree it would be unlikely for that many sites to become bound at the same time. We detect no changes in the amounts of Nanog or Sox2 in our cells across 4 hour CHD4 depletion time course. However, we maintain that low frequency interactions of CHD4 with a site can counteract low frequency TF binding and prevent it from stimulating opening of a cryptic enhancer. 

      While increased TF binding is observed at sites of gained accessibility, the changes in TF occupancy at the lost sites do not progress continuously across the time course. In addition, the changes in occupancy are small in comparison to those observed at the gained sites. The text comments on an increase in SOX2 and Nanog occupancy at 30 min, but there is either no change or a loss by 4 hours. It's difficult to know what to conclude from this. 

      At sites losing accessibility the enrichment of both Nanog and Sox2 increases at 30 minutes. We suspect this is due to the loss of CHD4’s TF-removal activity. Thereafter the two TFs show different trends: Nanog enrichment then decreases again, probably due to the decrease in accessibility at these sites. Sox2, by contrast, does not change very much, possibly due to its higher pioneering ability. It is true that the amounts of change are very small here, however Cut&Run was performed in triplicate and the summary graphs are plotted with standard error of the mean (which is often too small to see), demonstrating that the detected changes are highly significant. (We neglected to refer to the SEM  in our figure legends: this has now been corrected.) At sites where CHD4 maintains chromatin compaction, the amount of transcription factor binding goes from zero or nearly zero to some finite number, hence the fold change is very large. In contrast the changes at sites losing accessibility starts from high enrichment so fold changes are much smaller. 

      Changes in the diffusive motion of tagged TF's are measured. The data is presented as an average of measurements of individual TF's. What might be anticipated is that subpopulations of TF's would exhibit distinct behaviours. At many locations, occupancy of these TF's are presumably unchanged. At 1 hour, many new sites are occupied, and this would represent a subpopulation with high residence. A small population of TF's would be subject to distinct effects at the sites where accessibility reduces at the onehour time point. The analysis presented fails to distinguish populations of TF's exhibiting altered mobility consistent with the proportion of the TF's showing altered binding. 

      We agree that there are likely subpopulations of TFs exhibiting distinct binding behaviours, and our modality of imaging captures this, but to distinguish subpopulations within this would require a lot more data.

      However, there is no reason to believe that the TF binding at the new sites being occupied at 1 hr would have a difference in residence time to those sites already stably bound by TFs in the wildtype, i.e. that they would exhibit a different limitation to their residence time once bound compared to those sites. We do capture more stably bound trajectories per cell, but that’s not what we’re reporting on - it’s the dissociation rate of those that have already bound in a stable manner at sites where TF occupancy is detected also by ChIP.

      The analysis of transcription shown in Figure 2 indicates that high-quality data has been obtained, showing progressive changes to transcription. The linkage of the differentially expressed genes to chromatin changes shown in Figure 3 is difficult to interpret. The curves showing the distance distribution for increased or decreased DARs are quite similar for up- and down-regulated genes. The frequency density for gained sites is slightly higher, but not as much higher as would be expected, given these sites are c6fold more abundant than the sites with lost accessibility. The data presented do not provide a compelling link between the CHD4-induced chromatin changes and changes to transcription; the authors should consider revising to accommodate this. It is possible that much of the transcriptional response even at early time points is indirect. This is not unprecedented. For example, degradation of SOX2, a transcriptional activator, results in both repression and activation of similar numbers of genes https://pmc.ncbi.nlm.nih.gov/articles/PMC10577566/ 

      We agree that these figures do not provide a compelling link between the observed chromatin changes and gene expression changes. That 50K increased sites are, on average, located farther away from misregulated genes than are the 8K decreasing sites highlights that this is rarely going to be a case of direct derepression of a silenced gene, but rather distal sites could act as enhancers to spuriously activate transcription. This would certainly be a rare event, but could explain the low-level transcriptional noise seen in NuRD mutants. We have edited the wording to make this clearer.

      The model presented in Figure 7 includes distinct roles at sites that become more or less accessible following inactivation of CHD4. This is perplexing as it implies that the same enzymes perform opposing functions at some of the different sites where they are bound. 

      Our point is that it does the same thing at both kinds of sites, but the nature of the sites means that the consequences of CHD4 activity will be different. We have tried to make this clear in the text. 

      At active sites, it is clear that CHD4 is bound prior to activation of the degron and that chromatin accessibility is reduced following depletion. Changes in TF occupancy are complex, perhaps reflecting slow diffusion from less accessible chromatin and a global increase in the abundance of some pluripotency transcription factors such as SOX2 and Nanog that are competent for DNA binding. The link between sites of reduced accessibility and transcription is less clear. 

      At the inactive sites, the increase in accessibility could be driven by transcription factor binding. There is very little CHD4 present at these sites prior to activation of the degron, and TF binding may induce chromatin opening, which could be considered a rapid but indirect effect of the CHD4 degron. The link to transcription is not clear from the data presented, but it would be anticipated that in some cases it would drive activation. 

      We acknowledge these points and have indicated this possibility in the Results and the Discussion.

      No Analysis is performed to identify binding sequences enriched at the locations of decreased accessibility. This could potentially define transcription factors involved in CHD4 recruitment or that cause CHD4 to function differently in different contexts. 

      HOMER analyses failed to provide any unique insights. The sites going down are highly accessible in ES cells: they have TF binding sites that one would expect in ES cells. The increasing sites show an enrichment for G-rich sequences, which reflects the binding preference of CHD4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This useful study presents Altair-LSFM, a solid and well-documented implementation of a light-sheet fluorescence microscope (LSFM) designed for accessibility and cost reduction. While the approach offers strengths such as the use of custom-machined baseplates and detailed assembly instructions, its overall impact is limited by the lack of live-cell imaging capabilities and the absence of a clear, quantitative comparison to existing LSFM platforms. As such, although technically competent, the broader utility and uptake of this system by the community may be limited.

      We thank the editors and reviewers for their thoughtful evaluation of our work and for recognizing the technical strengths of the Altair-LSFM platform, including the custom-machined baseplates and detailed documentation provided to promote accessibility and reproducibility. Below, we provide point-by-point responses to each referee comment. In the process, we have significantly revised the manuscript to include live-cell imaging data and a quantitative evaluation of imaging speed. We now more explicitly describe the different variants of lattice light-sheet microscopy—highlighting differences in their illumination flexibility and image acquisition modes—and clarify how Altair-LSFM compares to each. We further discuss challenges associated with the 5 mm coverslip and propose practical strategies to overcome them. Additionally, we outline cost-reduction opportunities, explain the rationale behind key equipment selections, and provide guidance for implementing environmental control. Altogether, we believe these additions have strengthened the manuscript and clarified both the capabilities and limitations of AltairLSFM.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary: 

      The article presents the details of the high-resolution light-sheet microscopy system developed by the group. In addition to presenting the technical details of the system, its resolution has been characterized and its functionality demonstrated by visualizing subcellular structures in a biological sample.

      Strengths: 

      (1) The article includes extensive supplementary material that complements the information in the main article.

      (2) However, in some sections, the information provided is somewhat superficial.

      We thank the reviewer for their thoughtful assessment and for recognizing the strengths of our manuscript, including the extensive supplementary material. Our goal was to make the supplemental content as comprehensive and useful as possible. In addition to the materials provided with the manuscript, our intention is for the online documentation (available at thedeanlab.github.io/altair) to serve as a living resource that evolves in response to user feedback. We would therefore greatly appreciate the reviewer’s guidance on which sections were perceived as superficial so that we can expand them to better support readers and builders of the system.

      Weaknesses:

      (1) Although a comparison is made with other light-sheet microscopy systems, the presented system does not represent a significant advance over existing systems. It uses high numerical aperture objectives and Gaussian beams, achieving resolution close to theoretical after deconvolution. The main advantage of the presented system is its ease of construction, thanks to the design of a perforated base plate.

      We appreciate the reviewer’s assessment and the opportunity to clarify our intent. Our primary goal was not to introduce new optical functionality beyond that of existing high-performance light-sheet systems, but rather to substantially reduce the barrier to entry for non-specialist laboratories. Many open-source implementations, such as OpenSPIM, OpenSPIN, and Benchtop mesoSPIM, similarly focused on accessibility and reproducibility rather than introducing new optical modalities, yet have had a measureable impact on the field by enabling broader community participation. Altair-LSFM follows this tradition, providing sub-cellular resolution performance comparable to advanced systems like LLSM, while emphasizing reproducibility, ease of construction through a precision-machined baseplate, and comprehensive documentation to facilitate dissemination and adoption.

      (2) Using similar objectives (Nikon 25x and Thorlabs 20x), the results obtained are similar to those of the LLSM system (using a Gaussian beam without laser modulation). However, the article does not mention the difficulties of mounting the sample in the implemented configuration.

      We appreciate the reviewer’s comment and agree that there are practical challenges associated with handling 5 mm diameter coverslips in this configuration. In the revised manuscript, we now explicitly describe these challenges and provide practical solutions. Specifically, we highlight the use of a custommachined coverslip holder designed to simplify mounting and handling, and we direct readers to an alternative configuration using the Zeiss W Plan-Apochromat 20×/1.0 objective, which eliminates the need for small coverslips altogether.

      (3) The authors present a low-cost, open-source system. Although they provide open source code for the software (navigate), the use of proprietary electronics (ASI, NI, etc.) makes the system relatively expensive. Its low cost is not justified.

      We appreciate the reviewer’s perspective and understand the concern regarding the use of proprietary control hardware such as the ASI Tiger Controller and NI data acquisition cards. Our decision to use these components was intentional: relying on a unified, professionally supported and maintained platform minimizes complexity associated with sourcing, configuring, and integrating hardware from multiple vendors, thereby reducing non-financial barriers to entry for non-specialist users.

      Importantly, these components are not the primary cost driver of Altair-LSFM (they represent roughly 18% of the total system cost). Nonetheless, for individuals where the price is prohibitive, we also outline several viable cost-reduction options in the revised manuscript (e.g., substituting manual stages, omitting the filter wheel, or using industrial CMOS cameras), while discussing the trade-offs these substitutions introduce in performance and usability. These considerations are now summarized in Supplementary Note 1, which provides a transparent rationale for our design and cost decisions.

      Finally, we note that even with these professional-grade components, Altair-LSFM remains substantially less expensive than commercial systems offering comparable optical performance, such as LLSM implementations from Zeiss or 3i.

      (4) The fibroblast images provided are of exceptional quality. However, these are fixed samples. The system lacks the necessary elements for monitoring cells in vivo, such as temperature or pH control.

      We thank the reviewer for their positive comment regarding the quality of our data. As noted, the current manuscript focuses on validating the optical performance and resolution of the system using fixed specimens to ensure reproducibility and stability.

      We fully agree on the importance of environmental control for live-cell imaging. In the revised manuscript, we now describe in detail how temperature regulation can be achieved using a custom-designed heated sample chamber, accompanied by detailed assembly instructions on our GitHub repository and summarized in Supplementary Note 2. For pH stabilization in systems lacking a 5% CO₂ atmosphere, we recommend supplementing the imaging medium with 10–25 mM HEPES buffer. Additionally, we include new live-cell imaging data demonstrating that Altair-LSFM supports in vitro time-lapse imaging of dynamic cellular processes under controlled temperature conditions.

      Reviewer #2 (Public review): 

      Summary: 

      The authors present Altair-LSFM (Light Sheet Fluorescence Microscope), a high-resolution, open-source microscope, that is relatively easy to align and construct and achieves sub-cellular resolution. The authors developed this microscope to fill a perceived need that current open-source systems are primarily designed for large specimens and lack sub-cellular resolution or are difficult to construct and align, and are not stable. While commercial alternatives exist that offer sub-cellular resolution, they are expensive. The authors' manuscript centers around comparisons to the highly successful lattice light-sheet microscope, including the choice of detection and excitation objectives. The authors thus claim that there remains a critical need for high-resolution, economical, and easy-to-implement LSFM systems. 

      We thank the reviewer for their thoughtful summary. We agree that existing open-source systems primarily emphasize imaging of large specimens, whereas commercial systems that achieve sub-cellular resolution remain costly and complex. Our aim with Altair-LSFM was to bridge this gap—providing LLSM-level performance in a substantially more accessible and reproducible format. By combining high-NA optics with a precision-machined baseplate and open-source documentation, Altair offers a practical, high-resolution solution that can be readily adopted by non-specialist laboratories.

      Strengths: 

      The authors succeed in their goals of implementing a relatively low-cost (~ USD 150K) open-source microscope that is easy to align. The ease of alignment rests on using custom-designed baseplates with dowel pins for precise positioning of optics based on computer analysis of opto-mechanical tolerances, as well as the optical path design. They simplify the excitation optics over Lattice light-sheet microscopes by using a Gaussian beam for illumination while maintaining lateral and axial resolutions of 235 and 350 nm across a 260-um field of view after deconvolution. In doing so they rest on foundational principles of optical microscopy that what matters for lateral resolution is the numerical aperture of the detection objective and proper sampling of the image field on to the detection, and the axial resolution depends on the thickness of the light-sheet when it is thinner than the depth of field of the detection objective. This concept has unfortunately not been completely clear to users of high-resolution light-sheet microscopes and is thus a valuable demonstration. The microscope is controlled by an open-source software, Navigate, developed by the authors, and it is thus foreseeable that different versions of this system could be implemented depending on experimental needs while maintaining easy alignment and low cost. They demonstrate system performance successfully by characterizing their sheet, point-spread function, and visualization of sub-cellular structures in mammalian cells, including microtubules, actin filaments, nuclei, and the Golgi apparatus.

      We thank the reviewer for their thoughtful and generous assessment of our work. We are pleased that the manuscript’s emphasis on fundamental optical principles, design rationale, and practical implementation was clearly conveyed. We agree that Altair’s modular and accessible architecture provides a strong foundation for future variants tailored to specific experimental needs. To facilitate this, we have made all Zemax simulations, CAD files, and build documentation openly available on our GitHub repository, enabling users to adapt and extend the system for diverse imaging applications.

      Weaknesses:

      There is a fixation on comparison to the first-generation lattice light-sheet microscope, which has evolved significantly since then:

      (1) The authors claim that commercial lattice light-sheet microscopes (LLSM) are "complex, expensive, and alignment intensive", I believe this sentence applies to the open-source version of LLSM, which was made available for wide dissemination. Since then, a commercial solution has been provided by 3i, which is now being used in multiple cores and labs but does require routine alignments. However, Zeiss has also released a commercial turn-key system, which, while expensive, is stable, and the complexity does not interfere with the experience of the user. Though in general, statements on ease of use and stability might be considered anecdotal and may not belong in a scientific article, unreferenced or without data.

      We thank the reviewer for this thoughtful and constructive comment. We have revised the manuscript to more clearly distinguish between the original open-source implementation of LLSM and subsequent commercial versions by 3i and ZEISS. The revised Introduction and Discussion now explicitly note that while open-source and early implementations of LLSM can require expert alignment and maintenance, commercial systems—particularly the ZEISS Lattice Lightsheet 7—are designed for automated operation and stable, turn-key use, albeit at higher cost and with limited modifiability. We have also moderated earlier language regarding usability and stability to avoid anecdotal phrasing.

      We also now provide a more objective proxy for system complexity: the number of optical elements that require precise alignment during assembly and maintenance thereafter. The original open-source LLSM setup includes approximately 29 optical components that must each be carefully positioned laterally, angularly, and coaxially along the optical path. In contrast, the first-generation Altair-LSFM system contains only nine such elements. By this metric, Altair-LSFM is considerably simpler to assemble and align, supporting our overarching goal of making high-resolution light-sheet imaging more accessible to non-specialist laboratories.

      (2) One of the major limitations of the first generation LLSM was the use of a 5 mm coverslip, which was a hinderance for many users. However, the Zeiss system elegantly solves this problem, and so does Oblique Plane Microscopy (OPM), while the Altair-LSFM retains this feature, which may dissuade widespread adoption. This limitation and how it may be overcome in future iterations is not discussed.

      We thank the reviewer for this helpful comment. We agree that the use of 5 mm diameter coverslips, while enabling high-NA imaging in the current Altair-LSFM configuration, may pose a practical limitation for some users. We now discuss this more explicitly in the revised manuscript. Specifically, we note that replacing the detection objective provides a straightforward solution to this constraint. For example, as demonstrated by Moore et al. (Lab Chip, 2021), pairing the Zeiss W Plan-Apochromat 20×/1.0 detection objective with the Thorlabs TL20X-MPL illumination objective allows imaging beyond the physical surfaces of both objectives, eliminating the need for small-format coverslips. In the revised text, we propose this modification as an accessible path toward greater compatibility with conventional sample mounting formats. We also note in the Discussion that Oblique Plane Microscopy (OPM) inherently avoids such nonstandard mounting requirements and, owing to its single-objective architecture, is fully compatible with standard environmental chambers.

      (3) Further, on the point of sample flexibility, all generations of the LLSM, and by the nature of its design, the OPM, can accommodate live-cell imaging with temperature, gas, and humidity control. It is unclear how this would be implemented with the current sample chamber. This limitation would severely limit use cases for cell biologists, for which this microscope is designed. There is no discussion on this limitation or how it may be overcome in future iterations.

      We thank the reviewer for this important observation and agree that environmental control is critical for live-cell imaging applications. It is worth noting that the original open-source LLSM design, as well as the commercial version developed by 3i, provided temperature regulation but did not include integrated control of CO2 or humidity. Despite this limitation, these systems have been widely adopted and have generated significant biological insights. We also acknowledge that both OPM and the ZEISS implementation of LLSM offer clear advantages in this respect, providing compatibility with standard commercial environmental chambers that support full regulation of temperature, CO₂, and humidity.

      In the revised manuscript, we expand our discussion of environmental control in Supplementary Note 2, where we describe the Altair-LSFM chamber design in more detail and discuss its current implementation of temperature regulation and HEPES-based pH stabilization. Additionally, the Discussion now explicitly notes that OPM avoids the challenges associated with non-standard sample mounting and is inherently compatible with conventional environmental enclosures.

      (4) The authors' comparison to LLSM is constrained to the "square" lattice, which, as they point out, is the most used optical lattice (though this also might be considered anecdotal). The LLSM original design, however, goes far beyond the square lattice, including hexagonal lattices, the ability to do structured illumination, and greater flexibility in general in terms of light-sheet tuning for different experimental needs, as well as not being limited to just sample scanning. Thus, the Alstair-LSFM cannot compare to the original LLSM in terms of versatility, even if comparisons to the resolution provided by the square lattice are fair.

      We agree that the original LLSM design offers substantially greater flexibility than what is reflected in our initial comparison, including the ability to generate multiple lattice geometries (e.g., square and hexagonal), operate in structured illumination mode, and acquire volumes using both sample- and lightsheet–scanning strategies. To address this, we now include Supplementary Note 3 that provides a detailed overview of the illumination modes and imaging flexibility afforded by the original LLSM implementation, and how these capabilities compare to both the commercial ZEISS Lattice Lightsheet 7 and our AltairLSFM system. In addition, we have revised the discussion to explicitly acknowledge that the original LLSM could operate in alternative scan strategies beyond sample scanning, providing greater context for readers and ensuring a more balanced comparison.

      (5) There is no demonstration of the system's live-imaging capabilities or temporal resolution, which is the main advantage of existing light-sheet systems.

      In the revised manuscript, we now include a demonstration of live-cell imaging to directly validate AltairLSFM’s suitability for dynamic biological applications. We also explicitly discuss the temporal resolution of the system in the main text (see Optoelectronic Design of Altair-LSFM), where we detail both software- and hardware-related limitations. Specifically, we evaluate the maximum imaging speed achievable with Altair-LSFM in conjunction with our open-source control software, navigate.

      For simplicity and reduced optoelectronic complexity, the current implementation powers the piezo through the ASI Tiger Controller, which modestly reduces its bandwidth. Nonetheless, for a 100 µm stroke typical of light-sheet imaging, we achieved sufficient performance to support volumetric imaging at most biologically relevant timescales. These results, along with additional discussion of the design trade-offs and performance considerations, are now included in the revised manuscript and expanded upon in the supplementary material.

      While the microscope is well designed and completely open source, it will require experience with optics, electronics, and microscopy to implement and align properly. Experience with custom machining or soliciting a machine shop is also necessary. Thus, in my opinion, it is unlikely to be implemented by a lab that has zero prior experience with custom optics or can hire someone who does. Altair-LSFM may not be as easily adaptable or implementable as the authors describe or perceive in any lab that is interested, even if they can afford it. The authors indicate they will offer "workshops," but this does not necessarily remove the barrier to entry or lower it, perhaps as significantly as the authors describe.

      We appreciate the reviewer’s perspective and agree that building any high-performance custom microscope—Altair-LSFM included—requires a basic understanding of (or willingness to learn) optics, electronics, and instrumentation. Such a barrier exists for all open-source microscopes, and our goal is not to eliminate this requirement entirely but to substantially reduce the technical and logistical challenges that typically accompany the construction of custom light-sheet systems.

      Importantly, no machining experience or in-house fabrication capabilities are required. Users can simply submit the provided CAD design files and specifications directly to commercial vendors for fabrication. We have made this process as straightforward as possible by supplying detailed build instructions, recommended materials, and vendor-ready files through our GitHub repository. Our dissemination strategy draws inspiration from other successful open-source projects such as mesoSPIM, which has seen widespread adoption—over 30 implementations worldwide—through a similar model of exhaustive documentation, open-source software, and community support via user meetings and workshops.

      We also recognize that documentation alone cannot fully replace hands-on experience. To further lower barriers to adoption, we are actively working with commercial vendors to streamline procurement and assembly, and Altair-LSFM is supported by a Biomedical Technology Development and Dissemination (BTDD) grant that provides resources for hosting workshops, offering real-time community support, and developing supplementary training materials.

      In the revised manuscript, we now expand the Discussion to explicitly acknowledge these implementation considerations and to outline our ongoing efforts to support a broad and diverse user base, ensuring that laboratories with varying levels of technical expertise can successfully adopt and maintain the Altair-LSFM platform.

      There is a claim that this design is easily adaptable. However, the requirement of custom-machined baseplates and in silico optimization of the optical path basically means that each new instrument is a new design, even if the Navigate software can be used. It is unclear how Altair-LSFM demonstrates a modular design that reduces times from conception to optimization compared to previous implementations.

      We thank the reviewer for this insightful comment and agree that our original language regarding adaptability may have overstated the degree to which Altair-LSFM can be modified without prior experience. It was not our intention to imply that the system can be easily redesigned by users with limited technical background. Meaningful adaptations of the optical or mechanical design do require expertise in optical layout, optomechanical design, and alignment.

      That said, for laboratories with such expertise, we aim to facilitate modifications by providing comprehensive resources—including detailed Zemax simulations, complete CAD models, and alignment documentation. These materials are intended to reduce the development burden for expert users seeking to tailor the system to specific experimental requirements, without necessitating a complete re-optimization of the optical path from first principles.

      In the revised manuscript, we clarify this point and temper our language regarding adaptability to better reflect the realistic scope of customization. Specifically, we now state in the Discussion: “For expert users who wish to tailor the instrument, we also provide all Zemax illumination-path simulations and CAD files, along with step-by-step optimization protocols, enabling modification and re-optimization of the optical system as needed.” This revision ensures that readers clearly understand that Altair-LSFM is designed for reproducibility and straightforward assembly in its default configuration, while still offering the flexibility for modification by experienced users.

      Reviewer #3 (Public review):

      Summary: 

      This manuscript introduces a high-resolution, open-source light-sheet fluorescence microscope optimized for sub-cellular imaging. The system is designed for ease of assembly and use, incorporating a custommachined baseplate and in silico optimized optical paths to ensure robust alignment and performance. The authors demonstrate lateral and axial resolutions of ~235 nm and ~350 nm after deconvolution, enabling imaging of sub-diffraction structures in mammalian cells. The important feature of the microscope is the clever and elegant adaptation of simple gaussian beams, smart beam shaping, galvo pivoting and high NA objectives to ensure a uniform thin light-sheet of around 400 nm in thickness, over a 266 micron wide Field of view, pushing the axial resolution of the system beyond the regular diffraction limited-based tradeoffs of light-sheet fluorescence microscopy. Compelling validation using fluorescent beads and multicolor cellular imaging highlights the system's performance and accessibility. Moreover, a very extensive and comprehensive manual of operation is provided in the form of supplementary materials. This provides a DIY blueprint for researchers who want to implement such a system.

      We thank the reviewer for their thoughtful and positive assessment of our work. We appreciate their recognition of Altair-LSFM’s design and performance, including its ability to achieve high-resolution, imaging throughout a 266-micron field of view. While Altair-LSFM approaches the practical limits of diffraction-limited performance, it does not exceed the fundamental diffraction limit; rather, it achieves near-theoretical resolution through careful optical optimization, beam shaping, and alignment. We are grateful for the reviewer’s acknowledgment of the accessibility and comprehensive documentation that make this system broadly implementable.

      Strengths:

      (1) Strong and accessible technical innovation: With an elegant combination of beam shaping and optical modelling, the authors provide a high-resolution light-sheet system that overcomes the classical light-sheet tradeoff limit of a thin light-sheet and a small field of view. In addition, the integration of in silico modelling with a custom-machined baseplate is very practical and allows for ease of alignment procedures. Combining these features with the solid and super-extensive guide provided in the supplementary information, this provides a protocol for replicating the microscope in any other lab.

      (2) Impeccable optical performance and ease of mounting of samples: The system takes advantage of the same sample-holding method seen already in other implementations, but reduces the optical complexity.

      At the same time, the authors claim to achieve similar lateral and axial resolution to Lattice-light-sheet microscopy (although without a direct comparison (see below in the "weaknesses" section). The optical characterization of the system is comprehensive and well-detailed. Additionally, the authors validate the system imaging sub-cellular structures in mammalian cells.

      (3) Transparency and comprehensiveness of documentation and resources: A very detailed protocol provides detailed documentation about the setup, the optical modeling, and the total cost.

      We thank the reviewer for their thoughtful and encouraging comments. We are pleased that the technical innovation, optical performance, and accessibility of Altair-LSFM were recognized. Our goal from the outset was to develop a diffraction-limited, high-resolution light-sheet system that balances optical performance with reproducibility and ease of implementation. We are also pleased that the use of precisionmachined baseplates was recognized as a practical and effective strategy for achieving performance while maintaining ease of assembly.

      Weaknesses: 

      (1) Limited quantitative comparisons: Although some qualitative comparison with previously published systems (diSPIM, lattice light-sheet) is provided throughout the manuscript, some side-by-side comparison would be of great benefit for the manuscript, even in the form of a theoretical simulation. While having a direct imaging comparison would be ideal, it's understandable that this goes beyond the interest of the paper; however, a table referencing image quality parameters (taken from the literature), such as signalto-noise ratio, light-sheet thickness, and resolutions, would really enhance the features of the setup presented. Moreover, based also on the necessity for optical simplification, an additional comment on the importance/difference of dual objective/single objective light-sheet systems could really benefit the discussion.

      In the revised manuscript, we have significantly expanded our discussion of different light-sheet systems to provide clearer quantitative and conceptual context for Altair-LSFM. These comparisons are based on values reported in the literature, as we do not have access to many of these instruments (e.g., DaXi, diSPIM, or commercial and open-source variants of LLSM), and a direct experimental comparison is beyond the scope of this work.

      We note that while quantitative parameters such as signal-to-noise ratio are important, they are highly sample-dependent and strongly influenced by imaging conditions, including fluorophore brightness, camera characteristics, and filter bandpass selection. For this reason, we limited our comparison to more general image-quality metrics—such as light-sheet thickness, resolution, and field of view—that can be reliably compared across systems.

      Finally, per the reviewer’s recommendation, we have added additional discussion clarifying the differences between dual-objective and single-objective light-sheet architectures, outlining their respective strengths, limitations, and suitability for different experimental contexts.

      (2) Limitation to a fixed sample: In the manuscript, there is no mention of incubation temperature, CO₂ regulation, Humidity control, or possible integration of commercial environmental control systems. This is a major limitation for an imaging technique that owes its popularity to fast, volumetric, live-cell imaging of biological samples.

      We fully agree that environmental control is critical for live-cell imaging applications. In the revised manuscript, we now describe the design and implementation of a temperature-regulated sample chamber in Supplementary Note 2, which maintains stable imaging conditions through the use of integrated heating elements and thermocouples. This approach enables precise temperature control while minimizing thermal gradients and optical drift. For pH stabilization, we recommend the use of 10–25 mM HEPES in place of CO₂ regulation, consistent with established practice for most light-sheet systems, including the initial variant of LLSM. Although full humidity and CO₂ control are not readily implemented in dual-objective configurations, we note that single-objective designs such as OPM are inherently compatible with commercial environmental chambers and avoid these constraints. Together, these additions clarify how environmental control can be achieved within Altair-LSFM and situate its capabilities within the broader LSFM design space.

      (3) System cost and data storage cost: While the system presented has the advantage of being opensource, it remains relatively expensive (considering the 150k without laser source and optical table, for example). The manuscript could benefit from a more direct comparison of the performance/cost ratio of existing systems, considering academic settings with budgets that most of the time would not allow for expensive architectures. Moreover, it would also be beneficial to discuss the adaptability of the system, in case a 30k objective could not be feasible. Will this system work with different optics (with the obvious limitations coming with the lower NA objective)? This could be an interesting point of discussion. Adaptability of the system in case of lower budgets or more cost-effective choices, depending on the needs.

      We agree that cost considerations are critical for adoption in academic environments. We would also like to clarify that the quoted $150k includes the optical table and laser source. In the revised manuscript, Supplementary Note 1 now includes an expanded discussion of cost–performance trade-offs and potential paths for cost reduction.

      Last, not much is said about the need for data storage. Light-sheet microscopy's bottleneck is the creation of increasingly large datasets, and it could be beneficial to discuss more about the storage needs and the quantity of data generated.

      In the revised manuscript, we now include Supplementary Note 4, which provides a high-level discussion of data storage needs, approximate costs, and practical strategies for managing large datasets generated by light-sheet microscopy. This section offers general guidance—including file-format recommendations, and cost considerations—but we note that actual costs will vary by institution and contractual agreements.

      Conclusion:

      Altair-LSFM represents a well-engineered and accessible light-sheet system that addresses a longstanding need for high-resolution, reproducible, and affordable sub-cellular light-sheet imaging. While some aspects-comparative benchmarking and validation, limitation for fixed samples-would benefit from further development, the manuscript makes a compelling case for Altair-LSFM as a valuable contribution to the open microscopy scientific community. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) A picture, or full CAD design of the complete instrument, should be included as a main figure.

      A complete CAD rendering of the microscope is now provided in Supplementary Figure 4.

      (2) There is no quantitative comparison of the effects of the tilting resonant galvo; only a cartoon, a figure should be included.

      The cartoon was intended purely as an educational illustration to conceptually explain the role of the tilting resonant galvo in shaping and homogenizing the light sheet. To clarify this intent, we have revised both the figure legend and corresponding text in the main manuscript. For readers seeking quantitative comparisons, we now reference the original study that provides a detailed analysis of this optical approach, as well as a review on the subject.

      (3) Description of L4 is missing in the Figure 1 caption.

      Thank you for catching this omission. We have corrected it.

      (4) The beam profiles in Figures 1c and 3a, please crop and make the image bigger so the profile can be appreciated. The PSFs in Figure 3c-e should similarly be enlarged and presented using a dynamic range/LUT such that any aberrations can be appreciated.

      In Figure 1c, our goal was to qualitatively illustrate the uniformity of the light-sheet across the full field of view, while Figure 1d provided the corresponding quantitative cross-section. To improve clarity, we have added an additional figure panel offering a higher-magnification, localized view of the light-sheet profile. For Figure 3c–e, we have enlarged the PSF images and adjusted the display range to better convey the underlying signal and allow subtle aberrations to be appreciated.

      (5) It is unclear why LLSM is being used as the gold standard, since in its current commercial form, available from Zeiss, it is a turn-key system designed for core facilities. The original LLSM is also a versatile instrument that provides much more than the square lattice for illumination, including structured illumination, hexagonal lattices, live-cell imaging, wide-field illumination, different scan modes, etc. These additional features are not even mentioned when compared to the Altair-LSFM. If a comparison is to be provided, it should be fair and balanced. Furthermore, as outlined in the public review, anecdotal statements on "most used", "difficult to align", or "unstable" should not be provided without data.

      In the revised manuscript, we have carefully removed anecdotal statements and, where appropriate, replaced them with quantitative or verifiable information. For instance, we now explicitly report that the square lattice was used in 16 of the 20 figure subpanels in the original LLSM publication, and we include a proxy for optical complexity based on the number of optical elements requiring alignment in each system.

      We also now clearly distinguish between the original LLSM design—which supports multiple illumination and scanning modes—and its subsequent commercial variants, including the ZEISS Lattice Lightsheet 7, which prioritizes stability and ease of use over configurational flexibility (see Supplementary Note 3).

      (6) The authors should recognize that implementing custom optics, no matter how well designed, is a big barrier to cross for most cell biology labs.

      We fully understand and now acknowledge in the main text that implementing custom optics can present a significant barrier, particularly for laboratories without prior experience in optical system assembly. However, similar challenges were encountered during the adoption of other open-source microscopy platforms, such as mesoSPIM and OpenSPIM, both of which have nonetheless achieved widespread implementation. Their success has largely been driven by exhaustive documentation, strong community support, and standardized design principles—approaches we have also prioritized in Altair-LSFM. We have therefore made all CAD files, alignment guides, and detailed build documentation publicly available and continue to develop instructional materials and community resources to further reduce the barrier to adoption.

      (7) Statements on "hands on workshops" though laudable, may not be appropriate to include in a scientific publication without some documentation on the influence they have had on implanting the microscope.

      We understand the concern. Our intention in mentioning hands-on workshops was to convey that the dissemination effort is supported by an NIH Biomedical Technology Development and Dissemination grant, which includes dedicated channels for outreach and community engagement. Nonetheless, we agree that such statements are not appropriate without formal documentation of their impact, and we have therefore removed this text from the revised manuscript.

      (8) It is claimed that the microscope is "reliable" in the discussion, but with no proof, long-term stability should be assessed and included.

      Our experience with Altair-LSFM has been that it remains well-aligned over time—especially in comparison to other light-sheet systems we worked on throughout the last 11 years—we acknowledge that this assessment is anecdotal. As such, we have omitted this claim from the revised manuscript.

      (9) Due to the reliance on anecdotal statements and comparisons without proof to other systems, this paper at times reads like a brochure rather than a scientific publication. The authors should consider editing their manuscript accordingly to focus on the technical and quantifiable aspects of their work.

      We agree with the reviewer’s assessment and have revised the manuscript to remove anecdotal comparisons and subjective language. Where possible, we now provide quantitative metrics or verifiable data to support our statements.

      Reviewer #3 (Recommendations for the authors):

      Other minor points that could improve the manuscript (although some of these points are explained in the huge supplementary manual): 

      (1) The authors explain thoroughly their design, and they chose a sample-scanning method. I think that a brief discussion of the advantages and disadvantages of such a method over, for example, a laserscanning system (with fixed sample) in the main text will be highly beneficial for the users.

      In the revised manuscript, we now include a brief discussion in the main text outlining the advantages and limitations of a sample-scanning approach relative to a light-sheet–scanning system. Specifically, we note that for thin, adherent specimens, sample scanning minimizes the optical path length through the sample, allowing the use of more tightly focused illumination beams that improve axial resolution. We also include a new supplementary figure illustrating how this configuration reduces the propagation length of the illumination light sheet, thereby enhancing axial resolution.

      (2) The authors justify selecting a 0.6 NA illumination objective over alternatives (e.g., Special Optics), but the manuscript would benefit from a more quantitative trade-off analysis (beam waist, working distance, sample compatibility) with other possibilities. Within the objective context, a comparison of the performances of this system with the new and upcoming single-objective light-sheet methods (and the ones based also on optical refocusing, e.g., DAXI) would be very interesting for the goodness of the manuscript.

      In the revised manuscript, we now provide a quantitative trade-off analysis of the illumination objectives in Supplementary Note 1, including comparisons of beam waist, working distance, and sample compatibility. This section also presents calculated point spread functions for both the 0.6 NA and 0.67 NA objectives, outlining the performance trade-offs that informed our design choice. In addition, Supplementary Note 3 now includes a broader comparison of Altair-LSFM with other light-sheet modalities, including diSPIM, ASLM, and OPM, to further contextualize the system’s capabilities within the evolving light-sheet microscopy landscape.

      (3) The modularity of the system is implied in the context of the manuscript, but not fully explained. The authors should specify more clearly, for example, if cameras could be easily changed, objectives could be easily swapped, light-sheet thickness could be tuned by changing cylindrical lens, how users might adapt the system for different samples (e.g., embryos, cleared tissue, live imaging), .etc, and discuss eventual constraints or compatibility issues to these implementations.

      Altair-LSFM was explicitly designed and optimized for imaging live adherent cells, where sample scanning and short light-sheet propagation lengths provide optimal axial resolution (Supplementary Note 3). While the same platform could be used for superficial imaging in embryos, systems implementing multiview illumination and detection schemes are better suited for such specimens. Similarly, cleared tissue imaging typically requires specialized solvent-compatible objectives and approaches such as ASLM that maximize the field of view. We have now added some text to the Design Principles section that explicitly state this.

      Altair-LSFM offers varying levels of modularity depending on the user’s level of expertise. For entry-level users, the illumination numerical aperture—and therefore the light-sheet thickness and propagation length—can be readily adjusted by tuning the rectangular aperture conjugate to the back pupil of the illumination objective, as described in the Design Principles section. For mid-level users, alternative configurations of Altair-LSFM, including different detection objectives, stages, filter wheels, or cameras, can be readily implemented (Supplementary Note 1). Importantly, navigate natively supports a broad range of hardware devices, and new components can be easily integrated through its modular interface. For expert users, all Zemax simulations, CAD models, and step-by-step optimization protocols are openly provided, enabling complete re-optimization of the optical design to meet specific experimental requirements.

      (4) Resolution measurements before and after deconvolution are central to the performance claim, but the deconvolution method (PetaKit5D) is only briefly mentioned in the main text, it's not referenced, and has to be clarified in more detail, coherently with the precision of the supplementary information. More specifically, PetaKit5D should be referenced in the main text, the details of the deconvolution parameters discussed in the Methods section, and the computational requirements should also be mentioned. 

      In the revised manuscript, we now provide a dedicated description of the deconvolution process in the Methods section, including the specific parameters and algorithms used. We have also explicitly referenced PetaKit5D in the main text to ensure proper attribution and clarity. Additionally, we note the computational requirements associated with this analysis in the same section for completeness.

      (5)  Image post-processing is not fully explained in the main text. Since the system is sample-scanning based, no word in the main text is spent on deskewing, which is an integral part of the post-processing to obtain a "straight" 3D stack. Since other systems implement such a post-processing algorithm (for example, single-objective architectures), it would be beneficial to have some discussion about this, and also a brief comparison to other systems in the main text in the methods section. 

      In the revised manuscript, we now explicitly describe both deskewing (shearing) and deconvolution procedures in the Alignment and Characterization section of the main text and direct readers to the Methods section. We also briefly explain why the data must be sheared to correct for the angled sample-scanning geometry for LLSM and Altair-LSFM, as well as both sample-scanning and laser-scanning-variants of OPMs.

      (6) A brief discussion on comparative costs with other systems (LLSM, dispim, etc.) could be helpful for non-imaging expert researchers who could try to implement such an optical architecture in their lab.

      Unfortunately, the exact costs of commercial systems such as LLSM or diSPIM are typically not publicly available, as they depend on institutional agreements and vendor-specific quotations. Nonetheless, we now provide approximate cost estimates in Supplementary Note 1 to help readers and prospective users gauge the expected scale of investment relative to other advanced light-sheet microscopy systems.

      (7) The "navigate" control software is provided, but a brief discussion on its advantages compared to an already open-access system, such as Micromanager, could be useful for the users.

      In the revised manuscript, we now include Supplementary Note 5 that discusses the advantages and disadvantages of different open-source microscope control platforms, including navigate and MicroManager. In brief, navigate was designed to provide turnkey support for multiple light-sheet architectures, with pre-configured acquisition routines optimized for Altair-LSFM, integrated data management with support for multiple file formats (TIFF, HDF5, N5, and Zarr), and full interoperability with OMEcompliant workflows. By contrast, while Micro-Manager offers a broader library of hardware drivers, it typically requires manual configuration and custom scripting for advanced light-sheet imaging workflows.

      (8) The cost and parts are well documented, but the time and expertise required are not crystal clear.Adding a simple time estimate (perhaps in the Supplement Section) of assembly/alignment/installation/validation and first imaging will be very beneficial for users. Also, what level of expertise is assumed (prior optics experience, for example) to be needed to install a system like this? This can help non-optics-expert users to better understand what kind of adventure they are putting themselves through.

      We thank the reviewer for this helpful suggestion. To address this, we have added Supplementary Table S5, which provides approximate time estimates for assembly, alignment, validation, and first imaging based on the user’s prior experience with optical systems. The table distinguishes between novice (no prior experience), moderate (some experience using but not assembling optical systems), and expert (experienced in building and aligning optical systems) users. This addition is intended to give prospective builders a realistic sense of the time commitment and level of expertise required to assemble and validate AltairLSFM.

      Minor things in the main text:

      (1) Line 109: The cost is considered "excluding the laser source". But then in the table of costs, you mention L4cc as a "multicolor laser source", for 25 K. Can you explain this better? Are the costs correct with or without the laser source? 

      We acknowledge that the statement in line 109 was incorrect—the quoted ~$150k system cost does include the laser source (L4cc, listed at $25k in the cost table). We have corrected this in the revised manuscript.

      (2) Line 113: You say "lateral resolution, but then you state a 3D resolution (230 nm x 230 nm x 370 nm). This needs to be fixed.

      Thank you, we have corrected this.

      (3) Line 138: Is the light-sheet uniformity proven also with a fluorescent dye? This could be beneficial for the main text, showing the performance of the instrument in a fluorescent environment.

      The light-sheet profiles shown in the manuscript were acquired using fluorescein to visualize the beam. We have revised the main text and figure legends to clearly state this.

      (4) Line 149: This is one of the most important features of the system, defying the usual tradeoff between light-sheet thickness and field of view, with a regular Gaussian beam. I would clarify more specifically how you achieve this because this really is the most powerful takeaway of the paper.

      We thank the reviewer for this key observation. The ability of Altair-LSFM to maintain a thin light sheet across a large field of view arises from diffraction effects inherent to high NA illumination. Specifically, diffraction elongates the PSF along the beam’s propagation direction, effectively extending the region over which the light sheet remains sufficiently thin for high-resolution imaging. This phenomenon, which has been the subject of active discussion within the light-sheet microscopy community, allows Altair-LSFM to partially overcome the conventional trade-off between light-sheet thickness and propagation length. We now clarify this point in the main text and provide a more detailed discussion in Supplementary Note 3, which is explicitly referenced in the discussion of the revised manuscript.

      (5) Line 171: You talk about repeatable assembly...have you tried many different baseplates? Otherwise, this is a complicated statement, since this is a proof-of-concept paper. 

      We thank the reviewer for this comment. We have not yet validated the design across multiple independently assembled baseplates and therefore agree that our previous statement regarding repeatable assembly was premature. To avoid overstating the current level of validation, we have removed this statement from the revised manuscript.

      (6) Line 187: same as above. You mention "long-term stability". For how long did you try this? This should be specified in numbers (days, weeks, months, years?) Otherwise, it is a complicated statement to make, since this is a proof-of-concept paper.

      We also agree that referencing long-term stability without quantitative backing is inappropriate, and have removed this statement from the revised manuscript.

      (7) Line 198: "rapid z-stack acquisition. How rapid? Also, what is the limitation of the galvo-scanning in terms of the imaging speed of the system? This should be noted in the methods section.

      In the revised manuscript, we now clarify these points in the Optoelectronic Design section. Specifically, we explicitly note that the resonant galvo used for shadow reduction operates at 4 kHz, ensuring that it is not rate-limiting for any imaging mode. In the same section, we also evaluate the maximum acquisition speeds achievable using navigate and report the theoretical bandwidth of the sample-scanning piezo, which together define the practical limits of volumetric acquisition speed for Altair-LSFM.

      (8) Line 234: Peta5Kit is discussed in the additional documentation, but should be referenced here, as well.

      We now reference and cite PetaKit5D.

      (9) Line 256: "values are on par with LLSM", but no values are provided. Some details should also be provided in the main text.

      In the revised manuscript, we now provide the lateral and axial resolution values originally reported for LLSM in the main text to facilitate direct comparison with Altair-LSFM. Additionally, Supplementary Note 3 now includes an expanded discussion on the nuances of resolution measurement and reporting in lightsheet microscopy.

      Figures:

      (1) Figure 1 could be implemented with Figure 3. They're both discussing the validation of the system (theoretically and with simulations), and they could be together in different panels of the same figure. The experimental light-sheet seems to be shown in a transmission mode. Showing a pattern in a fluorescent dye could also be beneficial for the paper.

      In Figure 1, our goal was to guide readers through the design process—illustrating how the detection objective’s NA sets the system’s resolution, which defines the required pixel size for Nyquist sampling and, in turn, the field of view. We then use Figure 1b–c to show how the illumination beam was designed and simulated to achieve that field of view. In contrast, Figure 3 presents the experimental validation of the illumination system. To avoid confusion, we now clarify in the text that the light sheet shown in Figure 3 was visualized in a fluorescein solution and imaged in transmission mode. While we agree that Figures 1 and 3 both serve to validate the system, we prefer to keep them as separate figures to maintain focus within each panel. We believe this organization better supports the narrative structure and allows readers to digest the theoretical and experimental validations independently.

      (2) Figure 3: Panels d and e show the same thing. Why would you expect that xz and yz profiles should be different? Is this due to the orientation of the objectives towards the sample?

      In Figure 3, we present the PSF from all three orthogonal views, as this provides the most transparent assessment of PSF quality—certain aberration modes can be obscured when only select perspectives are shown. In principle, the XZ and YZ projections should be equivalent in a well-aligned system. However, as seen in the XZ projection, a small degree of coma is present that is not evident in the YZ view. We now explicitly note this observation in the revised figure caption to clarify the difference between these panels.

      (3) Figure 4's single boxes lack a scale bar, and some of the Supplementary Figures (e.g. Figure 5) lack detailed axis labels or scale bars. Also, in the detailed documentation, some figures are referred to as Figure 5. Figure 7 or, for example, figure 6. Figure 8, and this makes the cross-references very complicated to follow

      In the revised manuscript, we have corrected these issues. All figures and supplementary figures now include appropriate scale bars, axis labels, and consistent formatting. We have also carefully reviewed and standardized all cross-references throughout the main text and supplementary documentation to ensure that figure numbering is accurate and easy to follow.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:  

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter, and altered mitochondrial respiration.  

      Strengths:

      The authors use multiple orthogonal approaches to test the majority of their findings.  The authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration.  

      Weaknesses:

      Some indication as to whether other c-JUN target genes are also regulated by ZMAT3 would improve the broad relevance of the authors' findings.  

      We thank the reviewer for the kind words and the thoughtful suggestion. As recommended, to identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Reviewer #2 (Public review):

      Summary:

      The study elucidates the role of the recently discovered mediator of p53 tumor suppressive activity, ZMAT3. Specifically, the authors find that ZMAT3 negatively regulates HKDC1, a gene involved in the control of mitochondrial respiration and cell proliferation.  

      Strengths:

      Mechanistically, ZMAT3 suppresses HKDC1 transcription by sequestering JUN and preventing its binding to the HKDC1 promoter, resulting in reduced HKDC1 expression. Conversely, p53 mutation leads to ZMAT3 downregulation and HKDC1 overexpression, thereby promoting increased mitochondrial respiration and proliferation. This mechanism is novel; however, the authors should address several points.  

      Weaknesses:

      The authors conduct mechanistic experiments (e.g., transcript and protein quantification, luciferase assays) to demonstrate regulatory interactions between p53, ZMAT3, JUN, and HKDC1. These findings should be supported with functional assays, such as proliferation, apoptosis, or mitochondrial respiration analyses.  

      We thank the reviewer for appreciating our work and for this valuable suggestion. The reviewer rightly pointed out that supporting the regulatory interactions between p53, ZMAT3, JUN and HKDC1 with functional assays such as proliferation, apoptosis and mitochondrial respiration analyses would strengthen our mechanistic data. During the revision of our manuscript, we attempted to address this point by performing simultaneously knockdown of these proteins; however, we observed substantial toxicity under these conditions, making the functional assays technically unfeasible. This outcome was not unexpected as knockdown of JUN or HKDC1 individually results in growth defects.  We therefore focused our efforts on addressing the recommendation for authors.  

      Reviewer #3 (Public review):

      Summary:  

      In their manuscript, Kumar et al. investigate the mechanisms underlying the tumor suppressive function of the RNA binding protein ZMAT3, a previously described tumor suppressor in the p53 pathway. To this end, they use RNA-sequencing and proteomics to characterize changes in ZMAT3-deficient cells, leading them to identify the hexokinase HKDC1 as upregulated with ZMAT3 deficiency first in colorectal cancer cells, then in other cell types of both mouse and human origin. This increase in HKDC1 is associated with increased mitochondrial respiration. As ZMAT3 has been reported as an RNA-binding and DNA-binding protein, the authors investigated this via PAR-CLIP and ChIP-seq but did not observe ZMAT3 binding to HKDC1 pre-mRNA or DNA. Thus, to better understand how ZMAT3 regulates HKDC1, the authors used quantitative proteomics to identify ZMAT3interacting proteins. They identified the transcription factor JUN as a ZMAT3-interacting protein and showed that JUN promotes the increased HKDC1 RNA expression seen with ZMAT3 inactivation. They propose that ZMAT3 inhibits JUN-mediated transcriptional induction of HKDC1 as a mechanism of tumor suppression. This work uncovers novel aspects of the p53 tumor suppressor pathway.  

      Strengths:

      This novel work sheds light on one of the most well-established yet understudied p53 target genes, ZMAT3, and how it contributes to p53's tumor suppressive functions. Overall, this story establishes a p53-ZMAT3-HKDC1 tumor suppressive axis, which has been strongly substantiated using a variety of orthogonal approaches, in different cell lines and with different data sets.  

      Weaknesses:

      While the role of p53 and ZMAT3 in repressing HKDC1 is well substantiated, there is a gap in understanding how ZMAT3 acts to repress JUN-driven activation of the HKDC1 locus. How does ZMAT3 inhibit JUN binding to HKDC1? Can targeted ChIP experiments or RIP experiments be used to make a more definitive model? Can ZMAT3 mutants help to understand the mechanisms? Future work can further establish the mechanisms underlying how ZMAT3 represses JUN activity.  

      We thank the reviewer for the kind words and the invaluable suggestion. The reviewer has an excellent point regarding how ZMAT3 inhibits JUN binding to HKDC1 locus.Our new data included in the revised manuscript show that the ZMAT3-JUN interaction is lost in the presence of DNase or RNase, indicating that the interaction requires both DNA and RNA. This result suggests that ZMAT3 and JUN  form an RNA-dependent, chromatin- associated complex. Although not directly investigated in our study, this finding is consistent with emerging evidence that RBPs can function as chromatin-associated cofactors in transcription. For example, functional interplay between transcription factor YY1 and the RNA binding protein RBM25 co-regulates a broad set of genes, where RBM25 appears to engage promoters first and then recruit YY1, with RNA proposed to guide target recognition. We have discussed this possibility in the discussion section of revised manuscript (page 13). We agree that future work using ZMAT3 mutants and targeted ChIP or RIP assays will be valuable to delineate the precise mechanism by which ZMAT3 inhibits JUN binding to its target genes.   

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      ZMAT3 is a p53 target gene that the Lal group and others have shown is important for p53mediated tumor suppression, and which plays a role in the control of RNA splicing. In this manuscript, Lal and colleagues perform quantitative proteomics of cells with ZMAT3 knockout and show that the enzyme hexokinase HKDC1 is the most upregulated protein. HKDC1 is emerging as an important player in human cancer. Importantly, the authors show both acute (gene silencing) and chronic (CRISPR KO) approaches to silence ZMAT3, and they do this in several cell lines. Notably, they show that ZMAT3 silencing leads to impaired mitochondrial respiration, in a manner that is rescued by silencing of HKDC1. Mechanistically, the authors show that ZMAT3 does not appear to directly regulate the expression of HKDC1; rather, they show that the transcription factor c-JUN was strongly enriched in ZMAT3 pull-downs in IP-mass spec experiments, and they perform IP-western to demonstrate an interaction between c-JUN and ZMAT3. Importantly, the authors demonstrate, using ChIP-qPCR, that JUN is present at the HKDC1 gene (intron 1) in ZMAT3 WT cells, and shows markedly enhanced binding in ZMAT3 KO cells. The data best fit a model whereby p53 transactivates ZMAT3, leading to decreased JUN binding to the HKDC1 promoter (intron 1), and altered mitochondrial respiration. The findings are compelling, and the authors use multiple orthogonal approaches to test most findings. And the authors offer a potentially new activity of ZMAT3 in tumor suppression by p53: the control of mitochondrial respiration. As such, enthusiasm is high for this manuscript. 

      Addressing the following question would improve the manuscript. 

      It is not clear how many (other) c-JUN target genes might be impacted by ZMAT3; other important c-JUN targets in cancer include GLS1, WEE1, SREBP1, GLUT1, and CD36, so there could be a global impact on metabolism in ZMAT3 KO cells. Can the authors perform qPCR on these targets in ZMAT3 WT and KO cells and see if these target genes are differentially expressed? 

      We thank the reviewer for this thoughtful suggestion. As recommended, we examined the expression of key c-JUN target genes GLS1 (also known as GLS), WEE1, SREBP1, GLUT1, and CD36 in ZMAT3-WT and ZMAT3-KO cells. We first analyzed publicly available JUN ChIP-Seq data from three ENCODE cell lines, which revealed JUN binding peaks near or upstream of exon 1 for GLS1/GLS, SREBP1, and SLC2A1/GLUT1, but not for WEE1 or CD36 (Appendix 1, panels A-E). Based on these results, we performed RT-qPCR for GLS1/GLS, SREBP1 and SLC2A1 in ZMAT3-WT and ZMAT3-KO cells, with or without JUN knockdown. GLS mRNA was significantly reduced upon JUN knockdown in both ZMAT3-WT cells and ZMAT3-KO cells, but it was not upregulated upon loss of ZMAT3, indicating that GLS is a JUN target gene, but it is not regulated by ZMAT3. In contrast, SREBF1 or SLC2A1 expression remained unchanged upon ZMAT3 loss or JUN knockdown (Appendix 1 panels F-H). These data suggest that the ZMAT3/JUN axis does not regulate the expression of these genes.

      To identify additional c-JUN targets potentially regulated by ZMAT3, we intersected the genes upregulated upon ZMAT3 knockout (from our RNA-seq data) with the ChIP-Atlas dataset for human c-JUN and cross-referenced these with c-JUN peaks from three ENCODE cell lines. From this analysis, we selected for further analysis the top 4 candidate genes - LAMA2, VSNL1, SAMD3, and IL6R (Figure 5-figure supplement 2A-D). Like HKDC1, these genes were upregulated in ZMAT3-KO cells, and this upregulation was abolished upon siRNA-mediated JUN knockdown in ZMAT3-KO cells (Figure 5-figure supplement 2E). Moreover, by ChIP-qPCR we observed increased JUN binding to the JUN peak for these genes in ZMAT3-KO cells as compared to the ZMAT3-WT (Figure 5- figure supplement 2F). As described on page 11 of the revised manuscript, these results suggest that the ZMAT3/JUN axis negatively regulates HKDC1 expression and additional c-JUN target genes.   

      Minor concerns: 

      (1) Line 150: observed a modest. 

      (2) Line 159: Figure 2G appears to be inaccurately cited. 

      (3) Line 191: assays to measure. 

      We thank the reviewer for pointing these out. These minor concerns have been addressed in the text.  

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 1E: Can the authors clarify what the numbers on the left side of the chart represent? Do they refer to the scale?

      The numbers on the Y-axis represent the -log 10 (p- value) where higher values correspond to more significant changes. For visualization purposes, the significant changes are shown in red.  

      (2) Page 5, line 123: The sentence "As expected, ZMAT3 mRNA levels were decreased in the ZMAT3-KO cells" is redundant, as this information was already mentioned on page 4, line 103.  

      We thank the reviewer for noticing this redundancy. The repeated sentence has been removed in the revised manuscript.  

      (3) Page 5: The authors state: "Transcriptome-wide, upon loss of ZMAT3, 606 genes were significantly up-regulated (adj. p < 0.05 and 1.5-fold change) and 552 were down-regulated, with a median fold change of 1.76 and 0.55 for the up- and down-regulated genes, respectively." Later, on page 6, they write: "Comparison of the RNA-seq data from ZMAT3WT vs. ZMAT3-KO and CTRL siRNA vs. ZMAT3 siRNA-transfected HCT116 cells indicated that 1023 genes were commonly up-regulated, and 1042 were commonly down-regulated upon ZMAT3 loss (Figure S2C and D)." Why is the number of deregulated transcripts higher in the ZMAT3-WT vs. ZMAT3-KO comparison than in the CTRL siRNA vs. ZMAT3 siRNA comparison? Are the authors using less stringent criteria in the second analysis? This point should be clarified. 

      We thank the reviewer for highlighting this point. The reviewer is correct that less stringent criteria were used in the second analysis. On page 5, we applied stringent thresholds (adjusted p-value < 0.05 and 1.5-fold change) to identify high-confidence transcriptome-wide changes upon ZMAT3 loss. In contrast, for the comparison of both RNA-seq datasets (ZMAT3-WT vs. KO and siCTRL vs. siZMAT3), we included genes that were consistently up- or downregulated, without applying a fold change threshold, focusing instead on significantly altered genes (adjusted p < 0.05) in both datasets. This allowed us to capture broader and more reproducible transcriptomic changes that occur upon ZMAT3 depletion, including modest but significant changes upon transient ZMAT3 knockdown with siRNAs. We have now clarified this distinction on page 6 of the revised manuscript.

      (4) Figures 2B and 2E: The authors should provide quantification of HKDC1 protein levels normalized to a loading control. In addition, they should assess HKDC1 protein abundance upon ZMAT3 interference in SWI1222 and HCEC1CT cells, not just in HepG2 and HCT116 cells. 

      We thank the reviewer for this suggestion. We have now quantified all immunoblots presented throughout the manuscript, including those shown in Figures 2B and 2E, and all other figures containing protein analyses. Band intensities were quantified using ImageJ densitometry and normalized to GAPDH as the loading control. In addition, as suggested, we examined HKDC1 protein levels following ZMAT3 knockdown in two additional cell lines, SW1222 and HCEC-1CT. Consistent with our observations in HepG2 and HCT116 cells, ZMAT3 depletion led to increased HKDC1 protein levels in both SW1222 and HCEC-1CT cells. These new data are now included in Figure 2-figure supplement 1F and G. We have updated the Results section, figure legends, and figures to reflect these additions.

      (5) Figure 3A: It is unclear which gene was knocked out in the "KO cells." The authors should clearly specify this.

      We thank the reviewer for pointing this out. We have now updated Figure 3A.

      (6) Figure 3D: The result appears counterintuitive in comparison to Figure 3E. Why does HKDC1 knockdown reduce cell confluency more in ZMAT3 KO cells than in control (ZMAT3 wild-type) cells? The authors should explain this discrepancy more clearly.

      We thank the reviewer for this insightful comment. As shown in Figure 3D and 3E, knockdown of HKDC1 resulted in a greater decrease in proliferation in ZMAT3-KO cells than in ZMAT3-WT cells. This observation was indeed unexpected, given that HKDC1 acts downstream of ZMAT3. One possible explanation is that elevated HKDC1 expression in ZMAT3-KO cells increases their reliance on HKDC1 for sustaining proliferation, and that HKDC1 may also participate in additional pathways in ZMAT3-KO cells. Consequently, transient knockdown of HKDC1 in ZMAT3-KO cells would have a more pronounced effect on proliferation due to their increased dependency on HKDC1 activity. In contrast, ZMAT3WT cells which express lower levels of HKDC1 are less dependent on its function and therefore less sensitive to its depletion. We have now clarified this point on page 8 of the revised manuscript.  

      Reviewer #3 (Recommendations for the authors):  

      (1) Why do the authors start their analysis by knocking out the p53 response element in Zmat3? That should be clarified. In addition, since clones were picked after CRISPR KO of Zmat3, were experiments done to confirm that p53 signaling was not disrupted?

      We thank the reviewer for this thoughtful question. We began our study by targeting the p53 response element (p53RE) in the ZMAT3 locus because the basal expression of ZMAT3 is regulated by p53 (Muys, Bruna R. et al., Genes & Development, 2021). Deleting the p53RE therefore allowed us to markedly reduce ZMAT3 expression without disrupting the entire ZMAT3 locus. We have clarified this rationale on page 4 of the revised manuscript. To ensure that p53 signaling was not affected by this modification, we verified that canonical p53 targets such as p21 were equivalently induced in both ZMAT3WT and KO cells following Nutlin treatment and that p53 induction was unchanged(Figure 4F and Figure 1 – figure supplement 1A).

      (2) Throughout the text, many immunoblots are used to validate the knockouts and knockdowns used, but some clarification is needed. In Figure S1A, the Zmat3-WT sample seems to have significantly more p53 than the Zmat3 KO sample. Does Zmat3 KO compromise p53 levels in other experiments? It would be good to understand if Zmat3 affects p53 function by affecting its levels. Also, the p21 blot is overloaded.

      We thank the reviewer for this helpful observation. To determine whether ZMAT3 knockout affects p53 function by affecting its levels, we repeated the experiment three independent times. Western blots from these biological replicates, together with protein quantification, are now included in Appendix-2 and Figure 1-figure supplement 1A. These data show no significant differences in p53 or p21 induction between ZMAT3-WT and ZMAT3-KO cells following Nutlin treatment. In the revised manuscript, we have replaced the blot in Figure 1-figure supplement 1A with a more representative image from one of these replicate experiments.

      In Figure 2E, HKDC1 protein levels are not shown for the SW1222 and HCEC-1CT cell lines, 

      We thank the reviewer for this suggestion. HKDC1 protein levels in SW1222 and HCEC1-CT cells following ZMAT3 knockdown are now included as Figure 2- figure supplement 1F and 1G, together with the corresponding quantification.

      and Zmat3 does not appear as its characteristic two bands on the blot. What does this signify?

      We thank the reviewer for this observation. Endogenous ZMAT3 typically appears as two closely migrating bands on immunoblots. As shown in Figure 4D and Appendix 2A and 2B, these two bands are observed at the expected molecular weight following Nutlin treatment and are specific to ZMAT3, as they are markedly reduced in ZMAT3-KO cells. In contrast, only a single ZMAT3 band is visible in Figure 2E. This likely reflects limited resolution of the two bands in some blots rather than a biological difference.   

      (3) Why does HKDC1 knockdown only have an effect on metabolic phenotypes when ZMAT3 is gone? In Figure 3A, there does not seem to be a decrease in hexokinase activity in the siCTRL + siHKDC1 condition compared to siCTRL alone. Also, in Figure 3A, does phosphorylation activity of HKDC1 necessarily reflect glucose uptake, as stated? Additionally, in Figure 3C, there is no effect on mitochondrial respiration with siHKDC1, even though recent studies have shown a significant effect of HKDC1 on this.

      We thank the reviewer for raising these important questions. As noted, HKDC1 knockdown alone in wild-type cells (siCTRL + siHKDC1) does not significantly reduce hexokinase activity (Figure 3A). This likely reflects the low basal expression of HKDC1 in these cells. Thus, the metabolic phenotype may only become apparent when HKDC1 expression exceeds a functional threshold, as observed in ZMAT3-KO cells where HKDC1 is upregulated.

      Regarding the glucose uptake assay, HKDC1 itself is not phosphorylated; rather, it phosphorylates a non-catabolizable glucose analog, 2-deoxyglucose (2-DG) upon cellular uptake. According to the manufacturer’s protocol, intracellular 2-DG is phosphorylated by hexokinases to 2-deoxyglucose-6-phosphate (2-DG6P), which cannot be further metabolized and therefore accumulates. The accumulated 2-DG6P is quantified using a luminescence-based readout. This assay is widely used as a surrogate for glucose uptake because it reflects both glucose import and phosphorylation — the first step of glycolytic flux. As for the lack of change in mitochondrial respiration (Figure 3C), we acknowledge that some studies have reported mitochondrial roles for HKDC1 under basal conditions; however, such effects may be cell type-specific.

      (4) The emphasis on glycolysis signatures is confusing, as in the end, glycolysis does not seem to be affected by ZMAT3 status, but mitochondrial respiration is affected. Can the text be clarified to address this? It is also difficult to understand the role of oxygen consumption rate (OCR) in ZMAT3 phenotypes, as it does not fully track with proliferation. For example, ZMAT3 KD has the highest OCR, and the other conditions have similar OCRs but different proliferative rates in Figure 3D. Also, the colors used in Figure 3 to denote different genotypes change between B/C and D, which is confusing.

      We thank the reviewer for pointing out the inconsistency in the colors of the graph in Figure 2, which we have now corrected. Our data indicates that ZMAT3 regulates mitochondrial respiration without significantly affecting glycolysis. It is possible that mitochondria in ZMAT3-KO cells are oxidizing more substrates that are not produced by glycolysis. Additional work will be required to fully determine these mechanisms. We have clarified this on page 8 of the revised manuscript.      

      (5) The lack of ZMAT3 binding to RNAs in PAR-CLIP is not proof that it does not do so. A more targeted approach should be used, using individual RIP assays. The authors should also analyze the splicing of HKDC1, which could be affected by ZMAT3.

      As suggested, we performed ZMAT3 RNA IP experiments (RIP) using doxycycline-inducible HCT116-ZMAT3-FLAG cells. However, we did not observe significant enrichment of HKDC1 mRNA in the ZMAT3 IPs (Figure 5 – figure supplement 1A), consistent with previously published ZMAT3 RIP-seq data (Bersani et al, Oncotarget, 2016). These findings further support the notion that ZMAT3 does not directly bind to HKDC1 mRNA in these cells. We Accordingly, we have modified the text on page 10 of the revised manuscript.

      In addition, as suggested by the reviewer, we analyzed changes in splicing of HKDC1 pre-mRNA using rMATS in HCT116 cells by comparing our previously published RNA-seq data from siCTRL and siZMAT3-transfected HCT116 cells (Muys et al, Genes Dev, 2021). We focused on splicing events with an FDR < 0.05 and a delta PSI > |0.1| (representing at least a 10% change in splicing). The splicing analysis (data not shown) did not reveal any significant alterations in HKDC1 pre-mRNA splicing upon ZMAT3 knockdown. Corresponding text has been updated on page 10 of the revised manuscript.

      (6) The authors say that they examine JUN binding at the HKDC1 promoter several times, but they focus on intron 1 in Figure 5. They should revise the text accordingly, and they should also show JUN ChIP data traces for the whole HKDC1 locus in Figure 5C.

      We thank the reviewer for this helpful suggestion. As recommended, we have revised the text throughout the manuscript and replaced HKDC1 promoter with HKDC1 intron 1 DNA to accurately reflect our analysis, and Figure 5 now shows the JUN ChIP-seq signal across the entire HKDC1 locus.

      (7) In the ZMAT3 and JUN interaction assays, were these tested in the presence of DNAse or RNAse to determine if nucleic acids mediate the interaction?

      We thank the reviewer for this valuable suggestion. To test whether nucleic acids mediate the ZMAT3-JUN interaction, we performed ZMAT3 immunoprecipitation (IPs) in the presence or absence of DNase and RNase from doxycycline-inducible ZMAT3-FLAG expressing HCT116 cells. The ZMAT3-JUN interaction was lost upon treatment with either DNase or RNase, indicating that the interaction is mediated by nucleic acids. This data has been added in the revised manuscript (Figure 5-figure supplement 1D and on page 11).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review)

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      We thank the reviewer for emphasizing the importance of appropriate controls for the dominant-negative experiments. Dominant-negative Hox constructs have been successfully and widely used in previous studies, supporting the reliability of this approach. In our experiments, electroporation of the dominant-negative constructs into the limb field produced clear and reproducible effects when compared with both unoperated embryos and embryos electroporated with a GFP control construct. The GFP construct serves as an appropriate control, as it accounts for any effects of electroporation or exogenous protein expression without altering Hox gene function. We therefore conclude that the observed phenotypes specifically reflect dominant-negative Hox activity rather than procedural artifacts.

      The absence of overt limb phenotypes in PG4–PG7 mouse mutants likely reflects both functional redundancy among Hox paralogs and the difficulty of detecting subtle limbspecific effects in bilateral, systemically affected embryos. In contrast, the chick embryo system allows unilateral gene manipulation, providing an internal control and greater sensitivity for detecting weak or localized effects that may be masked in whole-animal mouse mutants.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      We thank the reviewer for this thoughtful suggestion. We fully agree that functional redundancy among Hox paralogs is an important consideration. However, Hox gene interactions are highly context-dependent and not strictly additive. Simultaneous interference with multiple Hox groups often leads to complex or compensatory effects that are difficult to interpret mechanistically, particularly when using dominant-negative constructs that may affect overlapping transcriptional networks.

      Our current experimental design, which targets individual paralog groups, allows us to attribute observed phenotypes to specific Hox activities and to interpret the results more precisely. Moreover, as shown in previous studies, simultaneous knockdown of multiple Hox genes does not necessarily produce stronger. For these reasons, we believe that the present single–dominant-negative experiments are the most informative and sufficient for addressing the specific questions in this study.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We thank the reviewer for this insightful suggestion. However, because of the extensive functional redundancy and regulatory interdependence within the Hox network, simultaneous inhibition of Hox4 and Hox5 is unlikely to produce a simple or interpretable outcome. Previous studies have shown that combinatorial Hox manipulations can trigger compensatory changes in other Hox genes, often obscuring rather than clarifying specific relationships.

      In our study, the proposed permissive role of Hox4/5 is supported by the spatial and temporal patterns of Hox expression and by the phenotypic effects observed upon individual dominant-negative perturbations. These data together suggest that Hox4/5 establish a forelimb-competent domain, on which Hox6/7 subsequently act to promote limb outgrowth. We therefore believe that the current evidence sufficiently supports this model without necessitating the additional combined experiment, which may not provide clear mechanistic insight due to redundancy effects.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      We thank the reviewer for this important comment. We agree that Tbx5 expression alone may be not sufficient to define forelimb identity. However, in our experiments, the induced bulge displays several additional characteristics consistent with early limb identity (in pre-AER stage). First, the Tbx5 expression we observe corresponds to the stage when the limb field is already specified, not the earlier broad mesodermal phase described in other systems. Second, the induced domain also expresses Lmx1, a marker of dorsal limb mesenchyme, further supporting its limb-specific nature. Third, our RNA sequencing analysis reveals upregulation of multiple genes associated with early limb development pathways, providing molecular evidence for limb-type identity rather than non-specific mesodermal expansion. Taken together, these results strongly indicate that the induced bulge represents a forelimb-like structure rather than a generic mesodermal thickening.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We thank the reviewer for this helpful suggestion. We have analyzed the cartilage structures of the operated embryos. No skeletal elements were detected within the ectopic wing bud in the neck region. Furthermore, we did not observe any significant structural changes in the wing skeleton following loss-of-function (dnHox) experiments. These observations indicate that the ectopic bulges do not progress to form skeletal elements, consistent with their identity as early limb-like outgrowths rather than fully developed limbs.

      Reviewer #2 (Public review):

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We thank the reviewer for raising this important point regarding the specificity of the dominant-negative constructs. The dnHox constructs used in this study were generated by truncating the C-terminal region of each Hox protein, a strategy that removes the homeodomain and has been demonstrated to act as a specific dominant-negative by interfering with the corresponding Hox function without broadly affecting unrelated Hox genes. This approach has been successfully validated and used in previous work (Moreau et al., Curr. Biol. 2019), where similar constructs effectively and specifically inhibited Hox activity in the chick embryo.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here)

      We thank the reviewer for this insightful suggestion. We agree that, in principle, coelectroporation of dnHox4/5 with Hox6/7 could test the hierarchical relationship between these genes. However, due to the extensive redundancy and regulatory interdependence among Hox genes, simultaneous manipulation of multiple genes often leads to compensatory effects or complex outcomes that are difficult to interpret mechanistically. As discussed in our response to Point 3 of the reviewer 1, inhibition of only one or two Hox4/5 paralogs is unlikely to completely abolish the permissive function of this group.

      Our current data — showing that Hox6/7 gain-of-function can induce ectopic limb-like outgrowths, while dnHox4/5 and dnHox6/7 lead to reduced limb formation — already provide strong evidence for both the necessity and sufficiency of these Hox activities in forelimb positioning. We therefore believe that the existing experiments adequately support our proposed model without the need for additional combinatorial manipulations.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We thank the reviewer for this constructive suggestion. In response, we have added a table (Table 3) listing the genes expressed in both the native limb/wing bud and the Hoxa6-induced wing bud, as identified from our RNA-Seq dataset. This table provides the underlying data for the Venn diagram, heatmap, and GO analysis presented in Figure 3. We agree that including these data improves transparency and helps readers better appreciate the molecular similarity between the induced and native limb buds.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      We thank the reviewer for this important point. We have addressed this issue in our response to Reviewer 1, Point 1, and have expanded the relevant discussion in the manuscript. Briefly, we believe that the apparent discrepancy between chick and mouse results arises from both the high degree of functional redundancy among Hox paralogs and the limitations of detecting subtle limb-specific effects in systemic mouse mutants, where both sides of the embryo are equally affected. In contrast, the chick system allows unilateral gene manipulation, providing an internal control and greatly enhancing sensitivity to detect weak or localized effects. Thus, the chick embryo model can reveal subtle Hox-dependent limb-induction activities that are masked in conventional mouse knockout approaches.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In the manuscript submission by Zhao et al. entitled, "Cardiac neurons expressing a glucagon-like receptor mediate cardiac arrhythmia induced by high-fat diet in Drosophila" the authors assert that cardiac arrhythmias in Drosophila on a high fat diet is due in part to adipokinetic hormone (Akh) signaling activation. High fat diet induces Akh secretion from activated endocrine neurons, which activate AkhR in posterior cardiac neurons. Silencing or deletion of Akh or AkhR blocks arrhythmia in Drosophila on high fat diet. Elimination of one of two AkhR expressing cardiac neurons results in arrhythmia similar to high fat diet.

      Strengths:

      The authors propose a novel mechanism for high fat diet induced arrhythmia utilizing the Akh signaling pathway that signals to cardiac neurons.

      Comments on revisions:

      The authors have addressed my other concerns. The only outstanding issue is in regard to the following comment:

      The authors state that "HFD led to increased heartbeat and an irregular rhythm." In representative examples shown, HFD resulted in pauses, slower heart rate, and increased irregularity in rhythm but not consistently increased heart rate (Figures 1B, 3A, and 4C). Based on the cited work by Ocorr et al (https://doi.org/10.1073/pnas.0609278104), Drosophila heart rate is highly variable with periods of fast and slow rates, which the authors attributed to neuronal and hormonal inputs. Ocorr et al then describe the use of "semi-intact" flies to remove autonomic input to normalize heart rate. Were semi-intact flies used? If not, how was heart rate variability controlled? And how was heart rate "increase" quantified in high fat diet compared to normal fat diet? Lastly, how does one measure "arrhythmia" when there is so much heart rate variability in normal intact flies?

      The authors state that 8 sec time windows were selected at the discretion of the imager for analysis. I don't know how to avoid bias unless the person acquiring the imaging is blinded to the condition and the analysis is also done blind. Can you comment whether data acquisition and analysis was done in a blinded fashion? If not, this should be stated as a limitation of the study.

      Drosophila heart rate is highly variable. During the recording, we were biased to choose a time window when heartbeat was fairly stable. This is a limitation of the study, which we mentioned in the revised version. We chose to use intact over “semi-intact” flies with an intention to avoid damaging the cardiac neurons. 

      Reviewer #3 (Public review):

      Zhao et al. provide new insights into the mechanism by which a high-fat diet (HFD) induces cardiac arrhythmia employing Drosophila as a model. HFD induces cardiac arrhythmia in both mammals and Drosophila. Both glucagon and its functional equivalent in Drosophila Akh are known to induce arrhythmia. The study demonstrates that Akh mRNA levels are increased by HFD and both Akh and its receptor are necessary for high-fat diet-induced cardiac arrhythmia, elucidating a novel link. Notably, Zhao et al. identify a pair of AKH receptor-expressing neurons located at the posterior of the heart tube. Interestingly, these neurons innervate the heart muscle and form synaptic connections, implying their roles in controlling the heart muscle. The study presented by Zhao et al. is intriguing, and the rigorous characterization of the AKH receptor-expressing neurons would significantly enhance our understanding of the molecular mechanism underlying HFD-induced cardiac arrhythmia.

      Many experiments presented in the manuscript are appropriate for supporting the conclusions while additional controls and precise quantifications should help strengthen the authors' arguments. The key results obtained by loss of Akh (or AkhR) and genetic elimination of the identified AkhR-expressing cardiac neurons do not reconcile, complicating the overall interpretation.

      We thank the reviewer for the positive comments. We believe that more signaling pathways are active in the AkhR neurons and regulate rhythmic heartbeat. We are current searching for the molecules and pathways that act on the AkhR cardiac neurons to regulate the heartbeat. Thus, AkhR neuron x shall have a more profound effect. Loss of AkhR is not equivalent to AkhR neuron ablation. 

      The most exciting result is the identification of AkhR-expressing neurons located at the posterior part of the heart tube (ACNs). The authors attempted to determine the function of ACNs by expressing rpr with AkhR-GAL4, which would induce cell death in all AkhRexpressing cells, including ACNs. The experiments presented in Figure 6 are not straightforward to interpret. Moreover, the conclusion contradicts the main hypothesis that elevated Akh is the basis of HFD-induced arrhythmia. The results suggest the importance of AkhR-expressing cells for normal heartbeat. However, elimination of Akh or AkhR restores normal rhythm in HFD-fed animals, suggesting that Akh and AkhR are not important for maintaining normal rhythms. If Akh signaling in ACNs is key for HFD-induced arrhythmia, genetic elimination of ACNs should unalter rhythm and rescue the HFD-induced arrhythmia. An important caveat is that the experiments do not test the specific role of ACNs. ACNs should be just a small part of the cells expressing AkhR. Specific manipulation of ACNs will significantly improve the study. Moreover, the main hypothesis suggests that HFD may alter the activity of ACNs in a manner dependent on Akh and AkhR. Testing how HFD changes calcium, possibly by CaLexA (Figure 2) and/or GCaMP, in wild-type and AkhR mutant could be a way to connect ACNs to HFD-induced arrhythmia. Moreover, optogenetic manipulation of ACNs may allow for specific manipulation of ACNs.

      We thank the reviewer for suggesting the detailed experiments and we believe that address these points shall consolidate the results. As AkhR-Gal4 also expresses in the fat body, we set out to build a more specific driver. We planned to use split-Gal4 system (Luan et al. 2006. PMID: 17088209). The combination of pan neuronal Elav-Gal4.DBD and AkhRp65.AD shall yield AkhR neuron specific driver. We selected 2580 bp AkhR upstream DNA and cloned into pBPp65ADZpUw plasmid (Addgene plasmid: #26234). After two rounds of injection, however, we were not able to recover a transgenic line.

      We used GCaMP to record the calcium signal in the AkhR neurons. AkhR-Gal4>GCaMP has extremely high levels of fluorescence in the cardiac neurons under normal condition.

      We are screening Gal4 drivers, trying to find one line that is specific to the cardiac neurons and has a lower level of driver activity.   

      Interestingly, expressing rpr with AkhR-GAL4 was insufficient to eliminate both ACNs. It is not clear why it didn't eliminate both ACNs. Given the incomplete penetrance, appropriate quantifications should be helpful. Additionally, the impact on other AhkR-expressing cells should be assessed. Adding more copies of UAS-rpr, AkhR-GAL4, or both may eliminate all ACNs and other AkhR-expressing cells. The authors could also try UAS-hid instead of UASrpr.

      We quantified the AkhR neuron ablation and found that about 69% (n=28) showed a single ACN in AkhR-Gal4>rpr flies. It is more challenging to quantify other AkhR-expressing cells, as they are wide-spread distributed. We tried to add more copies of UAS-rpr or AkhR-Gal4, which caused developmental defects (pupa lethality). Thus, as mentioned above, we are trying to find a more specific driver for targeting the cardiac neurons.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      The authors refer 'crop' as the functional equivalent of the human stomach. Considering the difference in their primary functions, this cannot be justified.

      In Drosophila, the crop functions analogously to the stomach in vertebrates. It is a foregut storage and preliminary processing organ that regulates food passage into the midgut. It’s more than a simple reservoir. Crop engages in enzymatic mixing, neural control, and active motility.

      Line 163 and 166, APCs are not neurons.

      Akh-producing cells (APCs) in Drosophila are neuroendocrine cells, residing in the corpora cardiaca (CC). While they produce and secrete the hormone AKH (akin to glucagon), they are not brain interneurons per se. APCs share many neuronal features (vesicular release, axon-like projections) and receive neural inputs, effectively functioning as a peripheral endocrine center.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fungal survival and pathogenicity rely on the ability to undergo reversible morphological transitions, which are often linked to nutrient availability. In this study, the authors uncover a conserved connection between glycolytic activity and sulfur amino acid biosynthesis that drives morphogenesis in two fungal model systems. By disentangling this process from canonical cAMP signaling, the authors identify a new metabolic axis that integrates central carbon metabolism with developmental plasticity and virulence.

      Strengths:

      The study integrates different experimental approaches, including genetic, biochemical, transcriptomic, and morphological analyses, and convincingly demonstrates that perturbations in glycolysis alter sulfur metabolic pathways and thus impact pseudohyphal and hyphal differentiation. Overall, this work offers new and important insights into how metabolic fluxes are intertwined with fungal developmental programs and therefore opens new perspectives to investigate morphological transitioning in fungi.

      We thank the reviewer for finding this study to be of importance and for appreciating our multipronged approach to substantiate our finding that perturbations in glycolysis alter sulfur metabolism and thus impact pseudohyphal and hyphal differentiation in fungi.

      Weaknesses:

      A few aspects could be improved to strengthen the conclusions. Firstly, the striking transcriptomic changes observed upon 2DG treatment should be analyzed in S. cerevisiae adh1 and pfk1 deletion strains, for instance, through qPCR or western blot analyses of sulfur metabolism genes, to confirm that observed changes in 2DG conditions mirror those seen in genetic mutants. Secondly, differences between methionine and cysteine in their ability to rescue the mutant phenotype in both species are not mentioned, nor discussed in more detail. This is especially important as there seem to be differences between S. cerevisiae and C. albicans, which might point to subtle but specific metabolic adaptations.

      The authors are also encouraged to refine several figure elements for clarity and comparability (e.g., harmonized axes in bar plots), condense the discussion to emphasize the conceptual advances over a summary of the results, and shorten figure legends.

      We are grateful for this valuable and constructive feedback, and we agree with the reviewer on the necessity of performing RT-qPCR analysis of sulfur metabolism genes in ∆∆pfk1 and ∆∆adh1 strains of S. cerevisiae to validate our RNA-Seq results using 2DG. We have performed this experiment, and our results show that several genes involved in the de novo biosynthesis of sulfur-containing amino acids are downregulated in both the ∆∆pfk1 and ∆∆adh1 strains, corroborating the downregulation of sulfur metabolism genes in the 2DG treated samples. This new data is now included in the revised manuscript as Supplementary Figure 2C. 

      Furthermore, we acknowledge the reviewer’s point regarding the significance of comparing the differences in the ability of methionine and cysteine to rescue filamentation defects exhibited by the mutants, between S. cerevisiae and C. albicans. The observed differences between S. cerevisiae and C. albicans likely highlight species-specific metabolic adaptations within the sulfur assimilation pathway.  While both yeasts employ the transsulfuration pathway to interconvert these sulfur-containing amino acids, the precise regulatory points including the specific enzymes, their compartmentalization, and transcriptional control are not identical. For instance, differences in the feedback inhibition mechanisms or the expression levels of key transsulfuration enzymes between S. cerevisiae and C. albicans could explain the variations in the phenotypic rescue experiments (Chebaro et al., 2017; Lombardi et al., 2024; Rouillon et al., 2000; Shrivastava et al., 2021; Thomas and Surdin-Kerjan, 1997). Furthermore, the species-specific differences in amino acid transport systems (permeases) adds another layer of complexity. S. cerevisiae primarily uses multiple, low-affinity permeases for cysteine transport (Gap1, Bap2, Bap3, Tat1, Tat2, Agp1, Gnp1, Yct1), while relying on a limited set of high-affinity transporters (like Mup1) for methionine transport, with the added complexity that its methionine transporters can also transport cysteine (Düring-Olsen et al., 1999; Huang et al., 2017; Kosugi et al., 2001; Menant et al., 2006). In contrast, C. albicans utilizes a high-affinity transporters for the uptake of both amino acids, employing Cyn1 specifically for cysteine and Mup1 for methionine, indicating a greater reliance on dedicated transport mechanisms for these sulfur-containing molecules in the pathogenic yeast (Schrevens et al., 2018; Yadav and Bachhawat, 2011). A combination of the aforesaid factors could be the potential reason for the differences in the ability of cysteine and methionine to rescue filamentation in S. cerevisiae and C. albicans.

      Finally, we have enhanced the quantitative rigor and clarity of the data presentation in the revised manuscript by implementing Y-axis uniformity across all relevant bar graphs to facilitate a more robust and direct comparative analysis. We have also condensed the discussion to emphasize the conceptual advances and have shortened the figure legends as per the reviewer suggestions

      Reviewer #2 (Public review):

      Summary:

      This manuscript investigates the interplay between glycolysis and sulfur metabolism in regulating fungal morphogenesis and virulence. Using both Saccharomyces cerevisiae and Candida albicans, the authors demonstrate that glycolytic flux is essential for morphogenesis under nitrogen-limiting conditions, acting independently of the established cAMP-PKA pathway. Transcriptomic and genetic analyses reveal that glycolysis influences the de novo biosynthesis of sulfur-containing amino acids, specifically cysteine and methionine. Notably, supplementation with sulfur sources restores morphogenetic and virulence defects in glycolysis-deficient mutants, thereby linking core carbon metabolism with sulfur assimilation and fungal pathogenicity.

      Strengths:

      The work identifies a previously uncharacterized link between glycolysis and sulfur metabolism in fungi, bridging metabolic and morphogenetic regulation, which is an important conceptual advance and fungal pathogenicity. Demonstrating that adding cysteine supplementation rescues virulence defects in animal models connects basic metabolism to infection outcomes, which adds to biomedical importance.

      We would like to thank the reviewer for the positive comments on our work. We are pleased that they recognize the novel metabolic link between glycolysis and sulfur metabolism as a key conceptual advance in fungal morphogenesis. 

      Weaknesses:

      The proposed model that glycolytic flux modulates Met30 activity post-translationally remains speculative. While data support Met4 stabilization in met30 deletion strains, the mechanism of Met30 modulation by glycolysis is not demonstrated.

      We thank the reviewer for this valuable feedback. The activity of the SCF<sup>Met30</sup> E3 ubiquitin ligase, mediated by the F box protein Met30, is dynamically regulated through both proteolytic degradation and its dissociation from the SCF complex, to coordinate sulfur metabolism and cell cycle progression (Smothers et al., 2000; Yen et al., 2005). Our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism. This observation is consistent with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates the SCF<sup>Met30</sup> E3 ubiquitin ligase via the dissociation of Met30 from the Skp1 subunit of the SCF complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Mechanistic validation of this hypothesis, particularly the assessment of Met30 dissociation from the SCF<sup>Met30</sup> complex via immunoprecipitation (IP), is technically challenging. Since these experiments will involve isolation of cells from colonies undergoing pseudohyphal differentiation, on solid media (given that pseudohyphal differentiation does not occur in liquid media that is limiting for nitrogen (Gancedo, 2001; Gimeno et al., 1992)), current cell yields (OD<sub>600</sub>≈1 from ≈80-100 colonies) are significantly below the amount of cells that is needed to obtain the required amount of total protein concentration, for standard pull down assays (OD<Sub>600</sub>≈600-800 is required to achieve 1-2 mg/ml of total protein which is the standard requirement for pull down protocols in S. cerevisiae (Lauinger et al., 2024)).

      Given that the primary objective of our study is to establish the novel regulatory link between glycolysis and sulfur metabolism in the context of fungal morphogenesis, we would like to explore these crucial mechanistic details, in depth, in a subsequent study.

      Reviewer #3 (Public review):

      This study investigates the connection between glycolysis and the biosynthesis of sulfur-containing amino acids in controlling fungal morphogenesis, using Saccharomyces cerevisiae and C. albicans as model organisms. The authors identify a conserved metabolic axis that integrates glycolysis with cysteine/methionine biosynthetic pathways to influence morphological transitions. This work broadens the current understanding of fungal morphogenesis, which has largely focused on gene regulatory networks and cAMP-dependent signaling pathways, by emphasizing the contribution of metabolic control mechanisms. However, despite the novel conceptual framework, the study provides limited mechanistic characterization of how the sulfur metabolism and glycolysis blockade directly drive morphological outcomes. In particular, the rationale for selecting specific gene deletions, such as Met32 (and not Met4), or the Met30 deletion used to probe this pathway, is not clearly explained, making it difficult to assess whether these targets comprehensively represent the metabolic nodes proposed to be critical. Further supportive data and experimental validation would strengthen the claims on connections between glycolysis, sulfur amino acid metabolism, and virulence.

      Strengths:

      (1) The delineation of how glycolytic flux regulates fungal morphogenesis through a cAMP-independent mechanism is a significant advancement. The coupling of glycolysis with the de novo biosynthesis of sulfur-containing amino acids, a requirement for morphogenesis, introduces a novel and unexpected layer of regulation.

      (2) Demonstrating this mechanism in both S. cerevisiae and C. albicans strengthens the argument for its evolutionary conservation and biological importance.

      (3) The ability to rescue the morphogenesis defect through exogenous supplementation of sulfur-containing amino acids provides functional validation.

      (4) The findings from the murine Pfk1-deficient model underscore the clinical significance of metabolic pathways in fungal infections.

      We are grateful for this comprehensive and insightful summary of our work. We deeply appreciate the reviewer's recognition of the key conceptual breakthroughs regarding the metabolic regulation of fungal morphogenesis and the clinical relevance of our findings.

      Weaknesses:

      (1) While the link between glycolysis and sulfur amino acid biosynthesis is established via transcriptomic and proteomic analysis, the specific regulation connecting these pathways via Met30 remains to be elucidated. For example, what are the expression and protein levels of Met30 in the initial analysis from Figure 2? How specific is this effect on Met30 in anaerobic versus aerobic glycolysis, especially when the pentose phosphate pathway is involved in the growth of the cells when glycolysis is perturbed ?

      We are grateful for the insightful feedback provided by the reviewer. S. cerevisiae is a Crabtree positive organism that primarily uses anaerobic glycolysis to metabolize glucose, under glucose-replete conditions (Barford and Hall, 1979; De Deken, 1966) and our pseudohyphal differentiation assays are performed in glucose-rich conditions (Gimeno et al., 1992). Furthermore, perturbation of glycolysis is known to induce compensatory upregulation of the Pentose Phosphate Pathway (PPP) (Ralser et al., 2007) and we have also observed the upregulation of the gene that encodes for transketolase-1 (Tkl1), a key enzyme in the PPP, in our RNA-seq data. Importantly, our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism.  This aligns with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates SCF<sup>Met30</sup> E3 ubiquitin ligase via Met30 dissociation from the Skp1 subunit of the complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Further experiments are required to delineate the specific role of pentose phosphate pathway in the aforesaid proposed regulation of the Met30 activity under glycolysis perturbation and this will be explored in our subsequent study.

      (2) Including detailed metabolite profiling could have strengthened the metabolic connection and provided additional insights into intermediate flux changes, i.e., measuring levels of metabolites to check if cysteine or methionine levels are influenced intracellularly. Also, it is expected to see how Met30 deletion could affect cell growth. Data on Met30 deletion and its effect on growth are not included, especially given that a viable heterozygous Met30 strain has been established. Measuring the cysteine or methionine levels using metabolomic analysis would further strengthen the claims in every section.

      We are grateful to the reviewer for this constructive feedback. To address the potential impact of met30 deletion on cell growth, we have included new data (Suppl. Fig. 4A) demonstrating that the deletion of a single copy of met30 in diploid S. cerevisiae does not compromise overall cell growth under nitrogen-limiting conditions as the ∆met30 strain grows similar to the wild-type strain. 

      Our pseudohyphal/hyphal differentiation assays show that the defects induced by glycolytic perturbation is fully rescued by the exogenous supplementation of sulfur-containing amino acids, cysteine or methionine. Since these data conclusively demonstrate that the primary metabolic limitation caused by the perturbation of glycolysis, which leads to filamentation defects is sulfur metabolism, we posit that performing comprehensive metabolic profiling would primarily reconfirm the aforesaid results. We believe that our in vitro and in vivo sulfur add-back experiments sufficiently substantiate the novel regulatory metabolic link between glycolysis and sulfur metabolism.

      (3) In comparison with the previous bioRxiv (doi: https://doi.org/10.1101/2025.05.14.654021) of this article in May 2025 to the recent bioRxiv of this article (doi: https://doi.org/10.1101/2025.05.14.654021), there have been some changes, and Met30 deletion has been recently included, and the chemical perturbation of glycolysis has been added as new data. Although the changes incorporated in the recent version of the article improved the illustration of the hypothesis in Figure 6, which connects glycolysis to Sulfur metabolism, the gene expression and protein levels of all genes involved in the illustrated hypothesis are not consistently shown. For example, in some cases, the Met4 expression is not shown (Figure 4), and the Met30 expression is not shown during profiling (gene expression or protein levels) throughout the manuscript. Lack of consistency in profiling the same set of key genes makes understanding more complicated.

      We thank the reviewer for this feedback which helps us to clarify the scope of our transcriptomic analysis. Our decision to focus our RT-qPCR experiments on downstream targets, while excluding met4 and met30 from the RT-qPCR analysis, is based on their known regulatory mechanisms. Met4 activity is predominantly regulated by post-translational ubiquitination by the SCFMet30 complex followed by its degradation (Rouillon et al., 2000; Shrivastava et al., 2021; Smothers et al., 2000)  while Met30 activity is primarily regulated by its auto-degradation or its dissociation from the SCFMet30 complex (Lauinger et al., 2024; Smothers et al., 2000; Yen et al., 2005).  Consistent with this, our RNA-Seq results indicate that neither met4 nor met30 transcripts are differentially expressed, in response to 2DG addition. For all our RT-qPCR analysis in S. cerevisiae and C. albicans, we have consistently used the same set of sulfur metabolism genes and these include met32, met3, met5, met10 and met17. Our data on protein expression analysis of Met30 in S. cerevisiae (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCFMet30 proteasomal degradation as the dominant regulatory mechanism.

      (4) The demonstrated link between glycolysis and sulfur amino acid biosynthesis, along with its implications for virulence in C. albicans, is important for understanding fungal adaptation, as mentioned in the article; however, the Met4 activation was not fully characterized, nor were the data presented when virulence was assessed in Figure 4. Why is Met4 not included in Figure 4D and I? Especially, according to Figure 6, Met4 activation is crucial and guides the differences between glycolysis-active and inactive conditions.

      We thank the reviewer for their input. As the Met4 transcription factor in C. albicans is primarily regulated post-translationally through its degradation and inactivation by the SCFMet30 E3 ubiquitin ligase complex (Shrivastava et al., 2021), we opted to monitor the transcriptional status of downstream targets of Met4 (i.e., genes directly regulated by Met4), as these are the genes that exhibit the most direct and functionally relevant transcriptional changes in response to the altered Met4 levels.

      (5) Similarly, the rationale behind selecting Met32 for characterizing sulfur metabolism is unclear. Deletion of Met32 resulted in a significant reduction in pseudohyphal differentiation; why is this attributed only to Met32? What happens if Met4 is deleted? It is not justified why Met32, rather than Met4, was chosen. Figure 6 clearly hypothesizes that Met4 activation is the key to the mechanism.

      We sincerely thank the reviewer for this insightful query regarding our selection of the met32 for our gene deletion experiments. The choice of ∆∆met32 strain was strategically motivated by its unique phenotypic properties within the de novo biosynthesis of sulfur-containing amino acids pathway. While deletions of most the genes that encode for proteins involved in the de novo biosynthesis of sulfurcontaining amino acids, result in auxotrophy for methionine or cysteine, ∆∆met32 strain does not exhibit this phenotype (Blaiseau et al., 1997). This key distinction is attributed to the functional redundancy provided by the paralogous gene, met31 (Blaiseau et al., 1997). Crucially, given that the deletion of the central transcriptional regulator, met4, results in cysteine/methionine auxotrophy, the use of the ∆∆met32 strain provides an essential, viable experimental model for investigating the role of sulfur metabolism during pseudohyphal differentiation in S. cerevisiae.

      (6) The comparative RT-qPCR in Figure 5 did not account for sulfur metabolism genes, whereas it was focused only on virulence and hyphal differentiation. Is there data to support the levels of sulfur metabolism genes?

      We thank the reviewer for this feedback. We wish to respectfully clarify that the data pertaining to expression of sulfur metabolism genes in the presence of 2DG or in the ∆∆pfk1 strain in C. albicans are already included and discussed within the manuscript. These results can be found in Figure 4, panels D and I, respectively.

      (7) To validate the proposed interlink between sulfur metabolism and virulence, it is recommended that the gene sets (illustrated in Figure 6) be consistently included across all comparative data included throughout the comparisons. Excluding sulfur metabolism genes in Figure 5 prevents the experiment from demonstrating the coordinated role of glycolysis perturbation → sulfur metabolism → virulence. The same is true for other comparisons, where the lack of data on Met30, Met4, etc., makes it hard.to connect the hypothesis. It is also recommended to check the gene expression of other genes related to the cAMP pathway and report them to confirm the cAMP-independent mechanism. For example, gap2 deletion was used to confirm the effects of cAMP supplementation, but the expression of this gene was not assessed in the RNA-seq analysis in Figure 2. It would be beneficial to show the expression of cAMP-related genes to completely confirm that they do not play a role in the claims in Figure 2.

      We thank the reviewer for this valuable feedback. The transcriptional analysis of the sulfur metabolism genes in the presence of 2DG and the ∆∆pfk1 strain is shown in Figures 4D and 4I.

      Our RNA-seq analysis (Author response image 1) confirms that there is no significant transcriptional change in the expression of cAMP-PKA pathway associated genes (Log2 fold change ≥ 1 for upregulated genes and Log2 fold change ≤ -1 for downregulated genes) in 2DG treated cells compared to the untreated control cells, reinforcing our conclusion that the glycolytic regulation of fungal morphogenesis is mediated through a cAMP-PKA pathway independent mechanism.

      Author response image 1.

      (8) Although the NAC supplementation study is included in the new version of the article compared to the previous version in BioRxiv (May 2025), the link to sulfur metabolism is not well characterized in Figure 5 and their related datasets. The main focus of the manuscript is to delineate the role of sulfur metabolism; hence, it is anticipated that Figure 5 will include sulfur-related metabolic genes and their links to pfk1 deletion, using RT-PCR measurements as shown for the virulence genes.

      We thank the reviewer for this question. The relevant data are indeed present within the current submission. We respectfully direct the reviewer's attention to Figure 4, panels D and I, where the data pertaining to expression of sulfur metabolism genes in the presence of 2DG or in the ∆∆pfk1 strain in C. albicans can be found.

      (9) The manuscript would benefit from more information added to the introduction section and literature supports for some of the findings reported earlier, including the role of (i) cAMP-PKA and MAPK pathways, (ii) what is known in the literature that reports about the treatment with 2DG (role of Snf1, HXT1, and HXT3), as well as how gpa2 is involved. Some sentences in the manuscripts are repetitive; it would be beneficial to add more relevant sections to the introduction and discussion to clarify the rationale for gene choices.

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Line 107: As morphological transitions are indeed a conserved phenomenon across fungal species, hosts & environmental niches, the authors could refer to a few more here (infection structures like appressoria; fruiting bodies, etc.).

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      Line 119/120: That's a bit misleading in my opinion. Gpr1 acts as a key sensor of external carbon, while Ras proteins control the cAMP pathway as intracellular sensory proteins. That should be stated more clearly. cAMP is the output and not the sensor.

      We appreciate the reviewer's detailed attention to this signaling network. We have revised the manuscript to precisely reflect this established signaling hierarchy for maximum clarity.

      (2) Line 180: ..differentiation

      We thank the reviewer for this valuable feedback. We have incorporated this change in our revised manuscript.

      (3) Figure 1 panels C & F. The authors should provide the same scale for all experiments. Otherwise, the interpretation can be difficult. The same applies to the different bar plots in Figure 4. Have the authors quantified pseudohyphal differentiation in the cAMP add-back assays? I agree that the chosen images look convincing, but they don't reflect quantitative analyses.

      We thank the reviewer for detailed and constructive feedback. We have changed the Y-axis and made it more uniform to improve the clarity of our data presentation in the revised manuscript.

      We have also incorporated the quantitative analysis of the cAMP add-back assays in S. cerevisiae, in Figure 2 Panel L.

      (4) Line 367/68: "cysteine or methionine was able to completely rescue". Here, the authors should phrase their wording more carefully. Figure 3C shows the complete rescue of the phenotype qualitatively, but Figure 3D clearly shows that there are differences between the supplementation of cysteine and methionine, with the latter not fully restoring the phenotype.

      We sincerely appreciate the reviewer's meticulous attention to the data interpretation. We fully agree that the initial phrasing in lines 367/368 requires adjustment, as Figure 3D establishes a quantitative difference in the efficiency of phenotypic rescue between cysteine and methionine supplementation. We have revised the text to articulate this difference.

      (5) Line 568: Here, apparently, the ability to rescue the differentiation phenotype is reversed compared to the experiment with S. cerevisiae. Cysteine only results in ~20% hyphal cells, while methionine restores to wild-type-like hyphal formation. Can the authors comment on where these differences might originate from? Is there a difference in the uptake of cysteine vs. methionine in the two species or consumption rates?

      We thank the reviewer for their detailed and constructive feedback. We believe this phenotypic difference can be due to the distinct metabolic prioritization of sulfur amino acids in C. albicans. Methionine is a known trigger for hyphal differentiation in C. albicans and serves as the immediate precursor for the universal methyl donor, S-adenosylmethionine (SAM) (Schrevens et al., 2018). (Kraidlova et al., 2016). The morphological transition to hyphae involves a complex regulatory cascade which requires high rates of methylation, and this requires a rapid and direct conversion of methionine into SAM (Kraidlova et al., 2016; Schrevens et al., 2018). Cysteine, however, must first be converted into methionine via the transsulfuration pathway to produce SAM, making it metabolically less efficient for these aforesaid processes.

      Reviewer #2 (Recommendations for the authors):

      The study's comprehensive experimental approach with integrating pharmacological inhibition, genetic manipulation, transcriptomics, and infection animal model, provides strong evidence for a conserved mechanism, though some aspects need further clarification.

      Major Comments:

      (1) While the data suggest that glycolysis affects Met30 activity post-translationally, the underlying mechanism remains speculative. The authors should perform co-immunoprecipitation or ubiquitination assays to confirm whether glycolytic perturbation alters Met30-SCF complex interactions or Met4 ubiquitination levels.

      We thank the reviewer for this valuable feedback. The activity of the SCF<sup>Met30</sup> E3 ubiquitin ligase, mediated by the F box protein Met30, is dynamically regulated through both proteolytic degradation and its dissociation from the SCF complex, to coordinate sulfur metabolism and cell cycle progression (Smothers et al., 2000; Yen et al., 2005). Our transcriptomic (RNA-seq analysis) and protein expression analysis (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCF<sup>Met30</sup> proteasomal degradation as the dominant regulatory mechanism. This observation is consistent with the established paradigm wherein stress signals, such as cadmium (Cd<sup>2+</sup>) exposure, rapidly inactivates the SCF<sup>Met30</sup> E3 ubiquitin ligase via the dissociation of Met30 from the Skp1 subunit of the SCF complex (Lauinger et al., 2024; Yen et al., 2005). We therefore propose that active glycolytic flux modulates SCF<sup>Met30</sup> activity post-translationally, specifically by triggering Met30 detachment from the SCF complex. This mechanism would stabilize the primary substrate, the transcription factor Met4, thus promoting the biosynthesis of sulfur-containing amino acids. Mechanistic validation of this hypothesis, particularly the assessment of Met30 dissociation from the SCF<sup>Met30 </sup>complex via immunoprecipitation (IP), is technically challenging. Since these experiments will involve isolation of cells from colonies undergoing pseudohyphal differentiation, on solid media (given that pseudohyphal differentiation does not occur in liquid media that is limiting for nitrogen (Gancedo, 2001; Gimeno et al., 1992)), current cell yields (OD<sup>600</sup>≈1 from ≈80-100 colonies) are significantly below the amount of cells that is needed to obtain the required amount of total protein concentration, for standard pull down assays (OD600≈600-800 is required to achieve 1-2 mg/ml of total protein which is the standard requirement for pull down protocols in S. cerevisiae (Lauinger et al., 2024)).

      Given that the primary objective of our study is to establish the novel regulatory link between glycolysis and sulfur metabolism in the context of fungal morphogenesis, we would like to explore these crucial mechanistic details, in depth, in a subsequent study.

      (2) 2DG can exert pleiotropic effects unrelated to glycolytic inhibition (e.g., ER stress, autophagy induction). The authors are encouraged to perform complementary metabolic flux analyses, such as quantification of glycolytic intermediates or ATP levels, to confirm specific glycolytic inhibition.

      We appreciate the reviewer's concern regarding the potential pleiotropic effects of 2DG. While we acknowledge that 2DG may induce secondary cellular stress, we are confident that the observed phenotypes are robustly attributed to glycolytic inhibition based on our complementary genetic evidence. Specifically, the deletion strains ∆∆pfk1 and ∆∆adh1, which genetically perturb distinct steps in glycolysis, recapitulate the phenotypic results observed with 2DG treatment. Given this strong congruence between chemical inhibition and specific genetic deletions of key glycolytic enzymes, we are confident that our observed phenotypes are predominantly driven by the perturbation of the glycolytic pathway by 2DG.

      (3) The differential rescue effects (cysteine-only in inhibitor assays vs. both cysteine and methionine in genetic mutants) require further explanation. The authors should discuss potential differences in metabolic interconversion or amino acid transport that may account for this observation.

      We thank the reviewer for their valuable feedback. One explanation for the observed differential rescue effects of cysteine and methionine can be due to the distinct amino acid transport systems used by S. cerevisiae to transport these amino acids. S. cerevisiae primarily uses multiple, lowaffinity permeases (Gap1, Bap2, Bap3, Tat1, Tat2, Agp1, Gnp1, Yct1) for cysteine transport, while relying on a limited set of high-affinity transporters (like Mup1) for methionine transport, with the added complexity that its methionine transporters can also transport cysteine (Düring-Olsen et al., 1999; Huang et al., 2017; Kosugi et al., 2001; Menant et al., 2006). Hence, it is likely that cysteine uptake could be happening at a higher efficiency in S. cerevisiae compared to methionine uptake. Therefore, to achieve a comparable functional rescue by exogenous supplementation of methionine, it is necessary to use a higher concentration of methionine. When we performed our rescue experiments using higher concentrations of methionine, we did not see any rescue of pseudohyphal differentiation in the presence of 2DG and in fact we noticed that, at higher concentrations of methionine, the wild-type strain failed to undergo pseudohyphal differentiation even in the absence of 2DG. This is likely due to the fact that increasing the methionine concentration raises the overall nitrogen content of the medium, thereby making the medium less nitrogen-starved. This presents a major experimental constraint, as pseudohyphal differentiation is strictly dependent on nitrogen limitation, and the elevated nitrogen resulting from the higher methionine concentration can inhibit pseudohyphal differentiation.

      (4) NAC may influence host redox balance or immune responses. The discussion should consider whether the observed virulence rescue could partly result from host-directed effects.

      We thank the reviewer for this valuable feedback. We acknowledge the role of NAC in host directed immune response. It is important to note that, in the context of certain bacterial pathogens, NAC has been reported to augment cellular respiration, subsequently increasing Reactive Oxygen Species (ROS) generation, which contributes to pathogen clearance (Shee et al., 2022). Interestingly, in our study, NAC supplementation to the mice was given prior to the infection and maintained continuously throughout the duration of the experiment. This continuous supply of NAC likely contributes to the rescue of virulence defects exhibited by the ∆∆pfk1 strain (Fig. 5I and J). Essentially, NAC likely allows the mutant to fully activate its essential virulence strategies (including morphological switching), to cause a successful infection in the host. As per the reviewer suggestion, this has been included in the discussion section of the manuscript.

      Reviewer #3 (Recommendations for the authors):

      Most of the comments related to improving the manuscript have been provided in the public review. Here are some specifics for the authors to consider:

      (1) It is important to clarify the rationale for choosing specific gene deletions over other key genes (e.g., Met32 and Met30) and explain why Met4 was not included, given its proposed central role in Figure 6.

      We sincerely thank the reviewer for this insightful query regarding our selection of the met32 for our gene deletion experiments. The choice of ∆∆met32 strain was strategically motivated by its unique phenotypic properties within the de novo biosynthesis of sulfur-containing amino acids pathway. While deletions of most the genes that encode for proteins involved in the de novo biosynthesis of sulfurcontaining amino acids, result in auxotrophy for methionine or cysteine, ∆∆met32 strain does not exhibit this phenotype (Blaiseau et al., 1997). This key distinction is attributed to the functional redundancy provided by the paralogous gene, met31 (Blaiseau et al., 1997). Crucially, given that the deletion of the central transcriptional regulator, met4, results in cysteine/methionine auxotrophy, the use of the ∆∆met32 strain provides an essential, viable experimental model for investigating the role of sulfur metabolism during pseudohyphal differentiation in S. cerevisiae.

      (2) Comparison of consistent gene and protein expression data (Met30, Met4, Met32) across all relevant figures and analyses would strengthen the mechanistic connection in a better way. Some data that might help connect the sections is not included; please see the public review for more details.

      We thank the reviewer for this valuable input, which helps us to clarify the scope of our transcriptomic analysis. Our decision to focus our RT-qPCR experiments on downstream targets, while excluding Met4 and Met30 from the RT-qPCR analysis, is based on their known regulatory mechanisms. Met4 activity is predominantly regulated by post-translational ubiquitination by the SCFMet30 complex followed by its degradation (Rouillon et al., 2000; Shrivastava et al., 2021; Smothers et al., 2000)  while Met30 activity is primarily regulated by its auto-degradation or its dissociation from the SCFMet30 complex (Lauinger et al., 2024; Smothers et al., 2000; Yen et al., 2005).  Consistent with this, our RNA-Seq results indicate that neither met4 nor met30 transcripts are differentially expressed, in response to 2DG addition. For all our RT-qPCR analysis in S. cerevisiae and C. albicans, we have consistently used the same set of sulfur metabolism genes and these include met32, met3, met5, met10 and met17. Our data on protein expression analysis of Met30 in S, cerevisiae (Fig. 3J) confirms that Met30 expression is not differentially regulated in the presence of 2DG, effectively eliminating changes in synthesis or SCFMet30 proteasomal degradation as the dominant regulatory mechanism.

      (3) Suggested to include metabolomic profiling (cysteine, methionine, and intermediate metabolites) to substantiate the proposed metabolic flux between glycolysis and sulfur metabolism.

      We thank the reviewer for this valuable input. Our pseudohyphal/hyphal differentiation assays show that the defects induced by glycolytic perturbation is fully rescued by the exogenous supplementation of sulfur-containing amino acids, cysteine or methionine. Since these data conclusively demonstrate that the primary metabolic limitation caused by the perturbation of glycolysis, which leads to filamentation defects, is sulfur metabolism, we posit that performing comprehensive metabolic profiling would primarily reconfirm the aforesaid results. We believe that our in vitro and in vivo sulfur add-back experiments sufficiently substantiate the novel regulatory metabolic link between glycolysis and sulfur-metabolism.

      (4) Data on the effects of Met30 deletion on cell growth are currently not included, and relevant controls should be included to ensure observed phenotypes are not due to general growth defects.

      We are grateful to the reviewer for this constructive feedback. To address the potential impact of met30 deletion on cell growth, we have included new data (Suppl. Fig. 4A) demonstrating that the deletion of a single copy of met30 in diploid S. cerevisiae does not compromise overall growth under nitrogen-limiting conditions as the ∆met30 strain grows similar to the wild-type strain.

      (5) Expanding RT-qPCR and data from transcriptomic analyses to include sulfur metabolism genes and key cAMP pathway genes to confirm the proposed cAMP-independent mechanism during virulence characterization is necessary.

      We thank the reviewer for this valuable feedback. The transcriptional analysis of the sulfur metabolism genes in the presence of 2DG and the ∆∆pfk1 strain is shown in Figures 4D and 4I. 

      In order to confirm that glycolysis is critical for fungal morphogenesis in a cAMP-PKA pathway independent manner under nitrogen-limiting conditions in C. albicans, we performed cAMP add-back assays. Interestingly, corroborating our S. cerevisiae data, the exogenous addition of cAMP failed to rescue hyphal differentiation defect caused by the perturbation of glycolysis through 2DG addition or by the deletion of the pfk1 gene, under nitrogen-limiting condition in C. albicans. This data is now included in Suppl. Fig. 5B.

      (6) Enhancing the introduction and discussion by providing a clearer rationale for gene selection and more detailed references to established pathways (cAMP-PKA, MAPK, Snf1/HXT regulation, gpa2 involvement) is needed to reinstate the hypothesis.

      We thank the reviewer for this valuable feedback. We have incorporated these changes in our revised manuscript.

      (7) Reducing redundancy in the text and improving figure consistency, particularly by ensuring that the gene sets depicted in Figure 6 are represented across all datasets, would strengthen the interconnections among sections.

      We thank the reviewer for this valuable feedback.  We have incorporated these changes in our revised manuscript.

      References

      Barford JP, Hall RJ. 1979. An examination of the crabtree effect in Saccharomyces cerevisiae: The role of respiratory adaptation. J Gen Microbiol. https://doi.org/10.1099/00221287-114-2-267

      Blaiseau, P. L., & Thomas, D. (1998). Multiple transcriptional activation complexes tether the yeast activator Met4 to DNA. The EMBO journal, 17(21), 6327–6336. https://doi.org/10.1093/emboj/17.21.6327

      Chebaro, Y., Lorenz, M., Fa, A., Zheng, R., & Gustin, M. (2017). Adaptation of Candida albicans to Reactive Sulfur Species. Genetics, 206(1), 151–162. https://doi.org/10.1534/genetics.116.199679

      De Deken R. H. (1966). The Crabtree effect: a regulatory system in yeast. Journal of general microbiology, 44(2), 149–156. https://doi.org/10.1099/00221287-44-2-149

      Düring-Olsen, L., Regenberg, B., Gjermansen, C., Kielland-Brandt, M. C., & Hansen, J. (1999). Cysteine uptake by Saccharomyces cerevisiae is accomplished by multiple permeases. Current genetics, 35(6), 609–617. https://doi.org/10.1007/s002940050459

      Gancedo J. M. (2001). Control of pseudohyphae formation in Saccharomyces cerevisiae. FEMS microbiology reviews, 25(1), 107–123. https://doi.org/10.1111/j.1574-6976.2001.tb00573.x

      Gimeno, C. J., Ljungdahl, P. O., Styles, C. A., & Fink, G. R. (1992). Unipolar cell divisions in the yeast S. cerevisiae lead to filamentous growth: regulation by starvation and RAS. Cell, 68(6), 1077–1090. https://doi.org/10.1016/0092-8674(92)90079-r

      Huang, C. W., Walker, M. E., Fedrizzi, B., Gardner, R. C., & Jiranek, V. (2017). Yeast genes involved in regulating cysteine uptake affect production of hydrogen sulfide from cysteine during fermentation. FEMS yeast research, 17(5), 10.1093/femsyr/fox046. https://doi.org/10.1093/femsyr/fox046

      Kosugi, A., Koizumi, Y., Yanagida, F., & Udaka, S. (2001). MUP1, high affinity methionine permease, is involved in cysteine uptake by Saccharomyces cerevisiae. Bioscience, biotechnology, and biochemistry, 65(3), 728–731. https://doi.org/10.1271/bbb.65.728

      Kraidlova, L., Schrevens, S., Tournu, H., Van Zeebroeck, G., Sychrova, H., & Van Dijck, P. (2016). Characterization of the Candida albicans Amino Acid Permease Family: Gap2 Is the Only General Amino Acid Permease and Gap4 Is an S-Adenosylmethionine (SAM) Transporter Required for SAM-Induced Morphogenesis. mSphere, 1(6), e00284-16. https://doi.org/10.1128/mSphere.00284-16

      Lauinger, L., Andronicos, A., Flick, K., Yu, C., Durairaj, G., Huang, L., & Kaiser, P. (2024). Cadmium binding by the F-box domain induces p97-mediated SCF complex disassembly to activate stress response programs. Nature communications, 15(1), 3894. https://doi.org/10.1038/s41467-024-48184-6

      Lombardi, L., Salzberg, L. I., Cinnéide, E. Ó., O'Brien, C., Morio, F., Turner, S. A., Byrne, K. P., & Butler, G. (2024). Alternative sulphur metabolism in the fungal pathogen Candida parapsilosis. Nature communications, 15(1), 9190. https://doi.org/10.1038/s41467-024-53442-8

      Menant, A., Barbey, R., & Thomas, D. (2006). Substrate-mediated remodeling of methionine transport by multiple ubiquitin-dependent mechanisms in yeast cells. The EMBO journal, 25(19), 4436–4447. https://doi.org/10.1038/sj.emboj.7601330

      Ralser, M., Wamelink, M. M., Kowald, A., Gerisch, B., Heeren, G., Struys, E. A., Klipp, E., Jakobs, C., Breitenbach, M., Lehrach, H., & Krobitsch, S. (2007). Dynamic rerouting of the carbohydrate flux is key to counteracting oxidative stress. Journal of biology, 6(4), 10. https://doi.org/10.1186/jbiol61

      Rouillon, A., Barbey, R., Patton, E. E., Tyers, M., & Thomas, D. (2000). Feedback-regulated degradation of the transcriptional activator Met4 is triggered by the SCF(Met30 )complex. The EMBO journal, 19(2), 282–294. https://doi.org/10.1093/emboj/19.2.282

      Schrevens, S., Van Zeebroeck, G., Riedelberger, M., Tournu, H., Kuchler, K., & Van Dijck, P. (2018). Methionine is required for cAMP-PKA-mediated morphogenesis and virulence of Candida albicans. Molecular microbiology, 108(3), 258–275. https://doi.org/10.1111/mmi.13933

      Shee, S., Singh, S., Tripathi, A., Thakur, C., Kumar T, A., Das, M., Yadav, V., Kohli, S., Rajmani, R. S., Chandra, N., Chakrapani, H., Drlica, K., & Singh, A. (2022). Moxifloxacin-Mediated Killing of Mycobacterium tuberculosis Involves Respiratory Downshift, Reductive Stress, and Accumulation of Reactive Oxygen Species. Antimicrobial agents and chemotherapy, 66(9), e0059222. https://doi.org/10.1128/aac.00592-22

      Shrivastava, M., Feng, J., Coles, M., Clark, B., Islam, A., Dumeaux, V., & Whiteway, M. (2021). Modulation of the complex regulatory network for methionine biosynthesis in fungi. Genetics, 217(2), iyaa049. https://doi.org/10.1093/genetics/iyaa049

      Smothers, D. B., Kozubowski, L., Dixon, C., Goebl, M. G., & Mathias, N. (2000). The abundance of Met30p limits SCF(Met30p) complex activity and is regulated by methionine availability. Molecular and cellular biology, 20(21), 7845–7852. https://doi.org/10.1128/MCB.20.21.7845-7852.2000

      Thomas, D., & Surdin-Kerjan, Y. (1997). Metabolism of sulfur amino acids in Saccharomyces cerevisiae. Microbiology and molecular biology reviews : MMBR, 61(4), 503–532. https://doi.org/10.1128/mmbr.61.4.503532.1997

      Yadav, A. K., & Bachhawat, A. K. (2011). CgCYN1, a plasma membrane cystine-specific transporter of Candida glabrata with orthologues prevalent among pathogenic yeast and fungi. The Journal of biological chemistry, 286(22), 19714–19723. https://doi.org/10.1074/jbc.M111.240648

      Yen, J. L., Su, N. Y., & Kaiser, P. (2005). The yeast ubiquitin ligase SCFMet30 regulates heavy metal response. Molecular biology of the cell, 16(4), 1872–1882. https://doi.org/10.1091/mbc.e04-12-1130

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Joint Public Review:

      In this work, the authors present DeepTX, a computational tool for studying transcriptional bursting using single-cell RNA sequencing (scRNA-seq) data and deep learning. The method aims to infer transcriptional burst dynamics-including key model parameters and the associated steady-state distributions-directly from noisy single-cell data. The authors apply DeepTX to datasets from DNA damage experiments, revealing distinct regulatory patterns: IdU treatment in mouse stem cells increases burst size, promoting differentiation, while 5FU alters burst frequency in human cancer cells, driving apoptosis or survival depending on dose. These findings underscore the role of burst regulation in mediating cell fate responses to DNA damage.

      The main strength of this study lies in its methodological contribution. DeepTX integrates a non-Markovian mechanistic model with deep learning to approximate steady-state mRNA distributions as mixtures of negative binomial distributions, enabling genome-scale parameter inference with reduced computational cost. The authors provide a clear discussion of the framework's assumptions, including reliance on steady-state data and the inherent unidentifiability of parameter sets, and they outline how the model could be extended to other regulatory processes.

      The revised manuscript addresses many of the original concerns, particularly regarding sample size requirements, distributional assumptions, and the biological interpretation of inferred parameters. However, the framework remains limited by the constraints of snapshot data and cannot yet resolve dynamic heterogeneity or causality. The manuscript would also benefit from a broader contextualisation of DeepTX within the landscape of existing tools linking mechanistic modelling and single-cell transcriptomics. Finally, the interpretation of pathway enrichment analyses still warrants clarification.

      Overall, this work represents a valuable contribution to the integration of mechanistic models with highdimensional single-cell data. It will be of interest to researchers in systems biology, bioinformatics, and computational modelling.

      Recommendations for the authors:

      We thank the authors for their thorough revision and for addressing many of the points raised during the initial review. The revised manuscript presents an improved and clearer account of the methodology and its implications. However, several aspects would benefit from further clarification and refinement to strengthen the presentation and avoid overstatement.

      (1) Contextualization within the existing literature

      The manuscript would benefit from placing DeepTX more clearly in the context of other computational tools developed to connect mechanistic modelling and single-cell RNA sequencing data. This is an active area of research with notable recent contributions, including Sukys and Grima (bioRxiv, 2024), Garrido-Rodriguez et al. (PLOS Comp Biol, 2021), and Maizels (2024). Positioning DeepTX in relation to these and other relevant efforts would help readers appreciate its specific advances and contributions.

      We sincerely thank you for this valuable suggestion. We agree that situating DeepTX within the broader landscape of computational approaches linking mechanistic modeling and single-cell RNA sequencing data will clarify its contributions and advances. In this revised version, we have explicitly discussed the comparison and relation of DeepTX in the context of this active area using an individual paragraph in the Discussion section.

      Specifically, we mentioned that the DeepTX research paradigm contributes to a growing line of area aiming to link mechanistic models of gene regulation with scRNA-seq data. Maizels provided a comprehensive review of computational strategies for incorporating dynamic mechanisms into single-cell transcriptomics (Maizels RJ, 2024). In this context, RNA velocity is one of the most important examples as it infers short-term transcriptional trends based on splicing kinetics and deterministic ODEs model. However, such approaches are limited by their deterministic assumptions and cannot fully capture the stochastic nature of gene regulation. DeepTX can be viewed as an extension of this framework to stochastic modelling, explicitly addressing transcriptional bursting kinetics under DNA damage. Similarly, DeepCycle, developed by Sukys and Grima (Sukys A & Grima R, 2025), investigates transcriptional burst kinetics during the cell cycle, employing a stochastic age-dependent model and a neural network to infer burst parameters while correcting for measurement noise. By contrast, MIGNON integrates genomic variation data and static transcriptomic measurements into a mechanistic pathway model (HiPathia) to infer pathway-level activity changes, rather than gene-level stochastic transcriptional dynamics (Garrido-Rodriguez M et al., 2021). In this sense, DeepTX and MIGNON are complementary, with DeepTX resolving burst kinetics at the single-gene level and MIGNON emphasizing pathway responses to genomic perturbations, which could inspire future extensions of DeepTX that incorporate sequence-level information.

      (2) Interpretation of GO analysis

      The interpretation of the GO enrichment results in Figure 4D should be revised. While the text currently associates the enriched terms with signal transduction and cell cycle G2/M phase transition, the most significant terms relate to mitotic cell cycle checkpoint signaling. This distinction should be made clear in the main text, and the conclusions drawn from the GO analysis should be aligned more closely with the statistical results.

      We sincerely appreciate you for the insightful comment. We have carefully re-examined the GO enrichment results shown in Figure 4D and agree that the most significantly enriched terms correspond to mitotic cell cycle checkpoint signaling and signal transduction in response to DNA damage, rather than general G2/M phase transition processes. Accordingly, we have revised the main text to highlight the biological significance of mitotic cell cycle checkpoint signaling.

      Specifically, we now emphasize two key points: DNA damage and mitotic checkpoint activation are closely interconnected. (1) The mitotic checkpoint serves as a crucial safeguard to ensure accurate chromosome segregation and maintain genomic stability under DNA damage conditions. Activation of the mitotic checkpoint can influence cell fate decisions and differentiation potential (Kim EM & Burke DJ, 2008; Lawrence KS et al., 2015). (2) Sustained activation of the spindle assembly checkpoint (SAC) has been reported to induce mitotic slippage and polyploidization, which in turn may enhance the differentiation potential of embryonic stem cells  (Mantel C et al., 2007). These revisions ensure that our interpretation is consistent with the statistical enrichment results and better reflect the underlying biological processes implicated by the data.

      (3) Justification for training on simulated data

      The decision to train the model on simulated data should be clearly justified. While the advantage of having access to ground-truth parameters is understood, the manuscript would benefit from a discussion of the limitations of this approach, particularly in terms of generalizability to real datasets. Moreover, it is worth noting that many annotated scRNA-seq datasets are publicly available and could, in principle, be used to complement the training strategy.

      We thank you for this insightful comment. We chose to train DeepTXsolver on simulated data because no experimental dataset currently provides genome-wide transcriptional burst kinetics with known ground truth, which is essential for supervised learning. Simulation enables us to (i) generate large, fully annotated datasets spanning the biologically relevant parameter space, (ii) expose the solver to diverse bursting regimes (e.g., low/high burst frequency, small/large burst size, unimodal/bimodal distributions), and (iii) quantitatively benchmark model accuracy, parameter identifiability, and robustness prior to deployment on real scRNA-seq data.

      We acknowledge, however, that simulation-based training has inherent limitations in terms of generalizability. Real biological systems may deviate from the idealized bursting model, exhibit more complex noise structures, or display parameter distributions that differ from those in simulations. Moreover, the lack of ground-truth parameters in experimental scRNA-seq datasets prevents an absolute evaluation of inference accuracy. In the future work, publicly available annotated scRNA-seq datasets could be used to complement this simulation-based training strategy and enhance generalizability. We have revised the manuscript to explicitly discuss both the rationale for using simulated data and the potential limitations of this approach.

      (4) Benchmarking against external methods

      The performance of DeepTX is primarily compared to a prior method from the same group. To strengthen the methodological claims, it would be preferable to include benchmarking against additional established tools from the broader literature. This would offer a more objective evaluation of the performance gains attributed to DeepTX.

      We thank you for this constructive suggestion. We fully agree that benchmarking DeepTX against additional established tools from the broader literatures would provide a more comprehensive and objective evaluation of DeepTX . In the revised manuscript, we have included comparative analyses with other widely used methods, including nnRNA (From Shahrezaei group (Tang W et al., 2023)), txABC (from our group (Luo S et al., 2023)), txBurst (from Sandberg group (Larsson AJM et al., 2019)), txInfer (from Junhao group (Gu J et al., 2025)) (Supplementary Figure S4). The comparative results indicate that our method demonstrates superior performance in both efficiency and accuracy.

      (5) Interpretation of Figures 4-6

      The revised figures are clear and informative; however, the associated interpretations in the main text remain too strong relative to the type of analysis performed. For instance, in Figure 4, it is suggested that changes in burst size are linked to DNA damage-induced signalling cascades that affect cell cycle progression and fate decisions. While this is a plausible hypothesis, GO and GSEA analyses are correlative by nature and not sufficient to support such a mechanistic claim on their own. These analyses should be presented as exploratory, and the strength of the conclusions drawn should be tempered accordingly. Similar caution should be applied to the interpretations of Figures 5 and 6.

      We thank you for this important comment. In the revised manuscript, we have carefully moderated the interpretation of the GO and GSEA results in Figures 4, 5, and 6. Specifically, we now present these analyses as exploratory and emphasize their correlative nature, avoiding causal claims that go beyond the scope of the data. The text has been rephrased to highlight the observed associations rather than implying direct causal relationships.

      For Figure 4, we emphasize that while it is tempting to hypothesize that enhanced burst size may contribute to DNA damage-related checkpoint activation and thereby influence cell cycle progression and differentiation, our current results only indicate an association between burst size enhancement and pathways involved in DNA damage response and checkpoint signaling.

      For Figure 5, we emphasize that although our GO analysis cannot establish causality, the results are consistent with an association between 5-FU-induced changes in burst kinetics and pathways related to oxidative stress and apoptosis. Based on this, we propose a model outlining a potential process through which DNA damage may ultimately lead to cellular apoptosis.

      For Figure 6, we emphasize that these enrichment results suggest that high-dose 5FU treatment may be associated with processes such as telomerase activation and mitochondrial function maintenance, both of which have been implicated in cell survival and apoptosis evasion in previous experimental studies. For example, prior work indicates that hTERT translocation can activate telomerase pathways to support telomere maintenance and reduce oxidative stress, which is thought to contribute to apoptosis resistance. While our enrichment analysis cannot establish causality, the observed transcriptional bursting changes are consistent with these reported survival-associated mechanisms.

      (6) Discussion section framing

      The initial paragraphs of the discussion section make broad biological claims about the role of transcriptional bursting in cellular decision-making. While transcriptional bursting is undoubtedly relevant, the manuscript would benefit from a more cautious framing. It would be more appropriate to foreground the methodological contributions of DeepTX, and to present the biological insights as hypotheses or observations that may guide future experimental investigation, rather than as established conclusions.

      We thank you for this insightful comment. We have revised the discussion to clarify and appropriately temper our claims regarding transcriptional bursting. First, we now explicitly recognize that transcriptional bursting is one of multiple contributors to cellular variability, rather than the sole or dominant factor driving cellular decision-making. Second, we have restructured the opening of the discussion to prioritize the methodological contributions of DeepTX, highlighting its strength as a framework for inferring genomewide burst kinetics from scRNA-seq data. Finally, the biological insights derived from our analysis are now presented as correlative observations and potential hypotheses, which may inform and guide future experimental investigations, rather than as definitive mechanistic conclusions.

      Small Comments

      (1) Presentation of discrete distributions: In several figures (e.g., Figure 2B and Supplementary Figures S4, S6, and S8), the comparisons between empirical mRNA distributions and DeepTX-inferred distributions are visually represented using connecting lines, which may give the impression that continuous distributions are being compared to discrete ones. Given the focus on transcriptional bursting, a process inherently tied to discrete stochastic events, this representation could be misleading. The figure captions and visual style should be revised to clarify that all distributions are discrete and to avoid potential confusion. In general, it is recommended to avoid connecting points in discrete distributions with lines, as this can suggest interpolation or comparison with continuous distributions. This applies to Figures 2A and 2B in particular.

      We thank you for this valuable suggestion. To prevent any potential misinterpretation of discrete distributions as continuous ones, we have revised the visual representation of the empirical and DeepTXinferred mRNA distributions in Figures 2B, and Supplementary Figures S4, S6, and S8. Specifically, we have replaced the line plots with step plots, which more accurately capture the discrete nature of transcriptional bursting. Additionally, we have updated the figure captions to clearly state that all distributions are discrete.

      (2) Transcription is always a multi-step process. While the manuscript aims to model additional complexity introduced by DNA damage, the current phrasing (e.g., on page 5) could be read as implying that transcription becomes multi-step only under damage conditions. This should be clarified.

      We thank you for this helpful observation. We agree that transcription is inherently a multi-step process under all conditions. To avoid any possible misunderstanding, we have revised the text to clarify this point.

      Specifically, we now explain that many previous studies have employed simplified two-state models to approximate transcriptional dynamics, however, the gene expression process is inherently a multi-step process, which particularly cannot be neglected under conditions of DNA damage. DNA damage can result in slowing or even stopping the RNA pol II movement and cause many macromolecules to be recruited for damage repair. This process will affect the spatially localized behavior of the promoter, causing the dwell time of promoter inactivation and activation that cannot be approximated by a simple two state. Our work adopts a multi-step model because it is more appropriate for capturing the additional complexity introduced by DNA damage.

      (3) The first sentence of the discussion section overstates the importance of transcriptional bursting. While it is a key source of variability, it is not the only nor always the dominant one. Furthermore, its role in DNA damage response remains an emerging hypothesis rather than a general principle. The claims in this section should be moderated accordingly.

      We thank you for this valuable feedback. In the revised discussion, we have moderated the statements in the opening paragraph to better reflect the current understanding. Specifically, we now acknowledge that transcriptional bursting represents one of multiple sources of variability and is not always the dominant contributor. In addition, we have reframed the role of transcriptional bursting in DNA damage response as an emerging hypothesis, rather than a general principle. To further address this concern, we replaced conclusion-like statements with more cautious, hypothesis-oriented phrasing, presenting our observations as potential directions for future experimental validation.

      References

      Maizels, R.J. 2024. A dynamical perspective: moving towards mechanism in single-cell transcriptomics. Philos Trans R Soc Lond B Biol Sci 379: 20230049. DOI: https://dx.doi.org/10.1098/rstb.2023.0049, PMID: 38432314

      Sukys, A., Grima, R. 2025. Cell-cycle dependence of bursty gene expression: insights from fitting mechanistic models to single-cell RNA-seq data. Nucleic Acids Research 53. DOI: https://dx.doi.org/10.1093/nar/gkaf295, PMID: 40240003

      Garrido-Rodriguez, M., Lopez-Lopez, D., Ortuno, F.M., Peña-Chilet, M., Muñoz, E., Calzado, M.A., Dopazo, J. 2021. A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways. PLoS Computational Biology 17: e1008748. DOI: https://dx.doi.org/10.1371/journal.pcbi.1008748, PMID: 33571195

      Kim, E.M., Burke, D.J. 2008. DNA damage activates the SAC in an ATM/ATR-dependent manner, independently of the kinetochore. PLoS Genet 4: e1000015. DOI: https://dx.doi.org/10.1371/journal.pgen.1000015, PMID: 18454191

      Lawrence, K.S., Chau, T., Engebrecht, J. 2015. DNA damage response and spindle assembly checkpoint function throughout the cell cycle to ensure genomic integrity. PLoS Genet 11: e1005150.DOI: https://dx.doi.org/10.1371/journal.pgen.1005150, PMID: 25898113

      Mantel, C., Guo, Y., Lee, M.R., Kim, M.K., Han, M.K., Shibayama, H., Fukuda, S., Yoder, M.C., Pelus, L.M., Kim, K.S., Broxmeyer, H.E. 2007. Checkpoint-apoptosis uncoupling in human and mouse embryonic stem cells: a source of karyotpic instability. Blood 109: 4518-4527. DOI: https://dx.doi.org/10.1182/blood-2006-10-054247, PMID: 17289813

      Tang, W., Jørgensen, A.C.S., Marguerat, S., Thomas, P., Shahrezaei, V. 2023. Modelling capture efficiency of single-cell RNA-sequencing data improves inference of transcriptome-wide burst kinetics. Bioinformatics 39. DOI: https://dx.doi.org/10.1093/bioinformatics/btad395, PMID: 37354494

      Luo, S., Zhang, Z., Wang, Z., Yang, X., Chen, X., Zhou, T., Zhang, J. 2023. Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model. Royal Society Open Science 10: 221057. DOI: https://dx.doi.org/10.1098/rsos.221057, PMID: 37035293

      Larsson, A.J.M., Johnsson, P., Hagemann-Jensen, M., Hartmanis, L., Faridani, O.R., Reinius, B., Segerstolpe, A., Rivera, C.M., Ren, B., Sandberg, R. 2019. Genomic encoding of transcriptional burst kinetics. Nature 565: 251-254. DOI: https://dx.doi.org/10.1038/s41586-018-0836-1, PMID: 30602787

      Gu, J., Laszik, N., Miles, C.E., Allard, J., Downing, T.L., Read, E.L. 2025. Scalable inference and identifiability of kinetic parameters for transcriptional bursting from single cell data. Bioinformatics. DOI: https://dx.doi.org/10.1093/bioinformatics/btaf581, PMID: 41131798.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study presents valuable findings that advance our understanding of mural cell dynamics and vascular pathology in a zebrafish model of cerebral small vessel disease. The authors provide compelling evidence that partial loss of foxf2 function leads to progressive, cell-intrinsic defects in pericytes and associated endothelial abnormalities across the lifespan, leveraging powerful in vivo imaging and genetic tools. The strength of evidence could be further improved by additional mechanistic insight and quantitative or lineage-tracing analyses to clarify how pericyte number and identity are affected in the mutant model.

      Thank you to the reviewers for insightful comments and for the time spent reviewing the manuscript. We have strengthened the data through responding to the comments.

      Public Reviews:

      Reviewer #1 (Public review):

      The paper by Graff et al. investigates the function of foxf2 in zebrafish to understand the progression of cerebral small vessel disease. The authors use a partial loss of foxf2 (zebrafish possess two foxf2 genes, foxf2a and foxf2b, and the authors mainly analyze homozygous mutants in foxf2a) to investigate the role of foxf2 signaling in regulating pericyte biology. They find that the number of pericytes is reduced in foxf2a mutants and that the remaining pericytes display alterations in their morphologies. The authors further find that mutant animals can develop to adulthood, but that in adult animals, both endothelial and pericyte morphologies are affected. They also show that mutant pericytes can partially repopulate the brain after genetic ablation.

      (1) Weaknesses: The results are mainly descriptive, and it is not clear how they will advance the field at their current state, given that a publication on mice has already examined the loss of foxf2 phenotype on pericyte biology (Reyahi, 2015, Dev. Cell).

      The Reyahi paper was the earliest report of foxf2 mutant brain pericytes and remains illuminating. The work was very well technically executed. Our manuscript expands and at times, contradicts, their findings. We realized that we did not fully discuss this in our discussion, and this has now been updated. The biggest difference between the two studies is in the direction of change in pericytes after foxf2 knockout, a major finding in both papers. This is where it is important to understand the differences in methods. Reyahi et al., used a conditional knockout under Wnt1:Cre which will ablate pericytes derived from neural crest, but not those derived from mesoderm, nor will it affect foxf2 expression in endothelial cells. Our model is a full constitutive knockout of the gene in all brain pericytes and endothelial cells. For GOF, Reyahi used a transgenic model with a human FOXF2 BAC integrated into the mouse germline.

      Both studies are important. We do not know enough about human phenotypes in patients with strokeassociated human FOXF2 SNVs to know the direction of change in pericyte numbers. We showed that the SNVs reduce FOXF2 gene expression in vitro (Ryu, 2022). Here we demonstrate dosage sensitivity in fish (showing phenotypes when 1 of 4 foxf2a + foxf2b alleles are lost, Figure 1F), supporting that slight reductions of FOXF2 in humans could lead to severe brain vessel phenotypes. For this reason, our work is complementary to the previously published work and suggests that future studies should focus on understanding the role of dosage, cell autonomy, and human pericyte phenotypes with respect to FOXF2. While some experiments are parallel in mouse and fish, we go further to look at cell death and regeneration, and to understand the consequences on the whole brain vasculature.

      (2) Reyahi et al. showed that loss of foxf2 in mice leads to a marked downregulation of pdgfrb expression in perivascular cells. In contrast to expectation, perivascular cell numbers were higher in mutant animals, but these cells did not differentiate properly. The authors use a transgenic driver line expressing gal4 under the control of the pdgfrb promoter and observe a reduction in pericyte (pdgfrb-expressing) cells in foxf2a mutants. In light of the mouse data, this result might be due to a similar downregulation of pdgfrb expression in fish, which would lead to a downregulation of gal4 expression and hence reduced labelling of pericytes. The authors show a reduction of pdgfrb expression also in zebrafish in foxf2b mutants (Chauhan et al., The Lancet Neurology 2016).

      Reyahi detected more pericytes in the Wnt1:Cre mouse, while we detected fewer in the foxf2a (and foxf2a;foxf2b) mutants. This may be because of different methods. For instance, because the mouse knockout is not a constitutive Foxf2 knockout, the observed increase in pericytes may be because mesodermal-derived pericytes proliferate more highly when the neural crest-derived pericytes are absent. Or does endothelial foxf2 activate pericyte proliferation when foxf2 is lost in some pericytes? It is also possible that mouse foxf2 has a different role from its fish ortholog. Despite these differences, there are common conclusions from both models. For instance, both mouse and fish show foxf2 controls capillary pericyte numbers, albeit in different directions. Both show hemorrhage and loss of vascular stability as a result. Both papers identify the developmental window as critical for setting up the correct numbers of pericytes.  

      As the reviewer suggested, it was important to test whether pdgfrb is downregulated in fish as it is in mice. To do this, we measured expression of pdgfrb in foxf2 mutants using hybridization chain reaction (HCR) of pdgfrb in foxf2 mutants. The results show no change in pdgfrb mRNA in foxf2a mutants at two independent experiments (Fig S3). Independently, we integrated pdgfrb transgene intensity (using a single allele of the transgene so there are no dose effects) in foxf2a mutants vs. wildtype. We found no difference (Fig S3) suggesting that pdgfrb is a reliable reporter for counting pericytes in the foxf2a knockout. The reviewer is correct that we previously showed downregulation of pdgfrb in foxf2b mutants at 4 dpf using colorimetric ISH. foxf2a and foxf2b are unlinked, independent genes (~400 M years apart in evolution) and may have different regulation.

      (3) It would be important to clarify whether, also in zebrafish, foxf2a/foxf2b mutants have reduced or augmented numbers of perivascular cells and how this compares to the data in the mouse.  

      We discuss methodological differences between Reyahi and our work in point (1) above. The reduction in pericytes in foxf2a;foxf2b mutants has been previously published (Ryu, 2022, Supplemental Figure 1) and shown again here in Supplemental Figure 2). Numbers are reduced in double mutants up to 10 dpf, suggesting no recovery. Further, in response to reviewer comments, we have quantified pericytes in the whole fish brain (Figure 3E-G) and show reduced pericytes in the adult, reduced vessel network length, and importantly that the pericyte density is reduced. In aggregate, our data shows pericyte reduction at 5 developmental stages from embryo through adult. The reason for different results from the mouse is unknown and may reflect a technical difference (constitutive vs Wnt1:Cre) or a species difference.  

      (4) The authors should perform additional characterization of perivascular cells using marker gene expression (for a list of markers, see e.g., Shih et al. Development 2021) and/or genetic lineage tracing.

      This is a good point. We have added HCR analysis of additional markers. Results show co-expression of foxf2a, foxf2b, nduf4la2 and pdgfrb in brain pericytes (Fig 2, Fig S3).

      (5) The authors motivate using foxf2a mutants as a model of reduced foxf2 dosage, "similar to human heterozygous loss of FOXF2". However, it is not clear how the different foxf2 genes in zebrafish interact with each other transcriptionally. Is there upregulation of foxf2b in foxf2a mutants and vice versa? This is important to consider, as Reyahi et al. showed that foxf2 gene dosage in mice appears to be important, with an increase in foxf2 gene dosage (through transgene expression) leading to a reduction in perivascular cell numbers.

      We agree that dosage is a very important concept and show phenotypes in foxf2a heterozygotes (Fig 1F). To test the potential compensation from foxf2b, we have added qPCR for foxf2b in foxf2a mutants as well as HCR of foxf2b in foxf2a mutants (Fig S3C,D). There is no change in foxf2b expression in foxf2a mutants. We discuss dosage in our discussion.

      (6) Figures 3 and 4 lack data quantification. The authors describe the existence of vascular defects in adult fish, but no quantifiable parameters or quantifications are provided. This needs to be added.

      This query was technically challenging to address, but very worthwhile. We have not seen published methods for quantifying brain pericytes along with the vascular network (certainly not in zebrafish adults), so we developed new methods of analyzing whole brain vascular parameters of cleared adult brains (Figure S6) using a combination of segmentation methods for pericytes, endothelium and smooth muscle. We have added another author (David Elliott) as he was instrumental in designing methods. We find a significant decrease in vessel network length in foxf2a mutants at 3 month and 6 months (Figures 3F and 4G). Similarly, we show a lower number of brain pericytes in foxf2a mutants (Figure 3E). Finally, we added whole brain analysis of smooth muscle coverage (Figure 4) and show no change in vSMC number or coverage of vessels at 5 and 10 dpf or adult, respectively, pointing to pericytes being the cells most affected. Thank you, this query pushed us in a very productive direction. These methods will be extremely useful in the future!

      (7) The analysis of pericyte phenotypes and morphologies is not clear. On page 6, the authors state: "In the wildtype brain, adult pericytes have a clear oblong cell body with long, slender primary processes that extend from the cytoplasm with secondary processes that wrap around the circumference of the blood vessel." Further down on the same page, the authors note: "In wildtype adult brains, we identified three subtypes of pericytes, ensheathing, mesh and thin-strand, previously characterized in murine models." In conclusion, not all pericytes have long, slender primary processes, but there are at least three different sub-types? Did the authors analyze how they might be distributed along different branch orders of the vasculature, as they are in the mouse?

      We have reworded the text on page 5/6 to be clearer that embryonic pericytes are thin strand only. Additional pericyte subtypes develop later are seen in the mature vasculature of the adult. We could not find a way to accurately analyze pericyte subtypes in the adult brain. The imaging analysis to count pericytes used soma as machine learning algorithms have been developed to count nuclei but not analyze processes.

      (8) Which type of pericyte is affected in foxf2a mutant animals? Can the authors identify the branch order of the vasculature for both wildtype and mutant animals and compare which subtype of pericyte might be most affected? Are all subtypes of pericytes similarly affected in mutant animals? There also seems to be a reduction in smooth muscle cell coverage.

      Please see the response to (7) about pericyte subtypes. In response to the reviewer’s query, we have now analyzed vSMCs in the embryonic and adult brain. In the embryonic brain we see no statistical differences in vSMC number at 5 and 10 dpf (Figure 4). In the adult, vSMC length (total length of vSMCs in a brain) and vSMC coverage (proportion of brain vessels with vSMCs) are not significantly different. This data is important because it suggests that foxf2a has a more important role in pericytes than in vSMCs.

      (9) Regarding pericyte regeneration data (Figure 7): Are the values in Figure 7D not significantly different from each other (no significance given)?

      Any graphs missing bars have no significance and were left off for clarity. We have stated this in the statistical methods.  

      (10) In the discussion, the authors state that "pericyte processes have not been studied in zebrafish".

      Ando et al. (Development 2016) studied pericyte processes in early zebrafish embryos, and Leonard et al. (Development 2022) studied zebrafish pericytes and their processes in the developing fin. We apologize, this was not meant to say that pericyte processes had not been studied before, we have reworded this to make clear the intent of the sentence. We were trying to emphasize that we are the first to quantify processes at different stages, especially  in foxf2 mutants. Processes change morphology over development, especially after 5 dpf, something that our data captures. Our images are of stages that have not been previously characterized. We added a reference to Mae et al., who found similar process length changes in a mouse knockout of a different gene, and to Leonard who previously showed overlap of processes in a different context in fish.

      Reviewer #2 (Public review):

      Summary:

      This study investigates the developmental and lifelong consequences of reduced foxf2 dosage in zebrafish, a gene associated with human stroke risk and cerebral small vessel disease (CSVD). The authors show that a ~50% reduction in foxf2 function through homozygous loss of foxf2a leads to a significant decrease in brain pericyte number, along with striking abnormalities in pericyte morphologyincluding enlarged soma and extended processes-during larval stages. These defects are not corrected over time but instead persist and worsen with age, ultimately affecting the surrounding endothelium. The study also makes an important contribution by characterizing pericyte behavior in wild-type zebrafish using a clever pericyte-specific Brainbow approach, revealing novel interactions such as pericyte process overlap not previously reported in mammals.

      Strengths:

      This work provides mechanistic insight into how subtle, developmental changes in mural cell biology and coverage of the vasculature can drive long-term vascular pathology. The authors make strong use of zebrafish imaging tools, including longitudinal analysis in transgenic lines to follow pericyte number and morphology over larval development, and then applied tissue clearing and whole brain imaging at 3 and 11 months to further dissect the longitudinal effects of foxf2a loss. The ability to track individual pericytes in vivo reveals cell-intrinsic defects and process degeneration with high spatiotemporal resolution. Their use of a pericyte-specific Zebrabow line also allows, for the first time, detailed visualization of pericytepericyte interactions in the developing brain, highlighting structural features and behaviors that challenge existing models based on mouse studies. Together, these findings make the zebrafish a valuable model for studying the cellular dynamics of CSVD.

      Weaknesses:

      (11) While the findings are compelling, several aspects could be strengthened. First, quantifying pericyte coverage across distinct brain regions (forebrain, midbrain, hindbrain) would clarify whether foxf2a loss differentially impacts specific pericyte lineages, given known regional differences in developmental origin, with forebrain pericytes being neural crest-derived and hindbrain pericytes being mesoderm-derived.

      In recently published work from our lab, we published that both neural crest and mesodermal cells contribute to pericytes in both the mid and hindbrain, and could not confirm earlier work suggesting more rigid compartmental origins (Ahuja, 2024). In the Ahuja, 2024 paper we noted that lineage experiments are often limited by n’s which is why this may not have been discovered before. This makes us skeptical that counting different regions will allow us to interpret data about neural crest and mesoderm. Further, Ahuja 2024 shows that pericyte intermediate progenitors from both mesoderm and neural crest are indistinguishable at 30 hpf through single cell sequencing and have converged on a common phenotype.  

      (12) Second, measuring foxf2b expression in foxf2a mutants would better support the interpretation that total FOXF2 dosage is reduced in a graded fashion in heterozygote and homozygote foxf2a mutants.

      We have done both qPCR for foxf2b in foxf2a mutants and HCR (quantitative ISH). This is now reported in Fig S3. 

      (13) Finally, quantifying vascular density in adult mutants would help determine whether observed endothelial changes are a downstream consequence of prolonged pericyte loss. Correlating these vascular changes with local pericyte depletion would also help clarify causality.

      We have added this data to Figure 3 and 4. Please also see response (6).

      Reviewer #3 (Public review):

      Summary:

      The goal of the work by Graff et al. is to model CSVD in the zebrafish using foxf2a mutants. The mutants show loss of cerebral pericyte coverage that persists through adulthood, but it seems foxf2a does not regulate the regenerative capacity of these cells. The findings are interesting and build on previous work from the group. Limitations of the work include little mechanistic insight into how foxf2a alters pericyte recruitment/differentiation/survival/proliferation in this context, and the overlap of these studies with previous work in fox2a/b double mutants. However, the data analysis is clean and compelling, and the findings will contribute to the field.

      (14) Please make Figures 5C and 5E red-green colorblind friendly.

      Thank you. We have changed the colors to light blue and yellow to be colorblind friendly.

      Reviewer #3 (Recommendations for the authors):

      (15) I'm not sure this reviewer totally agrees with the assessment that foxf2a loss of function, while foxf2b remains normal, is the same as FOXF2 heterozygous loss of function in humans. The discussion of the gene dosage needs to be better framed, and the authors should carry out qPCR to show that foxf2b levels are not altered in the foxf2a mutant background.

      We have added data on foxf2b expression in foxf2a mutants to Fig S3. We have updated the results.

      (16) Figure 4/SF7- is the aneurysm phenotype derived from the ECs or pericytes? Cell-type-specific rescues would be interesting to determine if phenotypes are rescued, especially the developmental phenotypes (it is appreciated that carrying out rescue experiments until adulthood is complex). When is the earliest time point that aneurysm-like structures are seen?

      This is a fascinating question, especially as we show that endothelial cells (vessel network length) are affected in the adult mutants. The foxf2a mutants that we work with here are constitutive knockouts. While a strategy to rescue foxf2a in specific lineages is being developed in the laboratory this will require a multi-generation breeding effort to get drivers, transgenes and mutants on the same background, and these fish are not currently available. Thank you for this comment- it is something we want to follow up on.

      (17) Figure 5 - This is very nice analysis.

      Thank you! We think it is informative too.

      (18) Figure 6 - needs to contain control images

      We have added wildtype images to figure 6A.

      (19) Figure 7- vessel images should be shown to demonstrate the specificity of NTR treatment to the pericytes.

      We have added the vessel images to Figure 7. We apologize for the omission.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      One possible remaining conceptual concern that might require future work is determining whether STN primarily mediates higher-level cognitive avoidance or if its activation primarily modulates motor tone.

      Our results using viral and electrolytic lesions (Fig. 11) and optogenetic inhibition of STN neurons (Fig. 10) show that signaled active avoidance is virtually abolished, and this effect is reproduced when we selectively inhibit STN fibers in the midbrain (Fig. 12). Inhibition of STN projections in either the substantia nigra pars reticulata (SNr) or the midbrain reticular tegmentum (mRt) eliminates cued avoidance responses while leaving escape responses intact. Importantly, mice continue to escape during US presentation after lesions or during photoinhibition, demonstrating that basic motor capabilities and the ability to generate rapid defensive actions are preserved.

      These findings argue against the idea that STN’s role in avoidance reflects a nonspecific suppression or facilitation of motor tone, even if the STN also contributes to general movement control. Instead, they show that STN output is required for generating “cognitively” guided cued actions that depend on interpreting sensory information and applying learned contingencies to decide when to act. Thus, while STN activity can modulate movement parameters, the loss-of-function results point to a more selective role in supporting cued, goal-directed avoidance behavior rather than a general adjustment of motor tone.

      Reviewer #2 (Public review):

      All previous weaknesses have been addressed. The authors should explain how inhibition of the STN impairing active avoidance is consistent with the STN encoding cautious action. If 'caution' is related to avoid latency, why does STN lesion or inhibition increase avoid latency, and therefore increase caution? Wouldn't the opposite be more consistent with the statement that the STN 'encodes cautious action'?

      The reviewer’s interpretation treats any increase in avoidance latency as evidence of “more caution,” but this holds only when animals are performing the avoidance behavior normally. In our intact animals, avoidance rates remain high across AA1 → AA2 → AA3, and the active avoidance trials (CS1) used to measure latency are identical across tasks (e.g., in AA2 the only change is that intertrial crossings are punished). Under these conditions, changes in latency genuinely reflect adjustments in caution, because the behavior itself is intact, actions remain tightly coupled to the cue, and the trials are identical.

      This logic does not apply when STN function is disrupted. STN inhibition or lesions reduce avoidance to near chance levels; the few crossings that do occur are poorly aligned to the CS and many likely reflect random movement rather than a cued avoidance response. Once performance collapses, latency can no longer be assumed to reflect the same cognitive process. Thus, interpreting longer latencies during STN inactivation as “more caution” would be erroneous, and we never make that claim.

      A simple analogy may help clarify this distinction. Consider a pedestrian deciding when to cross the street after a green light. If the road is deserted (like AA1), the person may step off the curb quickly. If the road is busy with many cars that could cause harm (like AA2), they may wait longer to ensure that all cars have stopped. This extra hesitation reflects caution, not an inability to cross. However, if the pedestrian is impaired (e.g., cannot clearly see the light, struggles to coordinate movements, or cannot reliably make decisions), a delayed crossing would not indicate greater caution—it would reflect a breakdown in the ability to perform the behavior itself. The same principle applies to our data: we interpret latency as “caution” only when animals are performing the active avoidance behavior normally, success rates remain high, and the trial rules are identical. Under STN inhibition or lesion, when active avoidance collapses, the latency of the few crossings that still occur can no longer be interpreted as reflecting caution. We have added these points to the Discussion.

      Reviewer #3 (Public review):

      Original Weaknesses:

      I found the experimental design and presentation convoluted and some of the results over-interpreted.

      We appreciate the reviewer’s comment, but the concern as stated is too general for us to address in a concrete way. The revised manuscript has been substantially reorganized, with simplified terminology, streamlined figures, and removal of an entire set of experiments to avoid over-interpretation. We are confident that the experimental design and results are now presented clearly and without extrapolation beyond the data. If there are specific points the reviewer finds convoluted or over-interpreted, we would be happy to address them directly.

      As presented, I don't understand this idea that delayed movement is necessarily indicative of cautious movements. Is the distribution of responses multi-modal in a way that might support this idea; or do the authors simply take a normal distribution and assert that the slower responses represent 'caution'? Even if responses are multi-modal and clearly distinguished by 'type', why should readers think this that delayed responses imply cautious responding instead of say: habituation or sensitization to cue/shock, variability in attention, motivation, or stress; or merely uncertainty which seems plausible given what I understand of the task design where the same mice are repeatedly tested in changing conditions. This relates to a major claim (i.e., in the title).

      We appreciate the reviewer’s question and address each component directly.

      (1) What we mean by “caution” and how it is operationalized

      In our study, caution is defined operationally as a systematic increase in avoidance latency when the behavioral demand becomes higher, while the trial structure and required response remain unchanged. Specifically, CS1 trials are identical in AA1, AA2, and AA3. Thus, when mice take longer to initiate the same action under more demanding contexts, the added time reflects additional evaluation before acting—consistent with longestablished interpretations of latency shifts in cognitive psychology (see papers by Donders, Sternberg, Posner) and interpretations of deliberation time in speed-accuracy tradeoff literature.

      (2) Why this interpretation does not rely on multi-modal response distributions We do not claim that “cautious” responses form a separate mode in the latency distribution. The distributions are unimodal, and caution is inferred from conditiondependent shifts in these distributions across identical trials, not from the existence of multiple peaks (see Zhou et al, 2022). Latency shifts across conditions with identical trial structure are widely used as behavioral indices of deliberation or caution.

      (3) Why alternative explanations (habituation/sensitization, motivation, attention, stress, uncertainty) do not account for these latency changes

      Importantly, nothing changes in CS1 trials between AA1 and AA2 with respect to the cue, shock, or required response. Therefore:

      - Habituation/sensitization to the cue or shock cannot explain the latency shift (the stimuli and trial type are unchanged). We have previously examined cue-evoked orienting responses and their habituation in detail (Zhou et al., 2023), and those measurements are dissociable from the latency effects described here.

      - Motivation or attention are unlikely to change selectively for identical CS1 trials when the task manipulation only adds a contingency to intertrial crossings.

      - Uncertainty also does not increase for CS1 trials, they remain fully predictable and unchanged between conditions.

      - Stress is too broad a construct to be meaningful unless clearly operationalized; moreover, any stress differences that arise from task structure would covary with caution rather than replace the interpretation.

      (4) Clarifying “types” of responses

      The reviewer’s question about “response types” appears to conflate behavioral latencies with the neuronal response “types” defined in the manuscript. The term “type” in this paper refers to neuronal activation derived from movement-based clustering, not to distinct behavioral categories of avoidance, which we term modes.

      In sum, we interpret increased CS1 latency as “caution” only when performance remains intact and trial structure is identical between conditions; under those criteria, latency reliably reflects additional cognitive evaluation before acting, rather than nonspecific changes in sensory processing, motivation, etc.

      Related to the last, I'm struggling to understand the rationale for dividing cells into 'types' based their physiological responses in some experiments.

      There is longstanding precedent in systems neuroscience for classifying neurons by their physiological response patterns, because neurons that respond similarly often play similar functional roles. For example, place cells, grid cells, direction cells, in vivo, and regular spiking, burst firing, and tonic firing in vitro are all defined by characteristic activity patterns in response to stimuli rather than anatomy or genetics alone. In the same spirit, our classifications simply reflect clusters of neurons that exhibit similar ΔF/F dynamics around behaviorally relevant events, such as movement sensitivity or avoidance modes. This is a standard analytic approach used in many studies. Thus, our rationale is not arbitrary: the “classes” and “types” arise from data-driven clustering of physiological responses, consistent with widespread practice, and they help reveal functional distinctions within the STN that would otherwise remain obscured.

      In several figures the number of subjects used was not described. This is necessary. Also necessary is some assessment of the variability across subjects.

      All the results described include the number of animals. To eliminate uncertainty, we now also include this information in figure legends.

      The only measure of error shown in many figures relates trial-to-trial or event variability, which is minimal because in many cases it appears that hundreds of trials may have been averaged per animal, but this doesn't provide a strong view of biological variability (i.e., are results consistent across animals?).

      The concern appears to stem from a misunderstanding of what the mixed-effects models quantify. The figure panels often show session-averaged traces for clarity, all statistical inferences in the paper are made at the level of animals, not trials. Mixed-effects modeling is explicitly designed for hierarchical datasets such as ours, where many trials are nested within sessions, which are themselves nested within animals.

      In our models, animal is the clustering (random) factor, and sessions are nested within animals, so variability across animals is directly estimated and used to compute the population-level effects. This approach is not only appropriate but is the most stringent and widely recommended method for analyzing behavioral and neural data with repeated measures. In other words, the significance tests and confidence intervals already fully incorporate biological variability across animals.

      Thus, although hundreds of trials per animal may be illustrated for visualization, the inferences reflect between-animal consistency, not within-animal trial repetition. The fact that the mixed-effects results are robust across animals supports the biological reliability of the findings.

      It is not clear if or how spread of expression outside of target STN was evaluated, and if or how or how many mice were excluded due to spread or fiber placements. Inadequate histological validation is presented and neighboring regions that would be difficult to completely avoid, such as paraSTN may be contributing to some of the effects.

      The STN is a compact structure with clear anatomical boundaries, and our injections were rigorously validated to ensure targeting specificity. As detailed in the Methods, every mouse underwent histological verification, and injections were quantified using the Brain Atlas Analyzer app (available on OriginLab), which we developed to align serial sections to the Allen Brain Atlas. This approach provides precise, slice-by-slice confirmation of viral spread. We have performed thousands of AAV injections and probe implants in our lab, incorporating over the years highly reliable stereotaxic procedures with multiple depth and angle checks and tools. For this study specifically, fewer than 10% of mice were excluded due to off-target expression or fiber/lesion placement. None of the included cases showed spread into adjacent structures.

      Regarding paraSTN: anatomically, paraSTN is a very small extension contiguous with STN. Our study did not attempt to dissociate subregions within STN, and the viral expression patterns we report fall within the accepted boundaries of STN. Importantly, none of our photometry probes or miniscope lenses sampled paraSTN, so contributions from that region are extremely unlikely to account for any of our neural activity results.

      Finally, our paper employs five independent loss-of-function approaches—optogenetic inhibition of STN neurons, selective inhibition of STN projections to the midbrain (in two sites: SNr and mRt), and STN lesions (electrolytic and viral). All methods converge on the same conclusion, providing strong evidence that the effects we report arise from manipulation of STN itself rather than from neighboring regions.

      Raw example traces are not provided.

      We do not think raw traces are useful here. All figures contain average traces to reflect the average activity of the estimated populations, which are already clustered per classes and types.

      The timeline of the spontaneous movement and avoidance sessions were not clear, nor the number of events or sessions per animal and how this was set. It is not clear if there was pre-training or habituation, if many or variable sessions were combined per animal, or what the time gaps between sessions was, or if or how any of these parameters might influence interpretation of the results.

      As noted, we have enhanced the description of the sessions, including the number of animals and sessions, which are daily and always equal per animals in each group of experiments. The sessions are part of the random effects in the model. In addition, we now include schematics to facilitate understanding of the procedures.  

      Comments on revised version:

      The authors removed the optogenetic stimulation experiments, but then also added a lot of new analyses. Overall the scope of their conclusions are essentially unchanged. Part of the eLife model is to leave it to the authors discretion how they choose to present their work. But my overall view of it is unchanged. There are elements that I found clear, well executed, and compelling. But other elements that I found difficult to understand and where I could not follow or concur with their conclusions.

      We respectfully disagree with the assertion that the scope of our conclusions remains unchanged. The revised manuscript differs in several fundamental ways:

      (1) Removal of all optogenetic excitation experiments

      These experiments were a substantial portion of the original manuscript, and their removal eliminated an entire set of claims regarding the causal control of cautious responding by STN excitation. The revised manuscript no longer makes these claims.

      (2) Addition of analyses that directly address the reviewers’ central concerns The new analyses using mixed-effects modeling, window-specific covariates, and movement/baseline controls were added precisely because reviewers requested clearer dissociation of sensory, motor, and task-related contributions. These additions changed not only the presentation but the interpretation of the neural signals. We now conclude that STN encodes movement, caution, and aversive signals in separable ways—not that it exclusively or causally regulates caution.

      (3) Clear narrowing of conclusions

      Our current conclusions are more circumscribed and data-driven than in the original submission. For example, we removed all claims that STN activation “controls caution,” relying instead on loss-of-function data showing that STN is necessary for performing cued avoidance—not for generating cautious latency shifts. This is a substantial conceptual refinement resulting directly from the review process.

      (4) Reorganization to improve clarity

      Nearly every section has been restructured, including terminology (mode/type/class), figure organization, and explanations of behavioral windows. These revisions were implemented to ensure that readers can follow the logic of the analyses.

      We appreciate the reviewer’s recognition that several elements were clear and compelling. For the remaining points they found difficult to understand, we have addressed each one in detail in the response and revised the manuscript accordingly. If there are still aspects that remain unclear, we would welcome explicit identification of those points so that we can clarify them further.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      (1) Show individual data points on bar plots

      - partially addressed. Individual data points are still not shown.

      Wherever feasible, we display individual data points (e.g., Figures 1 and 2) to convey variability directly. However, in cases where figures depict hundreds of paired (repeatedmeasures) data points, showing all points without connecting them would not be appropriate, while linking them would make the figures visually cluttered and uninterpretable. All plots and traces include measures of variability (SEM), and the raw data will be shared on Dryad. When error bars are not visible, they are smaller than the trace thickness or bar line—for example, in Figure 5B, the black circles and orange triangles include error bars, but they are smaller than the symbol size.

      Also, to minimize visual clutter, only a subset of relevant comparisons is highlighted with asterisks, whereas all relevant statistical results, comparisons, and mouse/session numbers are fully reported in the Results section, with statistical analyses accounting for the clustering of data within subjects and sessions.

      (2) The active avoidance experiments are confusing when they are introduced in the results section. More explanation of what paradigms were used and what each CS means at the time these are introduced would add clarity. For example AA1, AA2 etc are explained only with references to other papers, but a brief description of each protocol and a schematic figure would really help.

      - partially addressed. A schematic figure showing the timeline would still be helpful.

      As suggested, we have added an additional panel to Fig. 5A with a schematic describing

      AA1-3 tasks. In addition, the avoidance protocols are described briefly but clearly in the Results section (second paragraph of “STN neurons activate during goal-directed avoidance contingencies”) and in greater detail in the Methods section. As stated, these tasks were conducted sequentially, and mice underwent the same number of sessions per procedure, which are indicated. All relevant procedural information has been included in these sections. Mice underwent daily sessions and learnt these tasks within 1-2 sessions, progressing sequentially across tasks with an equal number of sessions per task (7 per task), and the resulting data were combined and clustered by mouse/session in the statistical models.

      (3) How do the Class 1, 2, 3 avoids relate to Class 1 , 2, 3 neural types established in Figure 3? It seems like they are not related, and if that is the case they should be named something different from each other to avoid confusion.

      -not sufficiently addressed. The new naming system of neural 'classes' and 'types' helps with understanding that these are completely different ways of separating subpopulations within the STN. However, it is still unclear why the authors re-type the neurons based on their relation to avoids, when they classify the neurons based on their relationship to speed earlier. And it is unclear whether these neural classes and neural types have anything to do with each other. Are the neural Types related to the neural classes in any way? and what is the overlap between neural types vs classes? Which separation method is more useful for functionally defining STN populations?

      The remaining confusion stems from treating several independent analyses as if they were different versions of the same classification. In reality, each analysis asks a distinct question, and the resulting groupings are not expected to overlap or correspond. We clarify this explicitly below.

      - Movement onset neuron classes (Class A, B, C; Fig. 3):

      These classes categorize neurons based on how their ΔF/F changes around spontaneous movement onset. This analysis identifies which neurons encode the initiation and direction of movement. For instance, Class B neurons (15.9%) were inhibited as movement slowed before onset but did not show sharp activation at onset, whereas Class C neurons (27.6%) displayed a pronounced activation time-locked to movement initiation. Directional analyses revealed that Class C neurons discharged strongly during contraversive turns, while Class B neurons showed a weaker ipsiversive bias. Because neurons were defined per session and many of these recordings did not include avoidance-task sessions, these movement-onset classes were not used in the avoidance analyses.

      - Movement-sensitivity neuron classes (Class 1, 2, 3, 4; Fig. 7):

      These classes categorize neurons based on the cross-correlation between ΔF/F and head speed, capturing how each neuron’s activity scales with movement features across the entire recording session. This analysis identifies neurons that are strongly speed-modulated, weakly speed-modulated, or largely insensitive to movement. These movement-sensitivity classes were then carried forward into the avoidance analyses to ask how neurons with different kinematic relationships participate during task performance; for example, whether neurons that are insensitive to movement nonetheless show strong activation during avoidance actions.

      - Avoidance modes (Mode 1, 2, 3; Fig. 8)

      Here we classify actions, not neurons. K-means clustering is applied to the movementspeed time series during CS1 active avoidance trials only, which allows us to identify distinct action modes or variants—fast-onset versus delayed avoidance responses. This action-based classification ensures that we compare neural activity across identical movements, eliminating a major confound in studies that do not explicitly separate action variants. First, we examine how population activity differs across these avoidance modes, reflecting neural encoding of the distinct actions themselves. Second, within each mode, we then classify neurons into “types,” which simply describes how different neurons activate during that specific avoidance action (as noted next).

      - Neuron activation types within each mode (Type a, b, c; Fig.9)

      This analysis extends the mode-based approach by classifying neuronal activation patterns only within each specific avoidance mode. For each mode, we apply k-means clustering to the ΔF/F time series to identify three activation types—e.g., neurons showing little or no response, neurons showing moderate activation, and neurons showing strong or sharply timed activation. Because all trials within a mode have identical movement profiles, these activation types capture the variability of neural responses to the same avoidance behavior. Importantly, these activation “types” (a, b,

      c) are not global neuron categories. They do not correspond to, nor are they intended to map onto, the movement-based neuron classes defined earlier. Instead, they describe how neurons differ in their activation during a particular behavioral mode—that is, within a specific set of behaviorally matched trials. Because modes are defined at the trial level, the neurons contributing to each mode can differ: some neurons have trials belonging to one mode, others to two or all three. Thus, Type a/b/c groupings are not fixed properties of neurons. To prevent confusion, we refer to them explicitly as neuronal activation types, emphasizing that they characterize mode-specific response patterns rather than global cell identities.

      In conclusion, the categorizations serve entirely different analytical purposes and should not be interpreted as competing classifications. The mode-specific “types” do not reclassify or replace the movement-sensitivity classes; they capture how neurons differ within a single, well-defined avoidance action, while the movement classes reflect how neurons relate to movements in general. Each classification relates to different set of questions and overlap between them is not expected.

      To make this as clear as possible we added the following paragraph to the Results:  

      “To avoid confusion between analyses, it is important to note that the movement-sensitivity classes defined here (Class 1–4; Fig. 7) are conceptually distinct from both the movementonset classes (Class A–C; Fig. 3) and the neuronal activation “types” introduced later in the avoidance-mode analysis. The Class 1–4 grouping reflects how neurons relate to movement across the entire session, based on their cross-correlation with speed. The onset classes A–C capture neural activity specifically around spontaneous movement initiation during general exploration. In contrast, the later activation “types” are derived within each avoidance mode and describe how neurons differ in their activation patterns during identical CS1 avoidance responses. These classifications answer different questions about STN function and are not intended to correspond to one another.”

      (4) Similarly having 3 different cell types (a,b,c) in the active avoidance seems unrelated to the original classification of cell types (1,2,3), and these are different for each class of avoid. This is very confusing and it is unclear how any of these types relate to each other. Presumable the same mouse has all three classes of avoids, so there are recording from each cell during each type of avoid. So the authors could compare one cell during each avoid and determine whether it relates to movement or sound or something else. It is interesting that types a,b,c have the exact same proportions in each class of avoid, and really makes it important to investigate if these are the exact same cells or not. Also, these mice could be recorded during open field so the original neural classification (class 1, 2,3) could be applied to these same cells and then the authors can see whether each cell type defined in the open field has different response to the different avoid types. As it stands, the paper simply finds that during movement and during avoidance behaviors different cells in the STN do different things. - Similarly, the authors somewhat addressed the neural types issue, but figure 9 still has 9 different neural types and it is unclear whether the same cells that are type 'a' in mode 1 avoids are also type 'a' in mode 2 avoids, or do some switch to type b? Is there consistency between cell types across avoid modes? The authors show that type 'c' neurons are differentially elevated in mode 3 vs 2, but also describes neurons as type '2c' and statistically compare them to type '1c' neurons. Are these the same neurons? or are type 2c neurons different cells vs type 1c neurons? This is still unclear and requires clarification to be interpretable.

      We believe the remaining confusion arises from treating the different classification schemes as if they were alternative labels applied to the same neurons, when in fact they serve entirely separate analytical purposes and may not include the same neurons (see previous point). Because these classifications answer different questions, they are not expected to overlap, nor is overlap required for the interpretations we draw. It is therefore not appropriate to compare a neuron’s “type” in one avoidance mode to its movement class, or to ask whether types a/b/c across different modes are “the same cells,” since modes are defined by trial-level movement clustering rather than by neuron identity. Importantly, Types a/b/c are not intended as a new global classification of neurons; they simply summarize the variability of neuronal responses within each behaviorally matched mode. We agree that future studies could expand our findings, but that is beyond the already wide scope of the present paper. Our current analyses demonstrate a key conceptual point: when movement is held constant (via modes), STN neurons still show heterogeneous, outcome- and caution-related patterns, indicating encoding that cannot be reduced to movement alone.

      Relatedly, was the association with speed used to define each neural "class" done in the active avoidance context or in a separate (e.g. open field) experiment? This is not clear in the text.

      The cross-correlation classes were derived from the entire recording session, which included open-field and avoidance tasks recordings. The tasks include long intertrial periods with spontaneous movements. We found no difference in classes when we include only a portion of the session, such as the open field or if we exclude the avoidance interval where actions occur.

      Finally, in figure 7, why is there a separate avoid trace for each neural class? With the GRIN lens, the authors are presumably getting a sample of all cell types during each avoid, so why do the avoids differ depending on the cell type recorded?

      The entire STN population is not recorded within a single session; each session contributes only a subset of neurons to the dataset. Consequently, each neural class is composed of neurons drawn from partially non-overlapping sets of sessions, each with its own movement traces. For this reason, we plot avoidance traces separately for each neural class to maintain strict within-session correspondence between neural activity and the behavior collected in the same sessions. This prevents mixing behavioral data across sessions that did not contribute neurons to that class and ensures that all neural– behavioral comparisons remain appropriately matched. We have clarified this rationale in the revised manuscript. We note that averaging movement across classes—as is often done—would obscure these distinctions and would not preserve the necessary correspondence between neural activity and behavior. This is also clarified in Results.

      (5) The use of the same colors to mean two different things in figure 9 is confusing. AA1 vs AA2 shouldn't be the same colors as light-naïve vs light signaling CS.

      -addressed, but the authors still sometimes use the same colors to mean different things in adjacent figures (e.g. the red, blue, black colors in figure 1 and figure 2 mean totally different things) and use different colors within the same figure to represent the same thing (Figure 9AB vs Figure 9CD). This is suboptimal.

      Following the reviewer’s suggestion, in Figure 2, we changed the colors, so readers do not assume they are related to Fig. 1.

      In Figure 9, we changed the colors in C,D to match the colors in A,B.

      (6) The exact timeline of the optogenetics experiments should be presented as a schematic for understandability. It is not clear which conditions each mouse experienced in which order. This is critical to the interpretation of figure 9 and the reduction of passive avoids during STN stimulation. Did these mice have the CS1+STN stimulation pairing or the STN+US pairing prior to this experiment? If they did, the stimulation of the STN could be strongly associated with either punishment or with the CS1 that predicts punishment. If that is the case, stimulating the STN during CS2 could be like presenting CS1+CS2 at the same time and could be confusing. The authors should make it clear whether the mice were naïve during this passive avoid experiment or whether they had experienced STN stimulation paired with anything prior to this experiment.

      -addressed

      (7) Similarly, the duration of the STN stimulation should be made clear on the plots that show behavior over time (e.g. Figure 9E).

      -addressed

      (8) There is just so much data and so many conditions for each experiment here. The paper is dense and difficult to read. It would really benefit readability if the authors put only the key experiments and key figure panels in the main text and moved much of the repetative figure panels to supplemental figures. The addition of schematic drawings for behavioral experiment timing and for the different AA1, AA2, AA3 conditions would also really improve clarity.

      -partially addressed. The paper is still dense and difficult to read. No experimental schematics were added.

      As suggested, we now added the schematic to Fig. 5A.  

      New Comments:

      (9) Description of the animals used and institutional approval are missing from the methods.

      The information on animal strains and institutional approval is already included in the manuscript. The first paragraph of the Methods section states:

      “… All procedures were reviewed and approved by the institutional animal care and use committee and conducted in adult (>8 weeks) male and female mice. …”

      Additionally, the next subsection, “Strains and Adeno-Associated Viruses (AAVs),” fully specifies all mouse lines used. We therefore believe that the required descriptions of animals and institutional approval are already present and meet standard reporting.

    1. Author response:

      We thank the reviewers for their constructive and helpful feedback on our manuscript. We are delighted that they found the study to be "comprehensive and convincing" and a "tour de force" in its combination of electrophysiological recordings with large-scale digital twin screening. We appreciate that the reviewers highlighted the strengths of our multi-species approach and the "cross-species and cross-area consistency" of the results, noting that the work showcases how in silico experiments can generate concrete, experimentally validatable hypotheses.

      The reviewers also raised several important points that we plan to address in the final version of the manuscript to improve clarity and interpretation. These center on:

      Model performance in V4: Reviewer #1 raised questions regarding the comparative drop in model performance in V4 and the implications for the validity of the results (including the use of "high confidence" neurons and a request for clarification on the number of animals in the V4 dataset).

      Species differences: Both reviewers noted the value of the macaque-mouse comparison but requested a more explicit delineation of the differences between these species given their distinct ethological niches.

      The nature of inhibitory dimensions: The reviewers asked for further details on how to identify these inhibitory dimensions and the specific relationship between excitation and inhibition. We believe unraveling these mechanisms represents an exciting direction for future work, and we will explicitly mention this in the Discussion section of the final manuscript, alongside a clearer contextualization with prior literature.

      Technical clarifications: Reviewer #2 requested clarifications on specific technical details, such as the skewness thresholds used for sparsity analysis.

      In the final version of the manuscript, we will address these points by adding necessary clarifications to the text—including confirming the animal cohort details—explicitly contrasting the mouse and macaque data to highlight coding differences, and expanding our discussion. We will also ensure all technical inquiries, such as those regarding skewness and reference citations, are fully resolved.

      We believe addressing these points will significantly strengthen the manuscript.

    1. Author response:

      Public Reviews:.

      Reviewer #1 (Public review):

      Wang, Zhou et al. investigated coordination between the prefrontal cortex (PFC) and the hippocampus (Hp), during reward delivery, by analyzing beta oscillations. Beta oscillations are associated with various cognitive functions, but their role in coordinating brain networks during learning is still not thoroughly understood. The authors focused on the changes in power, peak frequencies, and coherence of beta oscillations in two regions when rats learn a spatial task over days. Inconsistent with the authors' hypothesis, beta oscillations in those two regions during reward delivery were not coupled in spectral or temporal aspects. They were, however, able to show reverse changes in beta oscillations in PFC and Hp as the animal's performance got better. The authors were also able to show a small subset of cell populations in PFC that are modulated by both beta oscillations in PFC and sharp wave ripples in Hp. A similarly modulated cell population was not observed in Hp. These results are valuable in pointing out distinct periods during a spatial task when two regions modulate their activity independently from each other.

      The authors included a detailed analysis of the data to support their conclusions. However, some clarifications would help their presentation, as well as help readers to have a clear understanding.

      (1) The crucial time point of the analysis is the goal entry. However, it needs a better explanation in the methods or in figures of what a goal entry in their behavioral task means.

      We appreciate Reviewer 1 pointing out this shortcoming and will clarify the description in the revised manuscript. Each goal is located at the end of the arm, and is equipped with a reward delivery unit. The unit has an infrared sensor. The rat breaks the infrared beam when it enters the goal.

      (2) Regarding Figure 2, the authors have mentioned in the methods that PFC tetrodes have targeted both hemispheres. It might be trivial, but a supplementary graph or a paragraph about differences or similarities between contralateral and ipsilateral tetrodes to Hp might help readers.

      We will provide the requested analysis in the full revision. We saw both hemispheres had similar properties.

      (3) The authors have looked at changes in burst properties over days of training. For the coincidence of beta bursts between PFC and Hp, is there a change in the coincidence of bursts depending on the day or performance of the animal?

      We will provide the requested analysis in the full revision.

      (4) Regarding the changes in performance through days as well as variance of the beta burst frequency variance (Figures 3C and 4C); was there a change in the number of the beta bursts as animals learn the task, which might affect variance indirectly?

      The analysis we can do here is to control for differences in the number of bursts for each category (days/performance quintile) by resampling the data to match the burst count between categories.

      (5) In the behavioral task, within a session, animals needed to alternate between two wells, but the central arm (1) was in the same location. Did the authors alternate the location of well number 1 between days to different arms? It is possible that having well number 1 in the same location through days might have an effect on beta bursts, as they would get more rewards in well number 1?

      The central arm remained the same across days since we needed the animals to learn the alternation task. In our experience, the animal needs a few days to learn the alternation rule when we switch the central arm location. For this experiment, we were interested in the initial learning process, and we kept the central constant. Switching the central arm location is a great suggestion for a follow up experiment where we can understand the effects of reward contingency change has on beta bursts.

      (6) The animals did not increase their performance in the F maze as much as they increased it in the Y maze. It would be more helpful to see a comparison between mazes in Figure 5 in terms of beta burst timing. It seems like in Y maze, unrewarded trials have earlier beta bursts in Y maze compared to F maze. Also, is there a difference in beta burst frequencies of rewarded and unrewarded trials?

      We will add this analysis in the revised manuscript.

      (7) For individual cell analysis, the authors recorded from Hp and the behavioral task involved spatial learning. It would be helpful to readers if authors mention about place field properties of the cells they have recorded from. It is known that reward cells firing near reward locations have a higher rate to participate in a sharp wave ripple. Factoring in the place field propertiesd of the cells into the analysis might give a clearer picture of the lack of modulation of HP cells by beta and sharp wave ripples.

      This is a great suggestion, and we will address this in the full revision.

      Reviewer #2 (Public review):

      We thank Reviewer 2 for their helpful comments and will address these in full in the revision. These are great suggestions to provide greater detail on the spectral and behavioral data at the goal.

      (1) When presenting the power spectra for the representative example (Figure 1), it would be appropriate to display a broader frequency band-including delta, theta, and gamma (up to ~100 Hz), rather than only the beta band.

      We will show more examples of power spectra with a wider frequency range. We did examine the wider spectra and noticed power in the beta frequency band was more prominent than others.

      What was the rat's locomotor state (e.g., running speed) after entering the reward location, during which the LFPs were recorded?

      We will add the time aligned speed profile to the spectra and raw data examples. Because goal entry is defined as the time the animals break the infrared beam at the goal (response to Reviewer 1), the rat would have come to a stop.

      If the rats stopped at the goal but still consumed the reward (i.e., exhibited very low running speed), theta rhythms might still occasionally occur, and sharp-wave ripples (SWRs) could be observed during rest.

      We typically find low theta power in the hippocampus after the animal reaches the goal location and as it consumes reward. Reviewer 2 is correct about occasional theta power at the goal. We have observed this but mostly before the animal leaves the goal location. We did find SWRs during goal periods. One example is shown in Fig. 7A.

      Do beta bursts also occur during navigation prior to goal entry?

      We did not find consistent beta bursts in PFC or CA1 on approach to goal entry. We can provide the analyses in our full revision. In our initial exploratory analysis, we found beta bursts was most prominent after goal entry, which led us to focus on post-goal entry beta for this manuscript. However, beta oscillations in the hippocampus during locomotion or exploration has been reported (Ahmed & Mehta, 2012; Berke et al., 2008; França et al., 2014; França et al., 2021; Iwasaki et al., 2021; Lansink et al., 2016; Rangel et al., 2015).

      It would be beneficial to display these rhythmic activities continuously across both the navigation and goal entry phases. Additionally, given that the hippocampal theta rhythm is typically around 7-8 Hz, while a peak at approximately 15-16 Hz is visible in the power spectra in Figure 1C, the authors should clarify whether the 22 Hz beta activity represents a genuine oscillation rather than a harmonic of the theta rhythm.

      To ensure we fully address this concern, we can provide further spectral analysis in our revised manuscript to show theta power in CA1 is reduced after goal entry. We were initially concerned about the possibility that the 22Hz power in CA1 may be a harmonic rather than a standalone oscillation band. If these are harmonics of theta, we should expect to find coincident theta at the time of bursts in the beta frequency. In Fig. 1B, Fig. 2A, we show examples of the raw LFP traces from CA1. Here, the detected bursts are not accompanied by visible theta frequency activity. For PFC, we do not always see persistent theta frequency oscillations like CA1. In PFC, we found beta bursts were frequent and visually identifiable when examining the LFP. We provided examples of the PFC LFP (Fig. 1B, Fig. 1-1, and Fig. 2A). In these cases, we see clear beta frequency oscillations lasting several cycles and these are not accompanied by any oscillations in the theta frequency in the LFP trace.

      (2) The authors claim that beta activity is independent between CA1 and PFC, based on the low coherence between these regions. However, it is challenging to discern beta-specific coherence in CA1; instead, coherence appears elevated across a broader frequency band (Figure 2 and Figure 2-1D). An alternative explanation could be that the uncoupled beta between CA1 and PFC results from low local beta coherence within CA1 itself.

      This is a legitimate concern, and we used three methods to characterize coherence and coordination between the two regions. First, we calculated coherence for tetrode pairs for times when the animal was at goals (Fig. 2B), which provides a general estimation of coherence across frequencies but lack any temporal resolution. Second, we calculated burst aligned coherence (Fig. 2-1), which provides temporal resolution relative to the burst, but the multi-taper method is constrained by the time-frequency resolution trade off. Third, we quantified the timing between the burst peaks (Fig. 2D), which will describe timing differences but the peaks for the bursts may not be symmetric. Thus, each method has its own caveats, but we drew our conclusion from the combination of results from these three analyses, which pointed to similar conclusions.

      Reviewer 2 is correct in pointing out the uniformly high coherence within CA1 across the frequency range we examined. When we inspected the raw LFP across multiple tetrodes in CA1, they were similar to each other (Fig. 2A). This likely reflects the uniformity in the LFP across recording sites in CA1, which is what we saw with coherence values across the frequency range (Fig. 2B). We found CA1 coherence between tetrode pairs within CA1 across the range, were statistically higher, compared to tetrode pairs in PFC (Fig. 2B and C), thus our results are unlikely to be explained by low beta coherence within CA1 itself. The burst aligned coherence using a multi-taper method also supports this. The coherence values within CA1 at the time of CA1 bursts is ~0.8-0.9.

      (3) In Figure 2-1E-F, visual inspection of the box plots reveals minimal differences between PFC-Ind and PFC-Coin/CA1-Coin conditions, despite reported statistical significance. It may be necessary to verify whether the significance arises from a large sample size.

      We will include the sample sizes for each of the boxplots, these should be the same as the power comparison in Fig. 2-1 A-C. The LFP within a one second window centered around the bursts are usually very similar, and the multi-taper method will return high coherence values. The p-values from statistical comparisons between the boxes are corrected using the Benjamini-Hochberg method.

      (4) In Figure 3 and Figure 4, although differences in power and frequency appear to change significantly across days, these changes are not easily discernible by visual inspection. It is worth considering whether these variations are related to increased task familiarity over days, potentially accompanied by higher running speeds.

      We agree with Reviewer 2 that familiarity increases across days, and the animal is likely running faster. The analysis for Fig. 3 and 4 includes only data from periods when the animal was at the goal and was not moving. We used linear mixed effects models to quantify the relationship between power, frequency and day or behavioral quintile.

      (5) The stronger spiking modulation by local beta oscillations shown in Figure 6 could also be interpreted in the context of uncoupled beta between CA1 and PFC. In this analysis, only spikes occurring during beta bursts should be included, rather than all spikes within a trial. The authors should verify the dataset used and consider including a representative example illustrating beta modulation of single-unit spiking.

      We agree with Reviewer 2 that the stronger modulation to local beta is another piece of evidence indicating uncoupled beta between the two regions. We appreciate this suggestion and will add examples illustrating beta modulation for single units. We want to clarify the spikes were only from periods when the animal is at the goal location on each trial and does not include the running period between goals.

      (6) As observed in Figure 7D, CA1 beta bursts continue to occur even after 2.5 seconds following goal entry, when SWRs begin to emerge. Do these oscillations alternate over time, or do they coexist with some form of cross-frequency coupling?

      This is a very interesting and helpful suggestion. Although we found SWRs generally appear later than beta bursts, it is possible the two are related on a finer timescale pointing to coordination. Our cross-correlation analysis between PFC and CA1 beta bursts only showed the relationship on the timescale of seconds. We will show a higher time-resolution version of this analysis in the revision.

      Reviewer #3 (Public review):

      Summary:

      This paper explored the role of beta rhythms in the context of spatial learning and mPFC-hippocampal dynamics. The authors characterized mPFC and hippocampal beta oscillations, examining how their coordination and their spectral profiles related to learning and prefrontal neuronal firing. Rats performed two tasks, a Y-maze and an F-maze, with the F-maze task being more cognitively demanding. Across learning, prefrontal beta oscillation power increased while beta frequency decreased. In contrast, hippocampal beta power and beta frequency decreased. This was particularly the case for the well-performed and well-learned Y-maze paradigm. The authors identified the timing of beta oscillations, revealing an interesting shift in beta burst timing relative to reward entry as learning progressed. They also discovered an interesting population of prefrontal neurons that were tuned to both prefrontal beta and hippocampal sharp-wave ripple events, revealing a spectrum of SWR-excited and SWR-inhibited neurons that were differentially phase locked to prefrontal beta rhythms.

      In sum, the authors set out to examine how beta rhythms and their coordination were related to learning and goal occupancy. The authors identified a set of learning and goal-related correlates at the level of LFP and spike-LFP interactions, but did not report on spike-behavioral correlates.

      Strengths:

      Pairing dual recordings of medial prefrontal cortex (mPFC) and CA1 with learning of spatial memory tasks is a strength of this paper. The authors also discovered an interesting population of prefrontal neurons modulated by both beta and CA1 sharp-wave ripple (SWR) events, showing a relationship between SWR-excited and SWR-inhibited neurons and beta oscillation phase.

      Weaknesses:

      Moreover, there is little detail provided about sample sizes and how data sampling is being performed (e.g., rats, sessions, or trials), raising generalizability concerns.

      We appreciate Reviewer 3’s thoughtful suggestions for making our claims convincing. We will include information about sample sizes and address each detailed recommendation in the revised manuscript.

      The authors report on a task where rats were performing sub-optimally (F-maze), weakening claims.

      Our experiment was designed to allow us to examine within the same animal, a well-performed task (Y) and a less well-performed task (F). This contrast allows us to determine differences in neural correlates. We can further dissect the relevant differences to take advantage of this experiment design.

      Likewise, it is questionable as to whether mPFC and hippocampus are dually required to perform a no-delay Y-maze task at day 5, where rats are performing near 100%.

      We agree with Reviewer 3 that the mPFC and hippocampus may not be required when the animal reaches stable performance on day 5 (Deceuninck & Kloosterman, 2024). The data we collected spans the full range of early learning (day 1) to proficiency (day 5). We wanted to understand the dynamics of beta across these learning stages.

      Recent studies suggest mPFC and hippocampus are likely to be needed, in some capacity, for learning continuous spatial alternation tasks on a range of maze geometries. Lesions, inactivation or waking activity perturbation of hippocampus or hippocampus and mPFC on the W maze alternation task slowed learning (Jadhav et al., 2012; Kim & Frank, 2009; Maharjan et al., 2018). More recently, optogenetic silencing of mPFC after sharp wave ripples on the Y maze alternation affected performance when the center arm was switched (den Bakker et al., 2023). The Y and F mazes in our study both share the continuous alternation rule, where the animal needed to avoid visiting a previously visited location on the outbound choice relative to the center, and always return to the center location.

      Further, the performance characteristics on the outbound and inbound components of our Y task is similar to the W task. We have analyzed the “inbound” and “outbound” performance of the animals on the Y maze alternation task, and they are similar to the W maze alternation task. The “inbound” or reference location component is learned quickly whereas the ”outbound”, alternation component is learned slowly. We can add this analysis to the revised manuscript.

      There would be little reason to suspect strong oscillatory coupling when task performance is poor and/or independent of mPFC-HPC communication (Jones and Wilson, 2005) potentially weakening conclusions about independent beta rhythms.

      Although many studies have examined the oscillatory coupling properties at the theta frequency between mPFC-HPC (Hyman et al., 2005; Jones & Wilson, 2005; Siapas et al., 2005), our understanding of beta frequency coordination between the two regions is less established, especially at goal locations. Beta frequency coordination at goal locations may or may not follow similar properties to theta frequency coupling. In this manuscript we are reporting the properties of goal-location beta frequency activity in mPFC-HPC networks. We are not aware of prior work describing these properties at this stage of a spatial navigation task, especially their coordination in time.

      References

      Ahmed, O. J., & Mehta, M. R. (2012). Running speed alters the frequency of hippocampal gamma oscillations. J Neurosci, 32(21), 7373-7383. https://doi.org/10.1523/JNEUROSCI.5110-11.2012

      Berke, J. D., Hetrick, V., Breck, J., & Greene, R. W. (2008). Transient 23-30 Hz oscillations in mouse hippocampus during exploration of novel environments. Hippocampus, 18(5), 519-529. https://doi.org/10.1002/hipo.20435

      Deceuninck, L., & Kloosterman, F. (2024). Disruption of awake sharp-wave ripples does not affect memorization of locations in repeated-acquisition spatial memory tasks. Elife, 13. https://doi.org/10.7554/eLife.84004

      den Bakker, H., Van Dijck, M., Sun, J. J., & Kloosterman, F. (2023). Sharp-wave-ripple-associated activity in the medial prefrontal cortex supports spatial rule switching. Cell Rep, 42(8), 112959. https://doi.org/10.1016/j.celrep.2023.112959

      França, A. S., do Nascimento, G. C., Lopes-dos-Santos, V., Muratori, L., Ribeiro, S., Lobão-Soares, B., & Tort, A. B. (2014). Beta2 oscillations (23-30 Hz) in the mouse hippocampus during novel object recognition. Eur J Neurosci, 40(11), 3693-3703. https://doi.org/10.1111/ejn.12739

      França, A. S. C., Borgesius, N. Z., Souza, B. C., & Cohen, M. X. (2021). Beta2 Oscillations in Hippocampal-Cortical Circuits During Novelty Detection. Front Syst Neurosci, 15, 617388. https://doi.org/10.3389/fnsys.2021.617388

      Hyman, J. M., Zilli, E. A., Paley, A. M., & Hasselmo, M. E. (2005). Medial prefrontal cortex cells show dynamic modulation with the hippocampal theta rhythm dependent on behavior. Hippocampus, 15(6), 739-749. https://doi.org/10.1002/hipo.20106

      Iwasaki, S., Sasaki, T., & Ikegaya, Y. (2021). Hippocampal beta oscillations predict mouse object-location associative memory performance. Hippocampus, 31(5), 503-511. https://doi.org/10.1002/hipo.23311

      Jadhav, S. P., Kemere, C., German, P. W., & Frank, L. M. (2012). Awake hippocampal sharp-wave ripples support spatial memory. Science (New York, N.Y.), 336(6087), 1454-1458. https://doi.org/10.1126/science.1217230

      Jones, M. W., & Wilson, M. A. (2005). Theta Rhythms Coordinate Hippocampal–Prefrontal Interactions in a Spatial Memory Task. PLoS Biology, 3(12). https://doi.org/10.1371/journal.pbio.0030402

      Kim, S. M., & Frank, L. M. (2009). Hippocampal Lesions Impair Rapid Learning of a Continuous Spatial Alternation Task. PLoS ONE, 4(5). https://doi.org/10.1371/journal.pone.0005494

      Lansink, C. S., Meijer, G. T., Lankelma, J. V., Vinck, M. A., Jackson, J. C., & Pennartz, C. M. (2016). Reward Expectancy Strengthens CA1 Theta and Beta Band Synchronization and Hippocampal-Ventral Striatal Coupling. J Neurosci, 36(41), 10598-10610. https://doi.org/10.1523/JNEUROSCI.0682-16.2016

      Maharjan, D. M., Dai, Y. Y., Glantz, E. H., & Jadhav, S. P. (2018). Disruption of dorsal hippocampal - prefrontal interactions using chemogenetic inactivation impairs spatial learning. Neurobiol Learn Mem, 155, 351-360. https://doi.org/10.1016/j.nlm.2018.08.023

      Rangel, L. M., Chiba, A. A., & Quinn, L. K. (2015). Theta and beta oscillatory dynamics in the dentate gyrus reveal a shift in network processing state during cue encounters. Front Syst Neurosci, 9, 96. https://doi.org/10.3389/fnsys.2015.00096

      Siapas, A. G., Lubenov, E. V., & Wilson, M. A. (2005). Prefrontal Phase Locking to Hippocampal Theta Oscillations. Neuron, 46(1), 141-151. https://doi.org/10.1016/j.neuron.2005.02.028.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors set out to understand how animals respond to visible light in an animal without eyes. To do so, they used the C. elegans model, which lacks eyes, but nonetheless exhibits robust responses to visible light at several wavelengths. Here, the authors report a promoter that is activated by visible light and independent of known pathways of light responses.

      Strengths:

      The authors convincingly demonstrate that visible light activates the expression of the cyp-14A5 promoter-driven gene expression in a variety of contexts and report the finding that this pathway is activated via the ZIP-2 transcriptionally regulated signaling pathway.

      Weaknesses:

      Because the ZIP-2 pathway has been reported to be activated predominantly by changes in the bacterial food source of C. elegans -- or exposure of animals to pathogens -- it remains unclear if visible light activates a pathway in C. elegans (animals) or if visible light potentially is sensed by the bacteria on the plate, which also lack eyes. Specifically, it is possible that the plates are seeded with excess E. coli, that E. coli is altered by light in some way, and in this context, alters its behavior in such a way that activates a known bacterially responsive pathway in the animals. This weakness would not affect the ability to use this novel discovery as a tool, which would still be useful to the field, but it does leave some questions about the applicability to the original question of how animals sense light in the absence of eyes.

      Thank you for the insightful questions and suggestions. We have now performed a key experiment requested. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. We now include this interesting new result in the paper and revised discussion on the bacteria-modulated mechanism but note that this bacterial requirement does not alter the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity likely influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the intrinsic regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50. Accordingly, we have revised the Results and Discussion to reflect the appropriate scope.

      Reviewer #2 (Public review):

      Summary:

      Ji, Ma, and colleagues report the discovery of a mechanism in C. elegans that mediates transcriptional responses to low-intensity light stimuli. They find that light-induced transcription requires a pair of bZIP transcription factors and induces expression of a cytochrome P450 effector. This unexpected light-sensing mechanism is required for physiologically relevant gene expression that controls behavioral plasticity. The authors further show that this mechanism can be co-opted to create light-inducible transgenes.

      Strengths:

      The authors rigorously demonstrate that ambient light stimuli regulate gene expression via a mechanism that requires the bZIP factors ZIP-2 and CEBP-2. Transcriptional responses to light stimuli are measured using transgenes and using measurements of endogenous transcripts. The study shows proper genetic controls for these effects. The study shows that this light-response does not require known photoreceptors, is tuned to specific wavelengths, and is highly unlikely to be an artifact of temperature-sensing. The study further shows that the function of ZIP-2 and CEBP-2 in light-sensing can be distinguished from their previously reported role in mediating transcriptional responses to pathogenic bacteria. The study includes experiments that demonstrate that regulatory motifs from a known light-response gene can be used to confer light-regulated gene expression, demonstrating sufficiency and suggesting an application of these discoveries in engineering inducible transgenes. Finally, the study shows that ambient light and the transcription factors that transduce it into gene expression changes are required to stabilize a learned olfactory behavior, suggesting a physiological function for this mechanism.

      Weaknesses:

      The study implies but does not show that the effects of ambient light on stabilizing a learned olfactory behavior are through the described pathway. To show this clearly, the authors should determine whether ambient light has any effect on mutants lacking CYP-14A5, ZIP-2, or CEBP-2. Other minor edits to the text and figures are suggested.

      We appreciate the reviewer’s comment. Our study indeed implies that ambient light stabilizes learned olfactory behavior through effects on the described pathway. Importantly, the existing data already address this point. Mutants lacking CYP-14A5, ZIP-2, or CEBP-2 display impaired olfactory memory even when exposed to ambient light, indicating that these genes are required for the behavioral effect of light. Consistent with this, ambient light robustly induces cyp-14A5p::GFP in wild-type animals but fails to do so in zip-2 and cebp-2 mutants, demonstrating that light-dependent transcriptional activation is blocked upstream in these pathway mutants. Together, these results support the conclusion that ambient light acts through the ZIP-2 → CEBP-2 → CYP-14A5 pathway to stabilize memory. Minor textual and figure revisions have been made where helpful to clarify this point.

      Reviewer #3 (Public review):

      Ji et al. report a novel and interesting light-induced transcriptional response pathway in the eyeless roundworm Caenorhabditis elegans that involves a cytochrome P450 family protein (CYP-14A5) and functions independently from previously established photosensory mechanisms. Although the exact mechanisms underlying photoactivation of this pathway remain unclear, light-dependent induction of CYP-14A5 requires bZIP transcription factors ZIP-2 and CEBP-2 that have been previously implicated in worm responses to pathogens. The authors then suggest that light-induced CYP-14A5 activity in the C. elegans hypoderm can unexpectedly and cell-non-autonomously contribute to retention of an olfactory memory. Finally, the authors demonstrate the potential for this pathway to enable robust light-induced control of gene expression and behavior, albeit with some restrictions. Overall, the evidence supporting the claims of the authors is convincing, and the authors' work suggests numerous interesting lines of future inquiry.

      (1) The authors determine that light, but not several other stressors tested (temperature, hypoxia, and food deprivation), can induce transcription of cyp-15A5. The authors use these experiments to suggest the potential specificity of the induction of CYP-14A5 by light. Given the established relationship between light and oxidative stress and the authors' later identification of ZIP-2, testing the effect of an oxidative stressor or pathogen exposure on transcription of cyp-14A5 would further strengthen the validity of this statement and potentially shed some insight into the underlying mechanisms.

      We appreciate the reviewer’s thoughtful suggestion. We would like to clarify that the “specificity” we refer to is the strong and preferential induction of cyp-14A5 by light among pathogen or detoxification-related genes, rather than an assertion that cyp-14A5 is exclusively light-responsive. This does not preclude the possibility that cyp-14A5 can also be activated under other conditions. Indeed, prior work from the Troemel laboratory has identified cyp-14A5 as one of many pathogen-inducible genes, consistent with its role in stress physiology. Our data show that classical pathogen-responsive genes (e.g., irg-1) are not induced by light, whereas cyp-14A5 is strongly induced, highlighting the selective engagement of this cytochrome P450 by light under the conditions tested. We have revised the text to clarify this point.

      (2) The authors suggest that short-wavelength light more robustly increases transcription of cyp-14A5 compared to equally intense longer wavelengths (Figure 2F and 2G). Here, however, the authors report intensities in lux of wavelengths tested. Measurements of and reporting the specific spectra of the incident lights and their corresponding irradiances (ideally, in some form of mW/mm2 - see Ward et al., 2008, Edwards et al., 2008, Bhatla and Horvitz, 2015, De Magalhaes Filho et al., 2018, Ghosh et al., 2021, among others, for examples) is critical for appropriate comparisons across wavelengths and facilitates cross-checking with previous studies of C. elegans light responses. On a related and more minor note, the authors place an ultraviolet shield in front of a visible light LED to test potential effects of ultraviolet light on transcription of cyp-14A5. A measurement of the spectrum of the visible light LED would help confirm if such an experiment was required. Regardless, the principal conclusions the authors made from these experiments will likely remain unchanged.

      Thank you. We have revised the text to clarify this point. “Using controlled light versus dark conditions, we confirmed the finding from an integrated cyp-14A5p::GFP reporter and observed its robust widespread GFP expression in many tissues induced by moderate-intensity (500-3000 Lux, 16-48 hr duration) LED light exposure (Fig. 1A). The photometric Lux range is approximately 0.1–0.60 mW/cm<sup>2</sup> in radiometric (total radiant power) metric given the spectrum of the LED light source.”

      (3) The authors report an interesting observation that animals exposed to ambient light (~600 lux) exhibit significantly increased memory retention compared to those maintained in darkness (Figure 4). Furthermore, light deprivation within the first 2-4 hours after learning appears to eliminate the effect of light on memory retention. These processes depend on CYP-14A5, loss of which can be rescued by re-expression of cyp-14A5 in mutant animals using a hypoderm-specific- and non-light-inducible- promoter. Taken together, the authors argue convincingly that hypodermal expression of cyp-14A5 can contribute to the retention of the olfactory memory. More broadly, these experiments suggest that cell-non-autonomous signaling can enhance retention of olfactory memory. How retention of the olfactory memory is enhanced by light generally remains unclear. In addition, the authors' experiments in Figure 1B demonstrate - at least by use of the transcriptional reporter - that light-dependent induction of cyp-14A5 transcription at 500 - 1000 lux is minimal and especially so at short duration exposures. Additional experiments, including verification of light-dependent changes in CYP-14A5 levels in the olfactory memory behavioral setup, would help further interpret these otherwise interesting results.

      We thank the reviewer for these thoughtful comments. We agree that understanding how light enhances memory retention at a mechanistic level is an important direction for future work. Regarding the light intensities used in Figure 1B, we would like to clarify that 500–1000 lux does produce a measurable and statistically significant induction of cyp-14A5p::GFP, although the magnitude is lower than that observed at higher intensities. We interpret this modest induction as physiologically relevant: intermediate light levels appear sufficient to engage the CYP-14A5–dependent program required for memory stabilization, whereas stronger light intensities are detrimental to learning and reduce behavioral performance. Thus, the behavioral paradigm uses a light regime that activates the pathway without introducing stress-associated confounders.

      (4) The experiments in Figure 4 nicely validate the usage of the cyp-14A5 promoter as a potential tool for light-dependent induction of gene expression. Despite the limitations of this tool, including those presented by the authors, it could prove useful for the community.

      Thank you and we agree. In addition, we have included in the revised manuscript the single-copy integration strains based on UAS-GAL4 that produced similar results as transgenic strains and will be even more flexible and useful for the community.

      Recommendations for the authors:

      Reviewing Editor Comments:

      While appreciating the quality and presentation of this important study, we had two major concerns that the authors need to address.

      (1) Bacteria-versus-worm origin:

      To rule out a bacterially derived stimulus, we suggest testing whether cyp-14A5p::GFP is inducible without bacteria (or killed bacteria). Checking whether the canonical immune reporters irg-5p::GFP and gst-4p::GFP are also light-inducible will further clarify this point.

      We have now performed the key experiment requested by the reviewers. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. Importantly, this requirement does not alter any of the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50.

      We included the data (Fig. 2D) to show that the canonical immune reporter irg-1p::GFP is not induced by the light condition that robustly induced cyp-14A5p::GFP, and gst-4p::GFP is only very mildly induced (Fig. S1J).

      (2) Pathway-behaviour link:

      The behavioural relevance of the newly described pathway is intriguing, but it needs direct support. Ideally, this would require comparing memory in WT, zip-2-/-, cebp-2-/-, and cyp-14A5-/- under both dark and light conditions. But at the very least, it would require testing if constitutive CYP-14A5 rescue in the dark bypasses the requirement of light.

      We respectfully submit that additional experiments are not required to support the behavioral conclusions. Our model posits that cyp-14A5 is required but not sufficient for memory stabilization, one component within a broader set of light-induced genes. Thus, constitutive hypodermal expression of cyp-14A5 would not be expected to bypass the requirement for ambient light. The existing data are fully consistent with this framework and conclusions of the paper.

      Reviewer #1 (Recommendations for the authors):

      Overall, I think this paper is interesting to the field of C. elegans researchers at a minimum, as a light-inducible gene expression system might have a variety of uses throughout the diverse research paradigms that use this model system. With that said, I have a couple of suggestions that I think would substantially impact the ability to interpret these findings, which might be useful for broader implications of the study.

      (1) Most importantly, the supplemental table of RNA-seq data should likely be updated and discussed further beyond the cyp-14A5 findings. First, the authors report 7,902 genes are differentially expressed in response to light and then break these into upregulated and downregulated genes. But there are only 1,785 upregulated genes and 3,632 downregulated genes. This adds up to 5417 genes, but doesn't match the 7,902 genes reported to change, and I could not find in the text if some other filters were applied that might explain this not adding up.

      Thank you for this helpful comment. We agree that the exact numbers depend on statistical thresholds and are therefore somewhat arbitrary. To avoid implying unwarranted precision, we have revised the text to state that “thousands of genes are differentially regulated by light.”

      (2) Among the upregulated genes in response to light are irg-5, irg-4, irg-6, irg-8, and gst-4. Indeed, all of these well-studied genes (or most) show even more induction by light than cyp-14A5. It is my opinion that this result needs further criticism as there are existing GFP reporters for gst-4 and irg-5 that are similarly well studied to irg-1, which is in the paper (and is not upregulated). In my opinion, the authors should test if they see activation of the irg-4 and gst-4 GFP reporters by light as well. This would not only validate their RNA-seq but might provide more important evidence for the field, as these other reporters are not considered light-inducible previously. If they are, several major studies might be impacted by this.

      Thank you for the comments. We have irg-1p::GFP and gst-4p::GFP in the lab but did not find other reporters for the genes mentioned from CGC. Neither of the two reporters showed light induction (Figs. 2D and S1J) as strongly as cyp-14A5p::GFP. It is possible that irg-1 and gst-4 RNA levels are up-regulated but not reflected in our transgenic reporters that used their promoters to drive GFP expression. Stronger light induction of cyp-14A5p::GFP is unlikely caused by the multi-copy nature of the transgene since newly generated single-copy integration strains based on the UAS-GAL4 system produced similar robust results for light induction (Fig. S1I and see Method).

      (3) Along the same lines, if at least 4 (and likely more) well characterized immune response genes are activated by light and these genes are known to mostly respond to differences in C. elegans bacterial food source/diet, then it stands to reason that maybe in this experimental context the light is not acting on "animals" at all, but rather triggering changes in E. coli (i.e. changing E. coli metabolism or pathogenicity like properties). If true, then perhaps the light affects bacteria in such a way that it activates a previously known bacterial pathogen response mechanism. This should be easy to test by seeing if this reporter is still activated by light in the presence of diverse bacterial diets, which are available from the CGC (CeMBio collection, for example). This is likely very important to the conclusions of the manuscript as it relates to animals sensing light, but might not be as important to the use of this system as a tool.

      Thank you for the insightful questions and suggestions. Interesting new data (Fig. S1I) show that light induction of cyp-14A5p::GFP requires live bacteria that maintain a non-starved physiological state. Neither plates without food nor plates with heat-killed OP50 support robust induction. Importantly, this requirement does not alter any of the central conclusions of the study. Rather, it reveals an intriguing mechanistic layer, namely, that bacterial metabolic activity influences the animal’s sensitivity to environmental light. We are pursuing this host–microbe interaction in a separate study. In the present work, we focus on the regulation and functional significance of cyp-14A5 under standard laboratory conditions with live OP50. We have revised the Results and Discussion to reflect the appropriate scope of our study and implications of the new findings.

      (4) Lastly, it seems unlikely that nearly half the C. elegans genome is transcriptionally regulated by light (or nearly half of the detected genes in the RNA-seq results). It seems likely that this list of 7,902 genes contains false positives. I would suggest upping some sort of filter, like moving to padj < 0.01 instead of 0.05, or adding a 4-fold change filter (2-fold and 0.01 still results in near 5000+ genes changing, which might explain the difference in up and down genes just being due to different padj filters. Along these lines, it is worth noting that the padj is generated using DESeq2 it appears and one of the first assumptions of DESeq2 is that the median expressed genes do not change, and there is a normalization. However, if MOST genes do change in expression, then one of the fundamental assumptions of DESeq2 is not valid, and thus would mean it might not be an appropriate analysis tool - perhaps there is some other normalization that could be done before running DESeq2 due to some other noise present in the RNA-seq runs?

      Thank you for this helpful comment. We agree that the exact numbers depend on statistical thresholds and are therefore somewhat arbitrary. To avoid implying unwarranted precision, we have revised the text to state that “thousands of genes are differentially regulated by light.”

      (5) Minor point - I would delete the reference to ER in line 92. While most CYPs do localize to the ER, the images shown are not clearly ER and probably do not have enough resolution to make claims about subcellular localization. To me, it would be easier to just delete this claim as it is not required for the main claims of the manuscript.

      Reference deleted.

      Reviewer #2 (Recommendations for the authors):

      I have one request for clarification that likely requires additional data. Figure 3 shows that ambient light stabilizes learned changes to chemotaxis and further shows that CYP-14A5 has a similar function. The implication is that light promotes CYP-14A5 expression, which somehow promotes memory consolidation. The authors should test whether memory consolidation in cyp-15A5, zip-2, or cebp-2 mutants is no longer affected by ambient light.

      It is also possible to test whether forced expression of CYP14A5 can bypass the effect of 'no light' conditions on memory consolidation.

      Thank you for the comments. We respectfully submit that additional experiments are not required to support the behavioral conclusions. Our model posits that cyp-14A5 is required but not sufficient for memory stabilization, one component within a broader set of light-induced genes. Thus, constitutive hypodermal expression of cyp-14A5 would not be expected to bypass the requirement for ambient light. The existing data are fully consistent with this framework and conclusions of the paper.

      I have several minor suggestions relating to the text and figures.

      (1) In the introduction, the authors assert that little is known about non-visual light sensing and then list many examples of molecular mechanisms of non-visual light-sensing. They should emphasize that non-visual light sensing is important and accomplished by diverse molecular mechanisms.

      Agree and revised accordingly.

      (2) Check spacing between gene names (line 109).

      Corrected.

      (3) There should be a new paragraph break when the uORF experiments are described (line 146).

      Corrected.

      (4) 'Phenoptosis' is an esoteric word. Please define it (line 206).

      Corrected.

      (5) 'p' in the transgene name cyp-14A5p::nlp-22 is in italics, unlike the rest of the manuscript.

      Corrected.

      (6) 'Acknowledgment' should be 'Acknowledgments' (line 384).

      Corrected.

      (7) The color map in panel 1B should have units.

      It was arbitrary unit (now added) to highlight relative not absolute differences.

      (8) In panel 1E, it is confusing to have 'DARK' denoted by reddish bars and 'LIGHT' denoted by bluish bars. Perhaps 'DARK' is black/dark grey and 'LIGHT' is white?

      Corrected.

      (9) In panel 1D, it takes a minute to find the purple diamond. Please mark up the volcano plot to make it easier.

      Corrected.

      Reviewer #3 (Recommendations for the authors):

      The authors generally present convincing experiments detailing interesting results in a well-written manuscript.

      One quick note: the same Bhatla and Horvitz (2015) papers appear to be cited twice [line 52].

      Corrected.

    1. Author response:

      The following is the authors’ response to the latest reviews:

      "One remaining question is the interpretation of matching variants with very low stable posterior probabilities (~0), which the authors have analyzed in detail but without fully conclusive findings. I agree with the authors that this event is relatively rare and the current sample size is limited but this might be something to keep in mind for future studies."

      Fine-mapping stabilityon matching variants with very low stable posterior probability

      We thank Reviewer 2 for encouraging us to think more about how low stable posterior probability matching variants can be interpreted. We describe a few plausible interpretations, even though – as Reviewer 2 and we have both acknowledged – our present experiments do not point to a clear and conclusive account.

      One explanation is that the locus captured by the variant might not be well-resolved, in the sense that many correlated variants exist around the locus. Thus, the variant itself is unlikely causal, but the set of variants in high LD with it may contain the true causal variant, or it's possible that the causal variant itself was not sequenced but lies in that locus. A comparison of LD patterns across ancestries at the locus would be helpful here.

      Another explanation rests on the following observation. For a variant to be matching between top and stable PICS and to also have very small stable PP, it has to have the largest PP after residualization on the ALL slice but also have positive PP with gene expression on many other slices. In other words, failing to control for potential confounders shrinks the PP. If one assumes that the matching variant is truly causal, then our observation points to an example of negative confounding (aka suppressor effect). This can occur when the confounders (PCs) are correlated with allele dosage at the causal variant in a different direction than their correlation with gene expression, so that the crude association between unresidualized gene expression and causal variant allele dosage is biased toward 0.

      Although our present study does not allow us to systematically confirm either interpretation – since we found that matching variants were depleted in causal variants in our simulations, violating the second argument, but we also found functional enrichment in analyses of GEUVADIS data though only 17 matching variants with low stable PP were reported – we believe a larger-scale study using larger cohort sizes (at least 1000 individuals per ancestry) and many more simulations (to increase yield of such cases) would be insightful.

      ———

      The following is the authors’ response to the original reviews:

      Reviewer #1:

      Major comments:

      (1) It would be interesting to see how much fine-mapping stability can improve the fine-mapping results in cross-population. One can simulate data using true genotype data and quantify the amount the fine-mapping methods improve utilizing the stability idea.

      We agree, and have performed simulation studies where we assume that causal variants are shared across populations. Specifically, by mirroring the simulation approach described in Wang et al. (2020), we generated 2,400 synthetic gene expression phenotypes across 22 autosomes, using GEUVADIS gene expression metadata (i.e., gene transcription start site) to ensure largely cis expression phenotypes were simulated. We additionally generated 1,440 synthetic gene expression phenotypes that incorporate environmental heterogeneity, to motivate our pursuit of fine-mapping stability in the first place (see Response to Reviewer 2, Comment 6). These are described in Results section “Simulation study”:

      We evaluated the performance of the PICS algorithm, specifically comparing the approach incorporating stability guidance against the residualization approach that is more commonly used — similar to our application to the real GEUVADIS data. We additionally investigated two ways of “combining” the residualization and stability guidance approaches: (1) running stability-guided PICS on residualized phenotypes; (2) prioritizing matching variants returned by both approaches. See Response to Reviewer 2, Comment 5.

      (2) I would be very interested to see how other fine-mapping methods (FINEMAP, SuSiE, and CAVIAR) perform via the stability idea.

      Thank you for this valuable comment. We ran SuSiE on the same set of simulated datasets. Specifically, we ran a version that uses residualized phenotypes (supposedly removing the effects of population structure), and also a version that incorporates stability. The second version is similar to how we incorporate stability in PICS. We investigated the performance of Stable SuSiE in a similar manner to our investigation of PICS. First we compared the performance relative to SuSiE that was run on residualized phenotypes. Motivated by our finding in PICS that prioritizing matching variants improves causal variant recovery, we did the same analysis for SuSiE. This analysis is described in Results section “Stability guidance improves causal variant recovery in SuSiE.”

      We reported overall matching frequencies and causal variant recovery rates of top and stable variants for SuSiE in Figures 2C&D.

      Frequencies with which Stable and Top SuSiE variants match, stratified by the simulation parameters, are summarized in Supplementary File 2C (reproduced for convenience in Response to Reviewer 2, Comment 3). Causal variant recovery rates split by the number of causal variants simulated, and stratified by both signal-to-noise ratio and the number of credible sets included, are reported in Figure 2—figure supplements 16-18. We reproduce Figure 2—figure supplement 18 (three causal variants scenario) below for convenience. Analogous recovery rates for matching versus non-matching top or stable variants are reported in Figure 2—figure supplements 19, 21 and 23.

      (3) I am a little bit concerned about the PICS's assumption about one causal variant. The authors mentioned this assumption as one of their method limitations. However, given the utility of existing fine-mapping methods (FINEMAP and SuSiE), it is worth exploring this domain.

      Thank you for raising this fair concern. We explored this domain, by considering simulations that include two and three causal variants (see Response to Reviewer 2, Comment 3). We looked at how well PICS recovers causal variants, and found that each potential set largely does not contain more than one causal variant (Figure 2—figure supplements 20 and 22). This can be explained by the fact that PICS potential sets are constructed from variants with a minimum linkage disequilibrium to a focal variant. On the other hand, in SuSiE, we observed multiple causal variants appearing in lower credible sets when applying stability guidance (Figure 2—figure supplements 21 and 23). A more extensive study involving more fine-mapping methods and metrics specific to violation of the one causal variant assumption could be pursued in future work.

      Reviewer #2:

      Aw et al. presents a new stability-guided fine-mapping method by extending the previously proposed PICS method. They applied their stability-based method to fine-map cis-eQTLs in the GEUVADIS dataset and compared it against what they call residualization-based method. They evaluated the performance of the proposed method using publicly available functional annotations and claimed the variants identified by their proposed stability-based method are more enriched for these functional annotations.

      While the reviewer acknowledges the contribution of the present work, there are a couple of major concerns as described below.

      Major:

      (1) It is critical to evaluate the proposed method in simulation settings, where we know which variants are truly causal. While I acknowledge their empirical approach using the functional annotations, a more unbiased, comprehensive evaluation in simulations would be necessary to assess its performance against the existing methods.

      Thank you for this point. We agree. We have performed a simulation study where we assume that causal variants are shared across populations (see response to Reviewer 1, Comment 1). Specifically, by mirroring the simulation approach described in Wang et al. (2020), we generated 2,400 synthetic gene expression phenotypes across 22 autosomes, using GEUVADIS gene expression metadata (i.e., gene transcription start site) to ensure cis expression phenotypes were simulated.

      (2) Also, simulations would be required to assess how the method is sensitive to different parameters, e.g., LD threshold, resampling number, or number of potential sets.

      Thank you for raising this point. The underlying PICS algorithm was not proposed by us, so we followed the default parameters set (LD threshold, r<sup>2</sup> \= 0.5; see Taylor et al., 2021 Bioinformatics) to focus on how stability considerations will impact the existing fine-mapping algorithm. We attempted to derive the asymptotic joint distribution of the p-values, but it was too difficult. Hence, we used 500 permutations because such a large number would allow large-sample asymptotics to kick in. However, following your critical suggestion we varied the number of potential sets in our analyses of simulated data. We briefly mention this in the Results.

      “In the Supplement, we also describe findings from investigations into the impact of including more potential sets on matching frequency and causal variant recovery…”

      A detailed write-up is provided in Supplementary File 1 Section S2 (p.2):

      “The number of credible or potential sets is a parameter in many fine-mapping algorithms. Focusing on stability-guided approaches, we consider how including more potential sets for stable fine-mapping algorithms affects both causal variant recovery and matching frequency in simulations…

      Causal variant recovery. We investigate both Stable PICS and Stable SuSiE. Focusing first on simulations with one causal variant, we observe a modest gain in causal variant recovery for both Stable PICS and Stable SuSiE, most noticeably when the number of sets was increased from 1 to 2 under the lowest signal-to-noise ratio setting…”

      We observed that increasing the number of potential sets helps with recovering causal variants for Stable PICS (Figure 2—figure supplements 13-15). This observation also accounts for the comparable power that Stable PICS has with SuSiE in simulations with low signal-to-noise ratio (SNR), when we increase the number of credible sets or potential sets (Figure 2—figure supplements 10-12).

      (3) Given the previous studies have identified multiple putative causal variants in both GWAS and eQTL, I think it's better to model multiple causal variants in any modern fine-mapping methods. At least, a simulation to assess its impact would be appreciated.

      We agree. In our simulations we considered up to three causal variants in cis, and evaluated how well the top three Potential Sets recovered all causal variants (Figure 2—figure supplements 13-15; Figure 2—figure supplement 15). We also reported the frequency of variant matches between Top and Stable PICS stratified by the number of causal variants simulated in Supplementary File 2B and 2C. Note Supplementary File 2C is for results from SuSiE fine-mapping; see Response to Reviewer 1, Comment 2.

      Supplementary File 2B. Frequencies with which Stable and Top PICS have matching variants for the same potential set. For each SNR/ “No. Causal Variants” scenario, the number of matching variants is reported in parentheses.

      Supplementary File 2C. Frequencies with which Stable and Top SuSiE have matching variants for the same credible set. For each SNR/ “No. Causal Variants” scenario, the number of matching variants is reported in parentheses.

      (4) Relatedly, I wonder what fraction of non-matching variants are due to the lack of multiple causal variant modeling.

      PICS handles multiple causal variants by including more potential sets to return, owing to the important caveat that causal variants in high LD cannot be statistically distinguished. For example, if one believes there are three causal variants that are not too tightly linked, one could make PICS return three potential sets rather than just one. To answer the question using our simulation study, we subsetted our results to just scenarios where the top and stable variants do not match. This mimics the exact scenario of having modeled multiple causal variants but still not yielding matching variants, so we can investigate whether these non-matching variants are in fact enriched in the true causal variants.

      Because we expect causal variants to appear in some potential set, we specifically considered whether these non-matching causal variants might match along different potential sets across the different methods. In other words, we compared the stable variant with the top variant from another potential set for the other approach (e.g., Stable PICS Potential Set 1 variant vs Top PICS Potential Set 2 variant). First, we computed the frequency with which such pairs of variants match. A high frequency would demonstrate that, even if the corresponding potential sets do not have a variant match, there could still be a match between non-corresponding potential sets across the two approaches, which shows that multiple causal variant modeling boosts identification of matching variants between both approaches — regardless of whether the matching variant is in fact causal.

      Low frequencies were observed. For example, when restricting to simulations where Top and Stable PICS Potential Set 1 variants did not match, about 2-3% of variants matched between the Potential Set 1 variant in Stable PICS and Potential Sets 2 and 3 variants in Top PICS; or between the Potential Set 1 variant in Top PICS and Potential Sets 2 and 3 variants in Stable PICS (Supplementary File 2D). When looking at non-matching Potential Set 2 or Potential Set 3 variants, we do see an increase in matching frequencies (between 10-20%) between Potential Set 2 variants and other potential set variants between the different approaches. However, these percentages are still small compared to the matching frequencies we observed between corresponding potential sets (e.g., for simulations with one causal variant this was 70-90% between Top and Stable PICS Potential Set 1, and for simulations with two and three causal variants this was 55-78% and 57-79% respectively).

      We next checked whether these “off-diagonal” matching variants corresponded to the true causal variants simulated. Here we find that the causal variant recovery rate is mostly less than the corresponding rate for diagonally matching variants, which together with the low matching frequency suggests that the enrichment of causal variants of “off-diagonal” matching variants is much weaker than in the diagonally matching approach. In other words, the fraction of non-matching (causal) variants due to the lack of multiple causal variant modeling is low.

      We discuss these findings in Supplementary File 1 Section S2 (bottom of p.2).

      (5) I wonder if you can combine the stability-based and the residualization-based approach, i.e., using the residualized phenotypes for the stability-based approach. Would that further improve the accuracy or not?

      This is a good idea, thank you for suggesting it. We pursued this combined approach on simulated gene expression phenotypes, but did not observe significant gains in causal variant recovery (Figure 2B; Figure 2—figure supplements 2, 13 and 15). We reported this Results “Searching for matching variants between Top PICS and Stable PICS improves causal variant Recovery.”

      “We thus explore ways to combine the residualization and stability-driven approaches, by considering (i) combining them into a single fine-mapping algorithm (we call the resulting procedure Combined PICS); and (ii) prioritizing matching variants between the two algorithms. Comparing the performance of Combined PICS against both Top and Stable PICS, however, we find no significant difference in its ability to recover causal variants (Figure 2B)...”

      However, we also confirmed in our simulations that prioritizing matching variants between the two approaches led to gains in causal variant recovery (Figure 2D; Figure 2—figure supplements 4, 19, 20 and 22). We reported this Results “Searching for matching variants between Top PICS and Stable PICS improves causal variant Recovery.”

      “On the other hand, matching variants between Top and Stable PICS are significantly more likely to be causal. Across all simulations, a matching variant in Potential Set 1 is 2.5X as likely to be causal than either a non-matching top or stable variant (Figure 2D) — a result that was qualitatively consistent even when we stratified simulations by SNR and number of causal variants simulated (Figure 2—figure supplements 19, 20 and 22)...”

      This finding is consistent with our analysis of real GEUVADIS gene expression data, where we reported larger functional significance of matching variants relative to non-matching variants returned by either Top of Stable PICS.

      (6) The authors state that confounding in cohorts with diverse ancestries poses potential difficulties in identifying the correct causal variants. However, I don't see that they directly address whether the stability approach is mitigating this. It is hard to say whether the stability approach is helping beyond what simpler post-hoc QC (e.g., thresholding) can do.

      Thank you for raising this fair point. Here is a model we have in mind. Gene expression phenotypes (Y) can be explained by both genotypic effects (G, as in genotypic allelic dosage) and the environment (E): Y = G + E. However, both G and E depend on ancestry (A), so that Y = G|A+E|A. Suppose that the causal variants are shared across ancestries, so that (G|A=a)=G for all ancestries a. Suppose however that environments are heterogeneous by ancestry: (E|A=a) = e(a) for some function e that depends non-trivially on a. This would violate the exchangeability of exogenous E in the full sample, but by performing fine-mapping on each ancestry stratum, the exchangeability of exogenous E is preserved. This provides theoretical justification for the stability approach.

      We next turned to simulations, where we investigated 1,440 simulated gene expression phenotypes capturing various ways in which ancestry induces heterogeneity in the exogenous E variable (simulation details in Lines 576-610 of Materials and Methods). We ran Stable PICS, as well as a version of PICS that did not residualize phenotypes or apply the stability principle. We observed that (i) causal variant recovery performance was not significantly different between the two approaches (Figure 2—figure supplements 24-32); but (ii) disagreement between the approaches can be considerable, especially when the signal-to-noise ratio is low (Supplementary File 2A). For example, in a set of simulations with three causal variants, with SNR = 0.11 and E heterogeneous by ancestry by letting E be drawn from N(2σ,σ<sup>2</sup>) for only GBR individuals (rest are N(0,σ<sup>2</sup>)), there was disagreement between Potential Set 1 and 2 variants in 25% of simulations — though recovery rates were similar (Probability of recovering at least one causal variant: 75% for Plain PICS and 80% for Stable PICS). These points suggest that confounding in cohorts can reduce power in methods not adjusting or accounting for ancestral heterogeneity, but can be remedied by approaches that do so. We report this analysis in Results “Simulations justify exploration of stability guidance”

      In the current version of our work, we have evaluated, using both simulations and empirical evidence, different ways to combine approaches to boost causal variant recovery. Our simulation study shows that prioritizing matching variants across multiple methods improves causal variant recovery. On GEUVADIS data, where we might not know which variants are causal, we already demonstrated that matching variants are enriched for functional annotations. Therefore, our analyses justify that the adverse consequence of confounding on reducing fine-mapping accuracy can be mitigated by prioritizing matching variants between algorithms including those that account for stability.

      (7) For non-matching variants, I wonder what the difference of posterior probabilities is between the stable and top variants in each method. If the difference is small, maybe it is due to noise rather than signal.

      We have reported differences in posterior probabilities returned by Stable and Top PICS for GEUVADIS data; see Figure 3—figure supplement 1. For completeness, we compute the differences in posterior probabilities and summarize these differences both as histograms and as numerical summary statistics.

      Potential Set 1

      - Number of non-matching variants = 9,921

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 1.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 1.

      Potential Set 2

      - Number of non-matching variants = 14,454

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 2.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 2.

      Potential Set 3

      - Number of non-matching variants = 16,814

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 3.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 3.

      We also compared the difference in posterior probabilities between non-matching variants returned by Stable PICS and Top PICS for our 2,400 simulated gene expression phenotypes. Focusing on just Potential Set 1 variants, we find two equally likely scenarios, as demonstrated by two distinct clusters of points in a “posterior probability-posterior probability” plot. The first is, as pointed out, a small difference in posterior probability (points lying close to y=x). The second, however, reveals stable variants with very small posterior probability (of order 4 x 10<sup>–5</sup> to 0.05) but with a non-matching top variant taking on posterior probability well distributed along [0,1]. Moving down to Potential Sets 2 and 3, the distribution of pairs of posterior probabilities appears less clustered, indicating less tendency for posterior probability differences to be small ( Figure 2—figure supplement 8).

      Here are the histograms and numerical summary statistics.

      Potential Set 1

      - Number of non-matching variants = 663 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 4.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 4.

      Potential Set 2

      Number of non-matching variants = 1,429 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 5.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 5.

      Potential Set 3

      - Number of non-matching variants = 1,810 (out of 2,400)

      - Table of Summary Statistics of (Stable Posterior Probability – Top Posterior Probability)

      Author response table 6.

      - Histogram of (Stable Posterior Probability – Top Posterior Probability)

      Author response image 6.

      (8) It's a bit surprising that you observed matching variants with (stable) posterior probability ~ 0 (SFig. 1). What are the interpretations for these variants? Do you observe functional enrichment even for low posterior probability matching variants?

      Thank you for this question. We have performed a thorough analysis of matching variants with very low stable posterior probability, which we define as having a posterior probability < 0.01 (Supplementary File 1 Section S11). Here, we briefly summarize the analysis and key findings.

      Analysis

      First, such variants occur very rarely — only 8 across all three potential sets in simulations, and 17 across all three potential sets for GEUVADIS (the latter variants are listed in Supplementary 2E). We begin interpreting these variants by looking at allele frequency heterogeneity by ancestry, support size — defined as the number of variants with positive posterior probability in the ALL slice* — and the number of slices including the stable variant (i.e., the stable variant reported positive posterior probability for the slice).

      *Note that the stable variant posterior probability need not be at least 1/(Support Size). This is because the algorithm may have picked a SNP that has a lower posterior probability in the ALL slice (i.e., not the top variant) but happens to appear in the most number of other slices (i.e., a stable variant).

      For variants arising from simulations, because we know the true causal variants, we check if these variants are causal. For GEUVADIS fine-mapped variants, we rely on functional annotations to compare their relative enrichment against other matching variants that did not have very low stable posterior probability.

      Findings

      While we caution against generalizing from observations reported here, which are based on very small sample sizes, we noticed the following. In simulations, matching variants with very low stable posterior probability are largely depleted in causal variants, although factors such as the number of slices including the stable variant may still be useful. In GEUVADIS, however, these variants can still be functionally enriched. We reported three examples in Supplementary File 1 Section S11 (pp. 8-9 of Supplement), where the variants were enriched in either VEP or biologically interpretable functional annotations, and were also reported in earlier studies. We partially reproduce our report below for convenience.

      “However, we occasionally found variants that stand out for having large functional annotation scores. We list one below for each potential set.

      - Potential Set 1 reported the variant rs12224894 from fine-mapping ENSG00000255284.1 (accession code AP006621.3) in Chromosome 11. This variant stood out for lying in the promoter flanking region of multiple cell types and being relatively enriched for GC content with a 75bp flanking region. This variant has been reported as a cis eQTL for AP006632 (using whole blood gene expression, rather than lymphoblastoid cell line gene expression in this study) in a clinical trial study of patients with systemic lupus erythematosus (Davenport et al., 2018). Its nearest gene is GATD1, a ubiquitously expressed gene that codes for a protein and is predicted to regulate enzymatic and catabolic activity. This variant appeared in all 6 slices, with a moderate support size of 23.

      - Potential Set 2 reported the variant rs9912201 from fine-mapping ENSG00000108592.9 (mapped to FTSJ3) in Chromosome 17. Its FIRE score is 0.976, which is close to the maximum FIRE score reported across all Potential Set 2 matching variants. This variant has been reported as a SNP in high LD to a GWAS hit SNP rs7223966 in a pan-cancer study (Gong et al., 2018). This variant appeared in all 6 slices, with a moderate support size of 32.

      - Potential Set 3 reported the variant rs625750 from fine-mapping ENSG00000254614.1 (mapped to CAPN1-AS1, an RNA gene) in Chromosome 11. Its FIRE score is 0.971 and its B statistic is 0.405 (region under selection), which lie at the extreme quantiles of the distributions of these scores for Potential Set 3 matching variants with stable posterior probability at least 0.01. Its associated mutation has been predicted to affect transcription factor binding, as computed using several position weight matrices (Kheradpour and Kellis, 2014). This variant appeared in just 3 slices, possibly owing to the considerable allele frequency difference between ancestries (maximum AF difference = 0.22). However, it has a small support size of 4 and a moderately high Top PICS posterior probability of 0.64.

      To summarize, our analysis of GEUVADIS fine-mapped variants demonstrates that matching variants with very low stable posterior probability could still be functionally important, even for lower potential sets, conditional on supportive scores in interpretable features such as the number of slices containing the stable variant and the posterior probability support size…”

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This manuscript presents useful insights into the molecular basis underlying the positive cooperativity between the co-transported substrates (galactoside sugar and sodium ion) in the melibiose transporter MelB. Building on years of previous studies, this work improves on the resolution of previously published structures and reports the presence of a water molecule in the sugar binding site that would appear to be key for its recognition, introduces further structures bound to different substrates, and utilizes HDX-MS to further understand the positive cooperativity between sugar and the co-transported sodium cation. Although the experimental work is solid, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion, as well as a clearer description of the new insight that is obtained in relation to previous studies. The work will be of interest to biologists and biochemists working on cation-coupled symporters, which mediate the transport of a wide range of solutes across cell membranes.

      We express our gratitude to the associate editor, review editor, and reviewers for their favorable evaluation of this manuscript, as well as their constructive comments and encouragement. Their feedback has been integrated to fortify the evidence, refine the data analysis, and elevate the presentation of the results, thereby enhancing the overall quality and clarity of the manuscript.

      A brief summary of the modifications in this revision:

      (a) We performed four new experiments: 1) intact cell [<sup>3</sup>H]raffinose transport assay; 2) intact cell p-nitrophenol detection to demonstrate α-NPG transport; 3) ITC binding assay for the D59C mutant; and 4) molecular dynamics to simulate the water-1 in sugar-binding site and the dynamics of side chains in the Na<sup>+</sup>- and melibiose-binding pockets. All data consistently support the conclusion draw in this article.

      (b) We have added a new figure to show the apo state dynamics (the new Fig. 5a,b) and annotated the amino acid residue positions and marked positions in sugar- or Na<sup>+</sup>-binding pockets.

      (c) As suggested by reviewer-3, we have moved the individual mapping of ligand effects on HDX data to the main figure, combined with the residual plots, and marked the amino-acid residue positions.

      (d) We have added more deuterium uptake plots to cover all residues in the sugar- or Na<sup>+</sup>-binding pockets in the current figure 7 (previously figure 6).

      (e) We have added a new figure 8 showing the positions at the well-studied cytoplasmic gating salt-bridge network and other loops likely important for conformational changes, along with a membrane topology marked with the HDX data. We have added a new figure 9 from MD simulations.

      Reviewer #1:

      While the structure of the melibiose permease in both outward and inward-facing forms has been solved previously, there remain unanswered questions regarding its mechanism. Hariharan et al set out to address this with further crystallographic studies complemented with ITC and hydrogen-deuterium exchange (HDX) mass spectrometry.

      (1) They first report 4 different crystal structures of galactose derivatives to explore molecular recognition, showing that the galactose moiety itself is the main source of specificity. Interestingly, they observe a water-mediated hydrogen bonding interaction with the protein and suggest that this water molecule may be important in binding.

      We thank you for understanding what we've presented in this manuscript.

      (2) The results from the crystallography appear sensible, though the resolution of the data is low, with only the structure with NPG better than 3Å. However, it is a bit difficult to understand what novel information is being brought out here and what is known about the ligands. For instance, are these molecules transported by the protein or do they just bind? They measure the affinity by ITC, but draw very few conclusions about how the affinity correlates with the binding modes. Can the protein transport the trisaccharide raffinose?

      The four structures with bound sugars of different sizes were used to identify the binding motif on both the primary substrate (sugar) and the transporter (MelB<sub>St</sub>). Although the resolutions of the structures complexed with melibiose, raffinose, or a-MG are relatively low, the size and shape of the densities at each structure are consistent with the corresponding sugar molecules, which provide valuable data for confirming the pose of the bound sugar proposed previously. In this revision, we further refine the α-NPG-bound structure to 2.60 Å. The identified water-1 in this study further confirms the orientation of C4-OH. Notably, this transporter does not recognize or transport glucosides in which the orientation of the C4-OH at the glucopyranosyl ring is opposite. To verify the water in the sugar-binding site, we initiated a new collaborative study using MD simulations. Results showed that Wat-1 exhibited nearly full occupancy when melibiose was present, regardless of whether Na<sup>+</sup> was bound at the cation-binding site.

      As detailed in the Summary, we added two additional sets of transport assays and confirmed that raffinose and α-NPG are transportable substrates of MelB<sub>St</sub>. For α-NPG transport, we measured the end products of the process—enzyme hydrolysis and membrane diffusion of p-nitrophenol released from intracellular α-NPG.

      As a bonus, based on the WT-like downhill α-NPG transport activity by the D59C uniporter mutant that failed in active transport against a sugar concentration gradient, we further emphasized that the sugar translocation pathway is isolated from the cation-binding site. The new data strongly support the allosteric effects of cation binding on sugar-binding affinity. Thank you for this helpful suggestion.

      A meaningful analysis of ITC data heavily depends on the quality of the data. My laboratory has extensive experience with ITC and has gained rich, insightful mechanistic knowledge of MelB<sub>St</sub>. Because of the low affinity in raffinose and a-MG, unfortunately, no further information can be convincingly obtained. Therefore, we did not dissect the enthalpic and entropic contributions but focused on the Kd value and binding stoichiometry.

      (3) The HDX also appears to be well done; however, in the manuscript as written, it is difficult to understand how this relates to the overall mechanism of the protein and the conformational changes that the protein undergoes.

      We are sorry for not presenting our data clearly in the initial submission. In this revised manuscript, we have made numerous improvements, as described in the Summary. These enhancements in the HDX data analysis provided new mechanistic insights into the allosteric effects, leading us to conclude that protein dynamics and conformational transitions are coupled with sugar-binding affinity. Na<sup>+</sup> binding restricts protein conformational flexibility, thereby increasing sugar-binding affinity. The HDX study revealed that the major dynamic region includes a sugar-binding residue, Arg149, which also plays a gating role. Structurally, this dual-function residue undergoes significant displacement during the sugar-affinity-coupled conformational transition, thereby coupling the sugar binding and structural dynamics.

      Reviewer #2:

      This manuscript from Hariharan, Shi, Viner, and Guan presents x-ray crystallographic structures of membrane protein MelB and HDX-MS analysis of ligand-induced dynamics. This work improves on the resolution of previously published structures, introduces further sugar-bound structures, and utilises HDX to explore in further depth the previously observed positive cooperatively to cotransported cation Na<sup>+</sup>. The work presented here builds on years of previous study and adds substantial new details into how Na<sup>+</sup> binding facilitates melibiose binding and deepens the fundamental understanding of the molecular basis underlying the symport mechanism of cation-coupled transporters. However, the presentation of the data lacks clarity, and in particular, the HDX-MS data interpretation requires further explanation in both methodology and discussion.

      We appreciate this reviewer's time in reading our previous articles related to this manuscript.

      Comments on Crystallography and biochemical work:

      (1) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      This figure is a stereo view of a density map created in cross-eye style. In this revision, we changed this figure to Fig. 3 and showed only the density for sugar and water-1. 

      (2) It is slightly unclear what the ITC measurements add to this current manuscript. The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but it is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend. Additionally, the D59C mutant utilised here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium.

      Thank this reviewer for your helpful suggestions. We have performed the suggested ITC measurements with the D59C mutant. The purpose of the ITC experiments was to demonstrate that MelB<sub>St</sub> can bind raffinose and α-MG to support the crystal structures.

      Comments on HDX-MS work:

      While the use of HDX-MS to deepen the understanding of ligand allostery is an elegant use of the technique, this reviewer advises the authors to refer to the Masson et al. (2019) recommendations for the HDX-MS article (https://doi.org/10.1038/s41592-019-0459-y) on how to best present this data. For example:

      All authors value this reviewer's comments and suggestions, which have been included in this revision.

      (1) The Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilised protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      Yes, a lipid/detergent removal step was included in this study and previous ones, and this information was clearly described in the Methods.

      (2) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We have updated the Table S2 and addressed the reviewer’ request for the details of HDX experiments.

      (3) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We have prepared and presented deuterium uptake time-course plots for any peptides with ΔD > threshold in Fig. S5a-c.

      (4) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, this reviewer understands that working with dynamic transporters can lead to increased data variation; a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We appreciate this comment and have cited the suggested article on the hybrid significance method. We fully acknowledge that using a cutoff of P < 0.05 can increase the likelihood of false-positive identifications. By applying multiple levels of statistical testing, we determined that P < 0.05 is an appropriate threshold for this study. The threshold values were presented in the residual plots and explained in the text. For the previous Fig. 6 (renamed Fig. S4b in the current version), we have reported the P value. *, < 0.05; **, < 0.01. (The text for 0.01 was not visible in the previous version. Sorry for the confusion.)

      (5) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity. The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there is no difference between the dynamics of each site.

      The current Table S3 (combined from previous Tables S3 and S4 as suggested) was prepared to provide an overall view of the dynamic regions with SD values provided. For other questions, if we understand correctly, this reviewer asked us to comment on the effects of solvent accessibility or hydrophobic regions on the overall dynamics outside the binding residues of the peptides that cover them. Since HDX rates are influenced by two linked factors: solvent accessibility and hydrogen-bonding interactions that reflect structural dynamics, poor solvent accessibility in buried regions should result in low deuterium uptakes. The peptides in our dataset that include the Na<sup>+</sup>-binding site showed lower HDX, likely due to limited solvent accessibility and lower structural stability. It is unclear what this reviewer meant by "increased dynamics at peptides covering the Na binding site on overall more dynamic helices." We did not observe increased dynamics in peptides covering the Na<sup>+</sup>-binding site; instead, all Na<sup>+</sup>-binding residues and nearby sugar-binding residues have lower degrees of deuteriation.

      (6) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Thanks for this suggestion. The previous datasets were collected in the presence of Na<sup>+</sup>. In the current study, we also have two Na<sup>+</sup>-containing datasets. Both showed similar results: the multiple overlapping peptides covering the sugar-binding residues on helices I and V have higher HDX rates than those peptides covering the Na<sup>+</sup>-binding residues, even when Na<sup>+</sup> was present.

      (7) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Thank you for this suggestion. Comparing HDX-MS between the WT and the D59C mutant is certainly interesting, especially with the increasing amount of structural, biochemical, and biophysical data now available for this mutant. However, due to limited resources, we might consider it later.

      (8) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      We have shown that Li<sup>+</sup> also works positively with melibiose. Li<sup>+</sup> binds to MelB<sub>St</sub> with a higher affinity than Na<sup>+</sup> and modifies MelB<sub>St</sub> differently. It is important to study this thoroughly and separately. To answer the second question, H<sup>+</sup> is a weak coupling cation with little effect on melibiose binding. Since its pKa is around 6.5, only a small population of MelB<sub>St</sub> is protonated at pH 7.5. The order of sugar-binding cooperativity is highest with Na<sup>+</sup>, then Li<sup>+</sup>, and finally H<sup>+</sup>.

      (9) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to the OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      The sugar translocation free-energy landscape simulations showed that both helix bundles move relative to the membrane plane. This analysis aimed to clarify a hypothesis in the field—that the MFS transporter can use an asymmetric mode to perform the conformational transition between inward- and outward-facing states. In the case of MelB<sub>St</sub>, we clearly demonstrated that both domains move and each helix bundle moves as a unit. So only a small number of helices and loops showed labeling changes. Thanks for the suggestion about comparing with XylE. We have included that in the discussion.

      (10) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate, this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Thank this reviewer for reading the single-molecule force spectroscopy (SMFS) study on MelB<sub>St</sub>.  The C3 loop mentioned in this SMFS article is partially covered in the dataset Mel or Mel plus Na<sup>+</sup> vs. apo, and there is more coverage in the Na<sup>+</sup> vs. apo dataset. In either condition, no deprotection was detected. The labeling time point might not be long enough to detect it.

      Reviewer #3:

      Summary:

      The melibiose permease from Salmonella enterica serovar Typhimurium (MelB<sub>St</sub>) is a member of the Major Facilitator Superfamily (MFS). It catalyzes the symport of a galactopyranoside with Na<sup>+</sup>, H<sup>+</sup>, or Li<sup>+</sup>, and serves as a prototype model system for investigating cation-coupled transport mechanisms. In cation-coupled symporters, a coupling cation typically moves down its electrochemical gradient to drive the uphill transport of a primary substrate; however, the precise role and molecular contribution of the cation in substrate binding and translocation remain unclear. In a prior study, the authors showed that the binding affinity for melibiose is increased in the presence of Na<sup>+</sup> by about 8-fold, but the molecular basis for the cooperative mechanism remains unclear. The objective of this study was to better understand the allosteric coupling between the Na<sup>+</sup> and melibiose binding sites. To verify the sugar-recognition specific determinants, the authors solved the outward-facing crystal structures of a uniport mutant D59C with four sugar ligands containing different numbers of monosaccharide units (α-NPG, melibiose, raffinose, or α-MG). The structure with α-NPG bound has improved resolution (2.7 Å) compared to a previously published structure and to those with other sugars. These structures show that the specificity is clearly directed toward the galactosyl moiety. However, the increased affinity for α-NPG involves its hydrophobic phenyl group, positioned at 4 Å-distance from the phenyl group of Tyr26, which forms a strong stacking interaction. Moreover, a water molecule bound to OH-4 in the structure with α-NPG was proposed to contribute to the sugar recognition and appears on the pathway between the two specificity-determining pockets. Next, the authors analyzed by hydrogen-to-deuterium exchange coupled to mass spectrometry (HDX-MS) the changes in structural dynamics of the transporter induced by melibiose, Na<sup>+</sup>, or both. The data support the conclusion that the binding of the coupling cation at a remote location stabilizes the sugar-binding residues to switch to a higher-affinity state. Therefore, the coupling cation in this symporter was proposed to be an allosteric activator.

      Strengths:

      (1) The manuscript is generally well written.

      (2) This study builds on the authors' accumulated knowledge of the melibiose permease and integrates structural and HDX-MS analyses to better understand the communication between the sodium ion and sugar binding sites. A high sequence coverage was obtained for the HDX-MS data (86-87%), which is high for a membrane protein.

      Thank this reviewer for your positive comments.

      Weaknesses:

      (1) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373.

      A water molecule can be modeled at a resolution ranging from 2.4 to 3.2 Å, and the quality of the model depends on the map quality and water location. In this revision, we refined the resolution to 2.6 Å using the same dataset and also performed all-atom MD simulations. All results support the occupancy of water-1 in the sugar-bound MelB<sub>St</sub>.

      (2) Site-directed mutagenesis could help strengthen the conclusions of the authors. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr121, Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the claims of the authors regarding the allosteric communication between the two substrate-binding sites.

      The authors thank this reviewer for the thoughtful suggestions. MelB<sub>St</sub> has been subjected to Cys-scanning mutagenesis (https://doi.org/10.1016/j.jbc.2021.101090). Placing a Cys residue at Gln372 significantly decreased the transport initial rate, accumulation, and melibiose fermentation, with minimal effect on protein expression, as shown in Figure 2 of this JBC article, which could support its role in the binding pocket. The T373C mutant retained most of the WT's activities. Our previous studies showed that Thr121 is only responsible for Na<sup>+</sup> binding in MelB<sub>St</sub>, and mutations decreased protein stability; now, HDX reveals that this is the rigid position. Additionally, our previous studies indicated that Arg295 is another conformationally important residue. In this version, we have added more HDX analysis to explore the relationship between the two substrate-binding sites with conformational dynamics, especially focusing on the gating salt-bridge network including Arg295, which has provided meaningful new insights.

      (3) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations? The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. It would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter more visible. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      Thanks for this important question. We have added more discussion of the deprotected data and prepared a new Fig. 8b to highlight the melibiose-binding-induced flexibility in several loops, especially the gating area on both sides of the membrane. We also proposed that these changes might facilitate the formation of the transition-competent state. The overall effects induced by substrate binding are relatively small, and the datasets for apo and Na were collected separately, so comparing melibiose&Na<sup>+</sup> versus Na<sup>+</sup> might not be as precise. In fact, the Na<sup>+</sup> effects on the sugar-binding site can be clearly seen in the deuterium uptake plots shown in Figures 7-8, by comparing the first and last panels.

      (4) For non-specialists, it would be beneficial to better introduce and explain the choice of using D59C for the structural analyses.

      Asp59 is the only site that responds to the binding of all coupling cations: Na<sup>+</sup>, Li<sup>+</sup>, or H<sup>+</sup>. Notably, this thermostable mutant D59C selectively abolishes all cation binding and associated cotransport activities, but it maintains intact sugar binding and exhibits conformational transition as the WT, as demonstrated by electroneutral transport reactions including α-NPG transport showed in this articles, and melibiose exchange and fermentation showed previously. Therefore, the structural data derived from this mutant are significant and offer important mechanistic insights into sugar transport, which supports the conclusion that the Na<sup>+</sup> functions as allosteric activator.

      (5) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226). I would recommend indicating more of them in areas where deuterium changes are substantial.

      We appreciate this comment and have modified the plots by marking the residue position as well as labeled several peptides of significant HDX in the Fig 5b. We also provided a deuteriation map based on peptide coverage (Fig. 5a).

      (6) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      This is an intriguing mechanistic question. In this HDX study, we found that the cation-binding pocket and nearby sugar-binding residues are conformationally rigid, while some sugar-binding residues farther from the cation-binding pocket are flexible. We concluded that conformational dynamics regulate sugar-binding affinity, but the increase in Na-binding affinity caused by melibiose is not related to protein dynamics. Our previous interpretation based on structural data remains our preferred explanation; therefore, the bound melibiose physically prevents the release of Na<sup>+</sup> or Li<sup>+</sup> from the cation-binding pocket. We also proposed the mechanism of intracellular NA<sup>+</sup> release in the 2024 JBC paper (https://doi.org/10.1016/j.jbc.2024.107427); after sugar release, the rotamer change of Asp55 will help NA<sup>+</sup> exit the cation pocket into the empty sugar pocket, and the negative membrane potential inside the cell will further facilitate movement from MelB<sub>St</sub> to the cytosol.

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) It would help the reader if the previous work were introduced more clearly, and if the results of the experiments reported in this manuscript were put into the context of the previous work. Lines 283-296 discuss observations that are similar to previous reported structures as well as novel interpretations. It would help the reader to be clearer about what the new observations are.

      Thank you for the important comment. We have revised accordingly by adding related citations and words “as showed previously” when we stated our previous observations.

      (2) The affinity by ITC is measured for various ligands, but very few conclusions are drawn about how the affinity correlates with the binding modes. Are the other ligands that are investigated in this study transported by the protein, or do they just bind? Can the protein transport the trisaccharide raffinose? The authors comment that raffinose exhibiting poor binding affinity despite having more sugar units is surprising, but this is not surprising to me. No additional interactions can be mapped to these units on their structure, and while it fits into the substrate binding cavity, the extra bulk of additional sugar units is likely to reduce affinity. In fact, from their listed ITC measurements, this appears to be the trend.

      Additionally, the D59C mutant utilized here in structural determination is deficient in sodium/cation binding. The reported allostery of sodium-sugar binding will likely influence the sugar binding motif as represented by these structures. This is clearly represented by the authors' own ITC work. The ITC included in this work was carried out on the WT protein in the presence of Na<sup>+</sup>. The authors could benefit from clarifying how this work fits with the structural work or carrying out ITC with the D59C mutant, or additionally, in the absence of sodium. For non-specialists, please better introduce and explain the choice of using D59C for the structural analyses.

      Thank you for the meaningful comments. We have comprehensively addressed all the concerns and suggestions as listed in the summary of this revision. Notably, the D59C mutant does not catalyze any electrogenic melibiose transport involved in a cation transduction but catalyze downhill transport location of the galactosides, as shown by the downhill α-NPG transport assay in Fig. 1a. The intact downhill transport results from D59C mutant further supports the allosteric coupling between the cation- and sugar-binding sites.

      The binding isotherm and poor affinity of the ITC measurements do not support to further analyze the binding mode since none showed sigmoidal curve, so the enthalpy change cannot be accurately determined. But authors thank this comment.

      (3) It is not clear what Figure 2 is comparing. The text suggests this figure is a comparison of the lower resolution structure to the structure presented in this work; however, the figure legend does not mention which is which, and both images include a modelled water molecule that was not assigned due to poor resolution previously, as stated by the authors, in the previously generated structure. This figure should be more clearly explained.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #1.

      (4) I am not sure that the resolution of the structure (2.7 Å) is sufficiently high to unambiguously establish the presence of a water molecule bound to OH-4 of the α-NPG sugar. In Figure 2, the density for water 1 is not obvious to me, although it is indeed plausible that water mediates the interaction between OH4/OH6 and the residues Q372 and T373. Please change line 278 to state "this OH-4 water molecule is likely part of sugar binding".

      We have addressed these concerns in the response to the Public Reviews at reviewer-3 #1.

      (5) Line 290-296: The Thr121 is not represented in any figures, while the Lys377 is. Their relative positioning between sugar water and sodium is not made clear by any figure.

      Thanks for this comment. This information has been clearly presented in the Figs. 7-8. Lys377 is closer to the cation site and related far from the sugar-binding site.

      (6) Methodology includes a lipid removal step. Based on other included methods, I assumed that the HDX-MS was being carried out in detergent-solubilized protein samples. I therefore do not see the need for a lipid removal step that is usually included for bilayer reconstituted samples. I note that this methodology is the same as previously used for MelB. It should be clarified why this step was included, if it was in fact used, aka, further details on the sample preparation should be included.

      (7) A summary of HDX conditions and results should be given as recommended, including the mean peptide length and average redundancy per state alongside other included information such as reaction temperature, sequence coverage, etc., as prepared for previous publications from the authors, i.e., Hariharan et al., 2024.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (8) Uptake plots per peptide for the HDX-MS data should be included as supporting information outside of the few examples given in Figure 6.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (9) A reference should be given to the hybrid significance testing method utilised. Additionally, as stated by Hageman and Weis (2019) (doi:10.1021/acs.analchem.9b01325), the use of P < 0.05 greatly increases the likelihood of false positive ΔD identifications. While the authors include multiple levels of significance, what they refer to as high and lower significant results, and this reviewer understands that working with dynamic transporters can lead to increased data variation, a statement of why certain statistical criteria were chosen should be included, and possibly accompanied by volcano plots. The legend of Figure 6 should include what P value is meant by * and ** rather than statistically significant and highly statistically significant.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (10) The table (S3) and figure (S4) showing uncovered residues is an unclear interpretation of the data; this would be better given as a peptide sequence coverage heat map. This would also be more informative for the redundancy in covered regions, too. In this way, S3 and S4 can be combined.

      We have addressed these concerns in the response to the Public Reviews at reviewer-2 #4.

      (11) Residual plots in Figure 5 could be improved by a topological map to indicate how peptide number resembles the protein amino acid sequence.

      Thanks for the request, due to the figure 6 is big so that we add a transmembrane topology plot colored with the HDX results in Fig. 8c.

      (12) The presentation of data in S5 could be clarified. Does the number of results given in the brackets indicate overlapping peptides? What are the lengths of each of these peptides? Classical HDX data presentation utilizes blue for protection and red for deprotection. The use of yellow ribbons to show protection in non-sugar binding residues takes some interpretation and could be clarified by also depicting in a different blue. I also don't see the need to include ribbon and cartoon representation when also using colors to depict protection and deprotection. The authors should change or clarify this choice.

      We have moved this figure into the current Fig. 6b as suggested by Reviewer-3. To address your questions listed in the figure legend, the number of results shown in brackets indeed indicates overlapping peptides. What are the lengths of each of these peptides? The sequences of each peptide are shown in Figures 7-8 and are also included in Supplemental Figure S5. Regarding the use of color, both blue and green were used to distinguish peptides protecting the substrate-binding site from other regions. The ribbon and cartoon representations are provided for clarity, as the cartoon style hides many helices.

      (13) In Table S5, the difference between valid points and protection is unclear. And what is indicated by numbers in brackets or slashes? Additionally, it should be highlighted again here that single-residue information is inferred from peptide-level data. By value, are the authors referring to peptide-level differential data?

      Please review our responses in the Public Reviews at reviewer-2 #5.

      (14) Line 316 states a significant difference in seen in dynamics, how is significance measured here? There is no S.D. given in Table S4. Can the authors further comment on the potential involvement in solvent accessibility and buried helices that might influence the overall dynamics outside of their role in sugar vs sodium binding? An expected low rate of exchange suggests that dynamics are likely influenced by solvent accessibility or peptide hydrophobicity? The increased dynamics at peptides covering the Na binding site on overall more dynamic helices suggests that there isn't a difference between the dynamics of each site.

      Please review our responses in the Public Reviews at reviewer-2 #5.

      (15) Previously stated HDX-MS results of MelB (Hariharan et al., 2024) state that the transmembrane helices are less dynamic than polypeptide termini and loops with similar distributions across all transmembrane bundles. The previous data was obtained in the presence of sodium. Does this remove the difference in dynamics in the sugar-binding helices and the cation-binding helices? Including this comparison would support the statement that the sodium-bound MelB is more stable than the Apo state, along with the lack of deprotection observed in the differential analysis.

      Please review our responses in the Public Reviews.

      (16) MD of MelB suggests all transmembrane helices are reorientated during substrate translocation, yet substrate and cotransporter ligand binding only significantly impacts a small number of helices. Can the authors comment on the ensemble of states expected from each HDX experiment? The data presented here instead shows overall stabilisation of the transporter. This data can be compared to that of HDX on MFS sugar cation symporter XylE, where substrate binding induces a transition to the OF state. There is no discussion of how this HDX data compares to previous MFS sugar transporter HDX. The manuscript could benefit from this comparison rather than a comparison to LacY. It is unlikely that there are universal mechanisms that can be inferred even from these model proteins. Highlighting differences instead between these transport systems provides broader insights into this protein class. Doi: 10.1021/jacs.2c06148 and 10.1038/s41467-018-06704-1.

      Please review our responses in the Public Reviews.

      (17) Additionally, the recent publication of SMFS data (by the authors: doi:10.1016/j.str.2022.11.011) states the following: "In the presence of either melibiose or a coupling Na<sup>+</sup>-cation, however, MelB increasingly populates the mechanically less stable state which shows a destabilized middle-loop C3." And "In the presence of both substrate and co-substrate this mechanically less stable state of MelB is predominant.". It would benefit the authors to comment on these data in contrast to the HDX obtained here. Additionally, is the C3 loop covered, and does it show the destabilization suggested by these studies? HDX can provide a plethora of results that are missing from the current analysis on ligand allostery. The authors instead chose to reference CD and thermal denaturation methods as comparisons.

      Please review our responses in the Public Reviews.

      (18) The main conclusion of the authors is that the binding of the coupling cation stabilizes those dynamic sidechains in the sugar-binding pocket, leading to a high-affinity state. This is visible when comparing panels c and a from Figure S5. However, there is both increased protection (blue, near the sugar) and decreased protection in other areas (red). The latter was less commented, could the increased flexibility in these red regions facilitate the transition between inward- and outward-facing conformations? The HDX changes induced by the different ligands were compared to the apo form (see Figure S5). It might be worth it for data presentation more visible to also analyze the deuterium uptake difference by comparing the conditions sodium ion+melibiose vs melibiose alone. You would make the effect of Na<sup>+</sup> on the structural dynamics of the melibiose-bound transporter. Similarly, the deuterium uptake difference between sodium ion+melibiose vs sodium ion alone could be analyzed too, in order to plot the effect of melibiose on the Na<sup>+</sup>-bound transporter.

      Please review our responses in the Public Reviews.

      (19) In Figure 5a, deuterium changes are plotted as a function of peptide ID number. It is hardly informative without making it clearer which regions it corresponds to. Only one peptide is indicated (213-226); I would recommend indicating more of them, in areas where deuterium changes are substantial.

      Please review our responses in the Public Reviews.

      (20) Figure 6, please indicate in the legend what the black and blue lines are (I assume black is for the apo?)

      We are sorry that we did not make it clear. Yes, the black was used for apo state and blue was used for all bound states

      (21) From prior work of the authors, melibiose binding also substantially increases the affinity of the sodium ion. Can the authors interpret this observation based on the HDX data?

      Please review our responses in the Public Reviews.

      Addressing the following three points would strengthen the manuscript, but also involve a significant amount of additional experimental work. If the authors decide not to carry out the experiments described below, they can still improve the assessment by focusing on points (1-21) described above.

      (22) Have the authors considered carrying out an HDX-MS comparison between the WT and the D59C mutant? This may provide some further information on the WT structure (particularly a comparison with sugar-bound). This could be tied into a nice discussion of their structural data.

      Please review our responses in the Public Reviews.

      (23) Have the authors considered utilising Li<sup>+</sup> to infer how cation selectivity impacts the allostery? Do they expect similar stabilisation of a higher-affinity sugar binding state with all cations?

      Please review our responses in the Public Reviews.

      (24) Site-directed mutagenesis could help strengthen the conclusions. Would the mutation(s) of Q372 and/or T373 support the water hypothesis by decreasing the affinity for sugars? Mutations of Thr 121 and Arg 295, combined with functional and/or HDX-MS analyses, may also help support some of the authors' claims regarding allosteric communication between the two substrate-binding sites.

      Please review our responses in the Public Reviews.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewing Editor Comment:

      The reviewers felt that the study could be improved by (1) better integrating the results with the existing literature in the field

      (1) In the Introduction and Results section of the manuscript, we had made every attempt to cite the relevant literature. (Reviewer 1 stated that “The literature is appropriately cited”). We agree with the Reviewing Editor that rather than simply cite the relevant literature, we could have done a better job of integrating our findings with what has been previously discovered by others. We have attempted to do this in the revised manuscript. Also, we have included many additional citations in the Introduction and in the first section of the Results where work by others has provided a framework for interpreting our single-cell studies.

      and (2) manipulating Trib expression and analyzing the expression of 1-2 HIX genes.

      (2) We are grateful for this suggestion. As suggested by the Reviewing Editor we have attempted to increase and decrease trbl expression and assess the effect on expression of two genes, Swim and CG15784.

      We increased trbl levels in the wing pouch using rn-Gal4, tub-Gal80<sup>ts</sup> and UAS-trbl. By transferring larvae for 24 h from 18oC to 31oC, we were able to induce trbl expression in the wing pouch. When these larvae were irradiated at 4000 rad, we found reduced levels of apoptosis in the wing pouch of discs that overexpressed trbl (Figure 7-figure supplement 1). This indicated that upregulation of trbl is radioprotective. Consistent with our findings, others have previously shown that upregulation of trbl and stalling in the G2 phase of the cells cycle protects cells from JNK-induced apoptosis (Cosolo et al., 2019, PMID:30735120) or that downregulating the G2/M progression promoting factor string protects cells from X-ray radiation induced apoptosis (Ruiz-Losada et al., 2021, PMID:34824391).

      As suggested by the Reviewing Editor, we also examined the effect of trbl overexpression on the induction of two “highly induced by X-ray irradiation (HIX)” gene, Swim and CG15784. Increasing trbl expression had no effect on the induction of Swim and only a modest decrease in the induction of CG15784 (Figure 7-figure supplement 2). Thus, increasing trbl expression, is in itself, insufficient to promote HIX gene expression indicating that other factors are necessary for HIX gene induction.

      We also attempted to reduce trbl expression, using three different RNAi lines. While some of these lines have been used previously by others to reduce trbl expression under unirradiated conditions (Cosolo et al., 2019, PMID:30735120), we nevertheless wanted to check if they reduced trbl induction following irradiation. For each of the three lines, we observed no obvious reduction in trbl RNA following irradiation when visualized using HCR (Author response image 1). Thus, any effects on gene expression that we observe could not be attributed to a decrease in trbl expression. We have therefore included the images showing a lack of knockdown in this Response to Reviews document but not included these experiments in the revised manuscript.

      Author response image 1.

      RNA in situ hybridizations using the hybridization chain reaction performed using probes to trbl. In A-F, the RNAi is expressed using nubbin-Gal4. In G-I the RNAi is expressed using rn-Gal4, tub-Gal80<sup>ts</sup>. white-RNAi was used as a control (A, B, G, H). Three different RNAi lines directed against trbl were tested: Vienna lines VDRC 106774 (C, D) and VDRC 22113 (E, F), and Bloomington line BL42523. In no case was a reduction in trbl RNA upregulation in the wing pouch following 4000 rad observed, except for one disc (n = 6) of VDRC 106774 crossed to nubbin-gal4.

      Reviewer #1 (Public review):

      Summary:

      The authors analyze transcription in single cells before and after 4000 rads of ionizing radiation. They use Seuratv5 for their analyses, which allows them to show that most of the genes cluster along the proximal-distal axis. Due to the high heterogeneity in the transcripts, they use the Herfindahl-Hirschman index (HHI) from Economics, which measures market concentration. Using the HHI, they find that genes involved in several processes (like cell death, response to ROS, DNA damage response (DDR)) are relatively similar across clusters. However, ligands activating the JAK/STAT, Pvr, and JNK pathways and transcription factors Ets21C and dysf are upregulated regionally. The JAK/STAT ligands Upd1,2,3 require p53 for their upregulation after irradiation, but the normal expression of Upd1 in unirradiated discs is p53-independent. This analysis also identified a cluster of cells that expressed tribbles, encoding a factor that downregulates mitosis-promoting String and Twine, that appears to be G2/M arrested and expressed numerous genes involved in apoptosis, DDR, the aforementioned ligands, and TFs. As such, the tribbles-high cluster contains much of the heterogeneity.

      Strengths:

      (1) The authors have used robust methods for rearing Drosophila larvae, irradiating wing discs, and analyzing the data with Seurat v5 and HHI.

      (2) These data will be informative for the field.

      (3) Most of the data is well-presented

      (4) The literature is appropriately cited.

      We thank the reviewer for these comments.

      Weaknesses:

      (1) The data in Figure 1 are single-image representations. I assume that counting the number of nuclei that are positive for these markers is difficult, but it would be good to get a sense of how representative these images are and how many discs were analyzed for each condition in B-M.

      For each condition at least 5 discs were imaged but we imaged up to 15 discs in some cases. We tried to choose a representative disc for each condition after looking at all of them. All discs imaged under each condition are shown below; the disc chosen for the figure is indicated with an asterisk. All scale bars are 100 mm.

      Author response image 2.

      Images for discs shown in Manuscript Figure 1panels B, C

      Author response image 3.

      Images for discs shown in Manuscript Figure 1panels D, E

      Author response image 4.

      Images used in Manuscript Figure 1, F, G

      Author response image 5.

      Images used in Manuscript Figure 1H, I

      Author response image 6.

      Images used in Manuscript Figure 1J, K

      Author response image 7.

      Images used in Manuscript Figure 1L, M

      (2) Some of the figures are unclear.

      It is unclear to us exactly which figures the Reviewer is referring to. Perhaps this is the same issue mentioned below in “Recommendations for the authors”. We address it below.

      Reviewer #1 (Recommendations for the authors):

      (1) Regarding Figure 1, what is stained in blue? Is it DAPI? If so, this should be added to the figure legend.

      Thank you for pointing out this omission. This has been addressed in the revised manuscript.

      It is very difficult to see blue on black, so could the authors please outline the discs?

      Alternatively, they could show DAPI in green and the markers (pH2Av, etc) in magenta.

      We used DAPI (blue) as a way of outlining the discs. While we appreciate the reviewer’s concern, after reviewing the images, we found that the blue is clearly visible when the document is viewed on the screen. It is less obvious if the document is printed on some kinds or printers. Since boosting this channel would make the signal from the channels more difficult to see, we left the images as they were.

      (2) Figure 3, Figure Supplement 2, panel B. It is not possible to read the gene names in the panel's current form. Please break this up into 4 lines (as much as possible from the current 2).

      Thank you for this suggestion. We have done this in the revised manuscript.

      Reviewer #2 (Public review):

      This manuscript investigates the question of cellular heterogeneity using the response of Drosophila wing imaginal discs to ionizing radiation as a model system. A key advance here is the focus on quantitatively expressing various measures of heterogeneity, leveraging single-cell RNAseq approaches. To achieve this goal, the manuscript creatively uses a metric from the social sciences called the HHI to quantify the spatial heterogeneity of expression of individual genes across the identified cell clusters. Inter- and intra-regional levels of heterogeneity are revealed. Some highlights include the identification of spatial heterogeneity in the expression of ligands and transcription factors after IR. Expression of some of these genes shows dependence on p53. An intriguing finding, made possible by using an alternative clustering method focusing on cell cycle progression, was the identification of a high-trbl subset of cells characterized by concordant expression of multiple apoptosis, DNA damage repair, ROS-related genes, certain ligands, and transcription factors, collectively representing HIX genes. This high-trbl set of cells may correspond to an IR-induced G2/M arrested cell state.

      Overall, the data presented in the manuscript are of high quality but are largely descriptive. This study is therefore perceived as a resource that can serve as an inspiration for the field to carry out follow-up experiments.

      Thank you for your assessment of the work.

      Reviewer #2 (Recommendations for the authors):

      I suggest two major points for improvement:

      (1) It is important to test whether manipulation of trbl levels (i.e., overexpression, knockdown, mutation) would result in measurable biological outcomes after IR, such as altered HIX gene expression, altered cell cycle progression, or both. This may help disentangle the question of whether high trbl expression and correlated HIX gene expression are a cause or consequence of G2/M stalling.

      We have described these experiments at the beginning of this Response to Reviews document when addressing the comments made by the Reviewing Editor. Please see Figure 7, figure supplements 1 and 2. These experiments suggest that upregulation of trbl offers some protection from radiation-induced death, yet it is itself insufficient to induce expression of two HIX genes tested. As we have also described earlier, three different RNAi lines tested did not reduce trbl upregulation after irradiation.

      (2) A more extensive characterization of the high-trbl cell state would also be appropriate, particularly in terms of their relationship to the cell cycle.

      We attempted to address this issue in two ways. First, we used the expression of a trbl-gfp transgene and RNA in-situ hybridization experiments to visualize the distribution of the high-trbl cells (shown in new manuscript figure, Figure 6-figure supplement 3). When examining trbl RNA in irradiated discs, there is no obvious demarcation between cells that express high levels of trbl and other cells. This is also apparent in the UMAP shown in Figure 6A and A’. Most cells seem to express trbl; cells in the “high trbl” cluster simply express more trbl than others. We observed cells expressing trbl and PCNA as well as cells expressing only one of those two genes at detectable levels. Thus, it was not possible to distinguish the “high trbl” cells from other cells by this approach.

      We decided instead to focus on examining the expression of other cell-cycle genes in the high-trbl cluster. We have added a paragraph in the Results section that details our findings. Many transcriptional changes are indeed consistent with stalling in G2 such as high levels of trbl and low levels of string (stg). Additionally, that the cells are likely in G2 is consistent with reduced levels of genes that are normally expressed at other stages of the cell cycle: G1 genes such as E2f1 and Dp, S-phase genes such as several Mcm genes, PCNA and RnrS, and genes that encode mitotic proteins such as polo, Incenp and claspin. There are however, several anomalies such as slightly increased expression of the early-G1 cyclin, CycD, and the retinoblastoma ortholog Rbf. Thus, at least as assessed by the transcriptome, this cluster may not correspond to a cell state that is found under normal physiological conditions.

      (3) Minor: p. 12, line 3. Figure 5A is mentioned, but it seems that it should be 4A instead.

      Thank you for pointing this out. We have addressed this in our revisions.

      Reviewer #3 (Public review):

      Strengths:

      Overall, the manuscript makes a compelling case for heterogeneity in gene expression changes that occur in response to uniform induction of damage by X-rays in a single-layer epithelium. This is an important finding that would be of interest to researchers in the field of DNA damage responses, regeneration, and development.

      Weaknesses:

      This work would be more useful to the field if the authors could provide a more comprehensive discussion of both the impact and the limitations of their findings, as explained below.

      Propidium iodide staining was used as a quality control step to exclude cells with a compromised cell membrane. But this would exclude dead/dying cells that result from irradiation. What fraction of the total do these cells represent? Based on the literature, including works cited by the authors, up to 85% of cells die at 4000R, but this likely happens over a longer period than 4 hours after irradiation. Even if only half of the 85% are PI-positive by 4 hr, this still removes about 40% of the cell population from analysis. The remaining cells that manage to stay alive (excluding PI) at 4 hours and included in the analysis may or may not be representative of the whole disc. More relevant time points that anticipate apoptosis at 4 hr may be 2 hr after irradiation, at which time pro-apoptotic gene expression peaks (Wichmann 2006). Can the authors rule out the possibility that there is heterogeneity in apoptosis gene expression, but cells with higher expression are dead by 4 hours, and what is left behind (and analyzed in this study) may be the ones with more uniform, lower expression? I am not asking the authors to redo the study with a shorter time point, but to incorporate the known schedule of events into their data interpretation.

      We thank the reviewer for these important comments. The generation of single-cell RNA-seq data from irradiated cells is tricky. Many cells have already died. Even those that do not incorporate propidium iodide are likely in early stages of apoptosis or are physiologically unhealthy and likely made it through our FACS filters. Indeed, in irradiated samples up to 57% of sequenced cells were not included in our analysis since their RNA content seemed to be of low quality. It is therefore likely that our data are biased towards cells that are less damaged. As advised by the reviewer, we will include a clearer discussion of these issues as well as the time course of events and how our analysis captures RNA levels only at a single time point.

      If cluster 3 is G1/S, cluster 5 is late S/G2, and cluster 4 is G2/M, what are clusters 0, 1, and 2 that collectively account for more than half of the cells in the wing disc? Are the proportions of clusters 3, 4, and 5 in agreement with prior studies that used FACS to quantify wing disc cells according to cell cycle stage?

      Work by others (Ruiz-Losada et al., 2021, PMID:34824391) has shown that almost 80% of cells have a 4C DNA content 4 h after 4,000 rad X-ray irradiation. The high-trbl cluster accounts for only 18% of cells and can therefore account for a minority of cells with a 4C DNA content.

      Thus clusters 0, 1 and 2 could potentially contain other populations that also have a 4C DNA content. Importantly, similar proportions of cells in these clusters are also observed in unirradiated discs.

      We expect that clusters 1 and 2 are largely comprised of cells in G2/M. Together, these clusters are marked by some genes previously found to be higher in FACS separated G2 cells compared to G1 cells (Liang et al., 2014, PMID: 24684830). These genes include Det, aurA, and ana1. Strangely, cluster 0 is not strongly marked by any of the 175 cell cycle genes used in our clustering (eff being the strongest marker) and has a lower-than-average expression of 165/175 cell cycle genes. Cluster 0 is however marked by the genes ac and sc, which are known to be expressed in proneuronal cell clusters interspersed throughout the disc that stall in G2 and form mitotically quiescent domains (Usui & Kimura 1992, Development, 116 (1992), pp. 601-610 (no PMID); Nègre et al., 2003, PMID: 12559497). Given these observations, we hypothesize that cluster 0 is largely comprised of stalled G2 cells like those found in ac/sc-expressing proneural clusters.

      The EdU data in Figure 1 is very interesting, especially the persistence in the hinge. The authors speculate that this may be due to cells staying in S phase or performing a higher level of repair-related DNA synthesis. If so, wouldn't you expect 'High PCNA' cells to overlap with the hinge clusters in Figures 6G-G'? Again, no new experiments are needed. Just a more thorough discussion of the data.

      We have found that the locations of elevated PCNA expression do not always correlate with the location of EdU incorporation either by examining scRNA-seq data or by using HCR to detect PCNA. PCNA expression is far more widespread as we now show in Figure 6-figure supplement 3.

      Trbl/G2/M cluster shows Ets21C induction, while the pattern of Ets21C induction as detected by HCR in Figures 5H-I appears in localized clusters. I thought G2/M cells are not spatially confined. Are Ets21C+ cells in Figure 5 in G2/M? Can the overlap be confirmed, for example, by co-staining for Trbl or a G2/M marker with Ets21C?

      The data show that the high-trbl cells are higher in Ets21C transcripts relative to other cell-cycle-based clusters after irradiation. This does not imply that high-trbl-cells in all regions of the disc upregulate Ets21C equally. Ets21C expression is likely heterogeneous in both ways – by location in the disc and by cell-cycle state.

      Induction of dysf in some but not all discs is interesting. What were the proportions? Any possibility of a sex-linked induction that can be addressed by separating male and female larvae?

      We can separate the cells in our dataset into male and female cells by expression of lncRNA:roX1/2. When we do this, we see X-ray induced dysf expressed similarly in both male and female cells. We think that it is therefore unlikely that this difference in expression can be attributed to cell sex. Another possibility is that dysf upregulation might be acutely sensitive to the developmental stage of the disc. This would require experiments with very precisely-staged larvae. We have not investigated this further as it is not a central issue in our paper.

      Reviewer #3 (Recommendations for the authors):

      Please check the color-coding in Figure 1A. The region marked as pouch appears to include hinge folds that express Zfh2 (a hinge marker) in Figure 2A (even after accounting for low Zfh2 expression in part of the pouch).

      We have corrected this and have marked the pouch region based on the analysis of expression of different hinge and pouch markers by Ayala-Camargo et al. 2013 (PMID 2398534).

      The statement 'Furthermore, within tissues, stem cells are most sensitive while differentiated cells are relatively radioresistant' needs to be qualified, as there are differences in radiosensitivity of adult versus embryonic stem cells (e.g., PMID: 30588339)

      We thank the reviewer for bringing this point to our attention and for pointing us to an article that addresses this issue in detail. We appreciate that our statement was rather simplistic – we have modified it and added two additional references.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Negreira, G. et al clearly presented the challenges of conducting genomic studies in unicellular pathogens and of addressing questions related to the balance between genome integrity and instability, pivotal for survival under the stressful conditions these organisms face and for their evolutionary success. This underlies the need for powerful approaches to perform single-cell DNA analyses suited to the small and plastic Leishmania genome. Accordingly, their goal was to develop such a novel method and demonstrate its robustness.

      In this study, the authors combined semi-permeable capsules (SPCs) with primary template-directed amplification (PTA) and adapted the system to the Leishmania genome, which is about 100 times smaller than the human genome and exhibits remarkable plasticity and mosaic aneuploidy. Given the size and organization of the Leishmania genome, the challenges were substantial; nevertheless, the authors successfully demonstrated that PTA not only works for Leishmania but also represents a significantly improved whole-genome amplification (WGA) method compared with standard approaches. They showed that SPCs provide a superior alternative for cell encapsulation, increasing throughput. The methodology enabled high-resolution karyotyping and the detection of fine-scale copy number variations (CNVs) at the single-cell level. Furthermore, it allowed discrimination between genotypically distinct cells within mixed populations.

      Strengths:

      This is a high-impact study that will likely contribute to our understanding of DNA replication and the genetic plasticity of Leishmania, including its well-documented aneuploidy, somy variations, CNVs, and SNPs - all key elements for elucidating various aspects of the parasite's biology, such as genome evolution, genetic exchange, and mechanisms of drug resistance.

      Overall, the authors clearly achieved their objectives, providing a solid rationale for the study and demonstrating how this approach can advance the investigation of Leishmania's small, plastic genome and its frequent natural strain mixtures within hosts. This methodology may also prove valuable for genomic studies of other single-celled organisms.

      We thank the reviewer for the positive feedback and appreciation of the potential applications for the methodology we describe here.

      Weaknesses:

      The discussion section could be enriched to help readers understand the significance of the work, for instance, by more clearly pointing out the obstacles to a better understanding of DNA replication in Leishmania. Or else, when they discuss the results obtained at the level of nucleotide information and the relevance of being able to compare, in their case, the two strains, they could refer to the implications of this level of precision to those studying clonal strains or field isolates, drug resistance or virulence in a more detailed way.

      We thank the reviewer for the suggestions. Indeed, single-cell DNA sequencing has successfully revealed cell-to-cell variability in replication timing and fork progression in mammalian cells[1,2] and we believe that the SPC-PTA workflow could be used in similar studies in Leishmania to complement bulk-based observations[3,4]. Regarding nucleotide information, it is indeed of high relevance to detect minor circulating variants with potential virulence impact and/or effect on drug resistance which could be missed by bulk sequencing. This includes the ability to detect co-occurring variants with potential epistatic effects. These topics will be further developed in the revised version. Finally, we will explicitly discuss how this methodology can be applied beyond Leishmania, to investigate genome plasticity, adaptation, and evolutionary processes in other organisms.

      Reviewer #2 (Public review):

      Summary:

      Negreira et al. present an application of a novel single-cell genomics approach to investigate the genetic heterogeneity of Leishmania parasites. Leishmania, while also representing a major global disease with hundreds of thousands of cases annually, serves as a model to test the rigor of the sequencing strategy. Its complex karyotypic nature necessitates a method that is capable of resolving natural variation to better understand genome dynamics. Importantly, an earlier single-cell genomics platform (10x Chromium) is no longer available, and new methods need to be evaluated to fill in this gap.

      The study was designed to evaluate whether a capsule-based cell capture method combined with primary template-directed amplification (PTA) could maintain levels of genomic heterogeneity represented in an equal mixture of two Leishmania strains. This was a high bar, given the relatively small protozoan genome and prior studies that showed limitations of single-cell genomics, especially for gene-level copy number changes. Overall, the study found that semi-permeable capsules (SPC) are an effective way to isolate high-quality single cells. Additionally, short reads from amplified genomes effectively maintained the relative levels of variation in the two strains on the chromosome, gene copy, and individual base level. Thus, this method will be useful to evaluate adaptive strategies of Leishmania. Many researchers will also refer to these studies to set up SPC collection and PTA methods for their organism of choice.

      Strengths:

      (1) The use of SPC and PTA in a non-bacterial organism is novel. The study displays the utility of these methods to isolate and amplify single genomes to a level that can be sequenced, despite being a motile organism with a GC-rich genome.

      (2) The authors clearly outlined their optimization strategy and provided numerous quality-control metrics that inspire confidence in the success of achieving even chromosomal coverage relative to ploidy.

      (3) The use of two distinct Leishmania strains with known clonal status provided strong evidence that PTA-based amplification could reflect genome differences and displayed the utility of the method for studies of rare genotypes.

      (4) Evaluating the SPCs pre- and post-amplification with microscopy is a practical and robust way of determining the success of SPC formation and PTA.

      (5) The authors show that the PTA-based approach easily resolved major genotypic ploidy in agreement with a prior 10x Chromium-based study. The new method had improved resolution of drug resistance genotypes in the form of both copy-number variations and single-nucleotide polymorphisms.

      (6) In general, the authors are very thorough in describing the methods, including those used to optimize PTA lysis and amplification steps (fresh vs frozen cells, naked DNA vs sorted cells, etc). This demonstrates a depth of knowledge about the procedure and leaves few unanswered questions.

      (7) The custom, multifaceted, computational assessment of coverage evenness is a major strength of the study and demonstrates that the authors acknowledge potential computational factors that could impact the analysis.

      We deeply appreciate the positive and encouraging feedback on our manuscript.

      Weaknesses:

      (1) The rationale behind some experimental/analysis choices is not well-described. For example, the rationale behind methanol fixation and heat-lysis is unclear. Additionally, the choice of various methods to assess "evenness" is not justified (e.g. why are multiple methods needed? What is the strength of each method?). Also, there is no justification for using 100k reads for subsampling. Finally, what exactly constitutes a "confidently-called SNP"?

      The methanol fixation prior to lysis is part of the original protocol described in the Single-Microbe Genome Barcoding Kit manual and was meant to facilitate lysis and DNA denaturation in bacterial cells (for which the kit was originally developed). However, in our preliminary tests with bulk samples – described in the supplementary material – we noticed a strong negative effect on lysis efficiency/DNA recovery when parasites were fixed with methanol. Thus, we decided to test the effect of skipping this step in the single-cell DNA workflow. We kept the SPC_STD1 sample to have a safe control where the full workflow described in the kit manual was followed.

      As we were unsure if the standard lysis (25 ˚C for 15 minutes) would work efficiently for Leishmania, we included the heat-lysis (99˚C for 15 minutes) as well as the longer incubation lysis (25 ˚C for 1h). These modifications were listed as validated alternatives in the kit's manual.

      The 100k reads threshold was chosen based on the number of reads found in the 'true cell' with the lowest read count.

      Regarding variant calling, a variant was considered confidently called if it was covered, at single-cell level, by at least one deduplicated read with Phred quality above Q30 and mapping quality (MAPQ) also above 30.

      In the revised version, we will include these explanations and improve the explanation of the metrics used to estimate coverage quality.

      (2) In the methods, the STD protocol lists a 15-minute amplification at 45C whereas the PTA protocol involves 10h at 37C. This is a dramatic difference in incubation time and should be addressed when comparing results from the two methods. It is not really a fair comparison when you look at coverage levels; of course, a 10-hour incubation is going to yield more reads than a 15-minute incubation.

      We agree with the reviewer that the longer incubation period of PTA might explain the higher read count seen in the PTA samples, although the differences in amplification kinetics (linear in PTA, exponential in STD) and potential differences in amplification saturation points make it difficult to compare them. For instance, an updated version of PTA (ResolveDNA V2) uses a lower amplification time (2.5 h) and achieves similar amplification levels compared to the 10h incubation time, suggesting PTA amplification saturates well before the 10h time. In any case, all quality check metrics were done with the cells subsampled to 100 k reads to mitigate the effect of read count differences on the data quality.

      (3) There is a lack of quantitative evaluations of the SPCs. e.g. How many capsules were evaluated to assess doublets? How many capsules were detected as Syto5 positive in a successful vs an unsuccessful experiment?

      We agree with the reviewer but during experimental execution SPCs were only assessed qualitatively via microscopy following the Single-cell microbe DNA barcoding kit manual. No quantitative analysis was done and therefore we do not have this data. Regarding doublet, this was done in silico based on the detection of SPCs containing mixed genomes from the two strains used in the study as described in the Materials and Methods. As pointed by another reviewer, this only allow the detection of inter-strain doublets. In the revised version, we explain this and add an estimation of total doublets based on the inter-strain doublet rate.

      (4) The authors do not address some of the amplification results obtained under various conditions. For example, why did temperature-based lysis of STD4 lead to amplification failure? Also, what is the reason for fewer "true" cells (higher background) in the PTA samples compared to the STD samples? Is this related to issues with barcoding or, alternatively, substandard amplification as indicated by lower read amounts in some capsules (knee plots in Figure 1C)?

      After exchange with the technical support team of the SPC generator kit, it was clarified that the heat lysis done in STD4 should have had a shorter incubation time (10 minutes instead of 15 minutes). We suspect that the longer incubation time, combined with the higher temperature and the harsh lysis condition with 0.8M KOH might have damaged SPCs and therefore DNA might have leaked out of them before WGA. In the microscopy images, SPCs in STD4 show a swollen aspect not seen in the other samples. In the revised version we will explain this more clearly.

      (5) The paper presents limited biological relevance. Without this, the paper describes an improvement in genome amplification methods and some proof-of-concept analyses. Using a 1:1 mixture of parasites with different genotypes, the authors display the utility of the method to resolve genetic diversity, but they don't seek to understand the limits of detecting this diversity. For some, the authors do not comment on the mixed karyotypes from the HU3 cells (Figure 3F) other than to state that this line was not clonal. For CNVs, the two loci evaluated were detected at relatively high copy number (according to Figure 4C, they are between 4 and 20 copies). Thus, the sensitivity of CNV detection from this data remains unclear; can this approach detect lower-level CNVs like duplications, or minor CNVs that do not show up in every cell?

      As described above we will include more discussion on potential biological relevance of the method in the revised version of the manuscript. In the revised version we will attempt to use dedicated bioinformatic tools to discover de novo CNVs, as per the suggestion of other reviewers. This might also allow us to determine the detection limit of the methodology for CNVs.

      (6) The authors state that Leishmania can carry extrachromosomal copies of important genes. There is no discussion about how the presence of these molecules would affect the amplification steps and CNV detection. For example, the phi29 enzyme is very processive with circular molecules; does its presence lead to overamplification and overrepresentation in the data? Is this evident in the current study? This information would be useful for organisms that carry this type of genetic element.

      We believe our data, which uses short-read sequences, does not allow to differentiate between intra-chromosomal CNVs and linear or circular episomal CNVs, so we cannot define if circular CNVs are over-amplified. Of note, we have previously demonstrated that the M-locus CNV in chromosome 36 is intrachromosomal, not circular (episomal)[5].

      (7) The manuscript is missing a comparison with other similar studies in the field. For example, how does this coverage level compare to those achieved for other genomes? Can this method achieve amplification levels needed to assess larger genomes? Has there been any evaluation of base composition effects since Leishmania is a GC-rich genome?

      We believe the SPC-PTA workflow can be applied to organisms with larger genomes as PTA was developed specifically for mammalian cells[6], and also because, in our hands, it outperformed the 10X scDNA solution, which was developed for mammals.

      We believe direct comparison with other studies regarding coverage levels is elusive because other steps in the workflow apart from the WGA, such as the library preparation (PCR-based in our case), as well as genome features like GC content, size, and presence of repetitive regions, can also affect coverage levels and evenness. One strength of our approach was the use a single sample (the 50/50 mix between two L. donovani strain) for all conditions, thus removing potential parasite-specific biases. In addition, the application of a multiplexing system during barcoding allowed us to combine all samples prior to library preparation, thus removing potential differences introduced by this step.

      Regarding the effect of GC-content, we did notice a positive bias in all samples in regions with higher GC content, which had to be corrected in silico. This was the opposite to a negative bias observed in previous study[7] likely due to differences in WGA and/or library preparation. In the revised version, we will include a supplementary figure showing the GC bias.

      (8) Cost is mentioned as a benefit of the SPC platform, and savings are achieved when working in a plate format, but no details are included on how this was evaluated.

      In the revised version we will provide precise cost estimates and the rationale for the estimation.

      (9) The Zenodo link for custom scripts does not exist, and code cannot be evaluated.

      The full Zenodo link (https://doi.org/10.5281/zenodo.17094083) will be included in the revised version.

      Reviewer #3 (Public review):

      Summary

      In this manuscript, Negreira et al. propose a new scDNAseq method, using semi-permeable capsules (SPCs) and primary template-directed amplification (PTA). The authors optimize several metrics to improve their predictions, such as determining GC bias, Intra-Chromosomal fluctuation (ICF -metric to differentiate replicative and non-replicative cells) and Intra-chromosomal coefficient of variation (ICCV - chromosome read distribution). The coverage evenness was evaluated using the fini index and the median absolute pairwise difference between the counts of two consecutive bins. They validate the proposed method using two Leishmania donovani strains isolated from different countries, BPK081 (low genomic variability) and HU3 (high genomic variability). Then, they showed that the method outperforms WGA and has similar accuracy to the discontinued 10X-scDNA (10X Genomics), further improving on short CNV identification. The authors also show that the method can identify somy variations, insertions/deletions and SNP variations across cells. This is a timely and very relevant work that has a wide applicability in copy number variation assessment using single-cell data.

      Strengths

      I really appreciate this work. My congratulations to the authors. All my comments below only aim to improve an already solid manuscript.

      We thank the reviewer for the enthusiasm and positive feedback.

      Weaknesses

      (1) Data availability: Although the authors provide a Zenodo link, the data is restricted. I also could not access the GitHub link in the Zenodo website: https://github.com/gabrielnegreira/2025_scDNA_paper. The authors should make these files available.

      Both the Zenodo (https://doi.org/10.5281/zenodo.17094083) and the GitHub (https://github.com/gabrielnegreira/2025_scDNA_paper) repositories are now publicly available.

      (2) 2-SPC-PTA and SPC-STD cell count comparison: The authors have consistently proven that the SPC-PTA method was superior to SPC-STD. However, there are a few points that should be clarified regarding the SPC-PTA results. Is there an explanation for the lower proportion of SPC to true cells success in SPC-STD, which reflects the bimodal distribution for the reads per cell in SPC-PTA2 and a three-to-multimodal distribution in SPC-PTA1 in Figure 1B? Also, in Table 1, does the number of reads reflect the number of reads in all sequenced SPCs or only in the true cells? If it is in the SPCs, I suggest that the authors add a new column in the table with the "Number of reads in true cells" to account for this discrepancy.

      The reason for the higher presence of 'background' SPCs in the PTA samples is not clear, but we hypothesize that it could be due to PTA favoring amplification of small, free floating DNA molecules that might have been trapped in cell-free SPCs, as PTA works with shorter amplicons. Also, the longer incubation time seen in PTA (10 h) might have allowed enhanced amplification of low quantities of free-floating DNA to detectable levels. Regarding Table 1, indeed it only show the total number of reads per sample. In the revised version we will include the suggested column to Table 1.

      (3) The authors should evaluate the results with a higher coverage for SCP-PTA. I understand that the authors subsampled the total read to 100,000 to allow cross-sample comparisons, especially between SPC-STD and SPC-PTA. However, as they concluded that the SPC-PTA was far superior, and the samples SPC-PTA1 and SPC-PTA2 had an "elbow" of 650,493 and 448,041, respectively, it might be interesting to revisit some of the estimations using only SPC-PTA samples and a higher coverage cutoff, as 400,000.

      We believe the 100.000 cutoff is already high for aneuploidy analysis as we have successfully reconstructed parasite karyotype with 20.000 reads per cell8, so a higher cutoff will likely not improve it. For CNV analysis, in the revised version, we will try to identify de novo CNVs using dedicated bioinformatic tools as per other reviewer suggestions. There, we will also test if a higher CNV detection sensitivity is achieved using the suggested 400,000 reads cutoff for the PTA samples.

      (4) Doublet detection: I suggest that the authors be a little more careful with their definition of doublets. The doublet detection was based on diagnostic SNPs from the two strains, BPK081 and HU3, which identify doublets between two very different and well-characterised strains. However, this method will probably not identify strain-specific doublets. This is of minor importance for cloned and stable strains with few passages, as BPK081, but might be more relevant in more heterogeneous strains, as HU3. Strain-specific doublets might also be relevant in other scenarios, as multiclonal infections with different populations from the same strain in the same geographic area. One positive point is that the "between strain doublet count" was low, so probably the within-strain doublet count should be low too. The manuscript would benefit from a discussion on this regard.

      We fully agree with the reviewer. We will make it clear in the revised version that we quantify inter-strain doublets only, and we will also provide an estimation of total doublets based on the inter-strain doublet rate.

      (5) Nucleotide sequence variants and phylogeny: I believe that a more careful description of the phylogenetic analysis and some limitations of the sequence variant identification would benefit the manuscript.

      (5.1) As described in the methods, the authors intentionally selected two fairly different Leishmania donovani strains, HU3 and BPK081, and confirmed that the sequent variant methodology can separate cells from each strain. It is a solid proof of concept. However, most of the multiclonal infections in natural scenarios would be caused by parasite populations that diverge by fewer SNPs, and will be significantly harder to detect. Hence, I suggest that a short discussion about this is important.

      We will add a short discussion clarifying the limitations, while noting that our data demonstrate the ability of the approach to resolve very closely related cells, as illustrated by the fine-scale genetic differences observed within the clonal BPK081 population and by the detection of rare variants at targeted loci. We will also emphasize that the sensitivity to detect closely related genotypes depends on sequencing depth and the genomic regions considered.

      (5.2) The authors should expand on the description of the phylogenetic tree. In the HU3 on Figure 5F left panel, most of the variation is observed in ~8 cells, which goes from position 0 to position ~28.000. Most of the other cells are in very short branches, from ~29.000 to 30.4000 (5F right panel). Assuming that this representation is a phylogram, as the branches are short, these cells diverge by approximately 100-2000 SNPs. It is unexpected (but not impossible) that such ~8 divergent cells be maintained uniquely (or in very low counts) in the culture, unless this is a multiclonal infection. I would carefully investigate these cells. They might be doublets or have more missing data than other cells. I would also suggest that a quick discussion about this should be added to the manuscript.

      In the revised version we will improve the description of the phylogenetic analysis. We will also investigate deeper the 8 mentioned cells to define if they have confounding factors that might have led to their discrepancy. The possibility of multiclonal infection in HU3 is not excluded as this strain was not cloned after isolation.

      References:

      (1) Dileep, V., Gilbert, D. M., Dileep, V. & Gilbert, D. M. Single-cell replication profiling to measure stochastic variation in mammalian replication timing. Nat. Commun. 9, 427 (2018).

      (2) Miura, H. et al. Single-cell DNA replication profiling identifies spatiotemporal developmental dynamics of chromosome organization. Nat. Genet. 51, 1356–1368 (2019).

      (3) Marques, C. A. et al. Genome-wide mapping reveals single-origin chromosome replication in Leishmania, a eukaryotic microbe. Genome Biol. 16, 230 (2015).

      (4) Damasceno, J. D. et al. Leishmania major chromosomes are replicated from a single high-efficiency locus supplemented by thousands of lower efficiency initiation events. Cell Rep. 44, 116094 (2025).

      (5) Imamura, H. et al. Evolutionary genomics of epidemic visceral leishmaniasis in the Indian subcontinent. eLife 5, e12613 (2016).

      (6) Gonzalez-Pena, V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc. Natl. Acad. Sci. 118, e2024176118 (2021).

      (7) Imamura, H. et al. Evaluation of whole genome amplification and bioinformatic methods for the characterization of Leishmania genomes at a single cell level. Sci. Rep. 10, 15043 (2020).

      (8) Negreira, G. H. et al. High throughput single-cell genome sequencing gives insights into the generation and evolution of mosaic aneuploidy in Leishmania donovani. Nucleic Acids Res. 50, 293–305 (2022).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary: 

      This study is an evaluation of patient variants in the kidney isoform of AE1 linked to distal renal tubular acidosis. Drawing on observations in the mouse kidney, this study extends findings to autophagy pathways in a kidney epithelial cell line. 

      Strengths: 

      Experimental data are convincing and nicely done.

      Thank you

      Weaknesses: 

      Some data are lacking or not explained clearly. Mutations are not consistently evaluated throughout the study, which makes it difficult to draw meaningful conclusions.

      We have revised our manuscript to clarify some earlier explanations and provided rationale for focusing on specific variants throughout the study.

      Reviewer #2 (Public review):

      Context and significance: 

      Distal renal tubular acidosis (dRTA) can be caused by mutations in a Cl-/HCO3- exchanger (kAE1) encoded by the SLC4A1 gene. The precise mechanisms underlying the pathogenesis of the disease due to these mutations are unclear, but it is thought that loss of the renal intercalated cells (ICs) that express kAE1 and/or aberrant autophagy pathway function in the remaining ICs may contribute to the disease. Understanding how mutations in SLC4A1 affect cell physiology and cells within the kidney, a major goal of this study, is an important first step to unraveling the pathophysiology of this complex heritable kidney disease. 

      Summary: 

      The authors identify a number of new mutations in the SLC4A1 gene in patients with diagnosed dRTA that they use for heterologous experiments in vitro. They also use a dRTA mouse model with a different SLC4A1 mutation for experiments in mouse kidneys. Contrary to previous work that speculated dRTA was caused mainly by trafficking defects of kAE1, the authors observe that their new mutants (with the exception of Y413H, which they only use in Figure 1) traffic and localize at least partly to the basolateral membrane of polarized heterologous mIMCD3 cells, an immortalized murine collecting duct cell line. They go on to show that the remaining mutants induce abnormalities in the expression of autophagy markers and increased numbers of autophagosomes, along with an alkalinized intracellular pH. They also reported that cells expressing the mutated kAE1 had increased mitochondrial content coupled with lower rates of ATP synthesis. The authors also observed a partial rescue of the effects of kAE1 variants through artificially acidifying the intracellular pH. Taken together, this suggests a mechanism for dRTA independent of impaired kAE1 trafficking and dependent on intracellular pH changes that future studies should explore. 

      Strengths: 

      The authors corroborate their findings in cell culture with a well-characterized dRTA KI mouse and provide convincing quantification of their images from the in vitro and mouse experiments

      Thank you  

      Weaknesses: 

      The data largely support the claims as stated, with some minor suggestions for improving the clarity of the work. Some of the mutants induce different strengths of effects on autophagy and the various assays than others, and it is not clear why this is from the present manuscript, given that they propose pHi and the unifying mechanism

      We have modified our manuscript to discuss the various strengths of the mutants and emphasize that alteration of cytosolic pH by kAE1 variants may not be the only mechanism leading to dRTA.  

      Reviewer #3 (Public review):

      Summary: 

      The authors have identified novel dRTA causing SLC4A1 mutations and studied the resulting kAE1 proteins to determine how they cause dRTA. Based on a previous study on mice expressing the dRTA kAE1 R607H variant, the authors hypothesize that kAE1 variants cause an increase in intracellular pH, which disrupts autophagic and degradative flux pathways. The authors clone these new kAE1 variants and study their transport function and subcellular localization in mIMCD cells. The authors show increased abundance of LC3B II in mIMCD cells expressing some of the kAE1 variants, as well as reduced autophagic flux using eGFP-RFP-LC3. These data, as well as the abundance of autophagosomes, serve as the key evidence that these kAE1 mutants disrupt autophagy. Furthermore, the authors demonstrate that decreasing the intracellular pH abrogates the expression of LC3B II in mIMCD cells expressing mutant SLC4A1. Lastly, the authors argue that mitochondrial function, and specifically ATP synthesis, is suppressed in mIMCD cells expressing dRTA variants and that mitochondria are less abundant in AICs from the kidney of R607H kAE1 mice. While the manuscript does reveal some interesting new results about novel dRTA causing kAE1 mutations, the quality of the data to support the hypothesis that these mutations cause a reduction in autophagic flux can be improved. In particular, the precise method of how the western blots and the immunofluorescence data were quantified, with included controls, would enhance the quality of the data and offer more supportive evidence of the authors' conclusions. 

      Strengths: 

      The authors cloned novel dRTA causing kAE1 mutants into expression vectors to study the subcellular localization and transport properties of the variants. The immunofluorescence images are generally of high quality, and the authors do well to include multiple samples for all of their western blots.

      Thank you

      Weaknesses: 

      Inconsistent results are reported for some of the variants. For example, R295H causes intracellular alkalinization but also has no effect on intracellular pH when measured by BCECF. The authors also appear to have performed these in vitro studies on mIMCD cells that were not polarized, and therefore, the localization of kAE1 to the basolateral membrane seems unlikely, based upon images included in the manuscript. Additionally, there is no in vivo work to demonstrate that these kAE1 variants alter intracellular pH, including the R607H mouse, which is available to the authors. The western blots are of varying quality, and it is often unclear which of the bands are being quantified. For example, LAMP1 is reported at 100kDa, the authors show three bands, and it is unclear which one(s) are used to quantify protein abundance. Strikingly, the authors report a nonsensical value for their quantification of LCRB II in Figure 2, where the ratio of LCRB II to total LCRB (I + II) is greater than one. The control experiments with starvation and bafilomyocin are not supportive and significantly reduce enthusiasm for the authors' findings regarding autophagy. There are labeling errors between the manuscript and the figures, which suggest a lack of vigilance in the drafting process.

      The R295H variant was identified in a dRTA patient and as such, it was important to report it. However, this is the first mutation located in the amino-terminus of the protein, which may be involved in protein-protein interactions, so other mechanisms may cause dRTA for this variant. We have therefore modified our manuscript to state that alteration of cytosolic pH may not be the only mechanism leading to dRTA. At this time, we are not able to measure cytosolic pH in vivo and hope to be able to do it in the future.

      In our revised manuscript, we also show cell surface biotinylation results supporting that plasma membrane abundance of the kAE1 S525F and R589H variants is not significantly different than WT in non-polarized mIMCD3 cells (Figure 3 A&B), in line with the predominant basolateral localization of the variants in polarized cells (Figure 1C). Therefore, these two mutant proteins are not mis-trafficked in non-polarized cells.  Finally, we have clarified which bands have been used for quantification and corrected quantifications (including ratio measurements).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) R295H is recessively inherited, whereas Y413H is dominantly inherited: this is interesting and may be linked to their cellular expression and function. Is this information known for the other mutations examined in this study? 

      The S25F and R589H dRTA variants have both been reported to exhibit autosomal dominant inheritance. This information is now updated in lines 146 and 158-159.

      (2) R589H expression levels are evaluated in the Western blot of Figure 1, but localization and activity are not examined in Figure 2. However, R589H is included in autophagy experiments shown in later figures. Similarly, mutant R607H is the subject of several experiments further into the manuscript, but no initial analysis is provided for this variant. 

      Protein abundance and localization of the R589H mutant in mIMCD3 cells have been shown in our previous publication in Supplementary Fig 5D and Supplementary Fig 2J [1]. This now indicated on lines 158-159. Our previous paper also presented a detailed study of the R607H dRTA mutant, the mouse model corresponding to the human R589H mutation. This is now indicated on lines 70, 118-119 and 180. The present study builds upon those published findings.

      (3) This inconsistency is confusing, detracts from the usefulness of the study, and makes the comparative analysis of mutations incomplete. It is difficult to extrapolate from published studies in MDCK1 cells, which show different results on trafficking. 

      The mIMCD3 cell line, which more closely resembles the physiology of the mouse collecting duct than MDCK cells, was selected for this study and our previous one [1]. Accordingly, the results obtained are better aligned with in vivo evidence. In contrast, differences in mutant protein expression and localization observed in other cell lines, like the MDCK cells, are likely attributable to differences in their cellular origin. 

      (4) In Figure 2, could the authors explain why total LC3B is graphed for the data shown in mouse lysates, whereas the ratio of bands is analysed for cell lysates? Both sets of data show the two LC3B bands.

      Total LC3B levels were significantly increased in the mutant compared to WT; however, no significant difference was observed in the lipidation ratio. For this reason, that graph is not shown in the main paper but has been included in the Supplementary Figure 1D. 

      (5) In Figure 3, representative fluorescence images should be shown for all cell lines.

      We have now included representative immunofluorescence images for all cell lines in Figure 3C.

      (6) pH effects: Suggest that steady state pHi (Figure 3E) and rate of alkalization (Figure 1F) would be more effective together in Figure 1. The authors should show data for the effect of nigericin on cytoplasmic pH in Figure 3. If the rate of alkalinization in the mutant cells is reduced, shouldn't the intracellular steady state pH be more acidic? A cartoon depicting the transporter activity in the cell and the expected changes in pHi would be helpful. Is there a way to activate/inhibit NHE1 and rescue the effect of the mutant kAE1? It is unclear if the link between the mutant kAE1 and mitochondrial ATP production is a consequence of the intracellular pH or an indirect effect.

      We opted to keep the effect of nigericin on pHi in Supplementary Fig1A given that Figure 3 already contains 11 panels. Also, in intercalated cells, the kAE1 protein physiologically exports 1 molecule of bicarbonate in exchange of 1 chloride ion import hence a reduced transport activity would result in a more alkaline intracellular pH. To clarify this point, we have included a diagram in Figure 1E as suggested. However, to calculate the rate of intracellular alkalinisation, the transporter is functioning in the opposite direction, i.e. extruding chloride and importing bicarbonate (see methods protocol for transport assay). Therefore, in this assay (Figure 1G), a defective chloride/bicarbonate activity results in a reduced rate of intracellular alkalinisation rate. This is now explained on lines 169-172.

      Disruption of NHE1 function would impair sodium homeostasis and as such, potentially affect the activity of other proteins associated with acid-base balance and autophagy in collecting duct cells. Therefore, any resulting effects may not be confidently attributed specifically to the mutant kAE1. With nigericin, we aimed to alter pHi while affecting the least possible other ion concentration. Due to space considerations, Figure 1 has been reorganised to include the rate of alkalinisation and pHi (panels F and G). 

      Reviewer #2 (Recommendations for the authors):

      (1) The authors could improve the readability of this manuscript for a general audience by clarifying and summarizing the respective phenotype(s)/effect(s) of the different mutants in some kind of table in the main figures. It is hard to keep track of the different disease mutants alongside the KI mouse mutations, as the text frequently discusses multiple mutants at a time. 

      As requested, we added two tables (Supplementary Tables 1 & 2) in Supplementary files summarizing the data obtained in this study. We hope this will help the readership to keep track of each variant’s phenotype.

      (2) The subtitle of the results section of Figure 2 should be reworded to reflect that  whole kidney lysates are used for the KI mice and not the other mutants.

      As requested, the title in the Results section has been modified (lines 178-179).

      (3) More discussion of why the different mutants cause different strengths of phenotypes should be included.

      Different variants induce different degree of functional defects as seen in Figure 1F & G. The kAE1 R295H, the only amino acid substitution in the amino-terminal cytosol causing dRTA, does not affect the transporter’s function or cells’ pHi. Therefore, this variant may cause dRTA via a different pathway than transport-defective S525F or partially inactive R589H variants that both affect pHi. Our study does not exclude that dRTA may be caused by other defects than pHi alterations, including defective proteinprotein interactions. This discussion is now included in the manuscript on lines 386-391.

      Reviewer #3 (Recommendations for the authors):

      In general, I found the subject matter of this manuscript interesting and of value to the scientific community. The interpretation of the data and how much it supports the conclusion that "kAE1 variants increases pHi which alters mitochondrial function and leads to reduced cellular energy levels that eventually attenuate energy-dependent autophagic pathways" is largely incomplete. There are significant concerns about the quantification of Western blot data. Additionally, including the R607H variant in the in vitro experiments would improve the interpretation and extrapolation of in vitro data to the kidney.

      We apologize for the confusion with R589H and R607H variants. The R607H mutant is the murine ortholog to the human R589H dRTA variation. To clarify this, we have added this information on line 180, in addition to lines 118-119 and line 70.

      Suggestions:

      (1) Can an anion replacement experiment be performed in the mIMCD cells (no Cl or no HCO3) to determine that bicarbonate transport through AE1 is responsible for the reduced ATP rates in Figure 5? Inclusion of WT +dox control would be helpful to convince the reader of the effects.

      Because Seahorse real-time cell metabolism ATP rates measurements require specific and patented buffers with un-specified compositions, it was not possible to modify the Cl⁻ or HCO₃⁻ content during the ATP measurement assay. All cell lines, including empty vector cells (EV) were treated with doxycycline; thus, WT + dox was already included. The empty vector cell line treated with doxycycline allowed the exclusion of specific effects of doxycycline on mitochondrial activity as a control. This is now clarified in Figure 5 legend, lines 655-656.

      (2) Can the authors measure pHi in fresh kidney sections from the R607H mouse?

      Unfortunately, we are not currently able to measure pHi in fresh kidney sections and although we recognize it would benefit greatly to our study, establishing a new collaboration to perform this measurement would significantly delay the publication of this work; therefore, these results will not be available for the present manuscript. 

      (3) Does pH 7.0 media have any effect on autophagy, as shown in Figure 3? Why was pH 6.6 selected?

      The idea was to artificially acidify pHi in mutant cell lines (that have a steady state alkaline pHi) and assess whether this acidification corrects autophagy defects. We first determined that incubation in cell culture medium at pH 6.6 with 0.033 µM nigericin (final potassium concentration: 168 mM) for 2 hours provided optimal conditions, i.e. ensuring cell viability over the 2-hour period while effectively lowering intracellular pH to 6.9, as demonstrated in Supplementary Figure 1A-C.

      (4) In vitro experiments should be performed on polarized cells with kAE1 properly inserted in the basolateral membrane. Experiments on subconfluent, non-polarized cells do not support the hypothesis that transport functions of AE1 initiate the cascade of events attributed to these SLC4A1 mutations.

      To address this point, we have performed cell surface biotinylations on 70-80 % confluent mIMCD3 cells expressing kAE1 WT, S525F or R589H mutants and show that cell surface abundance of the mutants is not significantly different from the WT protein. This is now shown in Figure 3 A&B. As cell surface biotinylation provides a more quantitative assessment of protein cell surface abundance, we have removed the immunofluorescence images from non-polarised cells and replaced them with representative immunoblots from a cell surface biotinylation assay.

      Concerns:

      (1) No information about the B1 ATPase antibody used.

      Now provided in Supplementary Material, ATP6V1B1 Antibody from Bicell cat#20901.

      (2) No actin band in Figure 1E (as prepared).

      Actin bands are provided for each blot in Figure 1D.

      (3) Figures 1E and 1F are labelled wrong in the figure versus the results section. 

      Thank you for letting us know, this is now corrected.

      (4) The cortical sections shown in Figure 4 for the KI/KI do not appear to have the morphology of a CCD. The authors may want to consider including glomeruli to convince the reader of the localization of the tubules. Same concern with Figure 5G and I. The WT image in 5G does not have the morphology of a CCD. Principal cells should be predominant, and ICs should be dispersed.

      Both figures 4 and 5 have been updated with images showing glomeruli (light blue “G” on figure) with neighbour and dispersed IC staining.

      (5) The quantification of LAMP1 in Figure 4 is unclear. How did the authors determine the boundary of AICs, and how did they calculate the volume of lysosomes? If a zstack was used, how are the authors sure that their 10um section includes the entire AIC?

      The quantification of LAMP1 is detailed under “Image analysis”, then “Volocity” sections in Supplementary Material. The boundary of A-IC was manually detected in Volocity based on the presence of the H<sup>+</sup>-ATPase before Volocity analysis for lysosomal volume as described in the Methods.

      The 10 micron sections are expected to include full AIC as well as partial AIC, but the frequency of these events should be the same between WT and variants’ sections, therefore they were all included in the analysis if cells displayed H<sup>+</sup>-ATPase signal. 

      (6) Figure 5: There is no description of how ATP rates are calculated from the provided traces.

      We used Agilent Seahorse XF ATP rate assay kit for this experiment. In this assay, the total ATP rate is the sum of ATP production rate from both glycolysis and oxidative phosphorylation. Glycolysis releases protons in a 1:1 ratio with ATP hence the glycolytic ATP rate is calculated from the glycolytic proton efflux rate (glycoPER). GlycoPER is determined by subtracting respiration linked proton efflux from total proton efflux by inhibiting complex I and III. This information is now added to Supplementary Material, in the “Metabolic Flux analysis” section.

      (7) Figure labels in Figure 5 are wrong. It seems 5H (as presented) should actually be labeled 5G. In 5H (G?), why did some cells not have any TOM20 pixel intensity for S525F and R589H variants?

      Confocal image acquisition in this experiment was kept under the same settings to allow comparison between samples. Therefore, some cells show dimer fluorescence than others. From the figure 5 panels, all cells showed TOM 20 pixel intensity. Figure 5H panel has been relabelled Figure 5G.

      (8) In Figure 2, the summary graphs show analysis of more samples than are visible on the included western blots. What is the rationale for this? Why does S525F have 9 samples in BafA1 while R295H only has 3 (2H)? Yet, R295H has 6 samples in 2I. In 2D, S525F has at least 9 samples. Explain.

      Figure 2A-C shows representative immunoblots, among several ones independently conducted. Therefore, the final number of samples is higher than showed on Figure 2. This is now indicated in Figure 2 legend, line 603. It became clear quite early in our study that the recessive kAE1 R295H variant does not behave similarly to the other variants studied, maybe because it affects the cytosolic domain, so we did not perform as many replicates for this variant as we did for the others. However, we felt it was valuable to the research community to report the characterization of this variant and decided to keep it in our study. 

      (9) In general, the actin loading does not appear to be equal between samples. And some figures show the same actin blot twice (2A, C) while some show independent actin bands for LC3B and p62. Equal loading seems a fairly significant control, considering the importance of quantification in the figures.

      In addition to performing protein assays, we systematically conduct immunoblot with anti-b-actin antibody to control for loading variability. When possible, two or three proteins, including actin, are detected on the same blot, when molecular weight differ enough. This sometimes results in b-actin being used as a loading control for two different proteins, as seen on Figure 2A and 2C. This is now indicated on lines 605606.

      (10) In the Supplemental Figure 2, which band is being quantified for mature CTSD at 33kDa? Same for intermediate CTSD. The quantification of V-ATPase seems questionable based on the actin variance shown in the blot. Surely the ratio of the fourth sample is greater than 1.

      Supplementary Figure 2 has been updated to include arrows indicating which band was selected for the quantification. After verifying the measurements of band intensities from “Image Lab” quantification software, we confirm the results, including that fourth KI/KI sample has a ratio of 0.78 (Adj Total Band Vol (Int), lanes 10). Screen shots of quantifications are attached below.

      Author response image 1.

      Author response image 2.

      (11) Why are the experiments performed on non-confluent IMCD cells? Figure 1D shows good basolateral localization of AE1, yet the other experiments in the manuscript appear to use IMCD cells in low confluent states, without proper localization of AE1. Figure 3A shows AE1 dispersed throughout the cytoplasm. Why have the authors decided to study the effects of an anion exchanger without it being properly localized to the basolateral membrane? Shouldn't all experiments be performed in polarized IMCDs? If AE1 isnt properly in the membrane, and the cells do not have defined apico-basolateral polarity, then what role can AE1-mediated intracellular pH change have on the results of the experiments? Were the pHi experiments in 3E performed on polarized cells? Or even 1F?

      To address this point, we have performed cell surface biotinylations on 70-80 % confluent mIMCD3 cells expressing kAE1 WT, S525F or R589H mutants and show that cell surface abundance of the mutants is not significantly different from the WT protein. This is now shown in Figure 3A & B. As it provides a more quantitative assessment of protein cell surface abundance, we have removed the immunofluorescence images from non-polarised cells and replaced them with a representative immunoblot from a cell surface biotinylation assay.

      (12) As mentioned in the public comments, how is the ratio A/(A+B) greater than 1? With A and B > 0. In Figure 3, the data is reasonable, but in Figure 2, the data is simply impossible. What is the explanation for this phenomenon? Why was this presentation of data approved? Is it supposedly a fold of WT, like 2K and 2L? Is the reader also to believe that total LC3B is 2-fold greater in KI/KI mice, as shown in 2K? My eyes, though not densitometry equipment, cannot confirm this. The actin bands are not equal. Yet again, there are 4 lanes of KI/KI mice, but the quantification shows 5 samples.

      The ratios in figure 2D, 2F, 2H and 2L have been re-calculated and corrected. As indicated above, immunoblots are representative and quantification of additional blots has been included in the graphs.

      (12) Spelling error Figure 4B: cels.

      Corrected

      References 

      (1) Mumtaz, R. et al. Intercalated Cell Depletion and Vacuolar H+-ATPase Mistargeting in an Ae1 R607H Knockin Model. Journal of the American Society of Nephrology 28, 1507–1520 (2017).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Lahtinen et al. evaluated the association between polygenic scores and mortality. This question has been intensely studied (Sakaue 2020 Nature Medicine, Jukarainen 2022 Nature Medicine, Argentieri 2025 Nature Medicine), where most studies use PRS as an instrument to attribute death to different causes. The presented study focuses on polygenic scores of non-fatal outcomes and separates the cause of death into "external" and "internal". The majority of the results are descriptive, and the data doesn't have the power to distinguish effect sizes of the interesting comparisons: (1) differences between external vs. internal (2) differences between PGI effect and measured phenotype. I have two main comments:

      (1) The authors should clarify whether the p-value reported in the text will remain significant after multiple testing adjustment. Some of the large effects might be significant; for example, Figure 2C

      We have now added Benjamini-Hochberg multiple-testing adjusted p-values in the text each time we present nominal p-values. Additionally, supplementary tables S5 and S6 provide multiple-adjusted p-values for all analysed PGIs.

      Although this was not always the case, many comparisons remained significant after multiple testing adjustments, especially in Figure 2C that the reviewer commented on. In the revised version, we have placed more emphasis on describing these HRs that have low p-values after multiple-test adjustment. The revised text for Figure 2C in the Results section now reads:

      Panel C analyses mortality in three age-specific follow-up periods. The PGIs were more predictive of death in younger age groups, although the difference between the 25–64 and 65–79 age groups was small, except for the PGI of ADHD (HR=1.14, 95% CI 1.08; 1.21 for 25–64-year-olds; HR=1.04, 95% CI 1.00; 1.08 for 65–79-year-olds; p=0.008 for difference, p=0.27 after multiple-testing adjustment). PGIs predicted death only negligibly among those aged 80+, and the largest differences between the age groups 25–64 and 80+ were for PGIs of self-rated health (HR 0.87, 95% CI 0.82; 0.93 for 25–64-year-olds, HR 1.00, 95% CI 0.94; 1.04 for 80+ year-olds, p=2*10<sup>-4</sup> for difference, p=0.006 after multiple-testing adjustment), ADHD (HR 1.14, 95% CI 1.08; 1.21 for 25–64-year-olds, HR 0.99, 95% CI 0.95; 1.03 for 80+ year-olds, p=7*10<sup>-4</sup> for difference, p=0.012 after multiple-testing adjustment) and depressive symptoms (HR 1.12, 95% CI 1.06; 1.18 for 25–64-year-olds, HR 1.00, 95% CI 0.96; 1.04 for 80+ year-olds, p=0.002 for difference, p=0.032 after multiple-testing adjustment). Additionally, the difference in HRs between these age groups achieved significance after multiple testing adjustment at the conventional 5% level for PGIs of cigarettes per day, educational attainment, and ever smoking.

      We have also included the recent study by Argentieri et al. (2025) in the literature review, which was missing from our previous version. We appreciate the reference. Other references mentioned were already included in the previous version of the manuscript.

      (note that the small prediction accuracy of PGI in older age groups has been extensively studied, see Jiang, Holmes, and McVean, 2021, PLoS Genetics).

      We would like to thank the reviewer for suggesting the relevant reference by Jiang et al. We have now expanded on the discussion of age-specific differences in the discussion section and included this reference.

      (2) The authors might check if PGI+Phenotype has improved performance over Phenotype only. This is similar to Model 2 in Table 1, but slightly different.

      The reviewer raises an interesting angle to approach the analysis. We have now added an analysis assessing the information criteria and the significance of improvement between nested models in Supplementary table S8. All the tested PGI+phenotype models show improvement over the phenotype-only model that is statistically significant at all conventional levels when tested by likelihood-ratio tests between nested models . Additionally,  improvement was found when using Akaike and Bayesian (Schwarz) information criteria (albeit sometimes modest in size). We have added a passage in the results section briefly summarising this analysis:

      Supplementary table S8 presents information criteria and significance tests on corresponding models. Models with PGI+phenotype (Models 2a–f) showed improvement over models with the phenotype only (Models 1a, 1c, 1e, 1g, 1i, 1k, with a p=0.0006 or lower) in terms of both Akaike information criterion (AIC) as well as Bayesian (Schwarz) information criterion (BIC) with a p=0.0006 or lower in all comparisons. The full Model 4 again showed improvement over the model with all PGIs jointly (Model 3b, with a p=0.0002 or p=0.00002, depending on continuous/categorical phenotype measurement), which had a lower AIC but not BIC.

      Reviewer #2 (Public review): 

      Summary:

      This study provides a comprehensive evaluation of the association between polygenic indices (PGIs) for 35 lifestyle and behavioral traits and all-cause mortality, using data from Finnish population- and family-based cohorts. The analysis was stratified by sex, cause of death (natural vs. external), age at death, and participants' educational attainment. Additional analyses focused on the six most predictive PGIs, examining their independent associations after mutual adjustment and adjustment for corresponding directly measured baseline risk factors.

      Strengths:

      Large sample size with long-term follow-up.

      Use of both population- and family-based analytical approaches to evaluate associations.

      Weaknesses:

      It is unclear whether the PGIs used for each trait represent the most current or optimal versions based on the latest GWAS data.

      To our reading, this comment is closely related to the “recommendations for the author” number 3 by reviewer 2, and we thus address them together. 

      If the Finnish data used in this study also contributed to the development of some of the PGIs, there is a risk of overestimating their associations with mortality due to overfitting or "double-dipping." Similar inflation of effect sizes has been observed in studies using the UK Biobank, which is widely used for PGI construction.

      To our reading, this comment is closely related to the “recommendations for the author” 4 by reviewer 2, and we thus address them together.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) Cited reference 1 also investigated the PRS association with life span; cited reference 8 explains PRS association with healthy lifespan. Can authors be clearer about what is new in the context of these references? Specifically, what are the PGIs studied here that were not analyzed in the cited analyses?

      Although some previous studies on the topic do exist, our analysis arguably has novelty in touching upon several unstudied or scarcely studied themes. These include:

      A set of PGIs focusing on social, psychological, and behavioural phenotypes or PGIs for typically non-fatal health conditions.

      An assessment of direct genetic effects/ confounding with a within-sibship design.

      An assessment of potential heterogeneous effects by several socio-demographic characteristics.

      An analysis of external causes of deaths (which can be hypothesised to be particularly relevant here, given the choice of our PGIs not focusing directly on typical causes of death).

      A detailed assessment of the interplay of the most predictive PGIs with their corresponding phenotypes.

      We have substantially revised the Introduction section focusing on making these novel contributions more explicit.

      (2) In the Methods section, it is not very clear why the authors specifically study the "within-sibship" samples. Is this for avoiding nurturing effects from parental genotypes or for controlling assortative mating? The authors should clarify the rationale behind the design.

      The substance-related rationale behind this approach was briefly discussed in the Introduction section while in the Methods section, we focused more on the technical description of our analyses. However, it is certainly worthwhile to clarify to the reader why within-sibship methods have been used. The revised passage in the methods section now states:

      “In addition to this population sample, we used a within-sibship analysis sample to assess the extent of direct and indirect genetic associations captured by the PGIs, as discussed in the introduction.”

      (3) Residual correlations of PGIs were no more than 0.050..." As a minor comment, since PGIs is a noisy variable, the correlation would be low; however, I don't think there are better ways to evaluate Cox assumptions, and in many cases, this assumption is not correct for strong predictors.

      Yes, these points are true. Overall, it is often implausible that empirical distributions exactly match distributional assumptions in statistical models. For example, it may not be realistic to expect that the mortality hazards across categories of independent variables stay exactly proportional during long mortality-follow-ups; some deviations from constant proportions are almost inevitable. However, there are reasonable grounds to argue that in case of moderate violations of the proportional hazards assumption, the estimates still remain interpretable for practical uses. They can be read as approximating average relative hazards over the study period (for discussion, see pages 42–47 in Allison P. 2014. Event history and survival analysis: Regression for longitudinal event data (second edition). Thousand Oaks: SAGE).

      (4) "PGI of ADHD (HR=1.08 95%CI 1.04;1.11 among men; HR=1.01 95%CI 0.97;1.05 among women; p=0.012 for difference)." Is this difference significant after multiple testing correction?

      We have presented multiple-testing adjusted p-values together with nominal ones in this and in all other instances where they are mentioned in the text. Additionally, Supplementary tables S5–S6 present multiple-adjusted p-values for each PGIs studied.

      (5) "Panel D displays that most PGIs had stronger associations with external (accidents, violent, suicide, and alcohol related deaths) than natural causes of death." Similar to the comment above, are there any results that are significantly different between internal and external?

      We have added the p-values of those variables that had larger differences in the revised text. Quoting from the revised article: “The HR differences between external and natural causes of death were nominally significant at the conventional 5% level for cannabis use (p=0.016), drinks per week (p=0.028), left out of social activity (p=0.029), ADHD (p=0.031), BMI (p=0.035) and height (p=0.049), but none of these differences remained significant after adjusting for 35 multiple tests. “

      (6) Table 1: The effect of the phenotype is stronger than the PGI; this is expected as PGI is a weak predictor and can be considered as "noised" measurement of true genetic value (Becker 2021 Nature Human behavior). Is there a way to adjust for the impact of noise in PGI at tagging genetic value and compare if the PGI effect is different from the phenotype effect?

      PGIs are certainly imperfect measures that contain a lot of noise. However, extracting new information from what is unknown is an extremely demanding exercise, and still further complicated for example, by that we do not know the exact benchmark of total genetic effect we should be aiming at. Different methods of heritability estimation, for instance, often give dramatically differing results – for reasons that are still up to scrutiny.

      We are thus not familiar with a method that could achieve satisfactory answer for this challenging task.

      Reviewer #2 (Recommendations for the authors):

      (3) Justification and Selection of PGIs:

      For several traits, such as BMI, multiple polygenic indices (PGIs) are currently available. The criteria used to select specific PGIs for this study are not clearly described. A more systematic and reproducible approach-for example, leveraging metadata from the PGS Catalog-could strengthen the justification for PGI selection and enhance the study's generalizability.

      There are numerous PGIs developed in the extensive GWAS literature, but a finite set of PGIs always needs to be chosen for any analysis. The rationale behind our decision to include every PGI from the repository of Becker et al. 2021 (full reference in the manuscript, see also https://www.thessgac.org/pgi-repository) that was available for the Finnish data (including the possibility to exclude overlapping samples, see our response to the next comment for more discussion) was to provide rigorous analysis by limiting the researchers degrees of freedom in arbitrarily choosing PGIs. Although it would have been tempting to not use some PGIs that were not expected to substantially correlate with mortality, we believe that our conservative strategy increases the credibility of the reported p-values, particularly the multiple adjustment should now work as intended. 

      We also mention now this rationale when discussing the chosen PGIs in the methods section: “As the independent variables of main interest, we used 35 different PGIs in the Polygenic Index repository by Becker et al., which were mainly based on GWASes using UK Biobank and 23andMe, Inc. data samples, but also other data collections. They were tailored for the Finnish data, i.e., excluding overlapping individuals between the original GWAS and our analysis and performing linkage-disequilibrium adjustment. We used every single-trait PGI defined in the repository (except for subjective well-being, for which we were unable to obtain a meta-analysis version that excluded the overlapping samples). By limiting the researchers’ freedom in selecting the measures, this conservative strategy should increase the validity of our estimates, particularly with regards to multiple-testing adjusted p-values.”

      (4) Overlap Between PGI Training Data and Study Sample:

      The authors should describe any overlap between the data used to develop the PGIs and the current study sample. If such overlap exists, it may lead to overestimation of effect sizes due to "double-dipping." A discussion of this issue and its potential implications is warranted, as similar concerns have been raised in studies using UK Biobank data.

      This is, fortunately, not a concern of our analysis. Overlapping samples were excluded in creating the PGIs that we used. We have now described this more clearly in the revised methods section.

      (1) Clarify the Methodology for Family-Based Cox Analysis:

      It is unclear what specific method was used to perform Cox regression in the family-based analysis. Please provide additional methodological details. ”

      We have described the method further and added an additional reference in the revision. The text now stands:

      “We compared these models to the corresponding within-sibship models, using the sibship identifier as the strata variable. This method employs a sibship-specific (instead of a whole-sample-wide baseline hazard in the population models) baseline hazard, and corresponds to a fixed-effects model in some other regression frameworks (e.g., linear model with sibship-specific intercepts)”

      (2) Clarify Timing of Measured Risk Factors Relative to Follow-Up:

      The main text should provide more detailed information regarding the timing of data collection for directly measured risk factors. Specifically, it should be clarified whether the measurements used correspond to the first available data for each individual after the start of follow-up, or if a different criterion was applied.

      BMI, self-rated health, alcohol consumption and smoking status were measured at the baseline survey of each dataset. Education was registered as the highest completed degree up to the end of 2019. Depression was a composite of survey self-report (at the time of the baseline survey), as well as depression-related medicine purchases and hospitalizations over a two-year period before the start of the individual’s follow-up.

      We have added more comprehensive information on the measurement of the phenotypes of interest in Supplementary table 2, including the timing of the measurement.

    1. Author response:

      Point-by-point description of the revisions:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary

      In this article, the authors used the synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in Trypanosoma brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to detect and identified, using YFP-pulldown, specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and the protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosome.

      Major comments:

      Are the key conclusions convincing?

      The authors reported that they have successfully used TALE-based affinity selection of proteinassociated with repetitive sequences in the T. brucei genome. They claimed that this study has provided new information regarding the relevance of the repetitive region in the genome to chromosome integrity, telomere biology, chromosomal segregation and immune evasion strategies. These conclusions are based on high-quality research, and it is, basically, merits publication, provided that some major concerns, raised below, will be addressed before acceptance for publication.

      (1) The authors used TALE-YFP approach to examine the proteome associated with five different repetitive regions of the T. brucei genome and confirmed the binding of TALE-YFP with Chip-seq analyses. Ultimately, they got the list of proteins that bound to synthetic proteins, by affinity purification and LS-MS analysis and concluded that these proteins bind to different repetitive regions of the genome. There are two control proteins, one is TRF-YFP and the other KKT2-YFP, used to confirm the interactions. However, there are no experiment that confirms that the analysis gives some insight into the role of any putative or new protein in telomere biology, VSG gene regulation or chromosomal segregation. The proteins, which have already been reported by other studies, are mentioned. Although the author discovered many proteins in these repetitive regions, their role is yet unknown. It is recommended to take one or more of the new putative proteins from the repetitive elements and show whether or not they (1) bind directly to the specific repetitive sequence (e.g., by EMSA); (2) it is recommended that the authors will knockdown of one or a small sample of the new discovered proteins, which may shed light on their function at the repetitive region, as a proof of concept.

      The main request from Referee 1 is for individual evaluation of protein-DNA interaction for a few candidates identified in our TALE-YFP affinity purifications, particularly using EMSA to identify binding to the DNA repeats used for the TALE selection. In our opinion, such an approach would not actually provide the validation anticipated by the reviewer. The power of TALE-YFP affinity selection is that it enriches for protein complexes that associate with the chromatin that coats the target DNA repetitive elements rather than only identifying individual proteins or components of a complex that directly bind to DNA assembled in chromatin.

      The referee suggests we express recombinant proteins and perform EMSA for selected candidates, but many of the identified proteins are unlikely to directly bind to DNA – they are more likely to associate with a combination of features present in DNA and/or chromatin (e.g. specific histone variants or histone post-translational modifications). Of course, a positive result would provide some validation but only IF the tested protein can bind DNA in isolation – thus, a negative result would be uninformative.

      In fact, our finding that KKT proteins are enriched using the 177R-TALE (minichromosome repeat sequence) identifies components of the trypanosome kinetochore known (KKT2) or predicted (KKT3) to directly bind DNA (Marciano et al., 2021; PMID: 34081090), and likewise the TelR-TALE identifies the TRF component that is known to directly associate with telomeric (TTAGGG)n repeats (Reis et al 2018; PMID: 29385523). This provides reassurance on the specificity of the selection, as does the lack of cross selectivity between different TALEs used (see later point 3 below). The enrichment of the respective DNA repeats quantitated in Figure 2B (originally Figure S1) also provides strong evidence for TALE selectivity.

      It is very likely that most of the components enriched on the repetitive elements targeted by our TALE-YFP proteins do not bind repetitive DNA directly. The TRF telomere binding protein is an exception – but it is the only obvious DNA binding protein amongst the many proteins identified as being enriched in our TelR-TALE-YFP and TRF-YFP affinity selections.

      The referee also suggests that follow up experiments using knockdown of the identified proteins found to be enriched on repetitive DNA elements would be informative. In our opinion, this manuscript presents the development of a new methodology previously not applied to trypanosomes, and referee 2 highlights the value of this methodological development which will be relevant for a large community of kinetoplastid researchers. In-depth follow-up analyses would be beyond the scope of this current study but of course will be pursued in future. To be meaningful such knockdown analyses would need to be comprehensive in terms of their phenotypic characterisation (e.g. quantitative effects on chromosome biology and cell cycle progression, rates and mechanism of recombination underlying antigenic variation, etc) – simple RNAi knockdowns would provide information on fitness but little more. This information is already publicly available from genome-wide RNAi screens (www.tritrypDB.org), with further information on protein location available from the genome-wide protein localisation resource (Tryptag.org). Hence basic information is available on all targets selected by the TALEs after RNAi knock down but in-depth follow-up functional analysis of several proteins would require specific targeted assays beyond the scope of this study.

      (2) NonR-TALE-YFP does not have a binding site in the genome, but YFP protein should still be expressed by T. brucei clones with NLS. The authors have to explain why there is no signal detected in the nucleus, while a prominent signal was detected near kDNA (see Fig.2). Why is the expression of YFP in NonR-TALE almost not shown compared to other TALE clones?

      The NonR-TALE-YFP immunolocalisation signal indeed is apparently located close to the kDNA and away from the nucleus. We are not sure why this is so, but the construct is sequence validated and correct. However, we note that artefactual localisation of proteins fused to a globular eGFP tag, compared to a short linear epitope V5 tag, near to the kinetoplast has been previously reported (Pyrih et al, 2023; PMID: 37669165).

      The expression of NonR-TALE-YFP is shown in Supplementary Fig. S2 in comparison to other TALE proteins. Although it is evident that NonR-TALE-YFP is expressed at lower levels than other TALEs (the different TALEs have different expression levels), it is likely that in each case the TALE proteins would be in relative excess.

      It is possible that the absence of a target sequence for the NonR-TALE-YFP in the nucleus affects its stability and cellular location. Understanding these differences is tangential to the aim of this study.

      However, importantly, NonR-TALE-YFP is not the only control for used for specificity in our affinity purifications. Instead, the lack of cross-selection of the same proteins by different TALEs (e.g. TelR-TALE-YFP, 177R-TALE-YFP) and the lack of enrichment of any proteins of interest by the well expressed ingiR-TALE-YFP or 147R-TALE-YFP proteins each provide strong evidence for the specificity of the selection using TALEs, as does the enrichment of similar protein sets following affinity purification of the TelR-TALE-YFP and TRF-YFP proteins which both bind telomeric (TTAGGG)n repeats. Moreover, control affinity purifications to assess background were performed using cells that completely lack an expressed YFP protein which further support specificity (Figure 6).

      We have added text to highlight these important points in the revised manuscript:

      Page 8:

      “However, the expression level of NonR-TALE-YFP was lower than other TALE-YFP proteins; this may relate to the lack of DNA binding sites for NonR-TALE-YFP in the nucleus.”

      Page 8:

      “NonR-TALE-YFP displayed a diffuse nuclear and cytoplasmic signal; unexpectedly the cytoplasmic signal appeared to be in the vicinity the kDNA of the kinetoplast (mitochrondria). We note that artefactual localisation of some proteins fused to an eGFP tag has previously been observed in T. brucei (Pyrih et al, 2023).”

      Page 10:

      Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4). Thus, the most enriched proteins are specific to TelR-TALE-YFP-associated chromatin rather than to the TALE-YFP synthetic protein module or other chromatin.

      (3) As a proof of concept, the author showed that the TALE method determined the same interacting partners enrichment in TelR-TALE as compared to TRF-YFP. And they show the same interacting partners for other TALE proteins, whether compared with WT cells or with the NonR-TALE parasites. It may be because NonR-TALE parasites have almost no (or very little) YFP expression (see Fig. S3) as compared to other TALE clones and the TRF-YFP clone. To address this concern, there should be a control included, with proper YFP expression.

      See response to point 2, but we reiterate that the ingi-TALE -YFP and 147R-TALE-YFP proteins are well expressed (western original Fig. S3 now Fig. S2) but few proteins are detected as being enriched or correspond to those enriched in TelR-TALE-YFP or TRF-YFP affinity purifications (see Fig. S9). Therefore, the ingi-TALE -YFP and 147R-TALE-YFP proteins provide good additional negative controls for specificity as requested. To further reassure the referee we have also included additional volcano plots which compare TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP to the ingiR-TALE-YFP affinity selection (new Figure S8). As with No-YFP or NonR-TALE-YFP controls, the use of ingiR-TALE-YFP as a negative control demonstrates that known telomere associated proteins are enriched in TelR-TALE-YFP affinity purification, RPA subunits enriched with 70R-TALE-YFP and Kinetochore KKT poroteins enriched with 177RTALE-YFP. These analyses demonstrate specificity in the proteins enriched following affinity purification of our different TALE-YFPs and provide support to strengthen our original findings.

      We now refer to use of No-YFP, NonR-TALE-YFP, and ingiR-TALE -YFP as controls for comparison to TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP in several places:

      Page10:

      “Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4).”

      Page 11:

      “Thus, the nuclear ingiR-TALE-YFP provides an additional chromatin-associated negative control for affinity purifications with the TelR-TALE-YFP, 70R-TALE-YFP and 177R-TALE-YFP proteins (Fig. S8).”

      “Proteins identified as being enriched with 70R-TALE-YFP (Figure 6D) were similar in comparisons with either the No-YFP, NonR-TALE-YFP or ingiR-TALE-YFP as negative controls.”

      Top Page 12:

      “The same kinetochore proteins were enriched regardless of whether the 177R-TALE proteomics data was compared with No-YFP, NonR-TALE or ingiR-TALE-YFP controls.”

      Discussion Page 13:

      “Regardless, the 147R-TALE and ingiR-TALE proteins were well expressed in T. brucei cells, but their affinity selection did not significantly enrich for any relevant proteins. Thus, 147R-TALE and ingiR-TALE provide reassurance for the overall specificity for proteins enriched TelR-TALE, 70R-TALE and 177R-TALE affinity purifications.”

      (4) After the artificial expression of repetitive sequence binding five-TALE proteins, the question is if there is any competition for the TALE proteins with the corresponding endogenous proteins? Is there any effect on parasite survival or health, compared to the control after the expression of these five TALEs YFP protein? It is recommended to add parasite growth curves, for all the TALE proteins expressing cultures.

      Growth curves for cells expressing TelR-TALE-YFP, 177R-TALE-YFP and ingiR-TALE-YFP are now included (New Fig S3A). No deficit in growth was evident while passaging 70R-TALE-YFP, 147R-TALE-YFP, NonR-TALE-YFP cell lines (indeed they grew slightly better than controls).

      The following text has been added page 8:

      “Cell lines expressing representative TALE-YFP proteins displayed no fitness deficit (Fig. S3A).”

      (5) Since the experiments were performed using whole-cell extracts without prior nuclear fractionation, the authors should consider the possibility that some identified proteins may have originated from compartments other than the nucleus. Specifically, the detection of certain binding proteins might reflect sequence homology (or partial homology) between mitochondrial DNA (maxicircles and minicircles) and repetitive regions in the nuclear genome. Additionally, the lack of subcellular separation raises the concern that cytoplasmic proteins could have been co-purified due to whole cell lysis, making it challenging to discern whether the observed proteome truly represents the nuclear interactome.

      In our experimental design, we confirmed bioinformatically that the repeat sequences targeted were not represented elsewhere in the nuclear or mitochondrial genome (kDNA). The absence of subcellular fractionation could result in some cytoplasmic protein selection, but this is unlikely since each TALE targets a specific DNA sequence but is otherwise identical such that cross-selection of the same contaminating protein set would be anticipated if there was significant non-specific binding. We have previously successfully affinity selected 15 chromatin modifiers and identified associated proteins without major issues concerning cytoplasmic protein contamination (Staneva et al 2021 and 2022; PMID: 34407985 and 36169304). Of course, the possibility that some proteins are contaminants will need to be borne in mind in any future follow-up analysis of proteins of interest that we identified as being enriched on specific types of repetitive element in T. brucei. Proteins that are also detected in negative control, or negative affinity selections such as No-YFP, NoR-YFP, IngiR-TALE or 147R-TALE must be disregarded.

      (6) Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      As mentioned earlier, the author claimed that this study has provided new information concerning telomere biology, chromosomal segregation mechanisms, and immune evasion strategies. But there are no experiments that provides a role for any unknown or known protein in these processes. Thus, it is suggested to select one or two proteins of choice from the list and validate their direct binding to repetitive region(s), and their role in that region of interaction.

      As highlighted in response to point 1 the suggested validation and follow up experiments may well not be informative and are beyond the scope of the methodological development presented in this manuscript. Referee 2 describes the study in its current form as “a significant conceptual and technical advancement” and “This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology.”

      The Referee’s phrase ‘validate their direct binding to repetitive region(s)’ here may also mean to test if any of the additional proteins that we identified as being enriched with a specific TALE protein actually display enrichment over the repeat regions when examined by an orthogonal method. A key unexpected finding was that kinetochore proteins including KKT2 are enriched in our affinity purifications of the 177R-TALE-YFP that targets 177bp repeats (Figure 6F). By conducting ChIP-seq for the kinetochore specific protein KKT2 using YFP-KKT2 we confirmed that KKT2 is indeed enriched on 177bp repeat DNA but not flanking DNA (Figure 7). Moreover, several known telomere-associated proteins are detected in our affinity selections of TelRTALE-YFP (Figure 6B, FigS6; see also Reis et al, 2018 Nuc. Acids Res. PMID: 29385523; Weisert et al, 2024 Sci. Reports PMID: 39681615).

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation.

      The answer for this question depends on what the authors want to present as the achievements of the present study. If the achievement of the paper was is the creation of a new tool for discovering new proteins, associated with the repeat regions, I recommend that they add a proof for direct interactions between a sample the newly discovered proteins and the relevant repeats, as a proof of concept discussed above, However, if the authors like to claim that the study achieved new functional insights for these interactions they will have to expand the study, as mentioned above, to support the proof of concept.

      See our response to point 1 and the point we labelled ‘6’ above.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      I think that they are realistic. If the authors decided to check the capacity of a small sample of proteins (which was unknown before as a repetitive region binding proteins) to interacts directly with the repeated sequence, it will substantially add of the study (e.g., by EMSA; estimated time: 1 months). If the authors will decide to check the also the function of one of at least one such a newly detected proteins (e.g., by KD), I estimate the will take 3-6 months.

      As highlighted previously the proposed EMSA experiment may well be uninformative for protein complex components identified in our study or for isolated proteins that directly bind DNA in the context of a complex and chromatin. RNAi knockdown data and cell location data (as well as developmental expression and orthology data) is already available through tritrypDB.org and trtyptag.org

      Are the data and the methods presented in such a way that they can be reproduced? Yes

      Are the experiments adequately replicated, and statistical analysis adequate?

      The authors did not mention replicates. There is no statistical analysis mentioned.

      The figure legends indicate that all volcano plots of TALE affinity selections were derived from three biological replicates. Cutoffs used for significance: P < 0.05 (Student's t-test).

      For ChiP-seq two biological replicates were analysed for each cell line expressing the specific YFP tagged protein of interest (TALE or KKT2). This is now stated in the relevant figure legends – apologies for this oversight. The resulting data are available for scrutiny at GEO: GSE295698.

      Minor comments:

      Specific experimental issues that are easily addressable.

      The following suggestions can be incorporated:

      (1) Page 18, in the material method section author mentioned four drugs: Blasticidine, Phleomycin and G418, and hygromycin. It is recommended to mention the purpose of using these selective drugs for the parasite. If clonal selection has been done, then it should also be mentioned.

      We erroneously added information on several drugs used for selection in our labaoratory. In fact all TALE-YFP construct carry the Bleomycin resistance genes which we select for using Phleomycin. Also, clones were derived by limiting dilution immediately after transfection. We have amended the text accordingly:

      Page 17/18:

      “Cell cultures were maintained below 3 x 106 cells/ml. Pleomycin 2.5 µg/ml was used to select transformants containing the TALE construct BleoR gene.”

      “Electroporated bloodstream cells were added to 30 ml HMI-9 medium and two 10-fold serial dilutions were performed in order to isolate clonal Pleomycin resistant populations from the transfection. 1 ml of transfected cells were plated per well on 24-well plates (1 plate per serial dilution) and incubated at 37°C and 5% CO2 for a minimum of 6 h before adding 1 ml media containing 2X concentration Pleomycin (5 µg/ml) per well.”

      (2) In the method section the authors mentioned that there is only one site for binding of NonR-TALE in the parasite genome. But in Fig. 1C, the authors showed zero binding site. So, there is one binding site for NonR-TALE-YFP in the genome or zero?

      We thank the reviewer for pointing out this discrepancy. We have checked the latest Tb427v12 genome assembly for predicted NonR-TALE binding sites and there are no exact matches. We have corrected the text accordingly.

      Page 7:

      “A control NonR-TALE protein was also designed which was predicted to have no target sequence in the T. brucei genome.”

      Page 17:

      “A control NonR-TALE predicted to have no recognised target in the T. brucei geneome was designed as follows: BLAST searches were used to identify exact matches in the TREU927 reference genome. Candidate sequences with one or more match were discarded.”

      (3) The authors used two different anti-GFP antibodies, one from Roche and the other from Thermo Fisher. Why were two different antibodies used for the same protein?

      We have found that only some anti-GFP antibodies are effective for affinity selection of associated proteins, whereas others are better suited for immunolocalisation. The respective suppliers’ antibodies were optimised for each application.

      (4) Page 6: in the introduction, the authors give the number of total VSG genes as 2,634. Is it known how many of them are pseudogenes?

      This value corresponds to the number reported by Consentino et al. 2021 (PMID: 34541528) for subtelomeric VSGs, which is similar to the value reported by Muller et al 2018 (PMID: 30333624) (2486), both in the same strain of trypanosomes as used by us. Based on the earlier analysis by Cross et al (PMID: 24992042), 80% of the identified VSGs in their study (2584) are pseudogenes. This approximates to the estimation by Consentino of 346/2634 (13%) being fully functional VSG genes at subtelomeres, or 17% when considering VSGs at all genomic locations (433/2872).

      (5) I found several typos throughout the manuscript.

      Thank you for raising this, we have read through the manuscipt several times and hopefully corrected all outstanding typos.

      (6) Fig. 1C: Table: below TOTAL 2nd line: the number should be 1838 (rather than 1828)

      Corrected- thank you.

      - Are prior studies referenced appropriately? Yes

      - Are the text and figures clear and accurate? Yes

      - Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Suggested above

      Reviewer #1 (Significance):

      Describe the nature and significance of the advance (e.g., conceptual, technical, clinical) for the field:

      This study represents a significant conceptual and technical advancement by employing a synthetic TALE DNA-binding protein tagged with YFP to selectively identify proteins associated with five distinct repetitive regions of T. brucei chromatin. To the best of my knowledge, it is the first report to utilize TALE-YFP for affinity-based isolation of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology. Importantly, any essential or unique interacting partners identified could serve as potential targets for therapeutic intervention.

      - Place the work in the context of the existing literature (provide references, where appropriate). I agree with the information that has already described in the submitted manuscript, regarding its potential addition of the data resulted and the technology established to the study of VSGs expression, kinetochore mechanism and telomere biology.

      - State what audience might be interested in and influenced by the reported findings. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as scientists investigating chromatin structure and the functional roles of repetitive genomic elements in higher eukaryotes.

      - (1) Define your field of expertise with a few keywords to help the authors contextualize your point of view. Protein-DNA interactions/ chromatin/ DNA replication/ Trypanosomes

      - (2) Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. None

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeatcontaining intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Major Comments

      None. The experiments are well-controlled, claims well-supported, and methods clearly described. Conclusions are convincing.

      Thank you for these positive comments.

      Minor Comments

      (1) Fig. 2 - I couldn't find an uncropped version showing multiple cells. If it exists, it should be linked in the legend or main text; Otherwise, this should be added to the supplement.

      The images presented represent reproducible analyses, and independently verified by two of the authors. Although wider field of view images do not provide the resolution to be informative on cell location, as requested we have provided uncropped images in new Fig. S4 for all the cell lines shown in Figure 2A.

      In addition, we have included as supplementary images (Fig. S3B) additional images of TelRTALE-YFP, 177R-TALE-YFP and ingiR-TALE YFP localisation to provide additional support their observed locations presented in Figure 1. The set of cells and images presented in Figure 2A and in Fig S3B were prepared and obtained by a different authors, independently and reproducibly validating the location of the tagged protein.

      (2) I think Suppl. Fig. 1 is very valuable, as it is a quantification and summary of the ChIP-seq data. I think the authors could consider making this a panel of a main figure. For the main figure, I think the plot could be trimmed down to only show the background and the relevant repeat for each TALE protein, leaving out the non-target repeats. (This relates to minor comment 6.) Also, I believe, it was not explained how background enrichment was calculated.

      We are grateful for the reviewer’s positive view of original Fig. S1 and appreciate the suggestion. We have now moved these analysis to part B of main Figure 2 in the revised manuscript – now Figure 2B. We have also provided additional details in the Methods section on the approaches used to assess background enrichment.

      Page 19:

      “Background enrichment calculation

      The genome was divided into 50 bp sliding windows, and each window was annotated based on overlapping genomic features, including CIR147, 177 bp repeats, 70 bp repeats, and telomeric (TTAGGG)n repeats. Windows that did not overlap with any of these annotated repeat elements were defined as "background" regions and used to establish the baseline ChIP-seq signal. Enrichment for each window was calculated using bamCompare, as log₂(IP/Input). To adjust for background signal amongst all samples, enrichment values for each sample were further normalized against the corresponding No-YFP ChIP-seq dataset.”

      Note: While revising the manuscript we also noticed that the script had a nomalization error. We have therefore included a corrected version of these analyses as Figure 2B (old Fig. S1)

      (3) Generally, I would plot enrichment on a log2 axis. This concerns several figures with ChIP-seq data.

      Our ChIP-seq enrichment is calculated by bamCompare. The resulting enrichment values are indeed log2 (IP/Input). We have made this clear in the updated figures/legends.

      (4) Fig. 4C - The violin plots are very hard to interpret, as the plots are very narrow compared to the line thickness, making it hard to judge the actual volume. For example, in Centromere 5, YFP-KKT2 is less enriched than 147R-TALE over most of the centromere with some peaks of much higher enrichment (as visible in panel B), however, in panel C, it is very hard to see this same information. I'm sure there is some way to present this better, either using a different type of plot or by improving the spacing of the existing plot.

      We thank the reviewer for this suggestion; we have elected to provide a Split-Violin plot instead. This improves the presentation of the data for each centromere. The original violin plot in Figure 4C has been replaced with this Split-Violin plot (still Figure 4C).

      (5) Fig. 6 - The panels are missing an x-axis label (although it is obvious from the plot what is displayed).

      Maybe the "WT NO-YFP vs" part that is repeated in all the plot titles could be removed from the title and only be part of the x-axis label?

      In fact, to save space the X axis was labelled inside each volcano plot but we neglected to indicate that values are a log2 scale indicating enrichment. This has been rectified – see Figure 6, and Fig. S7, S8 and S9.

      (6) Fig. 7 - I would like to have a quantification for the examples shown here. In fact, such a quantification already exists in Suppl. Figure 1. I think the relevant plots of that quantification (YFPKKT2 over 177bp-repeats and centromere-repeats) with some control could be included in Fig. 7 as panel C. This opportunity could be used to show enrichment separated out for intermediate-sized, mini-, and megabase-chromosomes. (relates to minor comment 2 & 8)

      The CIR147 sequence is found exclusively on megabase-sized chromosomes, while the 177 bp repeats are located on intermediate- and mini-sized chromosomes. Due to limitations in the current genome assembly, it is not possible to reliably classify all chromosomes into intermediate- or mini- sized categories based on their length. Therefore, original Supplementary Fig. S1 presented the YFP-KKT2 enrichment over CIR147 and 177 bp repeats as a representative comparison between megabase chromosomes and the remaining chromosomes (corrected version now presented as main Figure 2B). Additionally, to allow direct comparison of YFP-KKT2 enrichment on CIR147 and 177 bp repeats we have included a new plot in Figure 7C which shows the relative enrichment of YFP-KKT2 on these two repeat types.

      We have added the following text , page 12:

      “Taking into account the relative to the number of CIR147 and 177 bp repeats in the current T.brucei genome (Cosentino et al., 2021; Rabuffo et al., 2024), comparative analyses demonstrated that YFP-KKT2 is enriched on both CIR147 and 177 bp repeats (Figure 7C).”

      (7) Suppl. Fig. 8 A - I believe there is a mistake here: KKT5 occurs twice in the plot, the one in the overlap region should be KKT1-4 instead, correct?

      Thanks for spotting this. It has been corrected

      (8) The way that the authors mapped ChIP-seq data is potentially problematic when analyzing the same repeat type in different regions of the genome. The authors assigned reads that had multiple equally good mapping positions to one of these mapping positions, randomly.

      This is perfectly fine when analysing repeats by their type, independent of their position on the genome, which is what the authors did for the main conclusions of the work.

      However, several figures show the same type of repeat at different positions in the genome. Here, the authors risk that enrichment in one region of the genome 'spills' over to all other regions with the same sequence. Particularly, where they show YFP-KKT2 enrichment over intermediate- and mini-chromosomes (Fig. 7) due to the spillover, one cannot be sure to have found KKT2 in both regions.

      Instead, the authors could analyze only uniquely mapping reads / read-pairs where at least one mate is uniquely mapping. I realize that with this strict filtering, data will be much more sparse. Hence, I would suggest keeping the original plots and adding one more quantification where the enrichment over the whole region (e.g., all 177bp repeats on intermediate-/mini-chromosomes) is plotted using the unique reads (this could even be supplementary). This also applies to Fig. 4 B & C.

      We thank the reviewer for their thoughtful comments. Repetitive sequences are indeed challenging to analyze accurately, particularly in the context of short read ChIP-seq data. In our study, we aimed to address YFP-KKT2 enrichment not only over CIR147 repeats but also on 177 bp repeats, using both ChIP-seq and proteomics using synthetic TALE proteins targeted to the different repeat types. We appreciate the referees suggestion to consider uniquely mapped reads, however, in the updated genome assembly, the 177 bp repeats are frequently immediately followed by long stretches of 70 bp repeats which can span several kilobases. The size and repetitive nature of these regions exceeds the resolution limits of ChIP-seq. It is therefore difficult to precisely quantify enrichment across all chromosomes.

      Additionally, the repeat sequences are highly similar, and relying solely on uniquely mapped reads would result in the exclusion of most reads originating from these regions, significantly underestimating the relative signals. To address this, we used Bowtie2 with settings that allow multi-mapping, assigning reads randomly among equivalent mapping positions, but ensuring each read is counted only once. This approach is designed to evenly distribute signal across all repetitive regions and preserve a meaningful average.

      Single molecule methods such as DiMeLo (Altemose et al. 2022; PMID: 35396487) will need to be developed for T. brucei to allow more accurate and chromosome specific mapping of kinetochore or telomere protein occupancy at repeat-unique sequence boundaries on individual chromosomes.

      Reviewer #2 (Significance):

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with minichromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Thank you for supporting the novelty and broad interest of our manuscript

      My field of expertise / Point of view:

      I'm a computer scientist by training and am now a postdoctoral bioinformatician in a molecular parasitology laboratory. The laboratory is working on antigenic variation in T. brucei. The focus of my work is on analyzing sequencing data (such as ChIP-seq data) and algorithmically improving bioinformatic tools.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors assess the role of map3k1 in adult Planaria through whole body RNAi for various periods of time. The authors' prior work has shown that neoblasts (stem cells that can regenerate the entire body) for various tissues are intermingled in the body. Neoblasts divide to produce progenitors that migrate within a "target zone" to the "differentiated target tissues" where they differentiate into a specific cell type. Here the authors show that map3k1-i animals have ectopic eyes that form along the "normal" migration path of eye progenitors (Fig. 1), ectopic neurons and glands along the AP axis (Fig. 2) and pharynx in ectopic anterior positions (Fig. 3). The rest of the study show that positional information is largely unaffected by loss of map3k1 (Fig. 4,5). However, loss of map3k1 leads to premature differentiated of progenitors along their normal migratory route (Fig. 6). They also show that an ill-defined "long-term" whole body depletion of map3k1 results in mis-specified organs and teratomas.

      Strengths:

      (1) The study has appropriate controls, sample sizes and statistics.

      (2) The work appears to be high-quality.

      (3) The conclusions are supported by the data.

      (4) Planaria is a good system to analyze the function of map3k1, which exists in mammals but not in other invertebrates.

      Weaknesses:

      (1) The paper is largely descriptive with no mechanistic insights. 

      The mechanistic insights we aim to address are primarily at the cellular systems level – how adult progenitor cells produce pattern. Specifically, we uncovered strong evidence that regulation of differentiation is an active process occurring in migratory progenitors and that this regulation is a major component of pattern formation during the adult processes of tissue turnover and regeneration. The map3k1 phenotype provided a tool used to reveal the existence of this regulation, and to understand the patterning abnormalities prevented by this regulatory mechanism. We updated the text in several places to make clearer some of this emphasis. For example, in the Discussion: "We suggest that differentiation is restricted during migratory targeting as an essential component of pattern formation, with the map3k1 RNAi phenotype indicating the existence and purpose of this element of patterning." 

      Naturally, identifying a particular molecule involved in this process is of interest for understanding molecular mechanism; this would allow for comparison to other cellular systems in other organisms and would focus future molecular inquiry. Future molecular studies into the mechanism of Map3k1 regulation and its downstream signaling will be fascinating as next steps towards understanding the process at the molecular level more deeply. We also added some discussion considering the types of upstream activation cues that could potentially be associated with Map3k1 regulation to suppress differentiation. 

      (2) Given the severe phenotypes of long-term depletion of map3k1, it is important that this exact timepoint is provided in the methods, figures, figure legends and results. 

      We removed the use of the term “long-term” and instead added timepoints used to all figure legends. We also added a summary of timepoints used in the methods section and included RNAi timepoint labels in figures where a phenotype progression over time is relevant to interpretation. For timecourses, we also added suitable time information to text in the results. 

      (3) Figure 1C, the ectopic eyes are difficult to see, please add arrows. 

      To improve visualization, we replaced the example animal in the original Figure 1C with one that has a stronger phenotype, including arrows pointing to every ectopic event. Additionally, we included magnified images of optic cup cells and photoreceptor neurons in the trunk and tail region. This is now Figure 1B.

      (4) line 217 - why does the n=2/12 animals not match the values in Figure 3B, which is 11/12 and 12/12. The numbers don't add up. Please correct/explain. 

      In Figure 3B in the submitted version (3/18 had cells in the tail) had more animals scored (6 animals from a replicate experiment where 1/6 showed the cells in the tail) than the total scored (2/12 had cells in the tail) in the text, which did not have the animals from the replicate added during writing. The results are the same, just different sample sizes were noted in those locations and we fixed this issue. In the updated Figure 3, the order of presentation has shifted (e.g., prior 3B is now in 3C and Figure 3_figure supplement 1). We made sure to include numbers to all figure panels. 

      (5) Figure panels do not match what is written in the results section. There is no Figure 6E. Please correct.

      Thank you for catching this. We have gone through figures and text after editing to make sure that text callouts are appropriately matched to the figures. 

      Reviewer #2 (Public review):

      Summary:

      The flatworm planarian Schmidtea mediterranea is an excellent model for understanding cell fate specification during tissue regeneration and adult tissue maintenance. Planarian stem cells, known as neoblasts, are continuously deployed to support cellular turnover and repair tissues damaged or lost due to injury. This reparative process requires great precision to recognize the location, timing, and cellular fate of a defined number of neoblast progeny. Understanding the molecular mechanisms driving this process could have important implications for regenerative medicine and enhance our understanding of how form and function are maintained in long-lived organisms such as humans. Unfortunately, the molecular basis guiding cell fate and differentiation remains poorly understood.

      In this manuscript, Canales et al. identified the role of the map3k1 gene in mediating the differentiation of progenitor cells at the proper target tissue. The map3k1 function in planarians appears evolutionarily conserved as it has been implicated in regulating cell proliferation, differentiation, and cell death in mammals. The results show that the downregulation of map3k1 with RNAi leads to spatial patterning defects in different tissue types, including the eye, pharynx, and the nervous system. Intriguingly, long-term map3k1-RNAi resulted in ectopic outgrowths consistent with teratomas in planarians. The findings suggest that map3k1 mediates signaling, regulating the timing and location of cellular progenitors to maintain correct patterning during adult tissue maintenance.

      Strengths:

      The authors provide an entry point to understanding molecular mechanisms regulating progenitor cell differentiation and patterning during adult tissue maintenance.

      The diverse set of approaches and methods applied to characterize map3k1 function strengthens the case for conserved evolutionary mechanisms in a selected number of tissue types. The creativity using transplantation experiments is commendable, and the findings with the teratoma phenotype are intriguing and worth characterizing.

      Thank you to the reviewer for the positive feedback

      Weaknesses:

      The article presents a provocative idea related to the importance of positional control for organs and cells, which is at least in part regulated by map3k1. Nonetheless, the role of map3k1 or its potential interaction with regulators of the anterior-posterior, mediolateral axes, and PCGs is somewhat superficial. The authors could elaborate or even speculate more in the discussion section and the different scenarios incorporating these axial modulators into the map3k1 model presented in Figure 8 

      First, to strengthen the support for our finding that positional information is largely unaffected in map3k1 RNAi animals, we added data regarding the expression of additional relevant position control genes (PCGs) –ndl-4, ptk7, sp5, and wnt11-1 – to the PCG panel in Figure 5. The expression domain of ndl-4, an FGF receptor-like protein family member that contributes to head patterning and anterior pole maintenance, was normal in map3k1 RNAi. wnt11-1, a PCG with expression concentrated in the posterior end of the animal and with expression dependent on general Wnt activity, was also normal in map3k1 RNAi animals. ptk7, RNAi of which can result in supernumerary pharynges, also showed normal expression in map3k1 RNAi animals. Finally, sp5, a Wnt-activated gene with expression in the tail, also showed normal expression in map3k1 RNAi animals. 

      Second, to further support the conclusion that cells are not suitably responding to positional information after map3k1 RNAi, which we argue normally dictates where differentiation should occur, we added examples of differentiated cell types that are ectopically positioned within an atypical PCG expression domain for that cell type (Figure 5C). This underscores that following map3k1 RNAi the PCG expression domains do not change, but the pattern of differentiated cell types relative to these domains does shift. We also added data showing that regenerating tails had a proper wntP-2 gradient, but an anterior regenerating pharynx appeared outside of this wntP-2<sup>+</sup> zone and inside of an ndl-5<sup>+</sup> zone (Figure 5- figure supplement 1E). We added some discussion of these new data in the Figure 5 results section. We also noted, regarding independent recent map3k1 work (Lo, 2025), some evidence exists that a minor posterior shift in ndl-5 expression can occur after map3k1 RNAi.

      Next, we added a new element to the model figure (Figure 8B) depicting that PCG expression domains remain normal after map3k1 RNAi, with ectopic differentiation occurring in an incorrect positional information environment. We refer to this new panel in the discussion: "We suggest that map3k1 is not required for the spatial distribution of progenitor-extrinsic differentiation-promoting cues themselves, but for progenitors to be restricted from differentiating until these cues are received (Figure 8B)."; we then follow this statement with a summary in the Discussion of six pieces of evidence that support this model.

      Finally, we added some additional text to the discussion section about candidate mechanisms by which extrinsic cues could potentially regulate Map3k1, pointing to potential future inquiry directions. We suggest that map3k1 is not involved in regulating PCG activity domains themselves, but instead acts as a brake on differentiation within migratory progenitors through active signaling. This brake is then lifted when the progenitors hit their correct PCG-based migratory target, or when they hit their target tissue. How that occurs mechanistically is unknown. One scenario is that each progenitor is tuned to respond to a particular PCG-regulated environment (such as a particular ECM or signaling environment) to generate a molecular change that inactivates Map3K1 signaling, such as by inactivating or disengaging an RTK signal. Alternatively, the migratory process in progenitors could engage the Map3K1 signal, enabling signal cessation with arrival at a target location. When Map3K1 is active it could result in a transcriptional state that prevents full expression of differentiated factors required for maturation, tissue incorporation, and cessation of migration. These considerations are now added to the discussion.

      The article can be improved by addressing inconsistencies and adding details to the results, including the main figures and supplements. This represents one of the most significant weaknesses of this otherwise intriguing manuscript. Below are some examples of a few figures, but the authors are expected to pay close attention to the remaining figures in the paper.

      Details associated with the number of animals per experiment, statistical methods used, and detailed descriptions of figures appear inconsistent or lacking in almost all figures. In some instances, the percentage of animals affected by the phenotype is shown without detailing the number of animals in the experiment or the number of repeats. Figures and their legends throughout the paper lack details on what is represented and sometimes are mislabeled or unrelated. 

      We endeavored to ensure that these noted details are present throughout the legends and figures for all figure panels.

      Specifically, the arrows in Figure 1A are different colors. Still, no reasoning is given for this, and in the exact figure, the top side (1A) shows the percentages and the number of animals below. 

      The only reason for the different colored arrows was for visibility purposes. To avoid confusion, we now use white arrows for all FISH images in figure 1, and where ever else possible. We also replaced the percentages with n numbers in the bottom left corner of the live images in Figure 1A. 

      Conversely, in Figures 1B, C, and D, no details on the number of animals or percentages are shown, nor an explanation of why opsin was used in Figure 1A but not 1B. 

      The original Figure 1B represented a few different examples of ectopic eye/eye cell patterns in the map3k1 RNAi animals to demonstrate the variable and disorganized nature of the phenotype. To address this, we added further explanation in the legend. We also merged 1A and 1B for simplicity of interpretation. opsin was used in Figure 1A to label cell bodies of photoreceptors. anti-Arrestin was used in the example FISH images to see if these cells were interconnected via projections, which we now clarify in the legend and in the text. 

      Is Figure 1B missing an image for the respective control? Figure 1C needs details regarding what the two smaller boxes underneath are. 

      The control for Figure 1B was in Figure 1A; the merger of Figures 1A/B should address this. Boxes in Figure 1C were labelled with numbers corresponding to the image above them.

      Figure 1C could use an AP labeling map in 10 days (e.g., AP6 has one optic cup present). Figure 1C and F counts do not match. 

      We added a cartoon to 1C to show the region imaged. Note that the 36d trunk image (now Fig. 1B) has now been replaced with a full animal image and magnified boxes that show locations of example ectopic cells. That cell in 1C was categorized as in AP5. Note that additional animals were analyzed and data added to the distribution (now Fig. 1D). 

      In Figure 1C, we do not know the number of animals tested, controls used, the scale bar sizes in the first two images, nor the degree of magnification used despite the pharynx region appearing magnified in the second image.  Figure 1C is also shown out of chronological order; 36 days post RNAi is shown before 10 days post RNAi. Moreover, the legends for Figures 1C and 1D are swapped.

      We have endeavored to ensure sample numbers, control images, and appropriate scale bar notation in legends are present for all images. Figure 1C has now been split into two panels: Figure 1B and Figure 1C. It does not follow a chronological order in presentation for the following logic flow: we uncover and describe the phenotype; then, with knowledge of the defect, we walk back to see how early the phenotype starts after RNAi and what the pattern of ectopic cell distribution is when the phenotype starts to emerge (using the knowledge of which cells are affected from the overt phenotype described in 1A/B). 

      Additionally, Figure 1F and many other figures throughout the paper lack overall statistical considerations. Furthermore, Figure 1F has three components, but only one is labeled. Labeling each of them individually and describing them in the corresponding figure legend may be more appropriate.

      The main point of the graphs in 1F (now 1D) was the overt overall pattern difference with the wild-type, which never has ectopic eye cells in the midbody or tail, and that the ectopic eye cells progress throughout the entire AP axis. However, we concur that a statistical test is a reasonable thing to show here and that is now included in the legend. The 3 components (in Figure 1F, now Figure 1D) where kept together with one figure label (D) for simplicity, but were rearranged (top and bottom) with a cartoon to the side and with modified labeling for extra clarity. 

      Figure 2C shows images of gene expression for two genes, but the counts are shown for only one in Figure 2D. It is challenging to follow the author's conclusions without apparent reasoning and by only displaying quantitative considerations for one case but not the other. These inconsistencies are also observed in different figures. 

      In Figure 2C, FISH images of cintillo+ and dd_17258+ neurons are shown to display the specificity of this effect to some neurons and not others. Because cintillo+ cells did not expand at all (n=24/24 animals), the counts for them would all be zero values. We only counted data for dd_17258 cells because it was the neuron that expanded compared to the control animals. We have now added a note in the legend explaining this.

      In Figure 2D, 24/24 animals were reported to show the phenotype, but only eight were counted (is there a reason for this?).

      8 animals were used to quantitatively characterize the spread of cells along the AP axis, as it was deemed an adequate sample size to capture the change in distribution of 17258+ cells from being head restricted to being present throughout the body. Through multiple cohorts of animals in replicates, a total of 24/24 examined animals showed this expansion phenotype. Double FISH experiments were additionally carried out using dd_17258 and various PCGs; these data are now included in Figure 5C, and these animals were added to the total counts regarding quantitative analysis of the phenotype in Figure 2D. 

      In Figure 2E, the expression for three genes is shown, with some displaying anterior and posterior regions while others only show the anterior picture. Is there a particular reason for this? 

      The original first panel in Figure 2E showed an example of a non-expanding gland cell type, dd_9223, which is very restricted to the head in both control and map3k1 RNAi animals. Because we did not observe a phenotype for this cell type (no cells in all control and map3k1 RNAi animal tails), we only included tail images of cell types that showed an abnormal phenotype with clear expanded to the posterior (dd_8476 and dd_7131). However, we have now included tail images of dd_9223 cells and added data for dd_9223 to the graph in Figure 2E. 

      Also, in Figure 2F, the counts are shown for only the posterior region of two genes out of the three displayed in Figure 2E. It is unclear why the authors do not show counts for the anterior areas considered in Figure 2E. Furthermore, the legend for Figure 2D is missing, and the legend for 2F is mislabeled as a description for Figure 2D.

      We now include tail images for dd_9223 in Figure 2E to show that there are no ectopic cells in tails. We did not originally include counts of dd_9223 because there was no phenotype observed. dd_7131 and dd_8476 cell types appeared in the posterior of even control animals at a low frequency, unlike dd_9223 cells. However, we did now add counts for dd_9223 tail regions in the graph. We did not count the anterior regions of the animal because our goal was to show data for the visible phenotype (ectopic cells in the tail) not only with an example image, but also by showing the number of cells in the tail with a graph and statistical test. Legends have been updated with correct details.

      Supplement Figure 1 B reports data up to 6 weeks, but no text in the manuscript or supplement mentions any experiment going up to 6 weeks. There are no statistics for data in Supplement Figure 1E. Any significance between groups is unclear.

      More details about the RNAi feeding schedules have been added in the methods section. All RNAi timepoints are now specified specifically in the legends. The Figure 1F and Figure 1- figure supplement 1E (additional data: ovo<sup>+</sup>; smedwi-1<sup>-</sup> cell counts) and legends now mention the statistical tests performed and annotations (not significant *ns) or p values have been added to the graphs. For simplicity, we decided to include all smedwi-1+ counts together rather than splitting them into low and high smedwi-1+ cells, because we weren't really making any claims about low and high cells. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It would be good to acknowledge in the discussion the recent paper from the Petersen lab on map3k1, published in PLoS Genet 2025, especially if the results differ between the two labs.

      We added reference/discussion regarding the recent PLoS Genetics Lo, 2025 map3k1 paper at several suitable points in the manuscript.

      Reviewer #2 (Recommendations for the authors):

      Please pay close attention to the description of experimental details and the consistency throughout the paper. It seems like the reader has to assume or come across information that is not readily available from the text or the legends in the paper. This is an interesting paper with intriguing findings. However, the version presented here appears rushed or put together on the flight.

      Thank you for your thorough feedback. We have endeavored to ensure all appropriate details are present in figures and/or figure legends.

    1. Author response:

      We thank all reviewers for their overall assessment, thoughtful comments, and suggestions. We are working to address each reviewer’s comment in detail. In this provisional response, we provide clarifications regarding our experimental approach and the novelty of our work, and include additional analyses that we have performed since the submission of the manuscript. We are also happy to report that we have now shared the raw data, intermediate analysis files, and the complete repository to facilitate replication of the analysis and figures.

      Code repo: github.com/LorenFrankLab/ms_stim_analysis

      Data repo: dandiarchive.org/dandiset/001634

      Docker containers (see GitHub repo for use instructions):

      Database: https://hub.docker.com/r/samuelbray32/spyglass-db-ms_stim_analysis

      Python notebooks: https://hub.docker.com/r/samuelbray32/spyglass-hub-ms_stim_analysis

      (1) Novelty and contrast with earlier manipulations:

      We thank the reviewers for suggesting that we explicitly contrast our results with prior pharmacological (Wang et al., 2016; Wang et al., 2015; Koenig et al., 2011; Brandon et al., 2014), systemic (Robbe & Buzsaki 2009; Petersen and Buzsáki 2020), and behavioral (Drieu et al., 2018) manipulations that also assessed some of the physiological features we evaluated. We will add a discussion of these studies, which will help us emphasize both the insights and discrepancies observed using these prior approaches. We will also more clearly explain the the novelty and importance of our specific approach for temporally and physiologically precise manipulation. Specifically, our approach (closed-loop theta-phase stimulation during locomotion) provides a level of physiological specificity that made it possible to dissociate theta-state dynamics from other hippocampal processes. This in turn allowed us to address a question that has remained unresolved across prior studies: Are hippocampal spatial sequences during locomotion (i.e., theta sequences) necessary to learn a novel hippocampal-dependent task?

      (2) Additional analysis on SWRs during rest:

      since submitting the manuscript, we have conducted additional analysis on the rate and length of SWRs in the rest box and found that their rate and length are also indistinguishable between targeted and control animals (effect of manipulation between control and targeted animals; rSWR rate: p=0.45; rSWR length: p=0.94, mixed effect model). We also find evidence for sequential neural representations in the rest box, when the encoding was performed in the behavioral arena. Example trajectories are shown below. These results are consistent with our observations on SWRs rate, length, and content in the behavioral arena. Additionally, we are in the process of evaluating and quantifying the results of decoding the rSWRs and will include those in the next version of the manuscript.

      Author response image 1.

      Sequential replay events observed in the rest box

      (3) Theta sequence measurement in the absence of theta:

      In the next version of the manuscript, we will explicitly explain why our manipulation makes it is more appropriate to measure sequential hippocampal representations during locomotion (i.e., theta sequences) without using theta oscillation or an epoch-averaged relatively large sliding window as a reference. The key insight here is that our manipulation suppresses theta and thus makes it difficult or impossible to accurately identify theta phase. We understand that theta-phase based approaches were used in prior work; however, these prior analyses may have confounded the absence of hippocampal theta sequences during locomotion by the inability to detect theta oscillatory phase reliably. We will show that our method of using clusterless Bayesian decoding in which we estimate the decoded position at every 2ms timestep is indeed able to capture endogenous hippocampal sequences even without imposing any requirements of aligning to theta oscillations, thus providing an unbiased estimate of the rhythmicity of hippocampal spatial representations.

      (4) Additional analysis on place cell stability and tuning:

      We thank the reviewer for this question. For the KL divergence analysis, we have imposed a spike-count criterion (100 spikes for each interval type —stimulation-off, stimulation-on, and the stimulus sub-interval) and a coverage criterion (50% HPD of the units’ spatial firing distribution was contained within 40cm on the linear track and 100cm on the w-track). These criteria were chosen to ensure that spatial tuning curves were sufficiently well sampled and localized to allow reliable estimation of KL divergence, which is particularly sensitive to noise arising from low spike counts or diffuse firing. Based on the reviewer’s suggestion, we have relaxed the unit inclusion criteria for KL divergence by relaxing the criteria for number of spikes and spatial coverage criterion to include more weakly tuned place cells and replicated our results (p=.146). Further, we have also evaluated the stability of place field order between stimulation-on and stimulation-off conditions using more standard methods (as in Wang et. al., 2015; spearman correlation of place field order, control vs targeted, p = .920, t-test). These results are consistent with our observations about place field stability during stimulation-off and stimulation-on conditions (Fig. 2F).

      Author response image 2.

      Spearman correlation of place field order during stimulation-on and stimulation-off conditions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public review):

      Summary: 

      The authors provide a resource to the systems neuroscience community, by offering their Python-based CLoPy platform for closed-loop feedback training. In addition to using neural feedback, as is common in these experiments, they include a capability to use real-time movement extracted from DeepLabCut as the control signal. The methods and repository are detailed for those who wish to use this resource. Furthermore, they demonstrate the efficacy of their system through a series of mesoscale calcium imaging experiments. These experiments use a large number of cortical regions for the control signal in the neural feedback setup, while the movement feedback experiments are analyzed more extensively.

      Strengths:

      The primary strength of the paper is the availability of their CLoPy platform. Currently, most closed-loop operant conditioning experiments are custom built by each lab and carry a relatively large startup cost to get running. This platform lowers the barrier to entry for closed-loop operant conditioning experiments, in addition to making the experiments more accessible to those with less technical expertise.

      Another strength of the paper is the use of many different cortical regions as control signals for the neurofeedback experiments. Rodent operant conditioning experiments typically record from the motor cortex and maybe one other region. Here, the authors demonstrate that mice can volitionally control many different cortical regions not limited to those previously studied, recording across many regions in the same experiment. This demonstrates the relative flexibility of modulating neural dynamics, including in non-motor regions.

      Finally, adapting the closed-loop platform to use real-time movement as a control signal is a nice addition. Incorporating movement kinematics into operant conditioning experiments has been a challenge due to the increased technical difficulties of extracting real-time kinematic data from video data at a latency where it can be used as a control signal for operant conditioning. In this paper they demonstrate that the mice can learn the task using their forelimb position, at a rate that is quicker than the neurofeedback experiments.

      Weaknesses:

      There are several weaknesses in the paper that diminish the impact of its strengths. First, the value of the CLoPy platform is not clearly articulated to the systems neuroscience community. Similarly, the resource could be better positioned within the context of the broader open-source neuroscience community. For an example of how to better frame this resource in these contexts, I recommend consulting the pyControl paper. Improving this framing will likely increase the accessibility and interest of this paper to a less technical neuroscience audience, for instance by highlighting the types of experimental questions CLoPy can enable.

      We appreciate the editor’s feedback regarding the clarity of the CLoPy platform's value and its positioning within the broader neuroscience community. We agree and understand the importance of effectively communicating the utility of CLoPy to both the systems neuroscience field and the wider open-source neuroscience community.

      To address this, we have revised the introduction and discussion sections of the manuscript to more clearly articulate the unique contributions of the CLoPy platform. Specifically:

      (1) We have emphasized how CLoPy can address experimental questions in systems neuroscience by highlighting its ability to enable real-time closed-loop experiments, such as investigating neural dynamics during behavior or studying adaptive cortical reorganization after injury. These examples are aimed at demonstrating its practical utility to the neuroscience audience.

      (2) We have positioned CLoPy within the broader open-source neuroscience ecosystem, drawing comparisons to similar resources like pyControl. We describe how CLoPy complements existing tools by focusing on real-time optical feedback and integration with genetically encoded indicators, which are becoming increasingly popular in systems neuroscience. We also emphasize its modularity and ease of adoption in experimental settings with limited resources.

      (3) To make the manuscript more accessible to a less technically inclined audience, we have restructured certain sections to focus on the types of experiments CLoPy enables, rather than the technical details of the implementation.

      We have consulted the pyControl paper, as suggested, and have used it as a reference point to improve the framing of our resource. We believe these changes will increase the accessibility and appeal of the paper to a broader neuroscience audience.

      While the dataset contains an impressive amount of animals and cortical regions for the neurofeedback experiment, and an analysis of the movement-feedback experiments, my excitement for these experiments is tempered by the relative incompleteness of the dataset, as well as its description and analysis in the text. For instance, in the neurofeedback experiment, many of these regions only have data from a single mouse, limiting the conclusions that can be drawn. Additionally, there is a lack of reporting of the quantitative results in the text of the document, which is needed to better understand the degree of the results. Finally, the writing of the results section could use some work, as it currently reads more like a methods section.

      Thank you for your thoughtful and constructive feedback on our manuscript. We appreciate the time and effort you took to review our work and provide detailed suggestions for improvement. Below, we address the key points raised in your review:

      (1) Dataset Completeness: We acknowledge that some of the neurofeedback experiments include data from only a single mouse for some cortical regions while for some cortical regions, there are several animals. This was due to practical constraints during the study, and we understand the limitations this poses for drawing broad conclusions. We felt it was still important to include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future. To address this, we have revised the text to explicitly acknowledge these limitations and clarify that the results for some regions are exploratory in nature. We believe our flexible tool will provide a means for our lab and others include more animals representing additional cortical regions in future studies. Importantly, we have included all raw and processed data as well as code for future analysis.

      (2) Quantitative Results: We recognize the importance of reporting quantitative results in the text for better clarity and interpretation. In response, we have added more detailed description of the quantitative findings from both the neurofeedback and movement-feedback experiments. This will include effect sizes, statistical measures, and key numerical results to provide a clearer understanding of the degree and significance of the observed effects.

      (3) Results Section Writing: We appreciate your observation that parts of the results section read more like a methods section. To improve clarity and focus, we have restructured the results section to present the findings in a more concise and interpretative manner, while moving overly detailed descriptions of experimental procedures to the methods section.

      Suggestions for improved or additional experiments, data or analyses:

      Not necessary for this paper, but it would be interesting to see if the CLNF group could learn without auditory feedback.

      This is a great suggestion and certainly something that could be done in the future.

      There are no quantitative results in the results section. I would add important results to help the reader better interpret the data. For example, in: "Our results indicated that both training paradigms were able to lead mice to obtain a significantly larger number of rewards over time," You could show a number, with an appropriate comparison or statistical test, to demonstrate that learning was observed.

      Thank you for pointing this out. We have mentioned quantification values in the results now, along with being mentioned in the figure legends, and we are quoting it in following sentences. “A ΔF/F0 threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, N=23, n=60 and CLNF Rule-change, N=17, n=60) were able to discover the task rule and perform above 80% over ten days of training (Figure 4A, RM ANOVA p=2.83e-5), and Rule-change mice even learned a change in ROIs or rule reversal (Figure 4A, RM ANOVA p=8.3e-10, Table 5 for different rule changes). There were no significant differences between male and female mice (Supplementary Figure 3A).”

      For: "Performing this analysis indicated that the Raspberry Pi system could provide reliable graded feedback within ~63 {plus minus} 15 ms for CLNF experiments." The LED test shows the sending of the signal, but the actual delay for the audio generation might be longer. This is also longer than the 50 ms mentioned in the abstract.

      We appreciate the reviewer’s insightful comment. The latency reported (~63ms) was measured using the LED test, which captures the time from signal detection to output triggering on the Raspberry Pi GPIO. We agree that the total delay for auditory feedback generation could include an additional latency component related to the digital-to-analog conversion and speaker response. In our setup, we employ a fast Audiostream library written in C to generate the audio signal and expect the delay contribution to be negligible compared to the GPIO latency. Though we did not do this, it can be confirmed by an oscilloscope-based pilot measurement (for additional delay calculation). We have updated the manuscript to clarify that the 63 ± 15 ms value reflects the GPIO-triggered output latency, and we have revised the abstract to accurately state the delay as “~63 ms” rather than 50 ms. This ensures consistency and avoids underestimation of the latency. We have corrected the LED latency for CLNF and CLMF experiments in the abstract as well.

      It could be helpful to visualize an individual trial for each experiment type, for instance how the audio frequency changes as movement speed / calcium activity changes.

      We have added Supplementary Figure 8 that contains this data where you can see the target cortical activity trace, target paw speed, rewards, along with the audio frequency generated.

      The sample sizes are small (n=1) for a few groups. I am excited by the variety of regions recorded, so it could be beneficial for the authors to collect a few more animals to beef up the sample sizes.

      We've acknowledged that some of the sample sizes are small. Importantly, we have included raw and processed data as well as code for future analysis. We felt it was still important to still include these data sets with smaller sample sizes as they might be useful for others pursuing this direction in the future.

      I am curious as to why 60 trials sessions were used. Was it mostly for the convenience of a 30 min session, or were the animals getting satiated? If the former, would learning have occurred more rapidly with longer sessions?

      This is a great observation and the answer is it was mostly due to logistical reasons. We tried to not keep animals headfixed for more than 45 minutes in each session as they become less engaged with long duration headfixed sessions. After headfixing them, it takes about 15 minutes to get the experiment going and therefore 30 - 40 minutes long recorded sessions seemed appropriate before they stop being engaged or before they get satiated in the task. We provided supplemental water after the sessions and we observed that they consumed water after the sessions so they were not fully satiated during the sessions even when they performed well in the task and got maximum rewards. We also had inter-trial rest periods of 10s that elongated the session duration. We think it would be interesting to explore the relationship between session duration(number of trials) and task learning progression over the days in a separate study.

      Figure 4E is interesting, it seems like the changes in the distribution of deltaF was in both positive and negative directions, instead of just positive. I'd be curious as to the author's thoughts as to why this is the case. Relatedly, I don't see Figure 4E, and a few other subplots, mentioned in the text. As a general comment, I would address each subplot in the text.

      We have split Figure 4 into two to keep the figures more readable. Previous Figure 4E-H are now Figure 5A-D in the revised manuscript. The online real-time CLNF sessions were using a moving window average to calculate ΔF/F<sub>0</sub>  and the figures were generated by averaging the whole recorded sessions. We have added text in Methods under “Online ΔF/F<sub>0</sub>calculation” and “Offline ΔF/F<sub>0</sub> calculation” sections making it clear about how we do our ΔF/F<sub>0</sub> normalization based on average fluorescence over the entire session. Using this method of normalization does increase the baseline so that some peaks appear to be below zero. Additionally, it is unclear what strategy animals are employing to achieve the rule specific target activity. The task did not constrain them to have a specific strategy for cortical activation - they were rewarded as long as they crossed the threshold in target ROI(s). For example, in 2-ROI experiments, to increase ROI1-ROI2 target activity, they could increase activity of ROI1 relative to ROI2 or decreased activity of ROI1 relative to ROI1 - both would have led to a reward as long as the result crossed the threshold.

      We have now addressed and added reference to the figures in the text in Results under “Mice can explore and learn an arbitrary task, rule, and target conditions” and “Mice can rapidly adapt to changes in the task rule” sections - thanks for pointing this out.

      For: "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time," I would provide a visual summary showing the learning curves for the different types of regions.

      We have rewritten this section to emphasize that these conclusions were based on pooled data from multiple regions of interest. The sample sizes for each type of region are different and some are missing. We believe it would be incomplete and not comparable to present this as a regular analysis since the sample sizes were not balanced. We would be happy to dive deeper into this and point to the raw and processed dataset if anyone would like to explore this further by GitHub or other queries.

      Relatedly, I would further explain the fast vs slow learners, and if they mapped onto certain regions.

      Mice were categorized into fast or slow learners based on the slope of learning over days (reward progression over the days) as shown in Supplementary Figure 3C,D. Our initial aim was not to probe cortical regions that led to fast vs slow learning but this was a grouping we did afterwards. Based on the analysis we did, the fast learners included the sensory (V1), somatosensory (BC, HL), and motor (M1, M2) areas, while the slow learners included the motor (M1, M2), and higher order (TR, RL) cortical areas. Testing all dorsal cortical areas would be prudent to establish their role in fast or slow learning and it is an interesting future direction.

      Also I would make the labels for these plots (e.g. Supp Fig3) more intuitive, versus the acronyms currently used.

      We have made more expressive labels and explained the acronyms below the Supplementary Figure 3.

      The CLMF animals showed a decrease in latency across learning, what about the CLNF animals? There is currently no mention in the text or figures.

      We have now incorporated the CLNF task latency data into both the Results text and Figure 4C. Briefly, task latency decreased as performance improved, increased following a rule change, and then decreased again as the animals relearned the task. The previous Figure 4C has been updated to Figure 4D, and the former Figure 4D has been moved to Supplementary Figure 4E.

      Reviewer #2 (Public review):

      Summary:

      In this work, Gupta & Murphy present several parallel efforts. On one side, they present the hardware and software they use to build a head-fixed mouse experimental setup that they use to track in "real-time" the calcium activity in one or two spots at the surface of the cortex. On the other side, the present another setup that they use to take advantage of the "real-time" version of DeepLabCut with their mice. The hardware and software that they used/develop is described at length, both in the article and in a companion GitHub repository. Next, they present experimental work that they have done with these two setups, training mice to max out a virtual cursor to obtain a reward, by taking advantage of auditory tone feedback that is provided to the mice as they modulate either (1) their local cortical calcium activity, or (2) their limb position.

      Strengths:

      This work illustrates the fact that thanks to readily available experimental building blocks, body movement and calcium imaging can be carried using readily available components, including imaging the brain using an incredibly cheap consumer electronics RGB camera (RGB Raspberry Pi Camera). It is a useful source of information for researchers that may be interested in building a similar setup, given the highly detailed overview of the system. Finally, it further confirms previous findings regarding the operant conditioning of the calcium dynamics at the surface of the cortex (Clancy et al. 2020) and suggests an alternative based on deeplabcut to the motor tasks that aim to image the brain at the mesoscale during forelimb movements (Quarta et al. 2022).

      Weaknesses:

      This work covers 3 separate research endeavors: (1) The development of two separate setups, their corresponding software. (2) A study that is highly inspired from the Clancy et al. 2020 paper on the modulation of the local cortical activity measured through a mesoscale calcium imaging setup. (3) A study of the mesoscale dynamics of the cortex during forelimb movements learning. Sadly, the analyses of the physiological data appears uncomplete, and more generally the paper tends to offer overstatements regarding several points:

      In contrast to the introductory statements of the article, closed-loop physiology in rodents is a well-established research topic. Beyond auditory feedback, this includes optogenetic feedback (O'Connor et al. 2013, Abbasi et al. 2018, 2023), electrical feedback in hippocampus (Girardeau et al. 2009), and much more.

      We have included and referenced these papers in our introduction section (quoted below) and rephrased the part where our previous text indicated there are fewer studies involving closed-loop physiology.

      “Some related studies have demonstrated the feasibility of closed-loop feedback in rodents, including hippocampal electrical feedback to disrupt memory consolidation (Girardeau et al.2009), optogenetic perturbations of somatosensory circuits during behavior (O'Connor et al.2013), and more recent advances employing targeted optogenetic interventions to guide behavior (Abbasi et al. 2023).”

      The behavioral setups that are presented are representative of the state of the art in the field of mesoscale imaging/head fixed behavior community, rather than a highly innovative design. In particular, the closed-loop latency that they achieve (>60 ms) may be perceived by the mice. This is in contrast with other available closed-loop setups.

      We thank the reviewer for this thoughtful comment and fully agree that our closed-loop latency is larger than that achieved in some other contemporary setups. Our primary aim in presenting this work, however, is not to compete with the lowest possible latencies, but to provide an open-source, accessible, and flexible platform that can be readily adopted by a broad range of laboratories. By building on widely available and lower-cost components, our design lowers the barrier of entry for groups that wish to implement closed-loop imaging and behavioral experiments, while still achieving latencies well within the range that can support many biologically meaningful applications.

      For example, our latency (~60 ms) remains compatible with experimental paradigms such as:

      Motor learning and skill acquisition, where sensorimotor feedback on the scale of tens to hundreds of milliseconds is sufficient to modulate performance.

      Operant conditioning and reward-based learning, in which reinforcement timing windows are typically broader and not critically dependent on sub-20 ms latencies.

      Cortical state dependent modulation, where feedback linked to slower fluctuations in brain activity (hundreds of milliseconds to seconds) can provide valuable insight.

      Studies of perception and decision-making, in which stimulus response associations often unfold on behavioral timescales longer than tens of milliseconds.

      We believe that emphasizing openness, affordability, and flexibility will encourage widespread adoption and adaptation of our setup across laboratories with different research foci. In this way, our contribution complements rather than competes with ultra-low-latency closed-loop systems, providing a practical option for diverse experimental needs.

      Through the paper, there are several statements that point out how important it is to carry out this work in a closed-loop setting with an auditory feedback, but sadly there is no "no feedback" control in cortical conditioning experiments, while there is a no-feedback condition in the forelimb movement study, which shows that learning of the task can be achieved in the absence of feedback.

      We fully agree that such a control would provide valuable insight into the contribution of feedback to learning in the CLNF paradigm. In designing our initial experiments, we envisioned multiple potential control conditions, including No-feedback and Random-feedback. However, our first and primary objective was to establish whether mice could indeed learn to modulate cortical ROI activation through auditory feedback, and to further investigate this across multiple cortical regions. For this reason, we focused on implementing the CLNF paradigm directly, without the inclusion of these additional control groups. To broaden the applicability of the system, we subsequently adapted the platform to the CLMF experiments, where we did incorporate a No-feedback group. These results, as the reviewer notes, strengthen the evidence for the role of feedback in shaping task performance. We agree that the inclusion of a No-feedback control group in the CLNF paradigm will be crucial in future studies to further dissect the specific contribution of feedback to cortical conditioning.

      The analysis of the closed-loop neuronal data behavior lacks controls. Increased performance can be achieved by modulating actively only one of the two ROIs, this is not clearly analyzed (for instance looking at the timing of the calcium signal modulation across the two ROIs. It seems that overall ROIs1 and 2 covariate, in contrast to Clancy et al. 2020. How can this be explained?

      We agree that the possibility of increased performance being driven by modulation of a single ROI is an important consideration. Our study indeed began with 1-ROI closed-loop experiments. In those early experiments, while we did observe animals improving performance across days, we realized that daily variability in ongoing cortical GCaMP activity could lead to fluctuations in threshold-crossing events. The 2-ROI design was subsequently introduced to reduce this variability, as the target activity was defined as the relative activity between the two ROIs (e.g., ROI1 – ROI2). This approach offered a more stable signal by normalizing ongoing fluctuations. In our analysis of the early 2-ROI experiments, we observed that animals adopted diverging strategies to achieve threshold crossings. Specifically, some animals increased activity in ROI1 relative to ROI2, while others decreased activity in ROI2 to accomplish the same effect. Once discovered, each animal consistently adhered to its chosen strategy throughout subsequent training sessions. This was an early and intriguing observation, but as the experiments were not originally designed to systematically test this effect, we limited our presentation to the analysis of a small number of animals (shown in Figure 11). We have added details about this observation in our Results section as well, quoted below-

      “In the 2-ROI experiment where the task rule required “ROI1 - ROI2” activity to cross a threshold for reward delivery, mice displayed divergent strategies. Some animals predominantly increased ROI1 activity, whereas others reduced ROI2 activity, both approaches leading to successful threshold crossing (Figure 11)”.

      We hope this clarifies how the use of two ROIs helps explain the apparent covariation of the signals, and why some divergence from the observations of Clancy et al. (2020) may be expected.

      Reviewer #3 (Public review):

      Summary:

      The study demonstrates the effectiveness of a cost-effective closed-loop feedback system for modulating brain activity and behavior in head-fixed mice. Authors have tested real-time closed-loop feedback system in head-fixed mice two types of graded feedback: 1) Closed-loop neurofeedback (CLNF), where feedback is derived from neuronal activity (calcium imaging), and 2) Closed-loop movement feedback (CLMF), where feedback is based on observed body movement. It is a python based opensource system, and authors call it CLoPy. The authors also claim to provide all software, hardware schematics, and protocols to adapt it to various experimental scenarios. This system is capable and can be adapted for a wide use case scenario.

      Authors have shown that their system can control both positive (water drop) and negative reinforcement (buzzer-vibrator). This study also shows that using the close loop system mice have shown better performance, learnt arbitrary task and can adapt to change in the rule as well. By integrating real-time feedback based on cortical GCaMP imaging and behavior tracking authors have provided strong evidence that such closed-loop systems can be instrumental in exploring the dynamic interplay between brain activity and behavior.

      Strengths:

      Simplicity of feedback systems designed. Simplicity of implementation and potential adoption.

      Weaknesses:

      Long latencies, due to slow Ca2+ dynamics and slow imaging (15 FPS), may limit the application of the system.

      We appreciate the reviewer’s comment and agree that latency is an important factor in our setup. The latency arises partly from the inherent slow kinetics of calcium signaling and GCaMP6s, and partly from the imaging rate of 15 FPS (every 66 ms). These limitations can be addressed in several ways: for example, using faster calcium indicators such as GCaMP8f, or adapting the system to electrophysiological signals, which would require additional processing capacity. In our implementation, image acquisition was fixed at 15 FPS to enable real-time frame processing (256 × 256 resolution) on Raspberry Pi 4B devices. With newer hardware, such as the Raspberry Pi 5, substantially higher acquisition and processing rates are feasible (although we have not yet benchmarked this extensively). More powerful platforms such as Nvidia Jetson or conventional PCs would further support much faster data acquisition and processing.

      Major comments:

      (1) Page 5 paragraph 1: "We tested our CLNF system on Raspberry Pi for its compactness, general-purpose input/output (GPIO) programmability, and wide community support, while the CLMF system was tested on an Nvidia Jetson GPU device." Can these programs and hardware be integrated with windows-based system and a microcontroller (Arduino/ Tency). As for the broad adaptability that's what a lot of labs would already have (please comment/discuss)?

      While we tested our CLNF system on a Raspberry Pi (chosen for its compactness, GPIO programmability, and large user community) and our CLMF system on an Nvidia Jetson GPU device (to leverage real-time GPU-based inference), the underlying software is fully written in Python. This design choice makes the system broadly adaptable: it can be run on any device capable of executing Python scripts, including Windows-based PCs, Linux machines, and macOS systems. For hardware integration, we have confirmed that the framework works seamlessly with microcontrollers such as Arduino or Teensy, requiring only minor modifications to the main script to enable sending and receiving of GPIO signals through those boards. In fact, we are already using the same system in an in-house project on a Linux-based PC where an Arduino is connected to the computer to provide GPIO functionality. Furthermore, the system is not limited to Raspberry Pi or Arduino boards; it can be interfaced with any GPIO-capable devices, including those from Adafruit and other microcontroller platforms, depending on what is readily available in individual labs. Since many neuroscience and engineering laboratories already possess such hardware, we believe this design ensures broad accessibility and ease of integration across diverse experimental setups.

      (2) Hardware Constraints: The reliance on Raspberry Pi and Nvidia Jetson (is expensive) for real-time processing could introduce latency issues (~63 ms for CLNF and ~67 ms for CLMF). This latency might limit precision for faster or more complex behaviors, which authors should discuss in the discussion section.

      In our system, we measured latencies of approximately ~63 ms for CLNF and ~67 ms for CLMF. While such latencies indeed limit applications requiring millisecond precision, such as fast whisker movements, saccades, or fine-reaching kinematics, we emphasize that many relevant behaviors, including postural adjustments, limb movements, locomotion, and sustained cortical state changes, occur on timescales that are well within the capture range of our system. Thus, our platform is appropriate for a range of mesoscale behavioral studies that probably needs to be discussed more. It is also important to note that these latencies are not solely dictated by hardware constraints. A significant component arises from the inherent biological dynamics of the calcium indicator (GCaMP6s) and calcium signaling itself, which introduce slower temporal kinetics independent of processing delays. Newer variants, such as GCaMP8f, offer faster response times and could further reduce effective biological latency in future implementations.

      With respect to hardware, we acknowledge that Raspberry Pi provides a low-cost solution but contributes to modest computational delays, while Nvidia Jetson offers faster inference at higher cost. Our choice reflects a balance between accessibility, cost-effectiveness, and performance, making the system deployable in many laboratories. Importantly, the modular and open-source design means the pipeline can readily be adapted to higher-performance GPUs or integrated with electrophysiological recordings, which provide higher temporal resolution. Finally, we agree with the reviewer that the issue of latency highlights deeper and interesting questions regarding the temporal requirements of behavior classification. Specifically, how much data (in time) is required to reliably identify a behavior, and what is the minimum feedback delay necessary to alter neural or behavioral trajectories? These are critical questions for the design of future closed-loop systems and ones that our work helps frame.

      We have added a slightly modified version of our response above in the discussion section under “Experimental applications and implications”.

      (3) Neurofeedback Specificity: The task focuses on mesoscale imaging and ignores finer spatiotemporal details. Sub-second events might be significant in more nuanced behaviors. Can this be discussed in the discussion section?

      This is a great point  and we have added the following to the discussion section. “In the case of CLNF we have focused on regional cortical GCAMP signals that are relatively slow in kinetics. While such changes are well suited for transcranial mesoscale imaging assessment, it is possible that cellular 2-photon imaging (Yu et al. 2021) or preparations that employ cleared crystal skulls (Kim et al. 2016) could resolve more localized and higher frequency kinetic signatures.”

      (4) The activity over 6s is being averaged to determine if the threshold is being crossed before the reward is delivered. This is a rather long duration of time during which the mice may be exhibiting stereotyped behaviors that may result in the changes in DFF that are being observed. It would be interesting for the authors to compare (if data is available) the behavior of the mice in trials where they successfully crossed the threshold for reward delivery and in those trials where the threshold was not breached. How is this different from spontaneous behavior and behaviors exhibited when they are performing the test with CLNF? 

      We would like to emphasize that we are not directly averaging activity over 6 s to compare against the reward threshold. Instead, the preceding 6 s of activity is used solely to compute a dynamic baseline for ΔF/F<sub>0</sub> ( ΔF/F<sub>0</sub> = (F –F<sub>0</sub> )/F<sub>0</sub>). Here, F<sub>0</sub>is calculated as the mean fluorescence intensity over the prior 6 s window and is updated continuously throughout the session. This baseline is then subtracted from the instantaneous fluorescence signal to detect relative changes in activity. The reward threshold is therefore evaluated against these baseline-corrected ΔF/F<sub>0</sub> values at the current time point, not against an average over 6 s. This moving-window baseline correction is a standard approach in calcium imaging analyses, as it helps control for slow drifts in signal intensity, bleaching effects, or ongoing fluctuations unrelated to the behavior of interest. Thus, the 6-s window is not introducing a temporal lag in reward assignment but is instead providing a reference to detect rapid increases in cortical activity.  We have added the term dynamic baseline to the Methods to clarify.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      Additional suggestions for improved or additional experiments, data or analyses.

      For: "Looking closely at their reward rate on day 5 (day of rule change), they had a higher reward rate in the second half of the session as compared to the first half, indicating they were adapting to the rule change within one session." It would be helpful to see this data, and would be good to see within-session learning on the rule change day

      Thank you for pointing this out. We had missed referencing the figure in the text, and have now added a citation to Supplementary Figure 4A, which shows the cumulative rewards for each day of training. As seen in the plot for day 5, the cumulative rewards are comparable to those on day 1, with most rewards occurring during the second half of the session.

      For: "These results suggest that motor learning led to less cortical activation across multiple regions, which may reflect more efficient processing of movement-related activity," it could also be the case that the behaviour became more stereotyped over learning, which would lead to more concentrated, correlated activity. To test this, it would be good to look at the limb variability across sessions. Similarly, if it is movement-related, there should be good decoding of limb kinematics.

      Indeed, we observed that behavior became more stereotyped over the course of learning, as shown in Supplementary Figure 4C, 4D. One plausible explanation for the reduction in cortical activation across multiple regions is that behavior itself became more stereotyped, a possibility we have explored in the manuscript. Specifically, forelimb movements during the trial became increasingly correlated as mice improved on the task, particularly in the groups that received auditory feedback (Rule-change and No-rule-change groups; Figure 8). As movements became more correlated, overall body movements during trials decreased and aligned more closely with the task rule (Figure 9D). This suggests that reduced cortical activity may in part reflect changes in behavior. Importantly, however, in the Rule-change group, we observed that on the day of the rule switch (day 5), when the target shifted from the left to the right forelimb, cortical activity increased bilaterally (Figure 9A–C). This finding highlights our central point: groups that received feedback (Rule-change and No-rule-change) were able to identify the task rule more effectively, and both their behavior and cortical activity became more specifically aligned with the rule compared to the No-feedback group. We agree with the reviewers that additional analyses along these lines would be valuable future directions. To facilitate this, we have included the movement data for readers who may wish to pursue further analyses, details can be found under “Data and code availability” in Methods section. However, given the limited sample sizes in our dataset and the need to keep the manuscript focused on the central message, we felt that including these additional analyses here would risk obscuring the main findings.

      For: "We believe the decrease in ΔF/F0peak is unlikely to be driven by changes in movement, as movement amplitudes did not decrease significantly during these periods (Figure 7D CLMF Rule-change)." I would formally compare the two conditions. This is an important control. Also, another way to see if the change in deltaF is related to movement would be to see if you can predict movement from the deltaF.

      Figure 7D in the previous version is Figure 9D in the current revision of the manuscript. We've assessed this for the examples shown based on graphing the movement data, unfortunately there is not enough of that data to do a group analysis of movement magnitude. We would suggest that this would be an excellent future direction that would take advantage of the flexible open source nature of our tool.

      Recommendations for improving the writing and presentation.

      In the abstract there is no mention of the rationale for the project, or the resulting significance. I would modify this to increase readership by the behavioral neuroscience community. Similarly, the introduction also doesn't highlight the value of this resource for the field. Again, I think the pyControl paper does a good job of this. For readability, I would add more subheadings earlier in the results, to separate the different technical aspects of the system.

      We have revised the introduction to include the rationale for the project, its potential implications, and its relevance for translational research. We have also framed the work within the broader context of the behavioral and systems neuroscience community. We greatly appreciate this suggestion, as we believe it enhances the clarity and accessibility of the manuscript for the community.

      For: "While brain activity can be controlled through feedback, other variables such as movements have been less studied, in part because their analysis in real time is more challenging." I would highlight research that has studied the control of behavior through feedback, such as the Mathis paper where mice learn to pull a joystick to a virtual box, and adapt this motion to a force perturbation.

      We have added a citation to the Mathis paper and describe this as an additional form of feedback. The text is quoted below:

      “Opportunities also exist in extending real time pose classification (Forys et al. 2020; Kane et al. 2020) and movement perturbation (Mathis et al. 2017) to shape aspects of an animal’s motor repertoire.”

      Some of the results content would be better suited for the methods, one example: "A previous version of the CLNF system was found to have non-linear audio generation above 10 kHz, partly due to problems in the audio generation library and partly due to the consumer-grade speaker hardware we were employing. This was fixed by switching to the Audiostream (https://github.com/kivy/audiostream) library for audio generation and testing the speakers to make sure they could output the commanded frequencies"

      This is now moved to the Methods section.

      For: "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19), supporting the idea that neural efficiency could improve with learning," not sure I agree with this, the studies on cortical plasticity are usually to show a neural basis for the learning observed, efficiency is separate from this.

      We have modified this statement to remove the concept of efficiency "There are reports of cortical plasticity during motor learning tasks, both at cellular and mesoscopic scales (17-19).”

      The paragraph that opens "Distinct task- and reward-related cortical dynamics" that describes the experiment should appear in the previous section, as the data is introduced there.

      We have moved the mentioned paragraphs in the previous section where we presented the data and other experiment details. This makes the text more readable and contextual.

      I would present the different ROI rules with better descriptors and visualization to improve the readability.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments.

      Minor corrections to the text and figures.

      Figure 1 is a little crowded, combining the CLNF and CLMF experiments, I would turn this into a 2 panel figure, one for each, similar to how you did figure 2.

      We have revised Figure 1 to include two panels, one for CLNF and one for CLMF. The colored components indicate elements specific to each setup, while the uncolored components represent elements shared between CLNF and CLMF. Relevant text in the manuscript is updated to refer to these figures.

      For Figure 2, the organization of the CLMF section is not intuitive for the reader. I would reorder it so it has a similar flow as the CLNF experiment.

      We have revised the figure by updating the layout of panel B (CLMF) to align with panel A (CLNF), thereby creating a more intuitive and consistent flow between the panels. We appreciate this helpful suggestion, which we believe has substantially improved the clarity of the figure. The corresponding text in the manuscript has also been updated to reflect these changes.

      For Figure 3, highlight that C and E are examples. They also seem a little out of place, so they could even be removed.

      We have now explicitly labeled Figures 3C and 3E as representative examples (figure legend and on figure itself). We believe including these panels provides helpful context for readers: Figure 3C illustrates how the ROIs align on the dorsal cortical brain map with segmented cortical regions, while Figure 3E shows example paw trajectories in three dimensions, allowing visualization of the movement patterns observed during the trials.

      In the plots, I would add sample sizes, for instance, in CLNF learning curve in Figure 4A, how many animals are in each group? 

      We have labeled Figure 4 with number of animals used in CLNF (No-rule-change, N=23; Rule-change, N=17), and CLMF (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4).

      Also, Figure 7 for example, which figures are single-sessions, versus across animals? For Figure 7c, what time bin is the data taken from?

      We have clarified this now and mentioned it in all the figures. Figure 7 in the previous version is Figure 9 in the current updated manuscript. Figure 9A is from individual sessions on different days from the same mouse. Figure 9B is the group average reward centered ΔF/F<sub>0</sub> activity in different cortical regions (Rule-change, N=8; No-rule-change, N=4; No-feedback, N=4). Figure 9C shows average ΔF/F<sub>0</sub> peak values obtained within -1sec to +1sec centered around the reward point (N=8).

      It says "punish" in Figure 3, but there is no punishment?

      Yes, the task did not involve punishment. Each trial resulted in either a success, which is followed by a reward, or a failure, which is followed by a buzzer sound. To better reflect these outcomes, we have updated Figure 3 and replaced the labels “Reward” with “Success” and “Punish” with “Failure.”

      The regression on 5c doesn't look quite right, also this panel is not mentioned in the text.

      The figure referred to by the reviewer as Figure 5 is now presented as Figure 6 in the revised manuscript. Regarding the reviewer’s observation about the regression line in the left panel of Figure 5C, the apparent misalignment arises because the majority of the data points are densely clustered at the center of the scatter plot, where they overlap substantially. The regression line accurately reflects this concentration of overlapping data. To improve clarity, we have updated the figure and ensured that it is now appropriately referenced in the Results section.

      Reviewer #2 (Recommendations for the authors):

      (1) There would be many interesting observations and links between the peripheral and cortical studies if there was a body video available during the cortical study. Is there any such data available?

      We agree that a detailed analysis of behavior during the CLNF task would be necessary to explore any behavior correlates with success in the task. Unfortunately, we do not have a sufficient video of the whole body to perform such an analysis.

      (2) The text (p. 24) states: [intracortical GCAMP transients measured over days became more stereotyped in kinetics and were more correlated (to each other) as the task performance increased over the sessions (Figure 7E).] But I cannot find this quantification in the figures or text?

      Figure 7 in the previous version of the manuscript now appears as Figure 9. In this figure, we present cortical activity across selected regions during trials, and in Figure 9E we highlight that this activity becomes more correlated. Since we did not formally quantify variability, we have removed the previous claim that the activity became stereotyped and revised the text in the updated manuscript accordingly.

      Typos:

      10-serest c (page 13)

      Inverted color codes in figure 4E vs F

      Reviewer #3 (Recommendations for the authors):

      We have mostly attempted to limit the feedback to suggestions and posed a few questions that might be interesting to explore given the dataset the authors have collected.

      Comments:

      In close loop systems the latency is primary concern, and authors have successfully tested the latency of the system (Delay): from detection of an event to the reaction time was less than 67ms.

      We have commented on the issues and limitations caused by latency, and potential future directions to overcome these challenges in responses to some of the previous comments.

      Additional major comments:

      "In general, all ROIs assessed that encompassed sensory, pre-motor, and motor areas were capable of supporting increased reward rates over time (Figure 4A, Animation 1)." Fig 4A is merely showing change in task performance over time and does not have information regarding the changes observed specific to CLNF for each ROI.

      We acknowledge that the sample size for individual ROI rules was not sufficient for meaningful comparisons. To address this limitation, we pooled the data across all the rules tested. The manuscript includes a detailed list of the rules along with their corresponding sample sizes for transparency.

      A ΔF/F<sub>0</sub> threshold value was calculated from a baseline session on day 0 that would have allowed 25% performance. Starting from this basal performance of around 25% on day 1, mice (CLNF No-rule-change, n=28 and CLNF Rule-change, n=13). It is unclear what the replicates here are. Trials or mice? The corresponding Figure legend has a much smaller n value.

      Thank you for pointing this out. We realized that we had not indicated the sample replicates in the figure, and the use of n instead of N for the number of animals may have been misleading. We have now corrected the notation and clarified this information in the figure to resolve the discrepancy.

      What were the replicates for each ROI pairs evaluated?

      Each ROI rule and number of mice and trials are listed in Table 5 and Table 6.

      Our analysis revealed that certain ROI rules (see description in methods) lead to a greater increase in success rate over time than others (Supplementary Figure 3D). The Supplementary figures 3C and 3D are blurry and could use higher resolution images. 

      We have increased the font size of the text that was previously difficult to read and re-exported the figure at a higher resolution (300 DPI). We believe these changes will resolve the issue.

      Also, It will help the reader is a visual representation of the ROI pairs are provided, instead of the text view. One interesting question is whether there are anatomical biases to fast vs slow learning pairs (Directionality - anterior/posterior, distance between the selected ROIs etc). This could be interesting to tease apart.

      We have added Supplementary Figure 7, which provides visualizations of the ROIs across all task rules used in the CLNF experiments. While a detailed investigation of the anatomical basis of fast versus slow learning cortical ROIs is beyond the scope of the present study, we agree that this represents an exciting future direction for further research.

      How distant should the ROIs be to achieve increased task performance?

      We appreciate this insightful question. We did not specifically test this scenario. In our study, we selected 0.3 × 0.3 mm ROIs centered on the standard AIBS mouse brain atlas (CCF). At this resolution, ROIs do not overlap, regardless of their placement in a two-ROI experiment. Furthermore, because our threshold calculations are based on baseline recordings, we expect the system would function for any combination of ROI placements. Nonetheless, exploring this systematically would be an interesting avenue for future experiments.

      Figures:

      I would leave out some of the methodological details such as the protocol for water restriction (Fig. 3) out of the legend. This will help with readability.

      We have removed some of the methodological details, including those mentioned above, from the legend of Figure 3 in the updated manuscript.

      Fig 1 and Fig 2: In my opinion, It would be easier for the reader if the current Fig. 2, which provides a high level description of CLNF and CLBF is presented as Fig. 1. The current Fig. 1, goes into a lot of methodological implementation details, and also includes a lot of programming jargon that is being introduced early in the paper that is hard to digest early on in the paper's narrative.

      Thank you for the suggestion. In the new manuscript, Figure 1 and Figure 2 have been swapped.

      Higher-resolution images/ plots are needed in many instances. Unsure if this is the pdf compression done by the manuscript portal that is causing this.

      All figures were prepared in vector graphics format using the open-source software Inkscape. For this manuscript, we exported the images at 300 DPI, which is generally sufficient for publication-quality documents. The submission portal may apply additional processing, which could have resulted in a reduction in image quality. We will carefully review the final submission files and ensure that all figures are clear and of high quality.

      The authors repeatedly show ROI specific analysis M1_L, F1_R etc. It will be helpful to provide a key, even if redundant in all figures to help the reader.

      We have now included keys to all such abbreviations in all the figures.

      There are also instances of editorialization and interpretation e.g., "Surprisingly, the "Rule-change" mice were able to discover the change in rule and started performing above 70% within a day of the rule change, on day 6" that would be more appropriate in the main body of the paper.

      Thank you for pointing this out in the figure legend, and we have removed it now since we already discussed this in the Results.

      Minor comments

      (1) The description of Figure 1 is hard to follow and can be described better based on how the information is processed and executed in the system from source to processing and back. Using separated colors (instead of shaded of grey) for the neuro feedback and movement feedback would help as well. Common components could have a different color. The specification like the description of the config file should come later.

      Figure 1 in the previous version is Figure 2 in the updated version. We have taken suggestions from other reviewers and made the figure easier to understand and split it into two panels with color coding Green for CLNF, Pink for CLMF specific parts while common shared parts are left without any color.

      (2) Page 20 last paragraph:

      Authors are neglecting that the rule change is done one day prior and the results that you see in the second half on the 6th day are not just because of the first half of the 6th day instead combined training on the 5th day (rule change) and then the first half of the 6th day. Rephrasing this observation is essential.

      We have revised the text for clarity to indicate that the performance increase observed on day 6 is not necessarily attributable to training on that day. In fact, we noted and mentioned that mice began to perform the task better during the second half of the session on day 5 itself.

      (3)  The method section description of the CLMF setup (Page no 39 first paragraph) is more detailed, a diagram of this setup would make it easy to follow and a better read.

      We have made changes to the CLMF setup (Figure 1B) and CLMF schematic (Figure 2B) to make it easier to understand parts of the setup and flow of control.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Bansal et al. present a study on the fundamental blood and nectar feeding behaviors of the critical disease vector, Anopheles stephensi. The study encompasses not just the fundamental changes in blood feeding behaviors of the crucially understudied vector, but then uses a transcriptomic approach to identify candidate neuromodulation pathways which influence blood feeding behavior in this mosquito species. The authors then provide evidence through RNAi knockdown of candidate pathways that the neuromodulators sNPF and Rya modulate feeding either via their physiological activity in the brain alone or through joint physiological activity along the brain-gut axis (but critically not the gut alone). Overall, I found this study to be built on tractable, well-designed behavioral experiments.

      Their study begins with a well-structured experiment to assess how the feeding behaviors of A. stephensi change over the course of its life history and in response to its age, mating, and oviposition status. The authors are careful and validate their experimental paradigm in the more well-studied Ae. aegypti, and are able to recapitulate the results of prior studies, which show that mating is a prerequisite for blood feeding behaviors in Ae. aegypt. Here they find A. Stephensi, like other Anopheline mosquitoes, has a more nuanced regulation of its blood and nectar feeding behaviors.

      The authors then go on to show in a Y-maze olfactometer that ,to some degree, changes in blood feeding status depend on behavioral modulation to host cues, and this is not likely to be a simple change to the biting behaviors alone. I was especially struck by the swap in valence of the host cues for the blood-fed and mated individuals, which had not yet oviposited. This indicates that there is a change in behavior that is not simply desensitization to host cues while navigating in flight, but something much more exciting is happening.

      The authors then use a transcriptomic approach to identify candidate genes in the blood-feeding stages of the mosquito's life cycle to identify a list of 9 candidates that have a role in regulating the host-seeking status of A. stephensi. Then, through investigations of gene knockdown of candidates, they identify the dual action of RYa and sNPF and candidate neuromodulators of host-seeking in this species. Overall, I found the experiments to be well-designed. I found the molecular approach to be sound. While I do not think the molecular approach is necessarily an all-encompassing mechanism identification (owing mostly to the fact that genetic resources are not yet available in A. stephensi as they are in other dipteran models), I think it sets up a rich line of research questions for the neurobiology of mosquito behavioral plasticity and comparative evolution of neuromodulator action.

      We appreciate the reviewer’s detailed summary of our work. We thank them for their positive comments and agree with them on the shortcomings of our approach.

      Strengths:

      I am especially impressed by the authors' attention to small details in the course of this article. As I read and evaluated this article, I continued to think about how many crucial details could potentially have been missed if this had not been the approach. The attention to detail paid off in spades and allowed the authors to carefully tease apart molecular candidates of blood-seeking stages. The authors' top-down approach to identifying RYamide and sNPF starting from first principles behavioral experiments is especially comprehensive. The results from both the behavioral and molecular target studies will have broad implications for the vectorial capacity of this species and comparative evolution of neural circuit modulation.

      We really appreciate that the reviewer has recognised the attention to detail we have tried to put, thank you!

      Weaknesses:

      There are a few elements of data visualizations and methodological reporting that I found confusing on a first few read-throughs. Figure 1F, for example, was initially confusing as it made it seem as though there were multiple 2-choice assays for each of the conditions. I would recommend removing the "X" marker from the x-axis to indicate the mosquitoes did not feed from either nectar, blood, or neither in order to make it clear that there was one assay in which mosquitoes had access to both food sources, and the data quantify if they took both meals, one meal, or no meals.

      We thank the reviewer for flagging the schematic in figure 1F. As suggested, we have removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose in the assay. For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data, as it does not capture the variability in the data.

      I would also like to know more about how the authors achieved tissue-specific knockdown for RNAi experiments. I think this is an intriguing methodology, but I could not figure out from the methods why injections either had whole-body or abdomen-specific knockdown.

      The tissue-specific knockdown (abdomen only or abdomen+head) emerged from initial standardisations where we were unable to achieve knockdown in the head unless we used higher concentrations of dsRNA and did the injections in older females. We realised that this gave us the opportunity to isolate the neuronal contribution of these neuropeptides in the phenotype produced. Further optimisations revealed that injecting dsRNA into 0-10h old females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 4 days old females resulted in knockdowns in both tissues. Moreover, head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts.

      We have mentioned the knockdown conditions- time of injection and the amount dsRNA injected- for tissue-specific knockdowns in methods but realise now that it does not explain this well enough. We have now edited it to state our methodology more clearly (see lines 932-948).

      I also found some interpretations of the transcriptomic to be overly broad for what transcriptomes can actually tell us about the organism's state. For example, the authors mention, "Interestingly, we found that after a blood meal, glucose is neither spent nor stored, and that the female brain goes into a state of metabolic 'sugar rest', while actively processing proteins (Figure S2B, S3)".

      This would require a physiological measurement to actually know. It certainly suggests that there are changes in carbohydrate metabolism, but there are too many alternative interpretations to make this broad claim from transcriptomic data alone.

      We thank the reviewer for pointing this out and agree with them. We have now edited our statement to read:

      “Instead, our data suggests altered carbohydrate metabolism after a blood meal, with the female brain potentially entering a state of metabolic 'sugar rest' while actively processing proteins (Figure S2B, S3). However, physiological measurements of carbohydrate and protein metabolism will be required to confirm whether glucose is indeed neither spent nor stored during this period.” See lines 271-277.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Bansal et al examine and characterize feeding behaviour in Anopheles stephensi mosquitoes. While sharing some similarities to the well-studied Aedes aegypti mosquito, the authors demonstrate that mated females, but not unmated (virgin) females, exhibit suppression in their bloodfeeding behaviour. Using brain transcriptomic analysis comparing sugar-fed, blood-fed, and starved mosquitoes, several candidate genes potentially responsible for influencing blood-feeding behaviour were identified, including two neuropeptides (short NPF and RYamide) that are known to modulate feeding behaviour in other mosquito species. Using molecular tools, including in situ hybridization, the authors map the distribution of cells producing these neuropeptides in the nervous system and in the gut. Further, by implementing systemic RNA interference (RNAi), the study suggests that both neuropeptides appear to promote blood-feeding (but do not impact sugar feeding), although the impact was observed only after both neuropeptide genes underwent knockdown.

      Strengths and/or weaknesses:

      Overall, the manuscript was well-written; however, the authors should review carefully, as some sections would benefit from restructuring to improve clarity. Some statements need to be rectified as they are factually inaccurate.

      Below are specific concerns and clarifications needed in the opinion of this reviewer:

      (1) What does "central brains" refer to in abstract and in other sections of the manuscript (including methods and results)? This term is ambiguous, and the authors should more clearly define what specific components of the central nervous system was/were used in their study.

      Central brain, or mid brain, is a commonly used term to refer to brain structures/neuropils without the optic lobes (For example: https://www.nature.com/articles/s41586-024-07686-5). In this study we have focused our analysis on the central brain circuits involved in modulating blood-feeding behaviour and have therefore excluded the optic lobes. As optic lobes account for nearly half of all the neurons in the mosquito brain (https://pmc.ncbi.nlm.nih.gov/articles/PMC8121336/), including them would have disproportionately skewed our transcriptomic data toward visual processing pathways. 

      We have indicated this in figure 3A and in the methods (see lines 800-801, 812). We have now also clarified it in the results section for neurotranscriptomics to avoid confusion (see lines 236-237).

      (2) The abstract states that two neuropeptides, sNPF and RYamide are working together, but no evidence is summarized for the latter in this section.

      We thank the reviewer for pointing this out. We have now added a statement “This occurs in the context of the action of RYa in the brain” to end of the abstract, for a complete summary of our proposed model. 

      (3) Figure 1

      Panel A: This should include mating events in the reproductive cycle to demonstrate differences in the feeding behavior of Ae. aegypti.

      Our data suggest that mating can occur at any time between eclosion and oviposition in An. stephensi and between eclosion and blood feeding in Ae. aegypti. Adding these into (already busy) 1A, would cloud the purpose of the schematic, which is to indicate the time points used in the behavioural assays and transcriptomics.

      Panel F: In treatments where insects were not provided either blood or sugar, how is it that some females and males had fed? Also, it is unclear why the y-axis label is % fed when the caption indicates this is a choice assay. Also, it is interesting that sugar-starved females did not increase sugar intake. Is there any explanation for this (was it expected)?

      We apologise for the confusion. The experiment is indeed a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. The x-axis indicates the choice made by the mosquitoes, not the choice provided in the assay, and the y-axis indicates the percentage of males or females that made each particular choice. We have now removed the “X” markers from the x-axis and revised the axis label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      In this assay, we scored females only for the presence or absence of each meal type (blood or sugar) and are therefore unable to comment on whether sugar-starved females consumed more sugar than sugarsated females. However, when sugar-starved, a higher proportion of females consumed both blood and sugar, while fewer fed on blood alone.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data as it does not capture the variability in the data.

      (4) Figure 3

      In the neurotranscriptome analysis of the (central) brain involving the two types of comparisons, can the authors clarify what "excluded in males" refers to? Does this imply that only genes not expressed in males were considered in the analysis? If so, what about co-expressed genes that have a specific function in female feeding behaviour?

      This is indeed correct. We reasoned that since blood feeding is exclusive to females, we should focus our analysis on genes that were specifically upregulated in them. As the reviewer points out, it is very likely that genes commonly upregulated in males and females may also promote blood feeding and we will miss out on any such candidates based on our selection criteria. 

      (5) Figure 4

      The authors state that there is more efficient knockdown in the head of unfed females; however, this is not accurate since they only get knockdown in unfed animals, and no evidence of any knockdown in fed animals (panel D). This point should be revised in the results test as well.

      Perhaps we do not understand the reviewer’s point or there has been a misunderstanding. In figure 4D, we show that while there is more robust gene knockdown in unfed females, blood-fed females also showed modest but measurable knockdowns ranging from 5-40% for RYamide and 2-21% for sNPF. 

      Relatedly, blood-feeding is decreased when both neuropeptide transcripts are targeted compared to uninjected (panel C) but not compared to dsGFP injected (panel E). Why is this the case if authors showed earlier in this figure (panel B) that dsGFP does not impact blood feeding?

      We realise this concern stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens. 4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomens. We have now added a schematic in the plots to make this clearer.

      In addition, do the uninjected and dsGFP-injected relative mRNA expression data reflect combined RYa and sNPF levels? Why is there no variation in these data,…

      In these qPCRs, we calculated relative mRNA expression using the delta-delta Ct method (see line 975). For each neuropeptide its respective control was used. For simplicity, we combined the RYa and sNPF control data into a single representation. The value of this control is invariant because this method sets the control baseline to a value of 1.

      …and how do transcript levels of RYa and sNPF compare in the brain versus the abdomen (the presentation of data doesn't make this relationship clear).

      The reviewer is correct in pointing out that we have not clarified this relationship in our current presentation. While we have not performed absolute mRNA quantifications, we extracted relative mRNA levels from qPCR data of 96h old unmanipulated control females. We observed that both sNPF and RYa transcripts are expressed at much lower levels in the abdomens, as compared to those in the heads, as shown in Author response Image 1 below. 

      Author response image 1.

      (6) As an overall comment, the figure captions are far too long and include redundant text presented in the methods and results sections.

      We thank the reviewer for flagging this and have now edited the legends to remove redundancy.  

      (7) Criteria used for identifying neuropeptides promoting blood-feeding: statement that reads "all neuropeptides, since these are known to regulate feeding behaviours". This is not accurate since not all neuropeptides govern feeding behaviors, while certainly a subset do play a role.

      We agree with the reviewer that not all neuropeptides regulate feeding behaviours. Our statement refers to the screening approach we used: in our shortlist of candidates, we chose to validate all neuropeptides.

      (8) In the section beginning with "Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels...", the authors state that there was no change in blood-feeding and later state the opposite. The wording should be clarified as it is unclear.

      Thank you for pointing this out. We were referring to an unchanged proportion of the blood fed females. We have now edited the text to the following: 

      “Two neuropeptides - sNPF and RYa - showed about 25% and 40% reduced mRNA levels in the heads but the proportion of females that took blood meals remained unchanged”. See lines 338-340.

      (9) Just before the conclusions section, the statement that "neuropeptide receptors are often ligandpromiscuous" is unjustified. Indeed, many studies have shown in heterologous systems that high concentrations of structurally related peptides, which are not physiologically relevant, might cross-react and activate a receptor belonging to a different peptide family; however, the natural ligand is often many times more potent (in most cases, orders of magnitude) than structurally related peptides. This is certainly the case for various RYamide and sNPF receptors characterized in various insect species.

      We agree with the reviewer and apologise for the mistake. We have now removed the statement.

      (10) Methods

      In the dsRNA-mediated gene knockdown section, the authors could more clearly describe how much dsRNA was injected per target. At the moment, the reader must carry out calculations based on the concentrations provided and the injected volume range provided later in this section.

      We have now edited the section to reflect the amount of dsRNA injected per target. Please see lines 921-931.

      It is also unclear how tissue-specific knockdown was achieved by performing injection on different days/times. The authors need to explain/support, and justify how temporal differences in injection lead to changes in tissue-specific expression. Does the blood-brain barrier limit knockdown in the brain instead, while leaving expression in the peripheral organs susceptible?

      To achieve tissue-specific knockdowns of sNPF and RYa, we optimised both the time of injection as well as the dsRNA concentration to be injected. Injecting dsRNA into 0-10h females produced abdomen-specific knockdowns without affecting head expression, whereas injections into 96h old females resulted in knockdowns in both tissues. Head knockdowns in older females required higher dsRNA concentrations, with knockdown efficiency correlating with the amount injected. In contrast, abdominal knockdowns in younger females could be achieved even with lower dsRNA amounts, reflecting the lower baseline expression of sNPF in abdomens compared to heads and the age-dependent increase in head expression (as confirmed by qPCR). It is possible that the blood-brain barrier also limits the dsRNA entering the brain, thereby requiring higher amounts to be injected for head knockdowns. 

      We have now edited this section to state our methodology more clearly (see lines 932-948).

      For example, in Figure 4, the data support that knockdown in the head/brain is only effective in unfed animals compared to uninjected animals, while there is no evidence of knockdown in the brain relative to dsGFP-injected animals. Comparatively, evidence appears to show stronger evidence of abdominal knockdown mostly for the RYa transcript (>90%) while still significantly for the sNPF transcript (>60%).

      As we explained earlier, this concern likely stems from our representation of the data. Since we had earlier determined that dsGFP-injected females fed similarly to uninjected females (fig 4B), we used these controls interchangeably in subsequent experiments. To avoid confusion, we have now only used the label ‘control’ in figure 4 (and supplementary figure S9) and specified which control was used for each experiment in the legend.

      In addition to this, we wanted to clarify that fig 4C and 4E are independent experiments. 4C is the behaviour corresponding to when the neuropeptides were knocked down in both heads and abdomens.  4E is the behaviour corresponding to when the neuropeptides were knocked down in only the abdomen. We have now added a schematic in the plots to make this clearer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript investigates the regulation of host-seeking behavior in Anopheles stephensi females across different life stages and mating states. Through transcriptomic profiling, the authors identify differential gene expression between "blood-hungry" and "blood-sated" states. Two neuropeptides, sNPF and RYamide, are highlighted as potential mediators of host-seeking behavior. RNAi knockdown of these peptides alters host-seeking activity, and their expression is anatomically mapped in the mosquito brain (sNPF and RYamide) and midgut (sNPF only).

      Strengths:

      (1) The study addresses an important question in mosquito biology, with relevance to vector control and disease transmission.

      (2) Transcriptomic profiling is used to uncover gene expression changes linked to behavioral states.

      (3) The identification of sNPF and RYamide as candidate regulators provides a clear focus for downstream mechanistic work.

      (4) RNAi experiments demonstrate that these neuropeptides are necessary for normal host-seeking behavior.

      (5) Anatomical localization of neuropeptide expression adds depth to the functional findings.

      Weaknesses:

      (1) The title implies that the neuropeptides promote host-seeking, but sufficiency is not demonstrated (for example, with peptide injection or overexpression experiments).

      Demonstrating sufficiency would require injecting sNPF peptide or its agonist. To date, no small-molecule agonists (or antagonists) that selectively mimic sNPF or RYa neuropeptides have been identified in insects. An NPY analogue, TM30335, has been reported to activate the Aedes aegypti NPY-like receptor 7 (NPYLR7; Duvall et al., 2019), which is also activated by sNPF peptides at higher doses (Liesch et al., 2013). Unfortunately, the compound is no longer available because its manufacturer, 7TM Pharma, has ceased operations. Synthesising the peptides is a possibility that we will explore in the future.

      (2) The proposed model regarding central versus peripheral (gut) peptide action is inconsistently presented and lacks strong experimental support.

      The best way to address this would be to conduct tissue-specific manipulations, the tools for which are not available in this species. Our approach to achieve head+abdomen and abdomen only knockdown was the closest we could get to achieving tissue specificity and allowed us to confirm that knockdown in the head was necessary for the phenotype. However, as the reviewer points out, this did not allow us to rule out any involvement of the abdomen. This point has been addressed in lines 364-371.

      (3) Some conclusions appear premature based on the current data and would benefit from additional functional validation.

      The most definitive way of demonstrating necessity of sNPF and RYa in blood feeding would be to generate mutant lines. While we are pursuing this line of experiments, they lie beyond the scope of a revision. In its absence, we relied on the knockdown of the genes using dsRNA. We would like to posit that despite only partial knockdown, mosquitoes do display defects in blood-feeding behaviour, without affecting sugar-feeding. We think this reflects the importance of sNPF in promoting blood feeding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Overall, I found this manuscript to be well-prepared, visually the figures are great and clearly were carefully thought out and curated, and the research is impactful. It was a wonderful read from start to finish. I have the following recommendations:

      Thank you very much, we are very pleased to hear that you enjoyed reading our manuscript!

      (1) For future manuscripts, it would make things significantly easier on the reviewer side to submit a format that uses line numbers.

      We sincerely apologise for the oversight. We have now incorporated line numbers in the revised manuscript.

      (2) There are a few statements in the text that I think may need clarification or might be outside the bounds of what was actually studied here. For example, in the introduction "However, mating is dispensable in Anophelines even under conditions of nutritional satiety". I am uncertain what is meant by this statement - please clarify.

      We apologise for the lack of clarity in the statement and have now deleted it since we felt it was not necessary.

      (3) Typo/Grammatical minutiae:

      (a) A small idiosyncrasy of using hyphens in compound words should also be fixed throughout. Typically, you don't hyphenate if the words are being used as a noun, as in the case: e.g. "Age affects blood feeding.". However, you would hyphenate if the two words are used as a compound adjective "Age affects blood-feeding behavior". This may not be an all-inclusive list, but here are some examples where hyphens need to either be removed or added. Some examples:

      "Nutritional state also influences other internal state outputs on blood-feeding": blood-feeding -> blood feeding

      "... the modulation of blood-feeding": blood-feeding -> blood feeding

      "For example, whether virgin females take blood-meals...": blood-meals -> blood meals

      ".... how internal and external cues shape meal-choice"-> meal choice

      "blood-meal" is often used throughout the text, but is correctly "blood meal" in the figures.

      There are many more examples throughout.

      We apologise for these errors and appreciate the reviewer’s keen eye. We have now fixed them throughout the manuscript.  

      (b) Figure 1 Caption has a typo: "co-housed males were accessed for sugar-feeding" should be "co-housed males were assessed for sugar feeding"

      We apologise for the typo and thank the reviewer for spotting it. We have now corrected this.  

      (c) It would be helpful in some other figure captions to more clearly label which statement is relevant to which part of the text. For example, in Figure 4's caption.

      "C,D. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head (C). Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected blood-fed and unfed females, as compared to that in uninjected females, analysed via qPCR (D)."

      I found re-referencing C and D at the end of their statements makes it look as thought C precedes the "Relative mRNA expression" and on a first read through, I thought the figure captions were backwards. I'd recommend reformatting here and throughout consistently to only have the figure letter precede its relevant caption information, e.g.:

      "C. Blood-feeding and sugar-feeding behaviour of females when both RYa and sNPF are knocked down in the head. D. Relative mRNA expressions of RYa and sNPF in the heads of dsRYa+dssNPF - injected bloodfed and unfed females, as compared to that in uninjected females, analysed via qPCR."

      We have now edited the legends as suggested.

      Reviewer #2 (Recommendations for the authors):

      Separately from the clarifications and limitations listed above, the authors could strengthen their study and the conclusions drawn if they could rescue the behavioural phenotype observed following knockdown of sNPF and RYamide. This could be achieved by injection of either sNPF or RYa peptide independently or combined following knockdown to validate the role of these peptides in promoting blood-feeding in An. stephensi. Additionally, the apparent (but unclear) regionalized (or tissue-specific) knockdown of sNPF and RYamide transcripts could be visualized and verified by implementing HCR in situ hyb in knockdown animals (or immunohistochemistry using antibodies specific for these two neuropeptides). 

      In a follow up of this work, we are generating mutants and peptides for these candidates and are planning to conduct exactly the experiments the reviewer suggests.

      Reviewer #3 (Recommendations for the authors):

      The loss-of-function data suggest necessity but not sufficiency. Synthetic peptide injection in non-hostseeking (blood-fed mated or juvenile) mosquitoes would provide direct evidence for peptide-induced behavioral activation. The lack of these experiments weakens the central claim of the paper that these neuropeptides directly promote blood feeding.

      As noted above, we plan to synthesise the peptide to test rescue in a mutant background and sufficiency.  

      Some of the claims about knockdown efficiency and interpretation are conflicting; the authors dismiss Hairy and Prp as candidates due to 30-35% knockdown, yet base major conclusions on sNPF and RYamide knockdowns with comparable efficiencies (25-40%). This inconsistency should be addressed, or the justification for different thresholds should be clearly stated.

      We have not defined any specific knockdown efficacy thresholds in the manuscript, as these can vary considerably between genes, and in some cases, even modest reductions can be sufficient to produce detectable phenotypes. For example, knockdown efficiencies of even as low as about 25% - 40% gave us observable phenotypes for sNPF and RYa RNAi (Figure S9B-G).

      No such phenotypes were observed for Hairy (30%) or Prp (35%) knockdowns. Either these genes are not involved in blood feeding, or the knockdown was not sufficient for these specific genes to induce phenotypes. We cannot distinguish between these scenarios. 

      The observation that knockdown animals take smaller blood meals is interesting and could reflect a downstream effect of altered host-seeking or an independent physiological change. The relationship between meal size and host-seeking behavior should be clarified.

      We agree with the reviewer that the reduced meal size observed in sNPF and RYa knockdown animals could result from their inability to seek a host or due to an independent effect on blood meal intake. Unfortunately, we did not measure host-seeking in these animals. We plan to distinguish between these possibilities using mutants in future work.

      Several figures are difficult to interpret due to cluttered labeling and poorly distinguishable color schemes. Simplifying these and improving contrast (especially for co-housed vs. virgin conditions) would enhance readability. 

      We regret that the reviewer found the figures difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition). Wherever mated females were used, we have now appended “(m)” to the annotations and consistently depicted these females with striped abdomens in all the schematics. We believe these changes will improve clarity and readability.

      The manuscript does not clearly justify the use of whole-brain RNA sequencing to identify peptides involved in metabolic or peripheral processes. Given that anticipatory feeding signals are often peripheral, the logic for brain transcriptomics should be explained.

      The reviewer is correct in pointing out that feeding signals could also emerge from peripheral tissues. Signals from these tissues – in response to both changing nutritional and reproductive states – are then integrated by the central brain to modulate feeding choices. For example, in Drosophila, increased protein intake is mediated by central brain circuitry including those in the SEZ and central complex (Munch et al., 2022; Liu et al., 2017; Goldschmidt et al., 202ti). In the context of mating, male-derived sex peptide further increases protein feeding by acting on a dedicated central brain circuitry (Walker et al., 2015). We, therefore focused on the central brain for our studies.

      The proposed model suggests brain-derived peptides initiate feeding, while gut peptides provide feedback. However, gut-specific knockdowns had no effect, undermining this hypothesis. Conversely, the authors also suggest abdominal involvement based on RNAi results. These contradictions need to be resolved into a consistent model.

      We thank the reviewer for raising this point and recognise their concern. Our reasons for invoking an involvement of the gut were two-fold:

      (1) We find increased sNPF transcript expression in the entero-endocrine cells of the midgut in blood-hungry females, which returns to baseline after a blood-meal (Fig. 4L, M).

      (2) While the abdomen-only knockdowns did not affect blood feeding, every effective head knockdown that affected blood feeding also abolished abdominal transcript levels (Fig. S9C, F). (Achieving a head-only reduction proved impossible because (i) systemic dsRNA delivery inevitably reaches the abdomen and (ii) abdominal expression of both peptides is low, leaving little dynamic range for selective manipulation.) Consequently, we can only conclude the following: 1) that brain expression is required for the behaviour, 2) that we cannot exclude a contributory role for gut-derived sNPF. We have discussed this in lines 364-371.

      The identification of candidate receptors is promising, but the manuscript would be significantly strengthened by testing whether receptor knockdowns phenocopy peptide knockdowns. Without this, it is difficult to conclude that the identified receptors mediate the behavioral effects.

      We agree that functional validation of the receptors would strengthen the evidence for sNPF and RYa-mediated control of blood feeding in An. stephensi. We selected these receptors based on sequence homology. A possibility remains that sNPF neuropeptides activate more than one receptor, each modulating a distinct circuit, as shown in the case of Drosophila Tachykinin (https://pmc.ncbi.nlm.nih.gov/articles/PMC10184743/). This will mean a systematic characterisation and knockdown of each of them to confirm their role. We are planning these experiments in the future.  

      The authors compared the percentage changes in sugar-fed and blood-fed animals under sugar-sated or sugar-starved conditions. Figure 1F should reflect what was discussed in the results.

      Perhaps this concern stems from our representation of the data in figure 1F? We have now edited the xaxis and revised its label from “choice of food” to “choice made” to better reflect what food the mosquitoes chose to take.

      For clarity, we have now also plotted the same data as stacked graphs at the bottom of Fig. 1F, which clearly shows the proportion of mosquitoes fed on each particular choice. We avoid the stacked graph as the sole representation of this data because it does not capture the variability in the data.

      Minor issues:

      (1) The authors used mosquitoes with belly stripes to indicate mated females. To be consistent, the post-oviposition females should also have belly stripes.

      We thank the reviewer for pointing this out. We have now edited all the figures as suggested.

      (2) In the first paragraph on the right column of the second page, the authors state, "Since females took blood-meals regardless of their prior sugar-feeding status and only sugar-feeding was selectively suppressed by prior sugar access." Just because the well-fed animals ate less than the starved animals does not mean their feeding behavior was suppressed.

      Perhaps there has been a misunderstanding in the experimental setup of figure 1F, probably stemming from our data representation. The experiment is a choice assay in which sugar-starved or sugar-sated females, co-housed with males, were provided simultaneous access to both blood and sugar, and were assessed for the choice made (indicated on the x-axis): both blood and sugar, blood only, sugar only, or neither. We scored females only for the presence or absence of each meal type (blood or sugar) and did not quantify the amount consumed.

      (3) The figure legend for Figure 1A and the naming convention for different experimental groups are difficult to follow. A simplified or consistently abbreviated scheme would help readers navigate the figures and text.

      We regret that the reviewer found the figure difficult to follow. We have now revised our annotations throughout the manuscript for enhanced readability. For example, “D1<sup>B”</sup> is now “D1<sup>PBM”</sup> (post-bloodmeal) and “D1<sup>O”</sup> is now “D1<sup>PO”</sup> (post-oviposition).

      (4) In the last paragraph of the Y-maze olfactory assay for host-seeking behaviour in An. stephensi in Methods, the authors state, "When testing blood-fed females, aged-matched sugar-fed females (bloodhungry) were included as positive controls where ever possible, with satisfactory results." The authors should explicitly describe what the criteria are for "satisfactory results".

      We apologise for the lack of clarity. We have now edited the statement to read:

      “When testing blood-fed females, age-matched sugar-fed females (blood-hungry) were included wherever possible as positive controls. These females consistently showed attraction to host cues, as expected.” See lines 786-790.

      (5) In the first paragraph of the dsRNA-mediated gene knockdown section in Methods, dsRNA against GFP is used as a negative control for the injection itself, but not for the potential off-target effect.

      We agree with the reviewer that dsGFP injections act as controls only for injection-related behavioural changes, and not for off-target effects of RNAi. We have now corrected the statement. See lines 919-920.

      To control for off-target effects, we could have designed multiple dsRNAs targeting different parts of a given gene. We regret not including these controls for potential off-target effects of dsRNAs injected. 

      (6) References numbers 48, 89, and 90 are not complete citations.

      We thank the reviewer for spotting these. We have now corrected these citations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      First, we thank the reviewers for the valuable and constructive reviews. Thanks to these, we believe the article has been considerably improved. We have organized our response to address points that are relevant to both reviewers first, after which we address the unique concerns of each individual reviewer separately. We briefly paraphrase each concern and provide comments for clarification, outlining the precise changes that we have made to the text.

      Common Concerns (R1 & R2):

      Can you clarify how NREM and REM sleep relate to the oneirogen hypothesis?

      Within the submission draft we tried to stay agnostic as to whether mechanistically similar replay events occur during NREM or REM sleep; however, upon a more thorough literature review, we think that there is moderately greater evidence in favor of Wake-Sleep-type replay occurring during REM sleep which is related to classical psychedelic drug mechanisms of action.

      First, we should clarify that replay has been observed during both REM and NREM sleep, and dreams have been documented during both sleep stages, though the characteristics of dreams differ across stages, with NREM dreams being more closely tied to recent episodic experience and REM dreams being more bizarre/hallucinatory (see Stickgold et al., 2001 for a review). Replay during sleep has been studied most thoroughly during NREM sharp-wave ripple events, in which significant cortical-hippocampal coupling has been observed (Ji & Wilson, 2007). However, it is critical to note that the quantification methods used to identify replay events in the hippocampal literature usually focus on identifying what we term ‘episodic replay,’ which involves a near-identical recapitulation of neural trajectories that were recently experienced during waking experimental recordings (Tingley & Peyrach, 2020). In contrast, our model focuses on ‘generative replay,’ where one expects only a statistically similar reproduction of neural activity, without any particular bias towards recent or experimentally controlled experience. This latter form of replay may look closer to the ‘reactivation’ observed in cortex by many studies (e.g. Nguyen et al., 2024), where correlation structures of neural activity similar to those observed during stimulus-driven experience are recapitulated. Under experimental conditions in which an animal is experiencing highly stereotyped activity repeatedly, over extended periods of time, these two forms of replay may be difficult to dissociate.

      Interestingly, though NREM replay has been shown to couple hippocampal and cortical activity, a similar study in waking animals administered psychedelics found hippocampal replay without any obvious coupling to cortical activity (Domenico et al., 2021). This could be because the coupling was not strong enough to produce full trajectories in the cortex (psychedelic administration did not increase ‘alpha’ enough), and that a causal manipulation of apical/basal influence in the cortex may be necessary to observe the increased coupling. Alternatively, as Reviewer 1 noted, it may be that psychedelics induce a form of hippocampus-decoupled replay, as one would expect from the REM stage of a recently proposed complementary learning systems model (Singh et al., 2022). 

      Evidence in favor of a similarity between the mechanism of action of classical psychedelics and the mechanism of action of memory consolidation/learning during REM sleep is actually quite strong. In particular, studies have shown that REM sleep increases the activity of soma-targeting parvalbumin (PV) interneurons and decreases the activity of apical dendrite-targeting somatostatin (SOM) interneurons (Niethard et al., 2021), that this shift in balance is controlled by higher-order thalamic nuclei, and that this shift in balance is critical for synaptic consolidation of both monocular deprivation effects in early visual cortex (Zhou et al., 2020) and for the consolidation of auditory fear conditioning in the dorsal prefrontal cortex (Aime et al., 2022). These last studies were not discussed in our previous text–we have added them, in addition to a more nuanced description of the evidence connecting our model to NREM and REM replay. 

      Relevant modifications: Page 4, 1st paragraph; Page 11, 1st paragraph.

      Can you explain how synaptic plasticity induced by psychedelics within your model relates to learning at a behavioral level?

      While the Wake-Sleep algorithm is a useful model for unsupervised statistical learning, it is not a model of reward or fear-based conditioning, which likely occur via different mechanisms in the brain (e.g. dopamine-dependent reinforcement learning or serotonin-dependent emotional learning). The Wake-Sleep algorithm is a ‘normative plasticity algorithm,’ that connects synaptic plasticity to the formation of structured neural representations, but it is not the case that all synaptic plasticity induced by psychedelic administration within our model should induce beneficial learning effects. According to the Wake-Sleep algorithm, plasticity at apical synapses is enhanced during the Wake phase, and plasticity at basal synapses is enhanced during the Sleep phase; under the oneirogen hypothesis, hallucinatory conditions (increased ‘alpha’) cause an increase in plasticity at both apical and basal sites. Because neural activity is in a fundamentally aberrant state when ‘alpha’ is increased, there are no theoretical guarantees that plasticity will improve performance on any objective: psychedelic-induced plasticity within our model could perhaps better be thought of as ‘noise’ that may have a positive or negative effect depending on the context.

      In particular, such ‘noise’ may be beneficial for individuals or networks whose synapses have become locked in a suboptimal local minimum. The addition of large amounts of random plasticity could allow a system to extricate itself from such local minima over subsequent learning (or with careful selection of stimuli during psychedelic experience), similar to simulated annealing optimization approaches. If our model were fully validated, this view of psychedelic-induced plasticity as ‘noise’ could have relevance for efforts to alleviate the adverse effects of PTSD, early life trauma, or sensory deprivation; it may also provide a cautionary note against repeated use of psychedelic drugs within a short time frame, as the plasticity changes induced by psychedelic administration under our model are not guaranteed to be good or useful in-and-of themselves without subsequent re-learning and compensation.

      We should also note that we have deliberately avoided connecting the oneirogen hypothesis model to fear extinction experimental results that have been observed through recordings of the hippocampus or the amygdala (Bombardi & Giovanni, 2013; Jiang et al., 2009; Kelly et al., 2024; Tiwari et al., 2024). Both regions receive extensive innervation directly from serotonergic synapses originating in the dorsal raphe nucleus, which have been shown to play an important role in emotional learning (Lesch & Waider, 2012); because classical psychedelics may play a more direct role in modulating this serotonergic innervation, it is possible that fear conditioning results (in addition to the anxiolytic effects of psychedelics) cannot be attributed to a shift in balance between apical and basal synapses induced by psychedelic administration. We have provided a more detailed review of these results in the text, as well as more clarity regarding their relation to our model.

      Relevant modifications: Page 9, final paragraph; Page 12, final paragraph.

      Reviewer 1 Concerns:

      Is it reasonable to assign a scalar parameter ‘alpha’ to the effects of classical psychedelics? And is your proposed mechanism of action unique to classical psychedelics? E.g. Could this idea also apply to kappa opioid agonists, ketamine, or the neural mechanisms of hallucination disorders?

      We have clarified that within our model ‘alpha’ is a parameter that reflects the balance between apical and basal synapses in determining the activity of neurons in the network. For the sake of simplicity we used a single ‘alpha’ parameter, but realistically, each neuron would have its own ‘alpha’ parameter, and different layers or individual neurons could be affected differentially by the administration of any particular drug; therefore, our scalar ‘alpha’ value can be thought of as a mean parameter for all neurons, disregarding heterogeneity across individual neurons.

      There are many different mechanisms that could theoretically affect this ‘alpha’ parameter, including: 5-HT2a receptor agonism, kappa opioid receptor binding, ketamine administration, or possibly the effects of genetic mutations underlying the pathophysiology of complex developmental hallucination disorders. We focused exclusively on 5-HT2a receptor agonism for this study because the mechanism is comparatively simple and extensively characterized, but similar mechanisms may well be responsible for the hallucinatory symptoms of a variety of drugs and disorders.

      Relevant modifications: Page 4, first paragraph; Page 13, first paragraph.

      Can you clarify the role of 5-HT2a receptor expression on interneurons within your model?

      While we mostly focused on the effects of 5-HT2a receptors on the apical dendrites of pyramidal neurons, these receptors are also expressed on soma-targeting parvalbumin (PV) interneurons. This expression on PV interneurons is consistent with our proposed psychedelic mechanism of action, because it could lead to a coordinated decrease in the influence of somatic and proximal dendritic inputs while increasing the influence of apical dendritic inputs. We have elaborated on this point, and moved the discussion earlier in the text.

      Relevant modifications: Page 1, 1st paragraph; Page 4, 2nd paragraph.

      Discussions of indigenous use of psychedelics over millenia may amount to over-romanticization.

      We ultimately decided to remove these discussions from the main text, as they had little bearing on the content of our work. Within the Ethics Declarations section we softened our claims from “millenia” to “centuries,” as indigenous psychedelic use over this latter period of time is well-substantiated.

      Relevant modifications: removed from introduction; modified Ethics Declarations

      You isolate the 5-HT2a agonism as the mechanism of action underlying ‘alpha’ in your model, but there exist 5-HT2a agonists that do not have hallucinatory effects (e.g. lisuride). How do you explain this?

      Lisuride has much-reduced hallucinatory effects compared to other psychedelic drugs at clinical doses (though it does indeed induce hallucinations at high doses; Marona-Lewicka et al., 2002), and we should note that serotonin (5-HT) itself is pervasive in the cortex without inducing hallucinatory effects during natural function. Similarly, MDMA is a partial agonist for 5-HT2a receptors, but it has much-reduced perceptual hallucination effects relative to classical psychedelics (Green et al., 2003) in addition to many other effects not induced by classical psychedelics.

      Therefore, while we argue that 5-HT2a agonism induces an increase in influence of apical dendritic compartments and a decrease in influence of basal/somatic compartments, and that this change induces hallucinations, we also note that there are many other factors that control whether or not hallucinations are ultimately produced, so that not all 5-HT2a agonists are hallucinogenic. There are two possible additional factors that could contribute to this phenomenon: 5-HT receptor binding affinity and cellular membrane permeability.

      Importantly, many 5-HT2a receptor agonists are also 5-HT1a receptor agonists (e.g. serotonin itself and lisuride), while MDMA has also been shown to increase serotonin, norepinephrine, and dopamine release (Green et al., 2003). While 5-HT2a receptor agonism has been shown to reduce sensory stimulus responses (Michaiel et al., 2019), 5-HT1a receptor agonism inhibits spontaneous cortical activity (Azimi et al., 2020); thus one might expect the net effect of administering serotonin or a nonselective 5-HT receptor agonist to be widespread inhibition of a circuit, as has been observed in visual cortex (Azimi et al., 2020). Therefore, selective 5-HT2a agonism is critical for the induction of hallucinations according to our model, though any intervention that jointly excites pyramidal neurons’ apical dendrites and inhibits their basal/somatic compartments across a broad enough area of cortex would be predicted to have a similar effect. Lisuride has a much higher binding affinity for 5-HT1a receptors than, for instance, LSD (Marona-Lewicka et al., 2002).

      Secondly, it has recently been shown that both the head-twitch effect (a coarse behavioral readout of hallucinations in animals) and the plasticity effects of psychedelics are abolished when administering 5-HT2a agonists that are impermeable to the cellular membrane because of high polarity, and that these effects can be rescued by temporarily rendering the cellular membrane permeable (Vargas et al., 2023). This suggests that the critical hallucinatory effects of psychedelics (apical excitation according to our model) may be mediated by intracellular 5-HT2a receptors. Notably, serotonin itself is not membrane permeable in the cortex.

      Therefore, either of these two properties could play a role in whether a given 5-HT2a agonist induces hallucinatory effects. We have provided an extended discussion of these nuances in our revision.

      Relevant modifications: Page 1, paragraph 2.

      Your model proposes that an increase in top-down influence on neural activity underlies the hallucinatory effects of psychedelics. How do you explain experimental results that show increases in bottom-up functional connectivity (either from early sensory areas or the thalamus)?

      Firstly, we should note that our proposed increase in top-down influence is a causal, biophysical property, not necessarily a statistical/correlative one. As such, we will stress that the best way to test our model is via direct intervention in cortical microcircuitry, as opposed to correlative approaches taken by most fMRI studies, which have shown mixed results with regard to this particular question. Correlative approaches can be misleading due to dense recurrent coupling in the system, and due to the coarse temporal and spatial resolution provided by noninvasive recording technologies (changes in statistical/functional connectivity do not necessarily correspond to changes in causal/mechanistic connectivity, i.e. correlation does not imply causation).

      There are two experimental results that appear to contradict our hypothesis that deserve special consideration. The first shows an increase in directional thalamic influence on the distributed cortical networks after psychedelic administration (Preller et al., 2018). To explain this, we note that this study does not distinguish between lower-order sensory thalamic nuclei (e.g. the lateral and medial geniculate nuclei receiving visual and auditory stimuli respectively) and the higher-order thalamic nuclei that participate in thalamocortical connectivity loops (Whyte et al., 2024). Subsequent more fine-grained studies have noted an increase in influence of higher order thalamic nuclei on the cortex (Pizzi et al., 2023; Gaddis et al., 2022), and in fact extensive causal intervention research has shown that classical psychedelics (and 5-HT2a agonism) decrease the influence of incoming sensory stimuli on the activity of early sensory cortical areas, indicating decoupling from the sensory thalamus (Evarts et al., 1955; Azimi et al., 2020; Michaiel et al. 2019). The increased influence of higher-order thalamic nuclei is consistent with both the cortico-striatal-thalamo-cortical (CTSC) model of psychedelic action as well as the oneirogen hypothesis, since higher-order thalamic inputs modulate the apical dendrites of pyramidal neurons in cortex (Whyte et al., 2024).

      The second experimental result notes that DMT induces traveling waves during resting state activity that propagate from early visual cortex to deeper cortical layers (Alamia et al., 2020). There are several possibilities that could explain this phenomenon: 1) it could be due to the aforementioned difficulties associated with directed functional connectivity analyses, 2) it could be due to a possible high binding affinity for DMT in the visual cortex relative to other brain areas, or 3) it could be due to increases in apical influence on activity caused by local recurrent connectivity within the visual cortex which, in the absence of sensory input, could lead to propagation of neural activity from the visual cortex to the rest of the brain. This last possibility is closest to the model proposed by (Ermentrout & Cowan, 1979), and which we believe would be best explained within our framework by a topographically connected recurrent network architecture trained on video data; a potentially fruitful direction for future research.

      Relevant modifications: Page 9, paragraph 1; Page 10, final paragraph; Page 11, final paragraph.

      Shouldn’t the hallucinations generated by your model look more ‘psychedelic,’ like those produced by the DeepDream algorithm?

      We believe that the differences in hallucination visualization quality between our Wake-Sleep-trained models and DeepDream are mostly due to differences in the scale and power of the models used across these two studies. We are confident that with more resources (and potentially theoretical innovations to improve the Wake-Sleep algorithm’s performance) the produced hallucination visualizations could become more realistic.

      We note that more powerful generative models trained with backpropagation are able to produce surreal images of comparable quality (Rezende et al., 2014; Goodfellow et al., 2020; Vahdat & Kautz, 2020), though these have not yet been used as a model of psychedelic hallucinations. However, the DeepDream model operates on top of large pretrained image processing models, and does not provide an biologically mechanistic/testable interpretation of its hallucination effects. When training smaller models with a local synaptic plasticity rule (as opposed to backpropagation), the hallucination effects are less visually striking due to the reduced quality of our trained generative model, though they are still strongly tied to the statistics of sensory inputs, as quantified by our correlation similarity metric (Fig. 5b).

      To demonstrate that our proposed hallucination mechanism is capable of producing more complex hallucinations in larger, more powerful models, we employed our same hallucination generation mechanism in a pretrained Very Deep Variational Autoencoder (VDVAE) (Child et al., 2021), which is a hierarchical variational autoencoder with a nearly identical structure compared to our Wake-Sleep-trained networks, with both a bottom-up inference pathway and a top-down generative pathway that maps cleanly onto our multicompartmental neuron model. VDVAEs are trained on the same objective function as our Wake-Sleep-trained networks, but using the backpropagation algorithm. The VDVAE models were able to generate much more complex hallucinations (emergence of complex geometric patterns, smooth deformations of objects and faces), whose complexity arguably exceeds those produced by the DeepDream algorithm. Therefore while the VDVAEs are less biologically realistic (they do not learn via local synaptic plasticity), they function as a valuable high-level model of hallucination generation that complements our Wake-Sleep-trained approach. As further validation, we were also able to replicate our key results and testable predictions with these models.

      Relevant modifications: Results section “Modeling hallucinations in large-scale pretrained networks”; Figure 6, S7, S8; Page 12, paragraph 3; Methods section “Generating hallucinations in hierarchical variational autoencoders.”

      Your model assumes domination by entirely bottom-up activity during the ‘wake’ phase, and domination entirely by top-down activity during ‘sleep,’ despite experimental evidence indicating that a mixture of top-down and bottom-up inputs influence neural activity during both stages in the brain. How do you explain this?

      Our use of the Wake-Sleep algorithm, in which top-down inputs (Sleep) or bottom-up inputs (Wake) dominate network activity is an over-simplification made within our model for computational and theoretical reasons. Models that receive a mixture of top-down and bottom-up inputs during ‘Wake’ activity do exist (in particular the closely related Boltzmann machine (Ackley et al., 1985)), but these models are considerably more computationally costly to train due to a need to run extensive recurrent network relaxation dynamics for each input stimulus. Further, these models do not generalize as cleanly to processing temporal inputs. For this reason, we focused on the Wake-Sleep algorithm, at the cost of some biological realism, though we note that our model should certainly be extended to support mixed apical-basal waking regimes. We have added a discussion of this in our ‘Model Limitations’ section.

      Relevant modifications: Page 12, paragraph 4.

      Your model proposes that 5-HT2a agonism enhances glutamatergic transmission, but this is not true in the hippocampus, which shows decreases in glutamate after psychedelic administration.

      We should note that our model suggests only compartment specific increases in glutamatergic transmission; as such, our model does not predict any particular directionality for measures of glutamatergic transmission that includes signaling at both apical and basal compartments in aggregate, as was measured in the provided study (Mason et al., 2020).

      You claim that your model is consistent with the Entropic Brain theory, but you report increases in variance, not entropy. In fact, it has been shown that variance decreases while entropy increases under psychedelic administration. How do you explain this discrepancy?

      Unfortunately, ‘entropy’ and ‘variance’ are heavily overloaded terms in the noninvasive imaging literature, and the particularities of the method employed can exert a strong influence on the reported effects. The reduction in variance reported by (Carhart-Harris et al., 2016) is a very particular measure: they are reporting the variance of resting state synchronous activity, averaged across a functional subnetwork that spans many voxels; as such, the reduction in variance in this case is a reduction in broad, synchronous activity. We do not have any resting state synchronous activity in our network due to the simplified nature of our model (particularly an absence of recurrent temporal dynamics), so we see no reduction in variance in our model due to these effects.

      Other studies estimate ‘entropy’ or network state disorder via three different methods that we have been able to identify. 1) (Carhart-Harris et al., 2014) uses a different measure of variance: in this case, they subtract out synchronous activity within functional subnetworks, and calculate variability across units in the network. This measure reports increases in variance (Fig. 6), and is the closest measure to the one we employ in this study. 2) (Lebedev et al., 2016) uses sample entropy, which is a measure of temporal sequence predictability. It is specifically designed to disregard highly predictable signals, and so one might imagine that it is a measure that is robust to shared synchronous activity (e.g. resting state oscillations). 3) (Mediano et al., 2024) uses Lempel-Ziv complexity, which is, similar to sample entropy, a measure of sequence diversity; in this case the signal is binarized before calculation, which makes this method considerably different from ours. All three of the preceding methods report increases in sequence diversity, in agreement with our quantification method. Our strongest explanation for why the variance calculation in (Carhart-Harris et al., 2016) produces a variance reduction is therefore due to a reduction in low-rank synchronous activity in subnetworks during resting state.

      As for whether the entropy increase is meaningful: we share Reviewer 1’s concern that increases in entropy could simply be due to a higher degree of cognitive engagement during resting state recordings, due to the presence of sensory hallucinations or due to an inability to fall asleep. This could explain why entropy increases are much more minimal relative to non-hallucinating conditions during audiovisual task performance (Siegel et al., 2024; Mediano et al., 2024). However, we can say that our model is consistent with the Entropic Brain Theory without including any form of ‘cognitive processing’: we observe increases in variability during resting state in our model, but we observe highly similar distributions of activity when averaging over a wide variety of sensory stimulus presentations (Fig. 5b-c). This is because variability in our model is not due to unstructured noise: it corresponds to an exploration of network states that would ordinarily be visited by some stimulus. Therefore, when averaging across a wide variety of stimuli, the distribution of network states under hallucinating or non-hallucinating conditions should be highly similar.

      One final point of clarification: here we are distinguishing Entropic Brain Theory from the REBUS model–the oneirogen hypothesis is consistent with the increase in entropy observed experimentally, but in our model this entropy increase is not due to increased influence of bottom-up inputs (it is due instead to an increase in top-down influence). Therefore, one could view the oneirogen hypothesis as consistent with EBT, but inconsistent with REBUS.

      Relevant modifications: Page 10, paragraph 1.

      You relate your plasticity rule to behavioral-timescale plasticity (BTSP) in the hippocampus, but plasticity has been shown to be reduced in the hippocampus after psychedelic administration. Could you elaborate on this connection?

      When we were establishing a connection between our ‘Wake-Sleep’ plasticity rule and BTSP learning, the intended connection was exclusively to the mathematical form of the plasticity rule, in which activity in the apical dendrites of pyramidal neurons functions as an instructive signal for plasticity in basal synapses (and vice versa): we will clarify this in the text. Similarly, we point out that such a plasticity rule tends to result in correlated tuning between apical and basal dendritic compartments, which has been observed in hippocampus and cortex: this is intended as a sanity check of our mapping of the Wake-Sleep algorithm to cortical microcircuitry, and has limited further bearing on the effects of psychedelics specifically.

      Reduction in plasticity in the hippocampus after psychedelic administration could be due to a complementary learning systems-type model, in which the hippocampus becomes partly decoupled from the cortex during REM sleep (Singh et al., 2022); were this to be the case, it would not be incompatible with our model, which is mostly focused on the cortex. Notably, potentiating 5HT-2a receptors in the ventral hippocampus does not induce the head-twitch response, though it does produce anxiolytic effects (Tiwari et al., 2024), indicating that the hallucinatory and anxiolytic effects of classical psychedelics may be partly decoupled. 

      Reviewer 2 Concerns:

      Could you provide visualizations of the ‘ripple’ phenomenon that you’re referring to?

      In our revised submission, ‘ripple’ phenomena are now visible in two places: Fig 2c-d, and Fig 6 (rows 2 and 3). Because the VDVAE models used to generate Figure 6 produce higher quality generated images, the ripples appearing in these plots are likely more prototypical, but it is not easy to evaluate the quality of these visualizations relative to subjective hallucination phenomena.

      Could you provide a more nuanced description of alternative roles for top-down feedback, beyond being used exclusively for learning as depicted in your model?

      For the sake of simplicity, we only treat top-down inputs in our model as a source of an instructive teaching signal, the originator of generative replay events during the Sleep phase, and as the mechanism of hallucination generation. However, as discussed in a response to a previous question, in the cortex pyramidal neurons receive and respond to a mixture of top-down and bottom-up processing.

      There are a variety of theories for what role top-down inputs could play in determining network activity. To name several, top-down input could function as: 1) a denoising/pattern completion signal (Kadkhodaie & Simoncelli, 2021), 2) a feedback control signal (Podlaski & Machens, 2020), 3) an attention signal (Lindsay, 2020), 4) ordinary inputs for dynamic recurrent processing that play no specialized role distinct from bottom-up or lateral inputs except to provide inputs from higher-order association areas or other sensory modalities (Kar et al., 2019; Tugsbayar et al., 2025). Though our model does not include these features, they are perfectly consistent with our approach.

      In particular, denoising/pattern completion signals in the predictive coding framework (closely related to the Wake-Sleep algorithm) also play a role as an instructive learning signal (Salvatori et al., 2021); and top-down control signals can play a similar role in some models (Gilra & Gerstner, 2017; Meulemans et al., 2021). Thus, options 1 and 2 are heavily overlapping with our approach, and are a natural consequence of many biologically plausible learning algorithms that minimize a variational free energy loss (Rao & Ballard, 1997; Ackley et al., 1985). Similarly, top-down attentional signals can exist alongside top-down learning signals, and some models have argued that such signals can be heavily overlapping or mutually interchangeable (Roelfsema & van Ooyen, 2005). Lastly, generic recurrent connectivity (from any source) can be incorporated into the Wake-Sleep algorithm (Dayan & Hinton, 1996), though we avoided doing this in the present study due to an absence of empirical architecture exploration in the literature and the computational complexity associated with training on time series data.

      To conclude, there are a variety of alternative functions proposed for top-down inputs onto pyramidal neurons in the cortex, and we view these additional features as mutually compatible with our approach; for simplicity we did not include them in our Wake-Sleep-trained model, but we believe that these features are unlikely to interfere with our testable predictions or empirical results. In fact, the pretrained VDVAE models that we worked with do include top-down influence during the Wake-stage inference process, and these models recapitulated our key results and testable predictions (Fig. S8).

      Relevant modifications: Fig. S8; Page 12, paragraph 4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the editor and reviewers for their constructive questions, valuable feedback, and for approving our manuscript. We truly appreciate the opportunity to improve our work based on their insightful comments. Before addressing the editor’s and each referee’s remarks individually, we provide below a point-by-point response summarizing the revisions made.

      Duplication of control groups across experiments

      We appreciate the reviewers’ concern regarding the potential duplication of control groups. In the revised manuscript, we have explicitly clarified that independent groups of control mice were used for each experiment. These details are now clearly indicated in the Materials and Methods section to avoid any ambiguity and to reinforce the rigor of our experimental design (Page 15, Line 453-455): “Furthermore, knockout animals and those treated with pharmacological inhibitors or neutralizing antibodies shared the same control groups (chow and HFCD), as required by the animal ethics committee.”

      Validation of the MASLD model

      To strengthen the metabolic characterization of our MASLD model, we have now included additional parameters, including liver weight, Picrosirius staining and blood glucose measurements. These data are presented as new graphs in the revised manuscript and support the metabolic relevance of the HFCD diet model (Figure Suplementary S1). The corresponding description has been added to the Results section (Page 5, Lines 116-117) as follows: “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C)”

      Assessment of liver injury in RagKO and anti-NK1.1 mice

      We fully agree that assessment of liver injury is essential for these models. For mice treated with antiNK1.1, ALT levels are shown in Figure 4G, confirming increased liver injury after treatment. Regarding Rag⁻/⁻ mice, the animals exhibit exacerbation of liver injury when fed a HFCD diet and challenged with LPS (Page 7, Lines 183–184). The corresponding description has been added to the Results section (Page 7, Lines 175-176) as follows: “Interestingly, Rag1-deficient animals under the HFCD remained susceptible to the LPS challenge (Fig. 4C) with exacerbation of liver injury (Fig. 4D) ”

      Discussion of limitations

      We have expanded the Discussion section to provide a more comprehensive and balanced perspective on the limitations of our model and experimental approach (Page 13-14, Lines 401–414) “Our study presents several limitations that should be acknowledged and discussed. First, we cannot entirely rule out the possibility that our mice deficient in pro-inflammatory components exhibit reduced responsiveness to LPS. However, our ex vivo analyses using splenocytes from these animals revealed a preserved cytokine production following LPS stimulation. These results suggest that the in vivo differences observed are primarily driven by the MAFLD condition rather than by intrinsic defects in LPS sensitivity. Second, the absence of publicly available single-cell RNA-seq datasets from MAFLD subjects under endotoxemic or septic conditions limited our ability to perform direct translational comparisons. To overcome this, we analyzed existing MAFLD patients and experimental MAFLD datasets, which consistently demonstrated upregulation of IFN-y and TNF-α inflammatory pathways in MALFD. In line with these findings, our murine model revealed TNF-α⁺ myeloid and IFN-y⁺ NK cell populations, thereby reinforcing the validity and translational relevance of our results.”. This revision highlights the constraints of the MASLD model, the inherent variability among in vivo experiments, and the interpretative limitations related to immunodeficient mouse strains.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In Figure 4 the authors are showing the number of IFN+ positive CD4, CD8, and NK 1.1+ cells. Could they show from total IFNg production, how much it goes specifically on NK cells and how much on other cell populations since NK1.1 is NK but also NKT and gamma delta T cell marker? Also, in Figure 2E the authors see a substantial increase in IFNg signal in T cells.

      While we did not specifically assess IFNγ production in NKT cells or other minor populations, our data indicate that the NK1.1+CD3+ cells (NKT cells) cited in Page 7, Lines  188-192 were essentially absent in the liver tissue of LPS-challenged animals, as shown in Supplementary Figures 3C and S10. The corresponding description has been added to the Results section (Page 7, Lines 188-192) as follows: “We observed that the number of NK cells increased in the liver tissue of PBS-treated MAFLD mice compared with mice fed a control diet (Fig. 4E). LPS challenge increased the accumulation of NK1.1+CD3− NK cells in the liver tissue of MAFLD mice and the absence of NK1.1+CD3+ NKT cells (Fig. S3C and 4E)”.

      This absence was consistent across all experimental conditions, corroborating our focus on NK1.1+CD3− cells as the primary source of NK1.1-associated IFNγ production. Furthermore, data demonstrated in Figure 2E illustrate the presence of IFNγ primarily in NK cells. Therefore, the observed IFNγ signal, attributed to NK1.1+ cells, predominantly reflects conventional NK cells, with minimal contribution from NKT or γδ T cells.

      (2) In Figure 4C, the authors state that the results suggest that T and B cells do not contribute to susceptibility to LPS challenge. However, they observe a drop in survival compared to chow+LPS. Are the authors certain there is no statistical significance there?

      The observed decrease in survival is consistent with our expectations, as T and B cells are not the primary source of interferon-gamma (IFNγ) in this context. Even in their absence, animals remain susceptible to LPS challenge due to the presence of other IFNγ-producing cells that drive the observed lethality. We have carefully re-examined the statistical analysis and confirm that it was correctly performed.  

      (3) Since the survival curve and rate are exactly the same (60%) in Figures 3F, 3G, 4C, 4F, 5G, and 5H I would just like to double-check that the authors used different controls for each experiment.

      The number of mice used in each experiment was carefully determined to ensure sufficient statistical power while fully complying with the limits established by our institutional Animal Ethics Committee. To minimize animal use, the same control group was shared across multiple survival experiments. Despite using shared controls, the total number of animals per experimental group was adequate to produce robust and reproducible survival outcomes. All groups were properly randomized, and the shared control data were rigorously incorporated into statistical analyses. This strategy allowed us to maintain both ethical standards and the scientific rigor of our findings.

      (4) In Figure 5 the authors are saying that it is neutrophils but not monocytes mediate susceptibility of animals with NAFLD to endotoxemia. However, CXCR2i depletion and CCR2 knock out mice affect both monocytes/macrophages and neutrophils. And in Figures 5E, 5G, and 5H they see that a) LPS+CXCR2i decreases liver damage more than LPS+anti Ly6G, b) HFCD mice challenged with LPS and treated with anti-LY6G do not rescue survival to levels of CHOW LPS and c) anti Ly6G treatment helps less than CXCR2i. Therefore, from both knock out mice and depletion experiments the authors can conclude that most likely monocytes (but potentially also other cells) together with neutrophils are substantial for the development of endotoxemic shock in choline-deficient high-fat diet model.

      While neutrophils express CCR2, our data clearly show that CCR2 deficiency does not impair neutrophil migration, as demonstrated in Supplemental Figures 5A and 5B (added to the manuscript, page 8, lines 213–217). The corresponding description has been added to the Results section (Page 8, Lines 213217) as follows: ``Interestingly, animals deficient in monocyte migration (CCR2-/-) showed a high mortality rate compared to wild type after LPS challenge and neutrophil migration is not altered (Fig. 5SA and Fig. 5SB)``, In contrast, CCR2 deficiency primarily affects monocyte recruitment, yet in our experimental conditions, monocyte depletion or CCR2 knockout did not significantly alter the severity of endotoxemic shock, indicating that monocytes play a minimal role in mediating susceptibility in HFCD-fed mice.

      To specifically investigate neutrophils, we used pharmacological blockade of CXCR2 to inhibit migration and antibody-mediated neutrophil depletion. Both approaches have consistently demonstrated that neutrophils are critical factors in endotoxemic shock.

      These findings support our conclusion that neutrophils are the primary cellular contributors to susceptibility in HFCD-fed mice during endotoxemia, with monocytes making a negligible contribution under the tested conditions.

      (6) In Figure 6A (but also others with PD-L1) did the authors do isotype control? And can they show how much of PD1+ population goes on neutrophils, and how much on all the other populations?

      To address this issue, we performed additional analyses to assess the distribution of PD-L1 expression on CD45+CD11B+ leukocytes. These new results, detailed on Page 9, lines 245-250, and now presented in Supplemental Figure 6, demonstrate that PD-L1 expression is predominantly enriched in neutrophils compared to other immune subsets. This observation further reinforces our conclusion that neutrophils represent a major source of PD-L1 in our experimental model.

      To ensure the robustness of these findings, we also included FMO controls for PD-L1 staining in the newly added Supplemental Figure S6. These controls validate the specificity of our gating strategy and confirm the reliability of the detected PD-L1 signal. The corresponding description has been added to the Results section (Page 9, Lines 245-250) as follows: ``First, we observed that only the MAFLD diet caused a significant increase in PD-L1 expression in CD45+CD11b+ leukocytes after LPS challenge (Fig. S6C). We observed that within this population, neutrophils predominate in their expression when compared to monocytes (Fig. 6SA, Fig. 6SB, and Fig. 6SD). Furthermore, PD-L+1 neutrophils showed an exacerbated migration of PD-L1+ neutrophils towards the liver (Fig. 6A and 6B)”

      (7) In Figure 6D it is interesting that there is not an increase in PD-L1+ neutrophils in LPS HFCD IFNg+/+ mice in comparison to LPS chow IFNg+/+ mice, since those should be like WT mice (Figure 6A going from 50% to 97%) and so an increase should be seen?

      The apparent difference between Figures 6A and 6D likely reflects inter-experimental variability rather than a biological discrepancy. Although the absolute percentages of PD-L1⁺ neutrophils varied slightly among independent experiments, the overall phenotype and trend were consistently maintained namely, that PD-L1 expression on neutrophils is enhanced in response to LPS stimulation and modulated by IFNγ signaling. Thus, the data shown in Figure 6D are representative of this consistent phenotype despite minor quantitative variation.

      (8) In Figure 7 do the authors have isotype control for TNFa because gating seems a bit random so an isotype control graph would help a lot as supplementary information, in order to make the figure more persuasive

      To address the concern regarding gating in Figure 7, we have included the FMO showing TNFα as a histogram Supplementary Figure 8gG. These control reaffirm the accuracy and reliability of our gating strategy for TNFα, further supporting the robustness of our data. The corresponding description has been added to the Results section (Page 9, Lines 272-274) as follows:`` We observed an exacerbated TNF-α expression by PD-L1+ neutrophils from MAFLD when compared to control chow animals (Fig. 7A, Fig. 7B, Fig. 7D, and Fig8SG).

      (9) Figure 6C IFNg+/+ mice on CHOW +LPS is same as Figure 8E mice chow +LPS but just with different numbers. Can the authors explain this?

      Although the data points in Figures 6C and 8E may appear similar, we confirm that they originate from entirely independent experiments and represent distinct datasets. To enhance clarity and avoid any potential confusion, we have adjusted the figure presentation and sizing in the revised manuscript. These changes make it clear that the datasets, while comparable, are derived from separate experimental replicates.

      (10) Figure 1E chow B6+LPS is the same as Figure 5D B6+LPS but should they be different since those should be two different experiments?

      We confirm that Figures 1E and 5D correspond to data obtained from independent experiments. Although the experimental conditions were similar, each dataset was generated and analyzed separately to ensure the reproducibility and robustness of our results.

      Reviewer #2 (Recommendations for the authors):

      (1) Why did you look at kidney injury in Figure 1D? I think this should be explained a little.

      We assessed kidney injury alongside ALT, a marker of liver damage, because both the liver and kidneys are among the primary organs affected during sepsis and endotoxemia. This rationale has been added to the manuscript (page 5, lines 129–131): “Remarkably, compared to the Chow group, HFCD mice exposed to LPS did not show greater changes in other organs commonly affected by endotoxemia, such as the kidneys (Figure 1D).” By evaluating markers of injury in both organs, we aimed to determine whether our physiopathological condition was liver-specific or indicative of broader systemic injury.

      (2) I know Figure 2C isn't your data, but why are there so few NK cells, considering NK cells are a resident liver cell type? Doesn't that also bring into question some of your data if there are so few NK cells? And the IFNG expression (2E) looks to mostly come from T-cells (CD8?).

      The data shown in Figure 2C were reanalyzed from a separate NAFLD model based on a 60% high-fat diet. Although this model differs from ours, the observed low number of NK cells is consistent with expectations for animals subjected solely to a hyperlipidic diet, which primarily provides an inflammatory stimulus that promotes recruitment rather than maintaining high baseline NK cell numbers.

      In our experimental model, these observations align with published data. Specifically, liver tissue from NAFLD animals typically exhibits low baseline NK cell numbers, but upon LPS challenge, there is a marked increase in NK cell recruitment to the liver. This dynamic illustrates the interplay between dietinduced inflammation and immune cell recruitment in our experimental context and supports the interpretation of our IFNγ data.

      (3) In your methods, I think you didn't explain something. You said LPS was administered to 56 week old mice, but that HFCD diet was started in 5-6 week old mice and lasted 2 weeks, then LPS was administered. So LPS administration happened when the mice were 7-8 weeks old, right?

      We thank the reviewer for pointing out this inconsistency in our Methods section. The reviewer is correct: the HFCD diet was initiated in 5–6-week-old mice, and LPS was administered after 2 weeks on the diet, such that LPS challenge occurred when the mice were 7–8 weeks old.

      We have revised the Methods section (add page 15-16, lines 474–480).  to clarify this timeline and ensure it is accurately described in the manuscript. The corresponding description has been added to the Materials and Methods section (Page 14, Lines 436-442) as follows: “Lipopolysaccharide (LPS; Escherichia coli (O111:B4), L2630, Sigma-Aldrich, St. Louis, MO, USA) was administered intraperitoneally (i.p.; 10 mg/kg) in C57BL/6, CCR2 -/-, IFN-/-, and TNFR1R2 -/- mice. The HFCD was initiated in 5–6 week-old mice, and LPS was administered after 2 weeks on the diet, meaning that LPS administration occurred when the mice were 7–8 weeks old, with body weights ranging from 22 to 26 g. LPS was previously solubilized in sterile saline and frozen at -70°C. The animals were euthanized 6 hours after LPS administration”.

      (4) Throughout the manuscript, I would consider changing the term NAFLD to something else. I think HFCD diet is a closer model to NASH, so there needs to be some discussion on that. And the field is changing these terms, so NAFLD is now MASLD and NASH is now MASH.

      We appreciate the reviewer’s comment regarding the terminology and disease classification. In our experimental conditions, the animals were subjected to a high-fat, choline-deficient (HFCD) diet for only two weeks, a period considered very early in the progression of diet-induced liver disease. At this stage, histological analysis revealed lipid accumulation in hepatocytes without evidence of hepatocellular injury, inflammation, or fibrosis. Therefore, our model more closely resembles the metabolic-associated fatty liver disease (MAFLD, formerly NAFLD) stage rather than the more advanced metabolic-associated steatohepatitis (MASH, formerly NASH).

      Indeed, prolonged exposure to HFCD diets, typically 8 to 16 weeks, is required to induce the inflammatory and fibrotic features characteristic of MASH. Since our objective was to study the initial metabolic and immune alterations preceding overt liver injury, we believe that using the term MAFLD more accurately reflects the pathological stage represented in our model. Accordingly, we have revised the text to align with the updated nomenclature and disease context.

      (6) I am concerned about over interpretation of the publicly available RNA-seq data in Figure 2. This data comes from human NAFLD patients with unknown endotoxemia and mouse models using a traditional high-fat diet model. So it is hard to compare these very disparate datasets to yours. Also, if these datasets have elevated IFNG, why does your model require LPS injection?

      We thank the reviewer for their thoughtful comments regarding the interpretation of the RNA-seq data presented in Figure 2. We would like to clarify that the human NAFLD datasets referenced in our study do not specifically include patients with endotoxemia; rather, they focus on individuals with NAFLD alone.

      Comparing data from human and murine MAFLD models, we observed that NK cells, T cells, and neutrophils are present and contribute to the hepatic inflammatory environment. Our reanalysis indicates that the elevations of IFNγ and TNF in NAFLD are primarily derived from NK cells, T cells, and myeloid cells, respectively.

      In our experimental model, LPS administration was used to evaluate whether these immune populations particularly NK cells are further potentiated under a hyperinflammatory state, leading to exacerbated IFNγ production. This approach allows us to determine whether increased IFNγ contributes to worsening outcomes in NAFLD, providing mechanistic insights that cannot be obtained from static human or traditional mouse datasets alone.

      (7) The zoom-ins for the histology (for example, Figure 1E) don't look right compared to the dotted square. The shape and area expanded don't match. And the cells in the zoom-in don't look exactly the same either.

      We have thoroughly re-examined the histological sections and the corresponding zoom-ins, including the example in Figure 1E. Upon verification, we confirm that the zoom-ins accurately represent the highlighted areas indicated by the dotted squares. The apparent discrepancies in shape or cellular appearance are likely due to minor differences in orientation or cropping during figure preparation. Nevertheless, the content and regions depicted are consistent with the original sections.  

      (8) Did the authors measure myeloid infiltration in the CCR2-/- mice? Did you measure Neutrophil infiltration in the TNF-Receptor KO mice?

      Analysis of CD45+ cell migration in CCR2 knockout mice, as shown in Supplemental Figure 5C and 5D, demonstrates that the absence of CCR2 does not impair overall leukocyte migration. Similarly, assessment of neutrophil migration in TNF receptor (TNFR1/2) knockout mice, presented in Supplemental Figure 8A, shows that neutrophil trafficking is not affected in these animals. These results indicate that the respective knockouts do not compromise the migration of the analyzed immune populations, supporting the interpretations presented in our study.

      (9) Regarding Methods for RNA-seq Analysis. Was the Mitochondrial percentage cutoff 0.8%, because that seems low. And was there not a Padj or FDR cutoff for the differential expression?

      The mitochondrial percentage in our scRNA-seq analysis reflects the proportion of mitochondrial gene expression per cell, which serves as a quality control metric. A low mitochondrial gene expression percentage, such as the 0.8% cutoff used here, is indicative of highly viable cells.

      For differential gene expression analysis, we employed the FindMarkers function in Seurat with standard parameters: adjusted p-value (Padj) < 0.05 and log2 fold change > 0.25 for upregulated genes, and adjusted p-value < 0.05 with log2 fold change < -0.25 for downregulated genes. These thresholds ensure robust identification of differentially expressed genes while balancing sensitivity and specificity.

      (10) Regarding Methods for Flow Cytometry. How were IFNG and TNF staining performed? Was this an intracellular stain? Did you need to block secretion? TNF and IFNG antibodies have the same fluorophore (PE), so were these stainings and analyses performed separately?

      Six hours after LPS challenge, non-parenchymal liver cells were isolated using Percoll gradient centrifugation. Because the animals were in a hyperinflammatory state induced by LPS, no in vitro stimulation was performed; all staining was carried out immediately after cell isolation. Detection of IFNγ and TNF was performed via intracellular staining using the Foxp3 staining kit (eBioscience). Due to both antibodies being conjugated to PE, IFN-γ and TNF-α staining and analyses were conducted in separate experiments. These distinct staining protocols and analyses are detailed in Supplemental Figures 10 and 11. The corresponding description has been added to the Materials and Methods section (Page 16, Lines 490-493) as follows: ``As animals were already in a hyperinflammatory state, no additional in vitro stimulation was required. Intracellular detection of IFN-γ and TNF-α was conducted using the Foxp3 staining kit (eBioscience). Since both antibodies were conjugated to PE, staining and analyses were performed in separate experiments``

      Reviewer #3 (Recommendations for the authors):

      (1) Achieving an NAFLD model/disease is the starting point of this study. I understand that a two-week HFCD diet period was applied due to the decrease in lymphocyte numbers. Was it enough to initiate NAFLD then? Or is it a milder metabolic disease? Which parameters have been evaluated to accept this model as a NAFLD model?

      Indeed, the two-week HFCD diet induces an early-stage form of NAFLD, characterized by initial fat accumulation in the liver without significant hepatic injury. While this represents a milder metabolic phenotype, it is sufficient to study the inflammatory and immune responses associated with NAFLD. To validate this model, we assessed multiple parameters: liver weight, blood glucose levels, and collagen deposition. These measurements confirmed the presence of early-stage NAFLD features in the animals, providing a relevant and reliable context for investigating susceptibility to endotoxemia and immune cell dynamics. They are shown in Figure Suplementary 1 and the text was included in the manuscript (Page 5, Lines 116-117): “Mice fed HFCD showed no increase in liver weight and collagen deposition as evidenced by Picrosirius staining (Fig. S1A and Fig. S1C) ”.

      (2) It is true that the CD274 gene (encoding PD-L1) and the IFNGR2 gene, corresponding to the IFNγ receptor, are among the upregulated genes when authors analyzed the publicly available RNAseq data but they are not the most significantly elevated genes. What is the reasoning behind this cherrypicking? Why are other high DEGs not analyzed but these two are analyzed?

      We highlighted the expression of the IFN-γ receptor (IFNGR2) and CD274 (encoding PD-L1) in the publicly available RNA-seq data to align and corroborate these findings with the key results observed later in our study. To avoid redundancy, we chose to present these genes in the initial figures as they are directly relevant to the subsequent analyses. Regarding the broader analysis of human RNA-seq data, our primary objective was to identify enriched biological processes and pathways, which served as a foundation for the focus and direction of this study.

      (3) Figures 3C-3G: I understand that IFNg-/- and NFR1R2a-/- mice are not showing elevated liver damage but it may simply be because of the non-responsiveness to the LPS challenge. I suggest using a different challenge or recovery experiments with the cytokines to show that the challenge is successful and results are caused by NAFLD, truly. The same goes for Figure 6: Looking at Figure 6D one may think that IFNg deficiency alters the LPS response independent of the diet condition (or NAFLD condition).

      We appreciate the reviewer’s insightful comment and fully understand the concern regarding the potential non-responsiveness of IFN-γ⁻/⁻ and TNFR1R2a⁻/⁻ mice to the LPS challenge. To address this point and confirm that these knockout animals are indeed responsive to LPS stimulation, we conducted an additional set of ex vivo experiments.

      Specifically, WT and cytokine-deficient (IFN-γ⁻/⁻) mice were fed either Chow or HFCD for two weeks, after which spleens were collected, and splenocytes were challenged in vitro with LPS. We then quantified TNF, IFN, and IL-6 production to confirm that these mice are capable of mounting cytokine responses upon LPS stimulation.

      Due to current breeding limitations and a temporary issue in colony maintenance of TNF-deficient mice, we were unable to include TNFR1R2a⁻/⁻ animals in this additional experiment. Nevertheless, we prioritized performing the analysis with the available knockout line to avoid leaving this important point unaddressed.

      These additional data demonstrate that IFN-γ-deficient mice remain responsive to LPS, reinforcing that the differences observed in vivo are related to the NAFLD condition rather than a lack of LPS responsiveness.

      (4) Figure 1 vs Figure 4: Rag-/- mice seem more susceptible to LPS-derived death even after normal conditions. But If I compare the survival data between Figure 1 and Figure 4, Rag-/- HFCD diet mice seem to be doing better than wt mice after LPS treatment. (1 day survival vs 2 days survival). How do you explain these different outcomes?

      We thank the reviewer for this insightful question regarding the survival data in Figures 1 and 4. Although there is a one-day difference in survival outcomes, Rag-/- mice consistently exhibit increased susceptibility to LPS-induced mortality can influence the exact survival timing. Nonetheless, across all experiments, Rag-/- mice display a reproducible phenotype of heightened sensitivity to LPS challenge, which is supported by multiple independent observations in our study.

      (5) How do you explain Figure 4J in connection to the observation presented with Figure 7: TNFa tissue levels, even though significant, seem very similar between the conditions?

      We would like to clarify that the animals in this study are in a metabolic syndrome state, with early-stage NAFLD characterized by hepatic fat accumulation without significant tissue injury, as shown in Figure 1C.

      Under these conditions, the LPS challenge triggers an exacerbated inflammatory response, leading to increased secretion of IFN-γ and TNF-α, primarily from NK cells and neutrophils. While TNFα levels may appear visually similar across conditions, the HFCD mice exhibit a heightened predisposition for an amplified immune response compared to chow-fed mice. This difference is consistent with the functional outcomes observed in our study and highlights the diet-specific sensitization of the immune system.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:  

      Reviewer #1 (Public review):  

      Summary:  

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.  

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.  

      Strengths:  

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      A recommendation should be added on when or under which conditions to use this pipeline. 

      We thank the reviewer for this valuable feedback, we added the text in the revised version, ines 418 to 474. “In general, the pipeline is applicable to any tissue, but it is particularly useful for large and dense 3D samples—such as organoids, embryos, explants, spheroids, or tumors—that are typically composed of multiple cell layers and have a thickness greater than 50 µm”.

      “The processing and analysis pipeline are compatible with any type of 3D imaging data (e.g. confocal, 2 photon, light-sheet, live or fixed)”.

      “Spectral unmixing to remove signal cross-talk of multiple fluorescent targets is typically more relevant in two-photon imaging due to the broader excitation spectra of fluorophores compared to single-photon imaging. In confocal or light-sheet microscopy, alternating excitation wavelengths often circumvents the need for unmixing. Spectral decomposition performs even better with true spectral detectors; however, these are usually not non-descanned detectors, which are more appropriate for deep tissue imaging. Our approach demonstrates that simultaneous cross-talk-free four-color two-photon imaging can be achieved in dense 3D specimen with four non-descanned detectors and co-excitation by just two laser lines. Depending on the dispersion in optically dense samples, depth-dependent apparent emission spectra need to be considered”.

      “Nuclei segmentation using our trained StarDist3D model is applicable to any system under two conditions: (1) the nuclei exhibit a star-convex shape, as required by the StarDist architecture, and (2) the image resolution is sufficient in XYZ to allow resampling. The exact sampling required is object- and system-dependent, but the goal is to achieve nearly isotropic objects with diameters of approximately 15 pixels while maintaining image quality. In practice, images containing objects that are natively close to or larger than 15 pixels in diameter should segment well after resampling. Conversely, images with objects that are significantly smaller along one or more dimensions will require careful inspection of the segmentation results”.

      “Normalization is broadly applicable to multicolor data when at least one channel is expected to be ubiquitously expressed within its domain. Wavelength-dependent correction requires experimental calibration using either an ubiquitous signal at each wavelength. Importantly, this calibration only needs to be performed once for a given set of experimental conditions (e.g., fluorophores, tissue type, mounting medium)”.

      “Multi-scale analysis of gene expression and morphometrics is applicable to any 3D multicolor image. This includes both the 3D visualization tools (Napari plugins) and the various analytical plots (e.g., correlation plots, radial analysis). Multi-scale analysis can be performed even with imperfect segmentation, as long as segmentation errors tend to cancel out when averaged locally at the relevant spatial scale. However, systematic errors—such as segmentation uncertainty along the Z-axis due to strong anisotropy—may accumulate and introduce bias in downstream analyses. Caution is advised when analyzing hollow structures (e.g., curved epithelial monolayers with large cavities), as the pipeline was developed primarily for 3D bulk tissues, and appropriate masking of cavities would be needed”.

      Reviewer #2 (Public review):  

      Summary:  

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.  

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.  

      All computational tools developed in this study are released as open-source, Python-based software.  

      Strengths:  

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.  

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      We thank the reviewer for this positive feedback.

      Weaknesses:  

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.

      We agree that applying our analysis pipeline to published datasets—particularly those acquired with different imaging systems—would be valuable. However, only a few high-resolution datasets of large organoid samples are publicly available, and most of these either lack multiple fluorescence channels or represent 3D hollow structures. Our computational pipeline consists of several independent modules: spectral filtering, dual-view registration, local contrast enhancement, 3D nuclei segmentation, image normalization based on a ubiquitous marker, and multiscale analysis of gene expression and morphometrics. We added the following sentences to the Discussion, lines 418 to 474, and completed the discussion on applicability with a table showing the purpose, requirements, applicability and limitations of each step of the processing and analysis pipeline.

      “Spectral filtering has already been applied in other systems (e.g. [7] and [8]), but is here extended to account for imaging depth-dependent apparent emission spectra of the different fluorophores. In our pipeline, we provide code to run spectral filtering on multichannel images, integrated in Python. In order to apply the spectral filtering algorithm utilized here, spectral patterns of each fluorophore need to be calibrated as a function of imaging depth, which depend on the specific emission windows and detector settings of the microscope”.

      “Image normalization using a wavelength-dependent correction also requires calibration on a given imaging setup to measure the difference in signal decay among the different fluorophores species. To our knowledge, the calibration procedures for spectral-filtering and our image-normalization approach have not been performed previously in 3D samples, which is why validation on published datasets is not readily possible. Nevertheless, they are described in detail in the Methods section, and the code used—from the calibration measurements to the corrected images—is available open-source at the Zenodo link in the manuscript”.

      Dual-view registration, local contrast enhancement, and multiscale analysis of gene expression and morphometrics are not limited to organoid data or our specific imaging modalities. To evaluate our 3D nuclei segmentation model, we tested it on diverse systems, including gastruloids stained with the nuclear marker Draq5 from Moos et al. [1]; breast cancer spheroids; primary ductal adenocarcinoma organoids; human colon organoids and HCT116 monolayers from Ong et al. [2]; and zebrafish tissues imaged by confocal microscopy from Li et al [3]. These datasets were acquired using either light-sheet or confocal microscopy, with varying imaging parameters (e.g., objective lens, pixel size, staining method). The results are added in the manuscript, Fig. S9b.

      Besides, the nuclei segmentation component lacks benchmarking against existing methods.  

      We agree with the reviewer that a benchmark against existing segmentation methods would be very useful. We tried different pre-trained models:

      CellPose, which we tested in a previous paper ([4]) and which showed poor performances compared to our trained StarDist3D model.

      DeepStar3D ([2]) is only available in the software 3DCellScope. We could not benchmark the model on our data, because the free and accessible version of the software is limited to small datasets. An image of a single whole-mount gastruloid with one channel, having dimensions (347,467,477) was too large to be processed, see screenshot below. The segmentation model could not be extracted from the source code and tested externally because the trained DeepStar3D weights are encrypted.

      Author response image 1.

      Screenshot of the 3DCellScore software. We could not perform 3D nuclei segmentation of a whole-mount gastruloids because the image size was too large to be processed.

      AnyStar ([5]), which is a model trained from the StarDist3D architecture, was not performing well on our data because of the heterogeneous stainings. Basic pre-processing such as median and gaussian filtering did not improve the results and led to wrong segmentation of touching nuclei. AnyStar was demonstrated to segment well colon organoids in Ong et al, 2025 ([2]), but the nuclei were more homogeneously stained. Our Hoechst staining displays bright chromatin spots that are incorrectly labeled as individual nuclei.

      Cellos ([6]), another model trained from StarDist3D, was also not performing well. The objects used for training and to validate the results are sparse and not touching, so the predicted segmentation has a lot of false negatives even when lowering the probability threshold to detect more objects. Additionally, the network was trained with an anisotropy of (9,1,1), based on images with low z resolution, so it performed poorly on almost isotropic images. Adapting our images to the network’s anisotropy results in an imprecise segmentation that can not be used to measure 3D nuclei deformations.

      We tried both Cellos and AnyStar predictions on a gastruloid image from Fig. S2 of our main manuscript.  The results are added in the manuscript, Fig. S9b. Fig3 displays the results qualitatively compared to our trained model Stardist-tapenade.

      Author response image 2.

      Qualitative comparison of two published segmentation models versus our model. We show one slice from the XY plane for simplicity. Segmentations are displayed with their contours only. (Top left) Gastruloid stained with Hoechst, image extracted from Fig S2 of our manuscript. (Top right) Same image overlayed with the prediction from the Cellos model, showing many false negatives. (Bottom left) Same image overlayed with the prediction from our Stardist-tapenade model. (Bottom right) Same image overlayed with the prediction from the AnyStar model, false positives are indicated with a red arrow.

      CellPose-SAM, which is a recent model developed building on the CellPose framework. The pre-trained model performs well on gastruloids imaged using our pipeline, and performs better than StarDist3D at segmenting elongated objects such as deformed nuclei. The performances are qualitatively compared on Fig. S9a and S10.  We also demonstrate how using local contrast enhancement improves the results of CellPose-SAM (Fig. S10a), showing the versatility of the Tapenade pre-processing module. Tissue-scale, packing-related metrics from Cellpose–SAM labels qualitatively match those from stardist-tapenade as shown Fig.10c and d.

      Appraisal:  

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.  

      Impact and utility:  

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.  

      We thank the reviewer for these positive feedbacks.

      Reviewer #3 (Public review):

      Summary  

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.  

      Strengths  

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      We thank the reviewer for his positive feedback and appreciation of our work.

      Weaknesses  

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:  

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).  

      We thank the reviewer for this relevant suggestion. The 7 notebooks were updated to automatically download sample tests. The different parts of the pipeline can now be run immediately:

      https://github.com/GuignardLab/tapenade/tree/chekcs_on_notebooks/src/tapenade/notebooks

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

      A morphometric analysis based on the axial views was added as Fig. S6a of the manuscript, complementary to the XY views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):  

      In lines 64 and 65, it is mentioned that confocal and light-sheet microscopy remain limited to samples under 100μm in diameter. I would recommend revising this sentence. In the paper of Moos and colleagues (also cited in this manuscript; PMID: 38509326), gastruloid samples larger than 100μm are imaged in toto with an open-top dual-view and dual-illumination light-sheet microscope, and live cell behaviour is analysed. Another example, if considering also multi-angle systems, is the impressive work of McDole and colleagues (PMID: 30318151), in which one of the authors of this manuscript is a corresponding author. There, multi-angle light sheet microscopy is used for in toto imaging and reconstruction of post-implantation mouse development (samples much larger than 100μm). Some multi-sample imaging strategies have been developed for this type of imaging system, though not to the sample number extent allowed by the Viventis LS2 system or the Bruker TruLive3D imager, which have higher image quality limitations.

      We thank the reviewer for this remark. As reported in their paper, Moos et al. used dual-view light-sheet microscopy to image gastruloids, which are particularly dense and challenging tissues, with whole-mount samples of approximately 250 µm in diameter. Nevertheless, their image quality metric (DCT) shows a rapid twofold decrease within 50 µm depth (Extended Fig 5.h), whereas with two-photon microscopy, our image quality metric (FRC-QE) decreases by a factor of two over 150 µm in non-cleared samples (PBS) (see Fig. 2 c). While these two measurements (FRC-QE versus DCT) are not directly comparable, the observed difference reflects the superior depth performance of two-photon microscopy, owing in part to the use of non-descanned detectors. In our case, imaging was performed with Hoechst, a blue fluorophore suboptimal for deep imaging, whereas in the Moos dataset (Draq5, far-red), the configuration was more favorable for imaging in depth  which further supports our conclusion.

      In McDole et al, tissues reaching 250µm were imaged from 4 views, but do not reach cellular-scale resolution in deeper layers compatible with cell segmentation to our knowledge.

      We corrected the sentence ‘However, light-sheet and confocal imaging approaches remain limited to relatively small organoids typically under 100 micrometers in diameter ‘ by the following (line 64) :

      “While advances in light-sheet microscopy have extended imaging depth in organoids, maintaining high image quality throughout thick samples remains challenging. In practice, quantitative analyses are still largely restricted to organoids under roughly 100 µm in diameter”.

      It is worth mentioning that two-photon microscopes are much more widely available than light sheet microscopes, and light sheet systems with 2-photon excitation are even less accessible, which makes the described workflow of Gros and colleagues have a wide community interest.  

      We thank the reviewer for this remark, and added this suggestion line 74:

      “Finally, two-photon microscopes are typically more accessible than light-sheet systems and allow for straightforward sample mounting, as they rely on procedures comparable to standard confocal imaging”.

      Reviewer #2 (Recommendations for the authors):  

      Suggestions:  

      A comparison with established pre-trained models for 3D organoid image segmentation (e.g., Cellos[1], AnyStar[2], and DeepStar3D[3], all based on StarDist3D) would help highlight the advantages of the authors' custom StarDist3D model, which has been specifically optimized for two-photon microscopy images.  

      (1)  Cellos: https://doi.org/10.1038/s41467-023-44162-6

      (2)  AnyStar: https://doi.org/10.1109/WACV57701.2024.00742

      (3)  DeepStar3D: https://doi.org/10.1038/s41592-025-02685-4

      We agree with the reviewer that a benchmark against existing segmentation methods is very useful. This is addressed in the revised version, as detailed above (Figure 3).

      Recommendations:  

      Please clarify the following point. In line 195, the authors state, "This allowed us to detect all mitotic nuclei in whole-mount samples for any stage and size." Does this mean that the custom-trained StarDist3D model can detect 100% of mitotic nuclei? It was not clear from the manuscript, figures, or videos how this was validated. Given the reported performance scores of the StarDist3D model for detecting all nuclei, claiming 100% detection of mitotic nuclei seems surprisingly high.

      We thank the reviewer for this comment. As it was detailed in the methods section, the detection score reaches 82%, and only the complete pipeline (detection+minimal manual curation) allows us to detect all mitotic nuclei. To make it clearer, the following precisions were added in the Results section:

      ”To detect division events, we stained gastruloids with phosphohistone H3 (ph3) and trained a separate custom Stardist3D model using 3D annotations of nuclei expressing ph3 (see Methods III H). This model together allowed us to detect nearly all mitotic nuclei in whole-mount samples for any stage and size (Fig.3f and Suppl.Movie 4), and we used minimal manual curation to correct remaining errors.”

      Minor corrections:  

      It appears that Figures 4-6 are missing from the submitted version, but they can be found in the manuscript available on bioRxiv.

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4 to 6.

      In line 185, is the intended phrase "by comparing the 2D predictions and the 2D sliced annotated segments..."? 

      To gain some clarity, we replaced the initial sentence:

      “The f1 score obtained by comparing the 3D prediction and the 3D ground-truth is well approximated by the f1 score obtained by comparing the 2D annotations and the 2D sliced annotated segments, with at most a 5% difference between the two scores.” by

      “The f1 score obtained in 3D (3D prediction compared with the 3D ground-truth) is well approximated by the f1 score obtained in 2D (2D predictions compared with the 2D sliced annotated segments). The difference between the 2 scores was at most 5%.”

      Reviewer #3 (Recommendations for the authors):

      (1) How is the "local neighborhood volume" defined, and how was it computed?

      The reviewer is referring to this paragraph (the term is underscored) :

      “To probe quantities related to the tissue structure at multiple scales, we smooth their signal with a Gaussian kernel of width σ, with σ defined as the spatial scale of interest. From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of nuclear volume to local neighborhood volume), and nuclear volume at multiple scales.”

      To improve clarity, the phrasing has been revised: the term local neighborhood volume has been replaced by local averaging volume, and a reference to the Methods section has been added.

      From the segmented nuclei instances, we compute 3D fields of cell density (number of cells per unit volume), nuclear volume fraction (ratio of space occupied by nuclear volume within the local averaging volume, as defined in the Methods III I), and nuclear volume at multiple scales.

      (2) In the definition of inertia tensor (18), isn't the inner part normally defined in the reversed way (delta_i,j - ...)?

      We thank the reviewer for noticing this error, which we fixed in the manuscript.

      (3) For intensity normalization, the paper uses the Hoechst signal density as a proxy for a ubiquitous nuclei signal. I would assume that this is problematic, for eg, dividing cells (which would overestimate it). Would using the average Hoechst signal per nucleus mask (as segmentation is available) be a better proxy?

      We agree that this idea is appealing if one assumes a clear relationship between nuclear volume and Hoechst intensity. However, since cell and nuclear volumes vary substantially with differentiation state (see Fig. 4), such a normalization approach would introduce additional biases at large spatial scales. We believe that the most robust improvement would instead consist in masking dividing cells during the normalization procedure, as these events could be detected and excluded from the computation.

      Nonetheless, we believe the method proposed by the reviewer could prove relevant for other types of data, so we will implement this recommendation in the code available in the Tapenade package.

      (4) Figures 4-6 were part of the Supplementary Material, but should be included in the main text?

      We thank the reviewer for this remark, this was corrected immediately to add Figures 4-6.

      We also noticed a missing reference to Fig. S3 in the main text, so we added lines 302 to 307 to comment on the wavelength-dependency of the normalization method. We improved the description of Fig.6, which lacked clarity (line 316 to 321, line 327).

      (1) Moos, F., Suppinger, S., de Medeiros, G., Oost, K.C., Boni, A., Rémy, C., Weevers, S.L., Tsiairis, C., Strnad, P. and Liberali, P., 2024. Open-top multisample dual-view light-sheet microscope for live imaging of large multicellular systems. Nature Methods, 21(5), pp.798-803.

      (2) Ong, H. T.; Karatas, E.; Poquillon, T.; Grenci, G.; Furlan, A.; Dilasser, F.; Mohamad Raffi, S. B.; Blanc, D.; Drimaracci, E.; Mikec, D.; Galisot, G.; Johnson, B. A.; Liu, A. Z.; Thiel, C.; Ullrich, O.; OrgaRES Consortium; Racine, V.; Beghin, A. (2025). Digitalized organoids: integrated pipeline for high-speed 3D analysis of organoid structures using multilevel segmentation and cellular topology.  Nature Methods, 22(6), pp.1343-1354

      (3) Li, L., Wu, L., Chen, A., Delp, E.J. and Umulis, D.M., 2023. 3D nuclei segmentation for multi-cellular quantification of zebrafish embryos using NISNet3D. Electronic Imaging, 35, pp.1-9.

      (4) Vanaret, J., Dupuis, V., Lenne, P. F., Richard, F., Tlili, S., & Roudot, P. (2023). A detector-independent quality score for cell segmentation without ground truth in 3D live fluorescence microscopy. IEEE Journal of Selected Topics in Quantum Electronics, 29(4:Biophotonics), 1-12.

      (5) Dey, N., Abulnaga, M., Billot, B., Turk, E. A., Grant, E., Dalca, A. V., & Golland, P. (2024). AnyStar: Domain randomized universal star-convex 3D instance segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 7593-7603).

      (6) Mukashyaka, P., Kumar, P., Mellert, D. J., Nicholas, S., Noorbakhsh, J., Brugiolo, M., ... & Chuang, J. H. (2023). High-throughput deconvolution of 3D organoid dynamics at cellular resolution for cancer pharmacology with Cellos. Nature Communications, 14(1), 8406.

      (7) Rakhymzhan, A., Leben, R., Zimmermann, H., Günther, R., Mex, P., Reismann, D., ... & Niesner, R. A. (2017). Synergistic strategy for multicolor two-photon microscopy: application to the analysis of germinal center reactions in vivo. Scientific reports, 7(1), 7101.

      (8) Dunsing, V., Petrich, A., & Chiantia, S. (2021). Multicolor fluorescence fluctuation spectroscopy in living cells via spectral detection. Elife, 10, e69687.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review):

      We thank Reviewer #1 for its thoughtful and constructive feedback. We found the suggestions particularly helpful in refining the conceptual framework and clarifying key aspects of our interpretations.

      Summary:

      This paper investigates the potential link between amygdala volume and social tolerance in multiple macaque species. Through a comparative lens, the authors considered tolerance grade, species, age, sex, and other factors that may contribute to differing brain volumes. They found that amygdala, but not hippocampal, volume differed across tolerance grades, such that hightolerance species showed larger amygdala than low-tolerance species of macaques. They also found that less tolerant species exhibited increases in amygdala volume with age, while more tolerant species showed the opposite. Given their wide range of species with varied biological and ecological factors, the authors' findings provide new evidence for changes in amygdala volume in relation to social tolerance grades. Contributions from these findings will greatly benefit future efforts in the field to characterize brain regions critical for social and emotional processing across species.

      Strengths:

      (1) This study demonstrates a concerted and impressive effort to comparatively examine neuroanatomical contributions to sociality in monkeys. The authors impressively collected samples from 12 macaque species with multiple datapoints across species age, sex, and ecological factors. Species from all four social tolerance grades were present. Further, the age range of the animals is noteworthy, particularly the inclusion of individuals over 20 years old - an age that is rare in the wild but more common in captive settings. 

      (2) This work is the first to report neuroanatomical correlates of social tolerance grade in macaques in one coherent study. Given the prevalence of macaques as a model of social neuroscience, considerations of how socio-cognitive demands are impacted by the amygdala are highly important. The authors' findings will certainly inform future studies on this topic.

      (3) The methodology and supplemental figures for acquiring brain MRI images are well detailed. Clear information on these parameters is crucial for future comparative interpretations of sociality and brain volume, and the authors do an excellent job of describing this process in full.

      Weaknesses:

      (1) The nature vs. nurture distinction is an important one, but it may be difficult to draw conclusions about "nature" in this case, given that only two data points (from grades 3 and 4) come from animals under one year of age (Method Figure 1D). Most brains were collected after substantial social exposure-typically post age 1 or 1.5-so the data may better reflect developmental changes due to early life experience rather than innate wiring. It might be helpful to frame the findings more clearly in terms of how early experiences shape development over time, rather than as a nature vs. nurture dichotomy.

      We agree with the reviewer that presenting our findings through a strict nature vs. nurture dichotomy was potentially misleading. We have revised the introduction and the discussion (e.g. lines 85-95 and 363-365) to clarify that we examined how neurodevelopmental trajectories differ across social grades with the caveat of related to the absence of very young individuals in our samples.  We now explicitly mention that our results may reflect both early species-typical biases and experience-dependent maturation.

      We positioned our study on social tolerance in a comparative neuroscience framework and introduced a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates

      Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organize these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024).

      “Cross-fostering experiments (De Waal and Johanowicz, 1993), along with our own results, suggest that social tolerance grades reflect both early, possibly innate predispositions and later environmental shaping”.

      (2) It would be valuable to clarify how the older individuals, especially those 20+ years old, may have influenced the observed age-related correlations (e.g., positive in grades 1-2, negative in grades 3-4). Since primates show well-documented signs of aging, some discussion of the potential contribution of advanced age to the results could strengthen the interpretation.

      We thank the reviewer for highlighting this important point. In our dataset, younger and older subjects are underrepresented, but they are distributed across all subgroups. Therefore, we do not think that it could drive the interaction effect we are reporting. In our sample, amygdala volume tended to increase with age in intolerant species and decrease in tolerant species. We included a new analysis (Figure 4) that allows providing a clearer assessment of when social grades 1 vs 4 differed in terms of amygdala and hippocampus volume. While our model accounts for age continuously, we agree that age-related variation deserves cautious interpretation and require longitudinal designs in future studies.

      We also added the following statements in the discussion (lines 386-391)

      “Due to a limited sample size of our study, this crossing trend, already accounted for by our continuous age model, should be further investigated. These results call for cautious interpretation of age-related variation and further emphasize the importance of longitudinal studies integrating both behavioral, cognitive and anatomical data in non-human primates, which would help to better understand the link between social environment and brain development (Song et al., 2021)”.

      (3) The authors categorize the behavioral traits previously described in Thierry (2021) into 3 selfdefined cognitive requirements, however, they do not discuss under what conditions specific traits were assigned to categories or justify why these cognitive requirements were chosen. It is not fully clear from Thierry (2021) alone how each trait would align with the authors' categories. Given that these traits/categories are drawn on for their neuroanatomical hypotheses, it is important that the authors clarify this. It would be helpful to include a table with all behavioral traits with their respective categories, and explain their reasoning for selecting each cognitive requirement category.

      Thank you for this important suggestion. We have extensively revised the introduction to explain how we derived from the scientific literature the three cognitive dimensions—socio-cognitive demands, behavioral inhibition, and predictability of the social environment—. We now provide a complete overview of the 18 behavioral traits described in Thierry’s framework and their cognitive classification in a dedicated table , along with hypothesized neural correlates. We have also mentioned traits that were not classified in our framework along with short justification of this classification. We believe this addition significantly improves the transparency and intelligibility of our conceptual approach.

      “The concept of social tolerance, central to this comparative approach, has sometimes been used in a vague or unidimensional way. As Bernard Thierry (2021) pointed out, the notion was initially constructed around variations in agonistic relationships – dominance, aggressiveness, appeasement or reconciliation behaviors – before being expanded to include affiliative behaviors, allomaternal care or male–male interactions (Thierry, 2021). These traits do not necessarily align along a single hierarchical axis but rather reflect a multidimensional complexity of social style, in which each trait may have co-evolved with others (Thierry, 2021, 2000; Thierry et al., 2004). Moreover, the lack of a standardized scientific definition has sometimes led to labeling species as “tolerant” or “intolerant” without explicit criteria (Gumert and Ho, 2008; Patzelt et al., 2014). These behavioral differences are characterized by different styles of dominance (Balasubramaniam et al., 2012), severity of agonistic interactions (Duboscq et al., 2014), nepotism (Berman and Thierry, 2010; Duboscq et al., 2013; Sueur et al., 2011) and submission signals (De Waal and Luttrell, 1985; Rincon et al., 2023), among the 18 covariant behavioral traits described in Thierry's classification of social tolerance (Thierry, 2021, 2017, 2000)”.

      “To ground the investigation of social tolerance in a comparative neuroanatomical framework, we introduce a tentative working model that articulates behavioral traits, cognitive dimensions, and their potential subcortical neural substrates. Drawing upon 18 behavioral traits identified in Thierry’s comparative analyses (Thierry, 2021, 2007), we organized these traits into three core dimensions: socio-cognitive demands, behavioral inhibition, and the predictability of the social environment (Table 1). This conceptualization does not aim to redefine social tolerance itself, but rather to provide a structured basis for testing neuroanatomical hypotheses related to social style variability. It echoes recent efforts to bridge behavioral ecology and cognitive neuroscience by linking specific mental abilities – such as executive functions or metacognition – with distinct prefrontal regions shaped by social and ecological pressures (Bouret et al., 2024; Testard 2022)”.

      (4) One of the main distinctions the authors make between high social tolerance species and low tolerance species is the level of complex socio-cognitive demands, with more tolerant species experiencing the highest demands. However, socio-cognitive demands can also be very complex for less tolerant species because they need to strategically balance behaviors in the presence of others. The relationships between socio-cognitive demands and social tolerance grades should be viewed in a more nuanced and context-specific manner. 

      We fully agree and we did not mean that intolerant species lives in a ‘simple’ social environment but that the ones of more tolerant species is markedly more demanding. Evidence supporting this statement include their more efficient social networks (Sueur et al., 2011) and more complex communicative skills (e.g. tolerant macaques displayed higher levels of vocal diversity and flexibility than intolerant macaques in social situation with high uncertainty (Rebout et al., 2020).

      In the revised version (lines 106-122), we now highlight that socio-cognitive challenges arise across the tolerance spectrum, including in less tolerant species where strategic navigation of rigid hierarchies and risk-prone interactions is required. We hope that this addition offers a more balanced and nuanced framing of socio-cognitive demands across macaque societies

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      (5) While the limitations section touches on species-related considerations, the issue of individual variability within species remains important. Given that amygdala volume can be influenced by factors such as social rank and broader life experience, it might be useful to further emphasize that these factors could introduce meaningful variation across individuals. This doesn't detract from the current findings but highlights the importance of considering life history and context when interpreting subcortical volumes-particularly in future studies.

      We have now emphasized this point in the limitations section (lines 441-456). While our current dataset does not allow us to fully control for individual-level variables across all collection centers, we recognize that factors such as rank, social exposure, and individual life history may influence subcortical volumes

      “Although we explained some interspecies variability, adding subjects to our database will increase statistical power and will help addressing potential confounding factors such as age or sex in future studies. One will benefit from additional information about each subject. While considered in our modelling, the social living and husbandry conditions of the individuals in our dataset remain poorly documented. The living environment has been considered, and the size of social groups for certain individuals, particularly for individuals from the CdP, have been recorded. However, these social characteristics have not been determined for all individuals in the dataset. As previously stated, the social environment has a significant impact on the volumetry of certain regions. Furthermore, there is a lack of data regarding the hierarchy of the subjects under study and the stress they experience in accordance with their hierarchical rank and predictability of social outcomes position (McCowan et al., 2022)”. 

      Reviewer #2 (Public review):

      We thank Reviewer #2 for its thoughtful remarks and for acknowledging the value of our comparative approach despite its inherent constraints.

      Summary:

      This comparative study of macaque species and the type of social interaction is both ambitious and inevitably comes with a lot of caveats. The overall conclusion is that more intolerant species have a larger amygdala. There are also opposing development profiles regarding amygdala volume depending on whether it is a tolerant or intolerant species.

      To achieve any sort of power, they have combined data from 4 centres, which have all used different scanning methods, and there are some resolution differences. The authors have also had to group species into 4 classifications - again to assist with any generalisations and power. They have focused on the volumes of two structures, the amygdala and the hippocampus, which seems appropriate. Neither structure is homogeneous and so it may well be that a targeted focus on specific nuclei or subfields would help (the authors may well do this next) - but as the variables would only increase further along with the number of potential comparisons, alongside small group numbers, it seems only prudent to treat these findings are preliminary. That said, it is highly unlikely that large numbers of macaque brains will become available in the near future.

      This introduction is by way of saying that the study achieves what it sets out to do, but there are many reasons to see this study as preliminary. The main message seems to be twofold: (1) that more intolerant species have relatively larger amygdalae, and (2) that with development, there is an opposite pattern of volume change (increasing with age in intolerant species and decreasing with age in tolerant species). Finding 1 is the opposite of that predicted in Table 1 - this is fine, but it should be made clearer in the Discussion that this is the case, otherwise the reader may feel confused. As I read it, the authors have switched their prediction in the Discussion, which feels uncomfortable. 

      We thank the reviewer for this important observation. In the original version, Table 1 presented simplified direct predictions linking social tolerance grades to amygdala and hippocampus volumes. We recognize that this formulation may have created confusion In the revised manuscript, we have thoroughly restructured the table and its accompanying rationale. Table 1 now better reflects our conceptual framework grounded in three cognitive dimensions—sociocognitive demands, behavioral inhibition, and social predictability—each linked to behavioral traits and associated neural hypotheses based on published literature. This updated framework, detailed in lines 144-169 of the introduction, provides a more nuanced basis for interpreting our results and avoids the inconsistencies previously noted. The Discussion was also revised accordingly (lines 329-255) to clarify where our findings diverge from the original predictions and to explore alternative explanations based on social complexity. Rather than directly predicting amygdala size from social tolerance grades, we propose that variation in volume emerges from differing combinations of cognitive pressures across species.

      It is inevitable that the data in a study of this complexity are all too prone to post hoc considerations, to which the authors indulge. In the case of Grade 1 species, the individuals have a lot to learn, especially if they are not top of the hierarchy, but at the same time, there are fewer individuals in the troop, making predictions very tricky. As noted above, I am concerned by the seemingly opposite predictions in Table 1 and those in the Discussion regarding tolerance and amygdala volume. (It may be that the predictions in Table 1 are the opposite of how I read them, in which case the Table and preceding text need to align.)

      In order to facilitate the interpretation of our Bayesian modelling, we have selected a more focused ROI in our automatic segmentation procedure of the Hippocampus (from Hippocampal Formation to Hippocampus) and have added to the new analysis (Figure 4) that helps to properly test whether the hippocampus significantly differs between species from social grade 1 vs 4. The present analysis found that this is the case in adult monkeys. This is therefore consistent with our hypothesis that amygdala volumes are principally explained by heightened sociocognitive demands in more tolerant species.

      We also acknowledge the reviewer’s concerns about the limited generalizability due to our sample. The challenges of comparative neuroimaging in non-human primates—especially when using post-mortem datasets—are substantial. Given the ethical constraints and the rarity of available specimens, increasing the number of individuals or species is not feasible in the short term. However, we have made all data and code publicly available and clearly stated the limitations of our sample in the manuscript. Despite these constraints, we believe our dataset offers an unprecedented comparative perspective, particularly due to the inclusion of rare and tolerant species such as M. tonkeana, M. nigra, and M. thibetana, which have never been included in structural MRI studies before. We hope this effort will serve as a foundation for future collaborative initiatives in primate comparative neuroscience.

      Reviewer #3 (Public review):

      We thank Reviewer #3 for their thoughtful and detailed review. Their comments helped us refine both the conceptual and interpretative aspects of the manuscript. We respond point by point below.

      Summary:

      In this study, the authors were looking at neurocorrelates of behavioural differences within the genus Macaca. To do so, they engaged in real-world dissection of dead animals (unconnected to the present study) coming from a range of different institutions. They subsequently compare different brain areas, here the amygdala and the hippocampus, across species. Crucially, these species have been sorted according to different levels of social tolerance grades (from 1 to 4). 12 species are represented across 42 individuals. The sampling process has weaknesses ("only half" of the species contained by the genus, and Macaca mulatta, the rhesus macaque, representing 13 of the total number of individuals), but also strengths (the species are decently well represented across the 4 grades) for the given purpose and for the amount of work required here. I will not judge the dissection process as I am not a neuroanatomist, and I will assume that the different interventions do not alter volume in any significant ways / or that the different conditions in which the bodies were kept led to the documented differences across species. 

      25 brains were extracted by the authors themselves who are highly with this procedure. Overall, we believe that dissection protocols did not alter the total brain volume. Despite our expertise, we experienced some difficulties to not damage the cerebellum. Therefore, this region was not included in our analysis. We also noted that this brain region was also damaged or absent from the Prime-DE dataset.

      Several protocols were used to prepare and store tissue. It could have impacted the total brain volume.

      We agree that differences in tissue preparation and storage could potentially affect total brain volume. Therefore, we explicitly included the main sample preparation variable — whether brains had been previously frozen — as a covariate in our model. This factor did not explain our results. Moreover, Figures 1D and 1I display the frozen status and its correlation with the amygdala and hippocampus ratios, respectively. Figure 2 shows the parameters of the model and the posterior distributions for the frozen status and total brain volume effects.

      There are two main results of the study. First, in line with their predictions, the authors find that more tolerant macaque species have larger amygdala, compared to the hippocampus, which remains undifferentiated across species. Second, they also identify developmental effects, although with different trends: in tolerant species, the amygdala relative volume decreases across the lifespan, while in intolerant species, the contrary occurs. The results look quite strong, although the authors could bring up some more clarity in their replies regarding the data they are working with. From one figure to the other, we switch from model-calculated ratio to modelpredicted volume. Note that if one was to sample a brain at age 20 in all the grades according to the model-predicted volumes, it would not seem that the difference for amygdala would differ much across grades, mostly driven with Grade 1 being smaller (in line with the main result), but then with Grade 2 bigger than Grade 3, and then Grade 4 bigger once again, but not that different from Grade 2.

      Overall, despite this, I think the results are pretty strong, the correlations are not to be contested, but I also wonder about their real meaning and implications. This can be seen under 3 possible aspects:

      (1)  Classification of the social grade

      While it may be familiar to readers of Thierry and collaborators, or to researchers of the macaque world, there is no list included of the 18 behavioral traits used to define the three main cognitive requirements (socio-cognitive demands, predictability of the environment, inhibitory control). It would be important to know which of the different traits correspond to what, whether they overlap, and crucially, how they are realized in the 12 study species, as there could be drastic differences from one species to the next. For now, we can only see from Table S1 where the species align to, but it would be a good addition to have them individually matched to, if not the 18 behavioral traits, at least the 3 different broad categories of cognitive requirements.

      We fully agree with this observation. In the revised version of the manuscript, we now include a detailed conceptual table listing all 18 behavioral traits from Thierry’s framework. For each trait, we provide its underlying social implications, its associated cognitive dimension (when applicable), and the hypothesized neural correlate. 

      While some traits may could have been arguably classified in several cognitive dimensions (e.g. reconciliation rate), we preferred to assign each to a unique dimension for clarity. Additionally, the introduction (lines 95-169 + Table1) now explains how each trait was evaluated based on existing literature and assigned to one of the three proposed cognitive categories: socio-cognitive demands, behavioral inhibition, or social unpredictability. This structure offers a clearer and more transparent basis for the neuroanatomical hypotheses tested in the study.

      “Navigating social life in primate societies requires substantial cognitive resources: individuals must not only track multiple relationships, but also regulate their own behavior, anticipate others’ reactions, and adapt flexibly to changing social contexts. Taken advantage of databases of magnetic resonance imaging (MRI) structural scans, we conducted the first comparative study integrating neuroanatomical data and social behavioral data from closely related primate species of the same genus to address the following questions: To what extent can differences in volumes of subcortical brain structures be correlated with varying degrees of social tolerance? Additionally, we explored whether these dispositions reflect primarily innate features, shaped by evolutionary processes, or acquired through socialization within more or less tolerant social environments”.

      “The first category, socio-cognitive demands, refers to the cognitive resources needed to process, monitor, and flexibly adapt to complex social environments. Linking those parameters to neurological data is at the core of the social brain theory to explain the expansion of the neocortex in primates (Dunbar). Macaques social systems require advanced abilities in social memory, perspective-taking, and partner evaluation (Freeberg et al., 2012). This is particularly true in tolerant species, where the increased frequency and diversity of interactions may amplify the demands on cognitive tracking and flexibility. Tolerant macaque species typically live in larger groups with high interaction frequencies, low nepotism, and a wider range of affiliative and cooperative behaviors, including reconciliation, coalition-building, and signal flexibility (REF). Tolerant macaque species also exhibit a more diverse and flexible vocal and facial repertoire than intolerants ones which may help reduce ambiguity and facilitate coordination in dense social networks (Rincon et al., 2023; Scopa and Palagi, 2016; Rebout 2020). Experimental studies further show that macaques can use facial expressions to anticipate the likely outcomes of social interactions, suggesting a predictive function of facial signals in managing uncertainty (Micheletta et al., 2012; Waller et al., 2016). Even within less tolerant species, like M. mulatta, individual variation in facial expressivity has been linked to increased centrality in social networks and greater group cohesion, pointing to the adaptive value of expressive signaling across social styles (Whitehouse et al., 2024)”.

      “The second category, inhibitory control, includes traits that involve regulating impulsivity, aggression, or inappropriate responses during social interactions. Tolerant macaques have been shown to perform better in tasks requiring behavioral inhibition and also express lower aggression and emotional reactivity in both experimental and natural contexts (Joly et al., 2017; Loyant et al., 2023). These features point to stronger self-regulation capacities in species with egalitarian or less rigid hierarchies. More broadly, inhibition – especially in its strategic form (self-control) – has been proposed to play a key role in the cohesion of stable social groups. Comparative analyses across mammals suggest that this capacity has evolved primarily in anthropoid primates, where social bonds require individuals to suppress immediate impulses in favour of longer-term group stability (Dunbar and Shultz, 2025). This view echoes the conjecture of Passingham and Wise (2012), who proposed that the emergence of prefrontal area BA10 in anthropoids enabled the kind of behavioural flexibility needed to navigate complex social environments (Passingham et al., 2012)”.

      “The third category, social environment predictability, reflects how structured and foreseeable social interactions are within a given society. In tolerant species, social interactions are more fluid and less kin-biased, leading to greater contextual variation and role flexibility, which likely imply a sustained level of social awareness. In fact, as suggested by recent research, such social uncertainty and prolonged incentives are reflected by stress-related physiology : tolerant macaques such as M. tonkeana display higher basal cortisol levels, which may be indicative of a chronic mobilization of attentional and regulatory resources to navigate less predictable social environments (Sadoughi et al., 2021)”.

      “Each behavioral trait was individually evaluated based on existing empirical literature regarding the types of cognitive operations it likely involves. When a primary cognitive dimension could be identified, the trait was assigned accordingly. However, some behaviors – such as maternal protection, allomaternal care, or delayed male dispersal – do not map neatly onto a single cognitive process. These traits likely emerge from complex configurations of affective and socialmotivational systems, and may be better understood through frameworks such as attachment theory (Suomi, 2008), which emphasizes the integration of social bonding, emotional regulation, and contextual plasticity. While these dimensions fall beyond the scope of the present framework, they offer promising directions for future research, particularly in relation to the hypothalamic and limbic substrates of social and reproductive behavior”.

      “Rather than forcing these traits into potentially misleading categories, we chose to leave them unclassified within our current cognitive framework. This decision reflects both a commitment to conceptual clarity and the recognition that some behaviors emerge from a convergence of cognitive demands that cannot be neatly isolated. This tripartite framework, leaving aside reproductive-related traits, provides a structured lens through which to link behavioral diversity to specific cognitive processes and generate neuroanatomical predictions”.

      (2) Issue of nature vs nurture

      Another way to look at the debate between nature vs nurture is to look at phylogeny. For now, there is no phylogenetic tree that shows where the different grades are realized. For example, it would be illuminating to know whether more related species, independently of grades, have similar amygdala or hippocampus sizes. Then the question will go to the details, and whether the grades are realized in particular phylogenetic subdivisions. This would go in line with the general point of the authors that there could be general species differences.

      As pointed out by Thierry and collaborators, the social tolerance concept is already grounded in a phylogenetic framework as social tolerance matches the phylogenetical tree of these macaque species, suggesting a biological ground of these behavioral observations. Given the modest sample size and uneven species representation, we opted not to adopt tools such as Phylogenetic Generalized Least Squares (PGLS) in our analysis. Our primary aim in this study was to explore neuroanatomical variation as a function of social traits, not to perform a phylogenetic comparative analysis per see. That said, we now explicitly acknowledge this limitation in the Discussion and indicate that future work using larger datasets and phylogenetic methods will be essential to disentangle social effects from evolutionary relatedness. We hope that making our dataset openly available will facilitate such futures analyses.

      With respect to nurture, it is likely more complicated: one needs to take into account the idiosyncrasies of the life of the individual. For example, some of the cited literature in humans or macaques suggests that the bigger the social network, the bigger the brain structure considered. Right, but this finding is at the individual level with a documented life history. Do we have any of this information for any of the individuals considered (this is likely out of the scope of this paper to look at this, especially for individuals that did not originate from CdP)?

      We appreciate this insightful observation. Indeed, findings from studies in humans and nonhuman primates showing associations between brain structure and social network size typically rely on detailed life history and behavioral data at the individual level. Unfortunately, such finegrained information was not consistently available across our entire sample. While some individuals from the Centre de Primatologie (CdP) were housed in known group compositions and social settings, we did not have access to longitudinal social data—such as rank, grooming rates, or network centrality—that would allow for robust individual-level analyses. We now acknowledge this limitation more clearly in the Discussion (lines 436-443), and we fully agree that future work combining neuroimaging with systematic behavioral monitoring will be necessary to explore how species-level effects interact with individual social experience.

      (3) Issue of the discussion of the amygdala's function

      The entire discussion/goal of the paper, states that the amygdala is connected to social life. Yet, before being a "social center", the amygdala has been connected to the emotional life of humans and non-humans alike. The authors state L333/34 that "These findings challenge conventional expectations of the amygdala's primary involvement in emotional processes and highlight the complexity of the amygdala's role in social cognition". First, there is no dichotomy between social cognition and emotion. Emotion is part of social cognition (unless we and macaques are robots). Second, there is nowhere in the paper a demonstration that the differences highlighted here are connected to social cognition differences per se. For example, the authors have not tested, say, if grade 4 species are more afraid of snakes than grade 1 species. If so, one could predict they would also have a bigger amygdala, and they would probably also find it in the model. My point is not that the authors should try to correlate any kind of potential aspect that has been connected to the amygdala in the literature with their data (see for example the nice review by DomínguezBorràs and Vuilleumier, https://doi.org/10.1016/B978-0-12-823493-8.00015-8), but they should refrain from saying they have challenged a particular aspect if they have not even tested it. I would rather engage the authors to try and discuss the amygdala as a multipurpose center, that includes social cognition and emotion.

      We thank the reviewer for this important and nuanced point. We have revised the manuscript to adopt a more cautious and integrative tone regarding the function of the amygdala. In the revised Discussion (lines 341-355), we now explicitly state that the amygdala is involved in a broad range of processes—emotional, social, and affective—and that these domains are deeply intertwined. Rather than proposing a strict dissociation, we now suggest that the amygdala supports integrated socio-emotional functions that are mobilized differently across social tolerance styles. We also cite recent relevant literature (e.g., Domínguez-Borràs & Vuilleumier, 2021) to support this view and have removed any claim suggesting we challenge the emotional function of the amygdala per se. Our aim is to contribute to a richer understanding of how affective and social processes co-construct structural variation in this region.

      Strengths:

      Methods & breadth of species tested.

      Weaknesses:

      Interpretation, which can be described as 'oriented' and should rather offer additional views.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Private Comments:

      (1) Table 1 should be formatted for clarity i.e., bolded table headers, text realignment, and spacing. It was not clear at first glance how information was organized. It may also be helpful to place behavioral traits as the first column, seeing that these traits feed into the author's defined cognitive requirements.

      We have reformatted Table 1 to improve clarity and readability. Behavioral traits now appear in the first column, followed by cognitive dimensions and hypothesized neural correlates. Column headers have been bolded and alignment has been standardized.

      (2) Figures could include more detail to help with interpretations. For example, Figure 3 should define values included on the x-axis in the figure caption, and Figure 4 should explain the use of line, light color, and dark color. Figure 1 does not have a y-axis title.

      The figures have been revised and legends completed to ensure more clarity.

      (3) Please proofread for typos throughout.

      The manuscript has been carefully proofread, and all typographical and grammatical errors have been corrected. These changes are visible in the tracked version.

      Reviewer #2 (Recommendations for the authors):

      Specific comments:

      (1) Given all of the variability would it not be a good idea to just compare (eg in the supplemental) the macaque data from just the Strasbourg centre for m mulatta and m toneanna. I appreciate the ns will be lower, but other matters are more standardized.

      We fully understand the reviewer’s suggestion to restrict the comparison to data collected at a single site in order to minimize inter-site variability. However, as noted, such an analysis would come at the cost of statistical power, as the number of individuals per species within a single center is small. For example, while M. tonkeana is well represented at the Strasbourg centre, only one individual of M. mulatta is available from the same site. Thus, a restricted comparison would severely limit the interpretability of results, particularly for age-related trajectories. To address variability, we included acquisition site and brain preservation method as covariates or predictors where appropriate, and we have been cautious in our interpretations. We also now emphasize in the Methods and Discussion the value of future datasets with more standardized acquisition protocols across species and centers. We hope that by openly sharing our data and workflow, we can contribute to this broader goal.

      (2) I have various minor edits:

      (a) L 25 abstract - Specify what is meant by 'opposite trend'; the reader cannot infer what this is.

      Modified in line 25-28: “Unexpectedly, tolerant species exhibited a decrease in relative amygdala volume across the lifespan, contrasting with the age-related increase observed in intolerant species—a developmental pattern previously undescribed in primates.”

      (b) L67 - The reference 'Manyprimates' needs fixing as it does in the references section.

      After double checking, Manyprimates studies are international collaborative efforts that are supposed to be cite this way (https://manyprimates.github.io/#pubs).

      (c) L74 - Taking not Taken.

      This typo has been corrected.

      (d) L129 - It says 'total volume', but this is corrected total volume?

      We have clarified in the figures legends that the “total brain volume” used in our analyses excludes the cerebellum and the myelencephalon, as specified in our image preprocessing protocol. This ensures consistency across individuals and institutions.

      (e) L138 - Suddenly mentions 'frozen condition' without any prior explanation - this needs explaining in the legend - also L144.

      We have added an explanation of the ‘frozen condition’ variable in in the relevant figure legend.

      (f) L166 - Results - it would be helpful to remind readers what Grade 1 signifies, ie intolerant species.

      We now include a brief reminder in the Results section that Grade 1 corresponds to socially intolerant species, to help readers unfamiliar with the classification (Lines 240-251).

      (g)Figure 4 - Provide the ns for each of the 4 grades to help appreciate the meaningfulness of the curves, etc.

      The number of subjects has been added to the Figure and a novel analysis helps in the revised ms help to appreciate the meaningfulness of some of these curves.

      (h) L235 - 'we had assumed that species of high social tolerance grade would have presented a smaller amygdala in size compared to grade 1'. But surely this is the exact opposite of what is predicted in Table 1 - ie, the authors did not predict this as I read the paper (Unless Table l is misleading/ambiguous and needs clarification).

      As discussed in our response to Reviewer #2 and #3, we have restructured both Table 1 and the Discussion to ensure consistency. We now explicitly state that the findings diverge from our initial inhibitory-control-based prediction and propose alternative interpretations based on sociocognitive demands.

      (i) L270 - 'This observation' which?? Specify.

      We have replaced ‘this observation’ with a precise reference to the observed developmental decrease in amygdala volume in tolerant species.

      (j) L327 - 'groundbreaking' is just hype given that there are so many caveats - I personally do not like the word - novel is good enough.

      We have replaced the word ‘groundbreaking’ with ‘novel’ to adopt a more measured and appropriate tone in the discussion.

      (3) I might add that I am happy with the ethics regarding this study. 

      Thanks, we are also happy that we were able to study macaque brains from different species using opportunistic samplings along with already available data. We are collectively making progress on this!

      (4) Finally, I should commend the authors on all the additional information that they provide re gender/age/species. Given that there are 2xs are many females as males, it would be good to know if this affects the findings. I am not a primatologist, so I don't know, for example, if the females in Grade 1 monkeys are just as intolerant as the males?

      We thank the reviewer for this thoughtful comment. We now explicitly mention the female-biased sex ratio in the Methods section and report in the Results (Figure 2, Figure 3) that sex was included as a covariate in our Bayesian models. While a small effect of sex was found for hippocampal volume, no effect was observed for the amygdala. Given the strong imbalance in our dataset (2:1 female-to-male ratio), we refrained from drawing any conclusion about sex-specific patterns, as these would require larger and more balanced samples. Although we did not test for sex-by-grade interactions, we agree that this question—especially regarding whether females and males express social style differences similarly across grades—represents an important direction for future comparative work.

      Reviewer #3 (Recommendations for the authors):

      I found the article well-written, and very easy to follow, so I have little ways to propose improvements to the article to the authors, besides addressing the various major points when it comes to interpretation of the data.

      One list I found myself wanting was in fact the list of the social tolerance grades, and the process by which they got selected into 3 main bags of socio-cognitive skills. Then it would become interesting to see how each of the 12 species compares within both the 18 grades (maybe once again out of the scope of this paper, there are likely reviews out there that already do that, but then the authors should explicitly mention so in the paper: X, 19XX have compared 15 out of 18 traits in YY number of macaque species); and within the 3 major subcognitive requirements delineated by the authors, maybe as an annex?

      We thank the reviewer for this thoughtful suggestion. In the revised manuscript, we now include a detailed table (Table 1) that lists the 18 behavioral traits derived from Thierry’s framework, along with their associated cognitive dimension and hypothesized neuroanatomical correlate. While we did not create a matrix mapping each of the 12 species across all 18 traits due to space and data availability constraints, we agree this is an important direction that should be tackled by primatologist. We now include a sentence (line 87-90) in the manuscript to guide readers to previous comparative reviews (e.g., Thierry, 2000; Thierry et al., 2004, 2021) that document the expression of these traits across macaque species. We also clarify that our three cognitive categories are conceptual tools intended to structure neuroanatomical predictions, and not formal clusters derived from quantitative analyses.

      In the annex, it would also be good to have a general summarizing excel/R file for the raw data, with important information like age, sex, and the relevant calculated volumes for each individual. The folders available following the links do not make it an easy task for a reader to find the raw data in one place.

      We fully agree with the reviewer on the importance of data accessibility. We have now uploaded an additional supplementary file in .csv format on our OSF repository, which includes individuallevel metadata for all 42 macaques: species, sex, age, social grade, total brain volume, amygdala volume, and hippocampus volume. The link to this file is now explicitly mentioned in the Data Availability section. We hope this will facilitate comparisons with other datasets and improve usability for the community. In addition, we provide in a supplementary table the raw data that were used for our Bayesian modelling (see below).

      The availability of the raw data would also clear up one issue, which I believe results from the modelling process: it looks odd on Figure 2, that volume ratios, defined as the given brain area volume divided by the total brain volume, give values above 1 (especially for the hippocampus). As such, the authors should either modify the legend or the figure. In general, it would be nicer to have the "real values" somewhere easily accessible, so that they can be compared more broadly with: 1) other macaques species to address questions relevant to the species; 2) other primates to address other questions that are surely going to arise from this very interesting work!

      We thank the reviewer for pointing this out. The ratio values in Figure 1 correspond to the proportion of the regional volume (amygdala or hippocampus) relative to the total brain volume, excluding the cerebellum and myelencephalon. As such, values above 0.01 (i.e., above 1% of the brain volume) are expected for these structures and do not indicate an error. We have updated the figure legend to clarify this point explicitly. In addition, we have now made a cleaned .csv file available via OSF, containing all raw volumetric data and metadata in a format that facilitates cross-species or cross-study comparisons. This replaces the previous folder-based structure, which may have been less accessible.

      Typos:

      L233: delete 'in'

      L430: insert space in 'NMT template(Jung et al., 2021).'

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Recommendations for the authors):

      (1) My primary concern is that in some of the studies, there are not enough data points to be totally convincing. This is particularly apparent in the low z-force condition of Figure 1C.

      We agree that adequate sampling is essential for drawing robust conclusions. To address this concern, we performed a post hoc sensitivity analysis to assess the statistical power of our dataset. Given our sample sizes (N = 85 and 45) and observed variability, the experiment had 80% power (α = 0.05) to detect a difference in stall force of approximately 0.36 pN (Cohen’s d ≈ 0.38). The actual difference observed between conditions was 0.25 pN (d ≈ 0.26), which lies below the minimum detectable effect size. Thus, the non-significant result (p = 0.16) likely reflects that any true difference, if present, is smaller than the experimental sensitivity, rather than a lack of sufficient sampling.

      Importantly, both measured stall forces fall within the reported range for kinesin-1 in the literature, supporting that the dataset is representative and the measurements are reliable.

      (2) I'm also concerned about Figure 2B. Does each data point in the three graphs represent only a single event? If so, this should probably be repeated several more times to ensure that the data are robust.

      Each data point shown corresponds to the average of many processive runs, ranging from 32 to 167. This has been updated in the figure caption accordingly.

      (3) Figure 3. I'm surprised that the authors could not obtain a higher occupancy of the multivalent DNA tether with kinesin motors. They were adding up to a 30X higher concentration of kinesin, but still did not achieve stoichiometric labeling. The reasons for this should be discussed. This makes interpretation of the mechanical data much tougher. For instance, only 6-7% of the beads would be driven by three kinesins. Unless the movement of hundreds of beads were studied, I think it would be difficult to draw any meaningful insight, since most of the events would be reflective of beads with only one or sometimes two kinesins bound. I think more discussion is required to describe how these data were treated.

      The mass-photometry data in Figure 3B were acquired in the presence of a 3-fold molar excess of kinesin (Supplemental Figure 4) relative to the DNA chassis. In comparison, optical trapping studies were performed at a 10-20-fold molar excess of kinesin, resulting in a substantially higher percentage of chassis with multiple motors. The reason why we had to perform mass photometry measurements at lower molar excess than the optical trap is that at higher kinesin concentrations, the “kinesin-only” peak dominated and obscured 2- or 3-kinesin-bound species, preventing reliable fitting of the mass photometry data. 

      We have now used the mass photometry measurements to extrapolate occupancies under trapping conditions. We estimate 76-93% of 2-motor chassis are bound to two kinesins and ~70% of 3-motor chassis are bound to three kinesins under our trapping conditions. Moreover, the mean forces in Figures 3C–D exceed those expected for a single kinesin, consistent with occupancy substantially greater than one motor per chassis.

      We wrote: “To estimate the percentage of chassis with two and three motors bound, we performed mass photometry measurements at a 3-fold molar excess of kinesin to the chassis, as higher ratios would obscure the distinction of complexes from the kinesin-only population. Assuming there is no cooperativity among the binding sites, we modeled motor occupancy using a Binomial distribution (Figure 3_figure supplement 2). We observed 17-29% of particles corresponded to the two-motor species on the 2-motor chassis in mass photometry, indicating that 45-78% of the 2-motor chassis was bound to two kinesins. Similarly, 15% and 40% of the 3motor chassis were bound to two and three kinesins, respectively.  

      In optical trapping assays, we used 10-fold and 20-fold molar excess of kinesin for 2-motor and 3-motor chassis, respectively, to substantially increase the percentage of the chassis carried by multiple kinesins. Under these conditions, we estimate 76-93% of the 2-motor chassis were bound to two kinesins, and 30% and 70% of 3-motor chassis were bound to two and three kinesins, respectively.”

      “Multi-motor trapping assays were performed similarly using 10x and 20x kinesin for 2- and 3motor chassis, respectively. To estimate the percentage of chassis with multiple motors, we used the probability of kinesin binding to a site on a chassis from mass photometry in 3x excess condition to compute an effective dissociation constant where r is the molar ratio of kinesin to chassis. Single-site occupancy at higher molar excesses of kinesin was calculated using this parameter. ”

      We also added Figure 3_figure supplement 2 to explain our Binomial model.

      (4) Page 5, 1st paragraph. Here, the authors are comparing time constants from stall experiments to data obtained with dynein from Ezber et al. This study used the traditional "one bead" trapping approach with dynein bound directly to the bead under conditions where it would experience high z-forces. Thus, the comparison between the behavior of kinesin at low z-forces is not necessarily appropriate. Has anyone studied dynein's mechanics under low z-force regimes?

      We thank the reviewer for catching a citation error. The text has been corrected to reference Elshenawy et al. 2020, which reported stall time constants for mammalian dynein. 

      To our knowledge, dynein’s mechanics under explicitly low z-force conditions have not yet been reported; however, given the more robust stalling behavior of dynein and greater collective force generation, the cited paper was chosen to compare low z-force kinesin to a motor that appears comparatively unencumbered by z-forces. Our study adds to growing evidence that high z-forces disproportionately limit kinesin performance. 

      For clarification, we modified that sentence as follows: “These time constants are comparable to those reported for minus-end-directed dynein under high z-forces”.

      Reviewer #2 (Recommendations for the authors):

      (1) P3 pp2, a DNA tensiometer cannot control the force, but it can measure it; get the distance between the two ends of the tensiometer, and apply WLC.

      The text has been updated to more accurately reflect the differences between optical trapping and kinesin motility against a DNA tensiometer with a fixed lattice position.

      (2) Fig. 2b, SEM is a poor estimate or error for exponentially distributed run lengths. Other methods, like bootstrapping an exponential distribution fit, may provide a more realistic estimate.

      Run lengths were plotted as an inverse cumulative distribution function and fitted to a single exponential decay (Supplementary Figure S3). The plotted value represents the fitted decay constant (characteristic run length) ± SE (standard error of the fit), not the arithmetic mean ± SEM. Velocity values are reported as mean ± SEM. Detachment rate was computed as velocity divided by run length, except at 6 and 10 pN hindering loads, where minimal forward displacement necessitated fitting run-time decays directly. In those cases, the plotted detachment rate equals the inverse of the fitted time constant. The figure caption has been updated accordingly.

      (3) Kinesin-1 is covalently bound to a DNA oligo, which then attaches to the DNA chassis by hybridization. This oligo is 21 nt with a relatively low GC%. At what force does this oligo unhybridize? Can the authors verify that their stall force measurements are not cut short by the oligo detaching from the chassis?

      The 21-nt attachment oligo (38 % GC) is predicted to have ΔG<sub>37C</sub> ≈-25 kcal/mole or approximately 42 kT. If we assume this is the approximate amount of work required to unhybridize the oligo, we would expect the rupture force to be >15 pN. This significantly exceeds the stall force of a single kinesin. Since the stalling events rarely exceed a few seconds, it is unlikely that our oligos quickly detach from the chassis under such low forces.  

      Furthermore, optical trapping experiments are tuned such that no more than 30% of beads display motion within several minutes after they are brought near microtubules. After stalling events, the motor dissociates from the MT, and the bead snaps back to the trap center. Most beads robustly reengage with the microtubule, typically within 10 s, suggesting that the same motor chassis reengages with the microtubule after microtubule detachment. Successive runs of the same bead typically have similar stall forces, suggesting that the motors do not disengage from the chassis under resistive forces exerted by the trap.

      (4) Figure 1, a justification or explanation should be provided for why events lower than 1.5 pN were excluded. It appears arbitrary.

      Single-motor stall-force measurements used a trap stiffness of 0.08–0.10 pN/nm. At this stiffness, a 1.5 pN force corresponds to 15–19 nm bead displacement, roughly two kinesin steps, and events below this threshold could not be reliably distinguished from Brownian noise. For this reason, forces < 1.5 pN were excluded.

      In Methods, we wrote “Only peak forces above 1.5 pN (corresponding to a 15-19 nm bead displacement) were analyzed to clearly distinguish runs from the tracking noise.”

      (5) Figure 2b, is the difference in velocity statistically significant?

      The difference in velocity is statistically significant for most conditions. We did not compare velocities for -10 and -6 pN as these conditions resulted in little forward displacement. However, the p-values for all of the other conditions are -4 pN: 0.0026, -2 pN: 0.0001, -1 pN: 0.0446, +0.5 pN: 0.3148, +2 pN: 0.0001, +3 pN: 0.1191, +4 pN: 0.0004.

      (6) The number of measurements for each experimental datapoint in the corresponding figure caption should be provided. SEM is used without, but N is not reported in the caption.

      Figure captions have now been updated to report the number of trajectories (N) for each data point.

      Reviewer #3 (Recommendations for the authors):  

      (1) The method of DNA-tethered motor trapping to enable low z-force is not entirely novel, but adapted from Urbanska (2021) for use in conventional optical trapping laboratories without reliance on microfluidics. However, I appreciate that they have fully established it here to share with the community. The authors could strengthen their methods section by being transparent about protein weight, protein labelling, and DNA ladders shown in the supplementary information. What organism is the protein from? Presumably human, but this should be specified in the methods. While the figures show beautiful data and exemplary traces, the total number of molecules analysed or events is not consistently reported. Overall, certain methodological details should be made sufficient for reproducibility.

      We appreciate the reviewer’s attention to methodological clarity. The constructs used are indeed human kinesin-1, KIF5B. The Methods now specify protein origin, molecular weights, and labeling details, and all figure captions report the number of trajectories analyzed to ensure reproducibility.

      (2) The major limitation the study presents is overarching generalisability, starting with the title. I recommend that the title be specific to kinesin-1. 

      The title has been revised to specify kinesin-1. 

      The study uses two constructs: a truncated K560 for conventional high-force assays, and full-length Kif5b for the low z-force method. However, for the multi-motor assay, the authors use K560 with the rationale of preventing autoinhibition due to binding with DNA, but that would also have limited characterisation in the single-molecule assay. Overall, the data generated are clear, high-quality, and exciting in the low z-force conditions. But why have they not compared or validated their findings with the truncated construct K560? This is especially important in the force-feedback experiments and in comparison with Andreasson et al. and Carter et al., who use Drosophila kinesin-1. Could kinesin-1 across organisms exhibit different force-detachment kinetics? It is quite possible. 

      Construct choice was guided by physiological relevance and considerations of autoinhibition: K560 was used for high z-force single-motor assays. The results of these assays are consistent with conventional bead assays performed by Andreasson et al. and Carter et al. using kinesin from a different organism. Therefore, we do not believe there are major differences between force properties of Drosophila and human kinesin-1.

      For low z-force assays, we used full-length KIF5B, which has nearly identical velocity and stall force to K560 in standard bead assays. We used this construct for low z force assays because it has a longer and more flexible stalk than K560 and better represents the force behavior of kinesin under physiological conditions. We then used constitutively-active K560 motors for multi-motor experiments to avoid potential complications from autoinhibition of full-length kinesin.

      Similarly, the authors test backward slipping of Kif5b and K560 and measure dwell times in multi-motor assays. Why not detail the backward slippage kinetics of Kif5b and any step-size impact under low z-forces? For instance, with the traces they already have, the authors could determine slip times, distances, and frequency in horizontal force experiments. Overall, the manuscript could be strengthened by analysing both constructs more fully.

      Slip or backstep analyses were not performed on single-motor data because such events were rare; kinesin typically detached rather than slipped. In contrast, multi-motor assays exhibited frequent slip events corresponding to the detachment of individual motors, which were analyzed in detail.

      We wrote “In comparison, slipping events were rarely observed in beads driven by a single motor, suggesting that kinesin typically detaches rather than slipping back on the microtubule under hindering loads.”

      Appraisal and impact:

      This study contributes to important and debated evidence on kinesin-1 force-detachment kinetics. The authors conclude that kinesin-1 exhibits a slip-bond interaction with the microtubule under increasing forces, while other recent studies (Noell et al. and Kuo et al.), which also use low z-force setups, conclude catch-bond behaviour under hindering loads. I find the results not fully aligned with their interpretation. The first comparison of low zforces in their setup with Noell et al. (2024), based on stall times, does not hold, because it is an apples-to-oranges comparison. Their data show a stall time constant of 2.52 s, which is comparable to the 3 s reported by Noell et al., but the comparison is made with a weighted average of 1.49 s. The authors do report that detachment rates are lower in low z-force conditions under unloaded scenarios. So, to completely rule out catch-bond-like behaviour is unfair. That said, their data quality is good and does show that higher hindering forces lead to higher detachment rates. However, on closer inspection, the range of 0-5 pN shows either a decrease or no change in detachment rate, which suggests that under a hindering force threshold, catch-bond-like or ideal-bond-like behaviour is possible, followed by slipbond behaviour, which is amazing resolution. Under assisting loads, the slip-bond character is consistent, as expected. Overall, the study contributes to an important discussion in the biophysical community and is needed, but requires cautious framing, particularly without evidence of motor trapping in a high microtubule-affinity state rather than genuine bond strengthening.

      We are not completely ruling out the catch bond behavior in our manuscript. As the reviewer pointed out, our results are consistent with the asymmetric slip bond model, whereas DNA tensiometer assays are more consistent with the catch bond behavior. The advantage of our approach is the capability to directly control the magnitude and direction of load exerted on the motor in the horizontal axis and measure the rate at which the motor detaches from the microtubule as it walks under constant load. In comparison, DNA tensiometer assays cannot control the force, but measure the time it takes the motor to fall off from the microtubule after a brief stall. The extension of the DNA tether is used to estimate the force exerted on the motor during a stall in those assays. The slight disadvantage of our method is the presence of low zforces, whereas DNA tensiometer assays are expected to have little to no z-force. We wrote that the discrepancy between our results can be attributed to the presence of low z forces in our DNA tethered trapping assembly, which may result in a higher-than-normal detachment rate under high hindering loads, thereby resulting in less asymmetry in the force detachment kinetics. We also added that this discrepancy can be addressed by future studies that directly control and measure horizontal force and measure the motor detachment rate in the absence of z forces. Optical trapping assays with small nanoparticles (Sudhakar et al. Science 2021) may be well suited to conclusively reveal the bond characteristics of kinesin under hindering loads.

      Reviewing Editor Comments:

      The reviewers are in agreement with the importance of the findings and the quality of the results. The use of the DNA tether reduces the z-force on the motor and provides biologically relevant insight into the behavior of the motor under load. The reviewers' suggestions are constructive and focus on bolstering some of the data points and clarifying some of the methodological approaches. My major suggestion would be to clarify the rationale for concluding that kinesin-1 exhibits slip-bond behavior with increasing force in light of the work of Noell (10.1101/2024.12.03.626575) and Kuo et al (2022 10.1038/s41467022-31069-x), both of which take advantage of DNA tethers.

      Please see our response to the previous comment. In the revised manuscript, we first clarified that our results are in agreement with previous theoretical (Khataee & Howard, 2019) and experimental studies (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020) that kinesin exhibits slower detachment under hindering load. This asymmetry became clear when the z-force was reduced or eliminated. 

      We clarified the differences between our results and DNA tensiometer assays and provided a potential explanation for these discrepancies. We also proposed that future studies might be required to fully distinguish between asymmetric slip, ideal, or catch bonding of kinesin under hindering loads.

      We wrote:

      “Our results agree with the theoretical prediction that kinesin exhibits higher asymmetry in force-detachment kinetics without z-forces (Khataee & Howard, 2019), and are consistent with optical trapping and DNA tensiometer assays that reported more persistent stalling of kinesin in the absence of z-forces (Kuo et al., 2022; Noell et al., 2024; Pyrpassopoulos et al., 2020).

      Force-detachment kinetics of protein-protein interactions have been modeled as either a slip, ideal, or catch bond, which exhibit an increase, no change, or a decrease in detachment rate, respectively, under increasing force (Thomas et al., 2008). Slip bonds are most commonly observed in biomolecules, but studies on cell adhesion proteins reported a catch bond behavior (Marshall et al., 2003). Although previous trapping studies of kinesin reported a slip bond behavior (Andreasson et al., 2015; Carter & Cross, 2005), recent DNA tensiometer studies that eliminated the z-force showed that the detachment rate of the motor under hindering forces is lower than that of an unloaded motor walking on the microtubule (Kuo et al., 2022; Noell et al., 2024), consistent with the catch bond behavior. Unlike these reports, we observed that the stall duration of kinesin is shorter than the motor run time under unloaded conditions, and the detachment rate of kinesin increases with the magnitude of the hindering force. Therefore, our results are more consistent with the asymmetric slip bond behavior. The difference between our results and the DNA tensiometer assays (Kuo et al., 2022; Noell et al., 2024) can be attributed to the presence of low z-forces in our DNA-tethered optical trapping assays, which may increase the detachment rate under high hindering forces. Future studies that could directly control hindering forces and measure the motor detachment rate in the absence of z-forces would be required to conclusively reveal the bond characteristics of kinesin under hindering loads.”

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. While the experimental dataset is unique and the coupled experimental and computational analyses comprehensive, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

      We thank the editor and reviewers for the thoughtful and constructive comments, which helped us substantially improve the manuscript. In this revised version, we have made the following key changes:

      - Directly presented the differential effect of microgravity in different movement directions, showing its quantitative match with model predictions.

      - Showed that changing cost function with the idea of conservative strategy is not a viable alternative.

      - Showed our model predictions remain largely the same after adding Coriolis and centripetal torques.

      - Discussed alternative explanations including neuromuscular deconditioning, friction, body stability, etc.

      - Detailed the model description and moved it to the main text, as suggested.

      Our point-to-point response is numbered to facilitate cross-referencing.

      We believe the revisions and the responses adequately addresses the reviewers’ concerns, and new analysis results strengthened our conclusion that mass underestimation is the major contributor to movement slowing in microgravity.

      Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slow down implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      Response (1): Thank you for raising this point. The basic premise of this concern is that changing the cost function for implementing strategic slowing can reproduce our empirical findings, thus the alternative hypothesis that we aimed to refute in the paper remain possible. At least, it could co-exist with our hypothesis of mass underestimation. In the revision, we show that changing the cost function only, as suggested here, cannot produce the behavioral patterns observed in microgravity.

      As suggested, we modified the relative weighting of the state and control cost matrices (i.e., Q and R in the cost function Eq 15) without considering mass underestimation. While this cost function scaling can decrease peak velocity – a hallmark of strategic slowing – it also inevitably leads to later peak timings. This is opposite to our robust findings: the taikonauts consistently “advanced” their peak velocity and peak acceleration in time. Note, these model simulation patterns have also been shown in Crevecoeur et al. (2010), the paper mentioned by the reviewer (see their Figure 7B).

      We systematically changed the ratio between the state and control weight matrices in the simulation, as suggested. We divided Q and multiplied R by the same factor α, the cost function scaling parameter α as defined in Crevecoeur et al. (2010). This adjustment models a shift in movement strategy in microgravity, and we tested a wide range of α to examine reasonable parameter space. Simulation results for α = 3 and α = 0.3 are shown in Figure 1—figure supplement 2 and Figure 1—figure supplement 3 respectively. As expected, with α = 3 (higher control effort penalty), peak velocities and accelerations are reduced, but their timing is delayed. Conversely, with α = 0.3, both peak amplitude and timing increase. Hence, changing the cost function to implement a conservative strategy cannot produce the kinematic pattern observed in microgravity, which is a combination of movement slowing and peak timing advance.

      Therefore, we conclude that a change in optimal control strategy alone is insufficient to explain our empirical findings. Logically speaking, we cannot refute the possibility of strategic slowing, which can still exist on top of the mass underestimation we proposed here. However, our data does not support its role in explaining the slowing of goal-directed hand reaching in microgravity. We have added these analyses to the Supplementary Materials and expanded the Discussion to address this point.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      Response (2): First, we have to clarify that our study does not aim to quantitatively fit observed hand trajectory. The two-link arm model simulates an ideal case of moving a point mass (effective mass) on a horizontal plane without friction (Todorov, 2004; 2005). In contrast, in the experiment, participants moved their hand on a tabletop without vertical arm support, so the movement was not strictly planar and was affected by friction. Thus, this kind of model can only illustrate qualitative differences between conditions, as in the majorities of similar modeling studies (e.g., Shadmehr et al., 2016). In our study, qualitative simulation means the model is intended to reproduce the directional differences between conditions—not exact numeric values—in key kinematic measures. Specifically, it should capture how the peak velocity and acceleration amplitudes and their timings differ between normal gravity and microgravity (particularly under the mass-underestimation assumption).

      Second, the reviewer rightfully pointed out that the directional effect is essential for our theorization of the importance of mass underestimation. However, the directional effect has two aspects, which were not clearly presented in our original manuscript. We now clarify both here and in the revision. The first aspect is that key kinematic variables (peak velocity/acceleration and their timing) are affected by movement direction, even before any potential microgravity effect. This is shown by the ranking order of directions for these variables (Figure 1C-H). The direction-dependent ranking, confirmed by pre-flight data, indicates that effective mass is a determining factor for reaching kinematics, which motivated us to study its role in eliciting movement slowing in space. This was what our original manuscript emphasized and clearly presented.

      The second aspect is that the hypothetical mass underestimation might also differentially affect movements in different directions. This was not clearly presented in the original manuscript. However, we would not expect a quantitative match between model predictions and empirical data, for the reasons mentioned above. We now show this directional ranking in microgravity-elicited kinematic changes in both model simulations and empirical data. The overall trend is that the microgravity effect indeed differs between directions, and the model predictions and the data showed a reasonable qualitative match (Author response image 1 below).

      Shown in Author response image 1, we found that for amplitude changes (Δ peak speed, Δ peak acceleration) both the model and the mean of empirical data show the same directional ordering (45° > 90° > 135°) in pre-in and post-in comparisons. For timing (Δ peak-speed time, Δ peak-acceleration time), which we consider the most diagnostic, the same directional ranking was observed. We only found one deviation, i.e., the predicted sign (earlier peaks) was confirmed at 90° and 135°, but not at 45°. As discussed in Response (6), the absence of timing advance at 45° may reflect limitations of our simplified model, which did not consider that the 45° direction is essentially a single-joint reach. Taken together, the directional pattern is largely consistent with the model predictions based on mass underestimation. The model successfully reproduces the directional ordering of amplitude measures -- peak velocity and peak acceleration. It also captures the sign of the timing changes in two out of the three directions. We added these new analysis results in the revision and expanded Discussion accordingly.

      The details of our analysis on directional effects: We compared the model predictions (Author response image 1, left) with the experimental data (Author response image 1, right) across the three tested directions (45°, 90°, 135°). In the experimental data panels, both Δ(pre-in) (solid bars) and Δ(post-in) (semi-transparent bars) with standard error are shown. The directional trends are remarkably similar between model prediction and actual data. The post-in comparison is less aligned with model prediction; we postulate that the incomplete after-flight recovery (i.e., post data had not returned to pre-flight baselines) might obscure the microgravity effect. Incomplete recovery has also been shown in our original manuscript: peak speed and peak acceleration did not fully recover in post-flight sessions when compared to pre-flight sessions. To further quantify the correspondence between model and data, we performed repeated-measures correlation (rm-corr) analyses. We found significant within-subject correlations for three of the four metrics. For pre–in, Δ peak speed time (r<sub>rm</sub> = 0.627, t(23) = 3.858, p < 0.001), Δ peak acceleration time (r<sub>rm</sub> = 0.591, t(23) = 3.513, p = 0.002), and Δ peak acceleration (r<sub>rm</sub> = 0.573, t(23) = 3.351, p = 0.003) were significant, whereas Δ peak speed was not (r<sub>rm</sub> = 0.334, t(23) = 1.696, p = 0.103). These results thus show that the directional effect, as predicted our model, is observed both before spaceflight and in spaceflight (the pre-in comparison).

      Author response image 1.

      Directional comparison between model predictions and experimental data across the three reach directions (45°, 90°, 135°). Left: model outputs. Right: experimental data shown as Δ relative to the in-flight session; solid bars = Δ(in − pre) and semi-transparent bars = Δ(in − post). Colors encode direction consistently across panels (e.g., 45° = darker hue, 90° = medium, 135° = lighter/orange). Panels (clockwise from top-left): Δ peak speed (cm/s), Δ peak speed time (ms), Δ peak acceleration time (ms), and Δ peak acceleration (cm/s²). Bars are group means; error bars denote standard error across participants.

      Citations:

      Todorov, E. (2004). Optimality principles in sensorimotor control. Nature Neuroscience, 7(9), 907.

      Todorov, E. (2005). Stochastic optimal control and estimation methods adapted to the noise characteristics of the sensorimotor system. Neural Computation, 17(5), 1084–1108.

      Shadmehr, R., Huang, H. J., & Ahmed, A. A. (2016). A Representation of Effort in Decision-Making and Motor Control. Current Biology: CB, 26(14), 1929–1934.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      We agree that both hypotheses have been put forward before, however they are competing hypotheses that have not been resolved. Furthermore, the mass underestimation hypothesis is a conjecture without any solid evidence; previous reports on mass underestimation of object cannot directly translate to underestimation of body. As detailed in our responses above, we have shown that a conservative strategy implemented via a different cost function cannot reproduce the key findings in our dataset, thereby supporting the alternative hypothesis of mass underestimation. Moreover, we found qualitative agreement between the model predictions and the experimental data in terms of directional effects, which further strengthens our interpretation.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      Response (3): We are happy to include exemplary speed and acceleration trajectories. Kinematic profiles from one example participant are shown in Figure 2—figure supplement 6.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      Response (4): Great suggestion. In the revision, we have moved the model into the main text and added further justification for using this simple model.

      We initially omitted the nonlinear Coriolis and centripetal terms in order to start with a minimal model. Importantly, excluding these terms does not affect the model’s main conclusions. In the revision we added simulations that explicitly include these terms. The full explanation and simulations are provided in the Supplementary Notes 2 (this time we have to put it into the Supplementary to reduce the texts devoted to the model). More explanations can also be found in our response to Reviewer 2 (response (6)). The results indicate that, although these velocity-dependent forces show some directional anisotropy, their contribution is substantially smaller relative to that of the included inertial component; specifically, they have only a negligible impact on the predicted peak amplitudes and peak times.

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

      Response (5): Thank you for your thoughtful comment. You are correct that the increase in the percentage of trials with submovements is modest, but a more critical change was observed in the timing between submovement peaks—specifically, the inter-peak interval (IPI). These intervals became longer during flight. Taken together with the percentage increase, the submovement changes significantly predicted the increase in movement duration, as shown by our linear mixed-effects model, which indicated that IPI increased.

      Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45° condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45°, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45° (beyond its low effective mass). In such planar movements, 45° often corresponds to a movement which is close to single-joint, whereas 90° and 135° involve multi-joint movements. If so, the increased proportion of submovements in 90° and 135° could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45° direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      Response (6): Thank you for raising these important questions. We unpacked the whole paragraph into two concerns: 1) the possibility that misestimation of Coriolis and centripetal torques might lead to corrective submovements, and 2) the weak effect in the 45° direction unexploited. These two concerns are valid but addressable, and they did not change our general conclusions based on our empirical findings (see Supplementary note 2. Coriolis and centripetal torques have minimal impact).

      Possible explanation for the 45° discrepancy

      We agree with the reviewer that the 45° direction likely involves more single-joint (elbow-dominant) movement, whereas the 90° and 135° directions require greater multi-joint (elbow + shoulder) coordination. This is particularly relevant when the workspace is near body midline (e.g., Haggard & Richardson, 1995), as the case in our experimental setup. To demonstrate this, we examined the curvature of the hand trajectories across directions. Using cumulative curvature (positive = counterclockwise), we obtained average values of 6.484° ± 0.841°, 1.539° ± 0.462°, and 2.819° ± 0.538° for the 45°, 90°, and 135° directions, respectively. The significantly larger curvature in the 45° condition suggests that these movements deviate more from a straight-line path, a hallmark of more elbow-dominant movements.

      Importantly, this curvature pattern was present in both the pre-flight and in-flight phases, indicating that it is a general movement characteristic rather than a microgravity-induced effect. Thus, the 45° reaches are less suitable for modeling with a simplified two-link arm model compared to the other two directions. We believe this is the main reason why the model predictions based on effective mass become less consistent with the empirical data for the 45° direction.

      We have now incorporated this new analysis in the Results and discussed it in the revised Discussion.

      Citation: Haggard, P., Hutchinson, K., & Stein, J. (1995). Patterns of coordinated multi-joint movement. Experimental Brain Research, 107(2), 254-266.

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      Response (7): Neuromuscular deconditioning is indeed a space effect; thanks for bringing this up as we omitted the discussion of this confounds in our original manuscript. Prolonged stay in microgravity can lead to a reduction of muscle strength, but this is mostly limited to lower limb. For example, a recent well-designed large-sample study have shown that while lower leg muscle showed significant strength reductions, no changes in mean upper body strength was found (Scott et al., 2023), consistent with previous propositions that muscle weakness is less for upper-limb muscles than for postural and lower-limb muscles (Tesch et al., 2005). Furthermore, the muscle weakness is unlikely to play a major role here since our reaching task involves small movements (~12cm) with joint torques of a magnitude of ~2N·m. Of course, we cannot completely rule out the contribution of muscle weakness; we can only postulate, based on the task itself (12 cm reaching) and systematic microgravity effect (the increase in submovements, the increase in the inter-submovements intervals, and their significant prediction on movement slowing), that muscle weakness is an unlikely major contributor for the movement slowing.

      The reviewer suggests that poor coordination in microgravity might contribute to slowing down + more submovements. This is also a possibility, but we did not find evidence to support it. First, there is no clear evidence or reports about poor coordination for simple upper-limb movements like reaching investigated here. Note that reaching or aiming movement is one of the most studied tasks among astronauts. Second, we further analyzed our reaching trajectories and found no sign of curvature increase, a hallmark of poor coordination of Coriolis/centripetal torques, in our large collection of reaching movements. We probably have the largest dataset of reaching movements collected in microgravity thus far, given that we had 12 taikonauts and each of them performed about 480 to 840 reaching trials during their spaceflight. We believe the probability of Type II error is quite low here.

      Citation: Tesch, P. A., Berg, H. E., Bring, D., Evans, H. J., & LeBlanc, A. D. (2005). Effects of 17-day spaceflight on knee extensor muscle function and size. European journal of applied physiology, 93(4), 463-468.

      Scott J, Feiveson A, English K, et al. Effects of exercise countermeasures on multisystem function in long duration spaceflight astronauts. npj Microgravity. 2023;9(11).

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      Response (8): We thank the reviewer for raising these important and technically insightful points regarding our modeling framework. We first clarify the structure of the model and key assumptions, and then address the specific questions in points (a)–(c) below.

      We used Todorov’s (2005) stochastic optimal control method to compute a finite-horizon LQG policy under sensory noise and signal-dependent motor noise (state noise set to zero). The cost function is: (see details in updated Methods). The resulting time-varying gains {L<sub>k</sub>, K<sub>k</sub>} correspond to the feedforward mapping and the feedback correction gain, respectively. The control law can be expressed as:

      where u<sub>k</sub> is the control input, is the nominal planned state, is the estimated state, L<sub>k</sub> is the feedforward (nominal) control associated with the planned trajectory, and K<sub>k</sub> is the time-varying feedback gain that corrects deviations from the plan.

      To define the motor plan for comparison with behavior, we simulate the deterministic open-loop

      trajectory by turning off noise and disabling feedback corrections, i.e., . In this framework, “feedforward” refers to this nominal motor plan. Thus, sensory and signal-dependent noise influence the computed policy (via the gains), but are not injected when generating the nominal trajectory. This mirrors the minimum-jerk practice used to obtain nominal kinematics in prior utility-based work (Shadmehr, 2016), while optimal control provides a more physiologically grounded nominal plan. In the revision, we have updated the equations, provided more modeling details, and moved the model description to the main text to reduce possible confusions.

      In the implementation of the “mass underestimation” condition, the mass used to compute the policy is the underestimated mass (), whereas the actual mass is used when simulating the feedforward trajectories. Corrective submovements are analyzed separately and are not required for the planning-deficit findings reported here.

      Answers of the three specific questions:

      a) We mistakenly wrote a continuous-time infinite-horizon cost function in our original manuscript, whereas our controller is actually implemented as a discrete-time finite-horizon LQG with a terminal cost, over a horizon set by the utility-based optimal movement duration T<sub>opt</sub>. The underestimated mass is used in both the utility model (to determine T<sub>opt</sub>) and in the control computation (i.e., internal model), while the true mass is used when simulating the movement. This mismatch captures the central idea of feedforward planning based on an incorrect internal model.

      b) As described, our model includes signal-dependent motor noise and sensory noise, following Todorov (2005). We also evaluated whether increased noise levels in microgravity could account for the observed behavioral changes. Simulation results showed that increasing either source of noise did not alter the main conclusions or reverse the trends in our key metrics. Moreover, our experimental data showed no significant increase in endpoint variability in microgravity (see analyses and results in Figure 2—figure supplement 3 & 4), making it unlikely that increased sensorimotor noise alone accounts for the observed slowing and submovement changes.

      c) In our framework, the time-varying gains {L<sub>K</sub>,K<sub>K</sub>}define the feedforward and feedback components of the control policy. While both gains are computed based on a stochastic optimal control formulation (including noise), for comparison with behavior we simulate only the nominal feedforward plan, by turning off both noise and feedback: . This defines a deterministic open-loop trajectory, which we use to capture planning-level effects such as peak timing shifts under mass underestimation. Feedback corrections via gains exist in the full model but are not involved in these specific analyses. We clarified this modeling choice and its behavioral relevance in the revised text.

      We have updated the equations and moved the model description into the main text in the revised manuscript to avoid confusion.

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      Response (9): Thanks for highlighting the brevity of movements in our experiment. Our intention in emphasizing fast movements is to rigorously test whether movement is indeed slowed down in microgravity. The observed prolonged movement duration clearly shows that microgravity affects people’s movement duration, even when they are pushed to move fast. The second reason for using fast movement is to highlight that feedforward control is affected in microgravity. Mass underestimation specifically affects feedforward control in the first place, shown by the microgravity-related changes in peak velocity/acceleration. Slow movement would inevitably have online corrections that might obscure the effect of mass underestimation. Note that movement slowing is not only observed in our speed-emphasized reaching task, but also in whole-arm pointing in other astronauts’ studies (Berger, 1997; Sangals, 1999), which have been quoted in our paper. We thus believe these findings are generalizable.

      Regarding the consistency of instructions: all our experiments conducted in the Tiangong space station were monitored in real time by experimenters in the control center located in Beijing. The task instructions were presented on the initial display of the data acquisition application and ample reading time was allowed. All the pre-, in-, and post-flight test sessions were administered by the same group of personnel with the same instruction. It is common that astronauts serve both as participants and experimenters at the same time. And, they were well trained for this type of role on the ground. Note that we had multiple pre-flight test sessions to familiarize them with the task. All these rigorous measures were in place to obtain high-quality data. In the revision, we included these experimental details for readers that are not familiar with space studies, and provided the rationales for emphasizing fast movements.

      Citations:

      Berger, M., Mescheriakov, S., Molokanova, E., Lechner-Steinleitner, S., Seguer, N., & Kozlovskaya, I. (1997). Pointing arm movements in short- and long-term spaceflights. Aviation, Space, and Environmental Medicine, 68(9), 781–787.

      Sangals, J., Heuer, H., Manzey, D., & Lorenz, B. (1999). Changed visuomotor transformations during and after prolonged microgravity. Experimental Brain Research. Experimentelle Hirnforschung. Experimentation Cerebrale, 129(3), 378–390.

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

      Response (10): We believe that the presence or absence of adaptation between our study and Gaveau et al.’s study cannot be simply attributed to single-joint versus multi-joint movements. Their adaptation concerned incorporating microgravity into movement control to minimize effort, whereas ours concerned accurately perceiving body mass. Gaveau et al.’s task involved large-amplitude vertical reaching, a scenario in which gravity strongly affects joint torques and movement execution. Thus, adaptation to microgravity can lead to better execution, providing a strong incentive for learning. By contrast, our task consisted of small-amplitude horizontal movements, where the gravitational influence on biomechanics is minimal.

      More importantly, we believe the lack of adaptation for mass underestimation is not totally surprising. When an inertial change is perceived (such as an extra weight attached to the forearm, as in previous motor adaptation studies), people can adapt their reaching within tens of trials. In that case, sensory cues are veridical, as they correctly signal the inertial perturbation. However, in microgravity, reduced gravitational pull and proprioceptive inputs constantly inform the controller that the body mass is less than its actual magnitude. In other words, sensory cues in space are misleading for estimating body mass. The resulting sensory bias prevents the sensorimotor system from adapting. Our initial explanation on this matter was too brief; we expanded it in the revised Discussion.

      Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Response (11): Indeed, the model predicts an almost equal separation between 45° and 90° and between 90° and 135°, while the data indicate that the spacing between 45° and 90° is much smaller than between 90° and 135°. We do not regard the divergence as evidence undermining our main conclusion since 1) the model is a simplification of the actual situation. For example, the model simulates an ideal case of moving a point mass (effective mass) without friction and without considering Coriolis and centripetal torques. 2) Our study does not make quantitative predictions of all the key kinematic measures; that will require model fitting, parameter estimation, and posture-constrained reaching experiments; instead, our study uses well-established (though simplified) models to qualitatively predict the overall behavioral pattern we would observe. For this purpose, our results are well in line with our expectations: though we did not find equal spacing between direction conditions, we do confirm that the key kinematic measures (Figure 2 and Figure 3 as questioned) show consistent directional trends between model predictions and empirical data. We added new analysis results on this matter: the directional effect we observed (how the key measures changed in microgravity across direction condition) is significantly correlated with our model predictions in most cases. Please check our detailed response (2) above. These results are also added in the revision.

      We also highlight in the revision that our modeling is not to quantitatively predict reaching behaviors in space, but to qualitatively prescribe that how mass underestimation, but not the conservative control strategy, can lead to divergent predictions about key kinematic measures of fast reaching.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al. showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      Response (12): We agree that muscle properties, tonic excitation level, proprioception-mediated reflexes all contribute to reaching control. Fisk et al. (1993) study indeed showed that arm movement kinematics change, possibly owing to lower muscle tone and/or damping. However, reduced muscle damping and reduced spindle activity are more likely to affect feedback-based movements. Like in Fisk et al.’s study, people performed continuous arm movements with eyes closed; thus their movements largely relied on proprioceptive control. Our major findings are about the feedforward control, i.e., the reduced and “advanced” peak velocity/acceleration in discrete and ballistic reaching movements. Note that the peak acceleration happens as early as approximately 90-100ms into the movements, clearly showing that feedforward control is affected -- a different effect from Fisk et al’s findings. It is unlikely that people “advanced” their peak velocity/acceleration because they feel the need for more later corrective movements. Thus, underestimation of body mass remains the most plausible explanation.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      Response (13): We agree that friction might play a role here, but normal interaction with a touch screen typically involves friction between 0.1N and 0.5N (e.g., Ayyildiz et al., 2018). We believe that the directional variation of the friction is even smaller than 0.1N. It is very small compared to the force used to accelerate the arm for the reaching movement (10N-15N). Thus, friction anisotropy is unlikely to explain our data. Indeed, our readers might have the same concern, we thus added some discussion about possible effect of friction.

      Citation: Ayyildiz M, Scaraggi M, Sirin O, Basdogan C, Persson BNJ. Contact mechanics between the human finger and a touchscreen under electroadhesion. Proc Natl Acad Sci U S A. 2018 Dec 11;115(50):12668-12673.

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      Response (14): Body stabilization is always a challenge for human movement studies in space. We minimized its potential confounding effects by using left-hand grasping and foot straps for postural support throughout the experiment. We think shoulder stability is an unlikely explanation because unexpected shoulder instability should not affect the feedforward (early) part of the ballistic reaching movement: the reduced peak acceleration and its early peak were observed at about 90-100ms after movement initiation. This effect is too early to be explained by an expected stability issue. This argument is now mentioned in the revised Discussion.

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

      Recommendations for the authors:

      Reviewing Editor Comments:

      General recommendation

      Overall, the reviewers agreed this is an interesting study with an original and strong approach. Nonetheless, there were significant weaknesses identified. The main criticism is that there is insufficient evidence for the claim that the movement slowing is due to mass underestimation, rather than other explanations for the increased feedback corrections. To bolster this claim, the reviewers have requested a deeper quantitative analysis of the directional effect and comparison to model predictions. They have also suggested that a 2-dof arm model could be used to predict how mass underestimation would influence multi-joint kinematics, and this should be compared to the data. Alternatively, or additionally, a control experiment could be performed (described in the reviews). We do realize that some of these options may not be feasible or practical. Ultimately, we leave it to you to determine how best to strengthen and solidify the argument for mass underestimation, rather than other causes.

      As an alternative approach, you could consider tempering the claim regarding mass underestimation and focus more on the result that slower movements in microgravity are not simply a feedforward, rescaling of the movement trajectories, but rather, have greater feedback corrections. In this case, the reviewers feel it would still be critical to explain and discuss potential reasons for the corrections beyond mass underestimation.

      We hope that these points are addressable, either with new analyses, experiments, or with a tempering of the claims. Addressing these points would help improve the eLife assessment.

      Reviewer #1 (Recommendations for the authors):

      (1) Move model descriptions to the main text to present modelling choices in more detail

      Response (15): Thank you for the suggestion. We have moved the model descriptions to the main text to present the modeling choices in more detail and to allow readers to better cross-reference the analyses.

      (2) Perform quantitative comparisons of the directional effect with the model's predictions, and add raw kinematic traces to illustrate the effect in more detail.

      Response (16): Thanks for the suggestion, we have added the raw kinematics figure from a representative participant and please refer to Response (2) above for the comparisons of directional effect.

      (3) Explore the effect of varying cost parameters in addition to mass estimation error to estimate the proportion of data explained by the underestimation hypothesis.

      Response (17): Thank you for the suggestion. This has already been done—please see Response (1) above.

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1) It must be justified early on why reaction times are being analyzed in this work. I understood later that it is to rule out any global slowing down of behavioral responses in microgravity.

      Response (18): Exactly, RT results are informative about the absence of a global slowing down. Contrary to the conservative-strategy hypothesis, taikonauts did not show generalized slowing; they actually had faster reaction times during spaceflight, incompatible with a generalized slowing strategy. Thanks for point out; we justified that early in the text.

      (2) Since the results are presented before the methods, I suggest stressing from the beginning that the reaching task is performed on a tablet and mentioning the instructions given to the participants, to improve the reading experience. The "beep" and "no beep" conditions also arise without obvious justification while reading the paper.

      Response (19): Great suggestions. We now give out some experimental details and rationales at the beginning of Results.

      (3) Figure 1C: The vel profiles are not returning to 0 at the end, why? Is it because the feedback gain is computed based on the underestimated mass or because a feedforward controller is applied here? Is it compatible with the experimental velocity traces?

      Response (20): Figure. 1C shows the forward simulation under the optimal control policy. In our LQG formulation the terminal velocity is softly penalized (finite weight) rather than hard-constrained to zero; with a fixed horizon° the optimal solution can therefore end with a small residual velocity.

      In the behavioral data, the hand does come to rest: this is achieved by corrective submovements during the homing phase.

      (4) Left-skewed -> I believe this is right-skewed since the peak velocity is earlier.

      Response (21): Yes, it should be right-skewed, thanks for point that out.

      (5) What was the acquisition frequency of the positional data points? (on the tablet).

      Response (22): The sampling frequency is 100 Hz. Thanks for pointing that out; we’ve added this information to the Methods.

      (6) Figure S1. The planned duration seems to be longer than in the experiment (it is more around 500 ms for the 135-degree direction in simulation versus less than 400 ms in the experiment). Why?

      Response (23): We apologize for a coding error that inadvertently multiplied the body-mass parameter by an extra factor, making the simulated mass too high. We have corrected the code, rerun the simulations, and updated Figures 1 and S1; all qualitative trends remain unchanged, and the revised movement durations (≈300–400 ms) are closer to the experimental values.

      (7) After Equation 13: "The control law is given by". This is not the control law, which should have a feedback form u=K*x in the LQ framework. This is just the dynamic equations for the auxiliary state and the force. Please double-check the model description.

      Response (24): Thank you for point this out. We have updated and refined all model equations and descriptions, and moved the model description from the Supplementary Materials to the main text; please see the revised manuscript.

      Reviewer #3 (Recommendations for the authors):

      (1) I have a concern about the interpretation of the anisotropic "equivalent mass". From my understanding, the equivalent mass would be what an external actor would feel as an equivalent inertia if pushing on the end effector from the outside. But the CNS does not push on the arm with a pure force generator acting at the hand to effectuate movement. It applies torque around the joints by applying forces across joints with muscles, causing the links of the arm to rotate around the joints. If the analysis is carried out in joint space, is the effective rotational inertia of the arm also anisotropic with respect to the direction of the movement of the hand? In other words, can the authors reassure me that the simulations are equivalent to an underestimation of the rotational inertia of the links when applied to the joints of the limb? It could be that these are mathematically the same; I have not delved into the mathematics to convince myself either way. But I would appreciate it if the authors could reassure me on this point.

      Response (25): Thank you for raising this point. In our work, “equivalent mass” denotes the operational-space inertia projected along the hand-movement direction u, computed as:

      This formulation describes the effective mass perceived at the end effector along a given direction, and is standard in operational-space control.

      Although the motor command can be coded as either torque/force in the CNS, the actual executions are equivalent no matter whether it is specified as endpoint forces or joint torques, since force and torque are related by . For small excursions as investigated here, this makes the directional anisotropy in endpoint inertia consistent with the anisotropy of the effective joint-space inertia required to produce the same endpoint motion. Conceptually, therefore, our “mass underestimation” manipulation in operational space corresponds to underestimating the required joint-space inertia mapped through the Jacobian. Since our behavioral data are hand positions, using the operational-space representation is the most direct and appropriate way for modeling.

      (2) I would also like to suggest one more level of analysis to test their hypothesis. The authors decomposed the movements into submovements and measured the prevalence of corrective submovements in weightlessness vs. normal gravity. The increase in corrective submovements is consistent with the hypothesis of a misestimation of limb mass, leading to an unexpectedly smaller displacement due to the initial feedforward command, leading to the need for corrections, leading to an increased overall movement duration. According to this hypothesis, however, the initial submovement, while resulting in a smaller than expected displacement, should have the same duration as the analogous movements performed on Earth. The authors could check this by analyzing the duration of the extracted initial submovements.

      Response (26): We appreciate the reviewer’s suggestion regarding the analysis of the initial submovement duration. In our decomposition framework, each submovement is modeled as a symmetric log-normal (bell-shaped) component, such that the time to peak speed is always half of the component duration. Thus, the initial submovement duration is directly reflected in the initial submovement peak-speed time already reported in our original manuscript (Figure. 5F).

      However, we respectfully disagree with the assumption that mass underestimation would necessarily yield the same submovement duration as on Earth. Under mass underestimation, the movement is effectively under-actuated, and the initial submovement can terminate prematurely, leading to a shorter duration. This is indeed what we observed in the data. Therefore, our reported metrics already address the reviewer’s proposal and support the conclusion that mass underestimation reduces the initial submovement duration in microgravity. Per your suggestion, we now added one more sentence to explain to the reader that initial submovement peak-speed time reflect the duration of the initial submovement.

      Some additional minor suggestions:

      (1) I believe that it is important to include the data from the control subjects, in some form, in the main article. Perhaps shading behind the main data from the taikonauts to show similarities or differences between groups. It is inconvenient to have to go to the supplementary material to compare the two groups, which is the main test of the experiment.

      Response (27): Thank you for the suggestion. For all the core performance variables, the control group showed flat patterns, with no changes across test sessions at all. Thus, including these figures (together with null statistical results) in the main text would obscure our central message, especially given the expanded length of the revised manuscript (we added model details and new analysis results). Instead, following eLife’s format, we have reorganized the Supplementary Material so that each experimental figure has a corresponding supplementary figure showing the control data. This way, readers can quickly locate the control results and directly compare them with the experimental data, while keeping the main text focused.

      (2) "Importantly, sensory estimate of bodily property in microgravity is biased but evaded from sensorimotor adaptation, calling for an extension of existing theories of motor learning." Perhaps "immune from" would be a better choice of words.

      Response (28): Thanks for the suggestion, we edited our text accordingly.

      (3) "First, typical reaching movement exhibits a symmetrical bell-shaped speed profile, which minimizes energy expenditure while maximizing accuracy according to optimal control principles (Todorov, 2004)." While Todorov's analysis is interesting and well accepted, it might be worthwhile citing the original source on the phenomenon of bell-shaped velocity profiles that minimize jerk (derivative of acceleration) and therefore, in some sense, maximize smoothness. Flash and Hogan, 1985.

      Response (29): Thanks for the suggestion, we added the citation of minimum jerk.

      (4) "Post-hoc analyses revealed slower reaction times for the 45° direction compared to both 90° (p < 0.001, d = 0.293) and 135° (p = 0.003, d = 0.284). Notably, reactions were faster during the in-flight phase compared to pre-flight (p = 0.037, d = 0.333), with no significant difference between in-flight and post-flight phases (p = 0.127)." What can one conclude from this?

      Response (30): Although these decreases reached statistical significance, their magnitudes were small. The parallel pattern across groups suggests the effect is not driven by microgravity, but is more plausibly a mild learning/practice effect. We now mentioned this in the Discussion.

      (5) "In line with predictions, peak acceleration appeared significantly earlier in the 45° direction than other directions (45° vs. 90°, p < 0.001, d = 0.304; 45° vs. 135°, p < 0.001, d = 0.271)." Which predictions? Because the effective mass is greater at 45º? Could you clarify the prediction?

      Response (31): We should be more specific here; thank you for raising this. The predictions are the ones about peak acceleration timing (shown in Fig. 1H). We now modified this sentence as:

      “In line with model predictions (Figure 1H), ….”.

      (6) Figure 2: Why do 45º movements have longer reaction times but shorter movement durations?

      Response (32): Appreciate your careful reading of the results. We believe this is possibly due to flexible motor control across conditions and trials, i.e., people tend to move faster when people react slower with longer reaction time. This has been reflected in across-direction comparisons (as spotted by the reviewer here), and it has also been shown within participant and across participants: For both groups, we found a significant negative correlation between movement duration (MD) and reaction time (RT), both across and within individuals (Figure 2—figure supplement 5). This finding indicates that participants moved faster when their RT was slower, and vice versa. This flexible motor adjustment, likely due to the task requirement for rapid movements, remained consistent during spaceflight.

    1. Author response:

      In response to the comments raised, we outline below the revisions we plan to strengthen the manuscript.

      First, we will expand the Introduction and Discussion sections to provide clearer comparison with prior experimental and computational studies of ectodomain tilting, MPER–TMD conformational heterogeneity, and membrane deformation, and to discuss how our simulations reproduce and extend these earlier observations.

      Second, we plan to add analyses that more directly assess the coupling between ectodomain and TMD motions. We will also revise the text to emphasize the limits imposed by sampling and model dependence and to discuss the potential benefits of enhanced sampling methods.

      Third, we will clarify the rationale for the chosen membrane composition and discuss how differences in lipid content between host plasma membranes and HIV virions may influence bilayer properties and Env dynamics.

      Fourth, we will supplement the Methods section to improve clarity and address issues of citation throughout the manuscript.

      Finally, we intend to deposit MD trajectories to a public research data repository to the extent permitted by available storage capacity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors present a novel usage of fluorescence lifetime imaging microscopy (FLIM) to measure NAD(P)H autofluorescence in the Drosophila brain, as a proxy for cellular metabolic/redox states. This new method relies on the fact that both NADH and NADPH are autofluorescent, with a different excitation lifetime depending on whether they are free (indicating glycolysis) or protein-bound (indicating oxidative phosphorylation). The authors successfully use this method in Drosophila to measure changes in metabolic activity across different areas of the fly brain, with a particular focus on the main center for associative memory: the mushroom body.

      Strengths:

      The authors have made a commendable effort to explain the technical aspects of the method in accessible language. This clarity will benefit both non-experts seeking to understand the methodology and researchers interested in applying FLIM to Drosophila in other contexts.

      Weaknesses:

      (1) Despite being statistically significant, the learning-induced change in f-free in α/β Kenyon cells is minimal (a decrease from 0.76 to 0.73, with a high variability). The authors should provide justification for why they believe this small effect represents a meaningful shift in neuronal metabolic state.

      We agree with the reviewer that the observed f_free shift averaged per individual, while statistically significant, is small. However, to our knowledge, this is the first study to investigate a physiological (i.e., not pharmacologically induced) variation in neuronal metabolism using FLIM. As such, there are no established expectations regarding the amplitude of the effect. In the revised manuscript, we have included an additional experiment involving the knockdown of ALAT in α/β Kenyon cells, which further supports our findings. We have also expanded the discussion to expose two potential reasons why this effect may appear modest.

      (2) The lack of experiments examining the effects of long-term memory (after spaced or massed conditioning) seems like a missed opportunity. Such experiments could likely reveal more drastic changes in the metabolic profiles of KCs, as a consequence of memory consolidation processes.

      We agree with the reviewer that investigating the effects of long-term memory on metabolism represent a valuable future path of investigation. An intrinsic caveat of autofluorescence measurement, however, is to identify the cellular origin of the observed changes. To this respect, long-term memory formation is not an ideal case study as its essential feature is expected to be a metabolic activation localized to Kenyon cells’ axons in the mushroom body vertical lobes (as shown in Comyn et al., 2024), where many different neuron subtypes send intricate processes. This is why we chose to first focus on middle-term memory, where changes at the level of the cell bodies could be expected from our previous work (Rabah et al., 2022). But our pioneer exploration of the applicability of NAD(P)H FLIM to brain metabolism monitoring in vivo now paves the way to extending it to the effect of other forms of memory.

      (3) The discussion is mostly just a summary of the findings. It would be useful if the authors could discuss potential future applications of their method and new research questions that it could help address.

      The discussion has been expanded by adding interpretations of the findings and remaining challenges.

      Reviewer #2 (Public review):

      This manuscript presents a compelling application of NAD(P)H fluorescence lifetime imaging (FLIM) to study metabolic activity in the Drosophila brain. The authors reveal regional differences in oxidative and glycolytic metabolism, with a particular focus on the mushroom body, a key structure involved in associative learning and memory. In particular, they identify metabolic shifts in α/β Kenyon cells following classical conditioning, consistent with their established role in energy-demanding middle- and long-term memories.

      These results highlight the potential of label-free FLIM for in-vivo neural circuit studies, providing a powerful complement to genetically encoded sensors. This study is well-conducted and employs rigorous analysis, including careful curve fitting and well-designed controls, to ensure the robustness of its findings. It should serve as a valuable technical reference for researchers interested in using FLIM to study neural metabolism in vivo. Overall, this work represents an important step in the application of FLIM to study the interactions between metabolic processes, neural activity, and cognitive function.

      Reviewer #3 (Public review):

      This study investigates the characteristics of the autofluorescence signal excited by 740 nm 2-photon excitation, in the range of 420-500 nm, across the Drosophila brain. The fluorescence lifetime (FL) appears bi-exponential, with a short 0.4 ns time constant followed by a longer decay. The lifetime decay and the resulting parameter fits vary across the brain. The resulting maps reveal anatomical landmarks, which simultaneous imaging of genetically encoded fluorescent proteins helps to identify. Past work has shown that the autofluorescence decay time course reflects the balance of the redox enzyme NAD(P)H vs. its protein-bound form. The ratio of free-to-bound NADPH is thought to indicate relative glycolysis vs. oxidative phosphorylation, and thus shifts in the free-to-bound ratio may indicate shifts in metabolic pathways. The basics of this measure have been demonstrated in other organisms, and this study is the first to use the FLIM module of the STELLARIS 8 FALCON microscope from Leica to measure autofluorescence lifetime in the brain of the fly. Methods include registering the brains of different flies to a common template and masking out anatomical regions of interest using fluorescence proteins.

      The analysis relies on fitting an FL decay model with two free parameters, f_free and t_bound. F_free is the fraction of the normalized curve contributed by a decaying exponential with a time constant of 0.4 ns, thought to represent the FL of free NADPH or NADH, which apparently cannot be distinguished. T_bound is the time constant of the second exponential, with scalar amplitude = (1-f_free). The T_bound fit is thought to represent the decay time constant of protein-bound NADPH but can differ depending on the protein. The study shows that across the brain, T_bound can range from 0 to >5 ns, whereas f_free can range from 0.5 to 0.9 (Figure 1a). These methods appear to be solid, the full range of fits are reported, including maximum likelihood quality parameters, and can be benchmarks for future studies.

      The authors measure the properties of NADPH-related autofluorescence of Kenyon Cells(KCs) of the fly mushroom body. The results from the three main figures are:

      (1) Somata and calyx of mushroom bodies have a longer average tau_bound than other regions (Figure 1e);

      (2) The f_free fit is higher for the calyx (input synapses) region than for KC somata (Figure 2b);

      (3) The average across flies of average f_free fits in alpha/beta KC somata decreases from 0.734 to 0.718. Based on the first two findings, an accurate title would be "Autofluorecense lifetime imaging reveals regional differences in NADPH state in Drosophila mushroom bodies."

      The third finding is the basis for the title of the paper and the support for this claim is unconvincing. First, the difference in alpha/beta f_free (p-value of 4.98E-2) is small compared to the measured difference in f_free between somas and calyces. It's smaller even than the difference in average soma f_free across datasets (Figure 2b vs c). The metric is also quite derived; first, the model is fit to each (binned) voxel, then the distribution across voxels is averaged and then averaged across flies. If the voxel distributions of f_free are similar to those shown in Supplementary Figure 2, then the actual f_free fits could range between 0.6-0.8. A more convincing statistical test might be to compare the distributions across voxels between alpha/beta vs alpha'/beta' vs. gamma KCs, perhaps with bootstrapping and including appropriate controls for multiple comparisons.

      The difference observed is indeed modest relative to the variability of f_free measurements in other contexts. The fact that the difference observed between the somata region and the calyx is larger is not necessarily surprising. Indeed, these areas have different anatomical compositions that may result in different basal metabolic profiles. This is suggested by Figure 1b which shows that the cortex and neuropile have different metabolic signatures. Differences in average f_free values in the somata region can indeed be observed between naive and conditioned flies. However, all comparisons in the article were performed between groups of flies imaged within the same experimental batches, ensuring that external factors were largely controlled for. This absence of control makes it difficult to extract meaningful information from the comparison between naive and conditioned flies.

      We agree with the reviewer that the choice of the metric was indeed not well justified in the first manuscript. In the new manuscript, we have tried to illustrate the reasons for this choice with the example of the comparison of f_free in alpha/beta neurons between unpaired and paired conditioning (Dataset 8). First, the idea of averaging across voxels is supported by the fact that the distributions of decay parameters within a single image are predominantly unimodal. Examples for Dataset 8 are now provided in the new Sup. Figure 14. Second, an interpretable comparison between multiple groups of distributions is, to our knowledge, not straightforward to implement. It is now discussed in Supplementary information. To measure interpretable differences in the shapes of the distributions we computed the first three moments of distributions of f_free for Dataset 8 and compared the values obtained between conditions (see Supplementary information and new Sup. Figure 15). Third, averaging across individuals allows to give each experimental subject the same weight in the comparisons.

      I recommend the authors address two concerns. First, what degree of fluctuation in autofluorescence decay can we expect over time, e.g. over circadian cycles? That would be helpful in evaluating the magnitude of changes following conditioning. And second, if the authors think that metabolism shifts to OXPHOS over glycolosis, are there further genetic manipulations they could make? They test LDH knockdown in gamma KCs, why not knock it down in alpha/beta neurons? The prediction might be that if it prevents the shift to OXPHOS, the shift in f_free distribution in alpha/beta KCs would be attenuated. The extensive library of genetic reagents is an advantage of working with flies, but it comes with a higher standard for corroborating claims.

      In the present study, we used control groups to account for broad fluctuations induced by external factors such as the circadian cycle. We agree with the reviewer that a detailed characterization of circadian variations in the decay parameters would be valuable for assessing the magnitude of conditioning-induced shifts. We have integrated this relevant suggestion in the Discussion. Conducting such an investigation lies unfortunately beyond the scope and means of the current project.

      In line with the suggestion of the reviewer, we have included a new experiment to test the influence of the knockdown of ALAT on the conditioning-induced shift measured in alpha/beta neurons. This choice is motivated in the new manuscript. The obtained result shows that no shift is detected in the mutant flies, in accordance with our hypothesis.

      FLIM as a method is not yet widely prevalent in fly neuroscience, but recent demonstrations of its potential are likely to increase its use. Future efforts will benefit from the description of the properties of the autofluorescence signal to evaluate how autofluorescence may impact measures of FL of genetically engineered indicators.

      Recommendations for the authors

      Reviewer #1 (Recommendations for the authors):

      (1) Y axes in Figures 1e, 2c, 3b,c are misleading. They must start at 0.

      Although we agree that making the Y axes start at 0 is preferable, in our case it makes it difficult to observe the dispersion of the data at the same time (your next suggestion). To make it clearer to the reader that the axes do not start at 0, a broken Y-axis is now displayed in every concerned figure.

      (2) These same plots should have individual data points represented, for increased clarity and transparency.

      Individual data points were added on all boxplots.

      Reviewer #2 (Recommendations for the authors):

      I am evaluating this paper as a fly neuroscientist with experience in neurophysiology, including calcium imaging. I have little experience with FLIM but anticipate its use growing as more microscopes and killer apps are developed. From this perspective, I value the opportunity to dig into FLIM and try to understand this autofluorescence signal. I think the effort to show each piece of the analysis pipeline is valuable. The figures are quite beautiful and easy to follow. My main suggestion is to consider moving some of the supplemental data to the main figures. eLife allows unlimited figures, moving key pieces of the pipeline to the main figures would make for smoother reading and emphasize the technical care taken in this study.

      We thank the reviewer for their feedback. Following their advice we have moved panels from the supplementary figures to the main text (see new Figure 2).

      Unfortunately, the scientific questions and biological data do not rise to the typical standard in the field to support the claims in the title, "In vivo autofluorescence lifetime imaging of the Drosophila brain captures metabolic shifts associated with memory formation". The authors also clearly state what the next steps are: "hypothesis-driven approaches that rely on metabolite-specific sensors" (Intro). The advantage of fly neuroscience is the extensive library of genetic reagents that enable perturbations. The key manipulation in this study is the electric shock conditioning paradigm that subtly shifts the distribution of a parameter fit to an exponential decay in the somas of alpha/beta KCs vs others. This feels like an initial finding that deserves follow-up; but is it a large enough result to motivate a future student to pick this project up? The larger effect appears to be the gradients in f_free across KCs overall (Figure 2b). How does this change with conditioning?

      We acknowledge that the observed metabolic shift is modest relative to the variability of f_free and agree that additional corroborating experiments would further strengthen this result. Nevertheless, we believe it remains a valid and valuable finding that will be of interest to researchers in the field. The reviewer is right in pointing out that the gradient across KCs is higher in magnitude, however, the fact that this technique can also report experience-dependent changes, in addition to innate heterogeneities across different cell types, is a major incentive for people who could be interested in applying NAD(P)H FLIM in the future. For this reason, we consider it appropriate to retain mention of the memory-induced shift in the title, while making it less assertive and adding a reference to the structural heterogeneities of f_free revealed in the study. We have also rephrased the abstract to adopt a more cautious tone and expanded the discussion to clarify why a low-magnitude shift in f_free can still carry biological significance in this context. Finally, we have added the results of a new set of data involving the knockdown of ALAT in Kenyon cells, to further support the relevance of our observation relative to memory formation, despite its small magnitude. We believe that these elements together form a good basis for future investigations and that the manuscript merits publication in its present form.

      Together, I would recommend reshaping the paper as a methods paper that asks the question, what are the spatial properties of NADPH FL across the brain? The importance of this question is clear in the context of other work on energy metabolism in the MBs. 2P FLIM will likely always have to account for autofluorescence, so this will be of interest. The careful technical work that is the strength of the manuscript could be featured, and whether conditioning shifts f_free could be a curio that might entice future work.

      By transferring panels of the supplementary figures to the main text (see new Figure 2) as suggested by Reviewer 2, we have reinforced the methodological part of the manuscript. For the reasons explained above, we however still mention the ‘biological’ findings in the title and abstract.

      Minor recommendations on science:

      Figure 2C. Plotting either individual data points or distributions would be more convincing.

      Individual data points were added on all boxplots.

      There are a few mentions of glia. What are the authors' expectations for metabolic pathways in glia vs. neurons? Are glia expected to use one more than the other? The work by Rabah suggests it should be different and perhaps complementary to neurons. Can a glial marker be used in addition to KC markers? This seems crucial to being able to distinguish metabolic changes in KC somata from those in glia.

      Drosophila cortex glia are thought to play a similar role as astrocytes in vertebrates (see Introduction). In that perspective, we expect cortex glia to display a higher level of glycolysis than neurons. The work by Rabah et al. is coherent with this hypothesis. Reviewer 2 is right in pointing out that using a glial marker would be interesting. However, current technical limitations make such experiments challenging. These limitations are now exposed in the discussion.

      The question of whether KC somata positions are stereotyped can probably be answered in other ways as well. For example, the KCs are in the FAFB connectomic data set and the hemibrain. How do the somata positions compare?

      The reviewer’s suggestion is indeed interesting. However, the FAFB and hemibrain connectomic datasets are based on only two individual flies, which probably limits their suitability for assessing the stereotypy of KC subtype distributions. In addition, aligning our data with the FAFB dataset would represent substantial additional work.

      The free parameter tau_bound is mysterious if it can be influenced by the identity of the protein. Are there candidate NADPH binding partners that have a spatial distribution in confocal images that could explain the difference between somas and calyx?

      There are indeed dozens of NADH- or NADPH-binding proteins. For this reason, in all studies implementing exponential fitting of metabolic FLIM data, tau_bound is considered a complex combination of the contributions from many different proteins. In addition, one should keep in mind that the number of cell types contributing to the autofluorescence signal in the mushroom body calyx (Kenyon cells, astrocyte-like and ensheathing glia, APL neurons, olfactory projection neurons, dopamine neurons) is much higher than in the somas (only Kenyon cells and cortex glia). This could also participate in the observed difference. Hence, focusing on intracellular heterogeneities of potential NAD(P)H binding partners seems premature at that stage.

      The phrase "noticeable but not statistically significant" is misleading.

      We agree with the reviewer and have removed “noticeable but” from the sentence in the new version of the manuscript.

      Minor recommendations on presentation:

      The Introduction can be streamlined.

      We agree that some parts of the Introduction can seem a bit long for experts of a particular field. However, we think that this level of detail makes the article easily accessible for neuroscientists working on Drosophila and other animal models but not necessarily with FLIM, as well as for experts in energy metabolism that may be familiar with FLIM but not with Drosophila neuroscience.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Wang et al. reports the potential involvement of an asymmetric neurocircuit in the sympathetic control of liver glucose metabolism.

      Strengths:

      The concept that the contralateral brain-liver neurocircuit preferentially regulates each liver lobe may be interesting.

      Weaknesses:

      However, the experimental evidence presented did not support the study's central conclusion.

      We sincerely thank the reviewer for recognizing the conceptual novelty of our work and for constructive comments aimed at enhancing its rigor and clarity. In response, we will carry out targeted experiments to address the points raised, including: (i) further characterization of LPGi projections to vagal and sympathetic circuits; (ii) evaluation of potential pancreatic involvement; and (ii) validation of the specificity of chemogenetic activation within the proposed circuit. We anticipate completing the revised version within 8 weeks.

      (1) Pseudorabies virus (PRV) tracing experiment:

      The liver not only possesses sympathetic innervations but also vagal sensory innervations. The experimental setup failed to distinguish whether the PRV-labeling of LPGi (Lateral Paragigantocellular Nucleus) is derived from sympathetic or vagal sensory inputs to the liver.

      Thank you for raising this important point. We fully agree that the liver receives both sympathetic and vagal sensory innervation, and we acknowledge that PRV-based tracing alone does not definitively distinguish between these two pathways. This represents a limitation of the original experimental design.

      Based on established anatomical literature as well as our experimental observations, vagal sensory neuron cell bodies reside in the nodose ganglion (NG), and their central projections terminate predominantly in the nucleus of the solitary tract (NTS) (Nature. 2023;623(7986):387-396; Curr Biol. 2020;30(20):3986-3998.e5.), which is located in the dorsomedial medulla. In contrast, the LPGi, together with other sympathetic-related nuclei, is predominantly distributed in the ventral medulla (Cell Metab. 2025;37(11):2264-2279.e10; Nat Commun. 2022;13(1):5079.).

      To directly assess the contribution of vagal sensory pathways, we will perform an additional PRV tracing experiment using two groups of mice: one with bilateral nodose ganglion (NG) removal and a sham-operated control group. Identical PRV injections will be delivered to the liver in both groups, and PRV labeling in the LPGi will be quantitatively compared. Preservation of LPGi labeling following NG ablation would indicate that PRV transmission occurs primarily via sympathetic, rather than vagal sensory, pathways. These data will be incorporated into the revised manuscript and are expected to be completed within 3 weeks.

      (2) Impact on pancreas:

      The celiac ganglia not only provide sympathetic innervations to the liver but also to the pancreas, the central endocrine organ for glucose metabolism. The chemogenetic manipulation of LPGi failed to consider a direct impact on the secretion of insulin and glucagon from the pancreas.

      Thank you for this important comment. We agree that the celiac ganglia (CG) provide sympathetic innervation not only to the liver but also to the pancreas, which plays a central role in glucose homeostasis through the secretion of both insulin and glucagon. Therefore, the potential pancreatic implications associated with LPGi chemogenetic manipulation worth careful consideration.

      To address this concern, we examined circulating glucagon levels following chemogenetic manipulation of the LPGi. As shown in the Supplementary Figure below, plasma glucagon (GCG) concentrations were not significantly altered at 30, 60, 90, or 120 minutes compared with control mice (n = 6), indicating that LPGi manipulation does not measurably affect glucagon secretion under our experimental conditions.

      We acknowledge that insulin secretion was not assessed in the study, which represents an important limitation given the pancreatic innervation of the CG. To further strengthen our interpretation, we are performing additional experiments in newly prepared mice to measure circulating insulin levels following LPGi manipulation. These data together with Author response image 1 below will be included in the revised manuscript upon completion.

      Author response image 1.

      Plasma concentrations of GCG in mice following LPGi GABAergic neurons activation.

      (3) Neuroanatomy of the brain-liver neurocircuit:<br /> The current study and its conclusion are based on a speculative brain-liver sympathetic circuit without the necessary anatomical information downstream of LPGi.

      Thank you for raising this important point. A clear anatomical definition of the downstream pathways linking the brain to the liver is essential for interpreting the proposed brain-liver sympathetic circuit.

      However, the present study (Figure 4A) provides direct anatomical evidence supporting the organization of the brain–liver sympathetic neurocircuit. These observations are consistent with our recent detailed characterization of the brain-liver sympathetic circuit published in Cell Metabolism (Cell Metab. 2025;37(11):2264–2279), LPGi GABAergic neurons inhibit GABAergic neurons in the caudal ventrolateral medulla (CVLM). Disinhibition of CVLM reduces GABAergic suppression of rostral ventrolateral medulla (RVLM) neurons, which are key excitatory drivers of sympathetic tone. RVLM neurons project to sympathetic preganglionic neurons in the sympathetic chain (Syc). These neurons synapse with postganglionic sympathetic neurons in ganglia such as the celiac-superior mesenteric ganglion (CG-SMG). Postganglionic sympathetic fibers then innervate the liver, releasing NE to activate hepatic β<sub>2</sub>-adrenergic receptors and stimulate HGP.

      Together, these data establish a coherent anatomical basis for the proposed brain-liver sympathetic pathway and clarify the downstream organization relevant to the functional experiments presented here.

      Author response image 2.

      Tracing scheme (Left) and whole-mount imaging (Right) of PRV-labeled brain-liver neurocircuit. Scale bars, 3,000 (whole mount) or 1,000 (optical sections) μm.

      (4) Local manipulation of the celiac ganglia:<br /> The left and right ganglia of mice are not separate from each other but rather anatomically connected. The claim that the local injection of AAV in the left or right ganglion without affecting the other side is against this basic anatomical feature.

      Thank you for raising this important anatomical point. We fully acknowledge that the left and right celiac ganglia (CG) in mice are interconnected, and that unilateral viral injection could theoretically affect the contralateral side. The celiac–superior mesenteric ganglion (CG-SMG) complex serves as a major sympathetic hub that regulates visceral organ functions. Recent transcriptomic, anatomical, and functional studies have revealed that the CG-SMG is not a homogeneous structure but is composed of molecularly and functionally distinct neuronal populations. These populations exhibit specialized projection patterns and regulate different aspects of gastrointestinal physiology, supporting a model of modular sympathetic control. (Nature. 2025 Jan;637(8047):895-902). Therefore, we were aware of this phenomenon during the initial stages of these experiments.

      To minimize unintended spread to the contralateral CG, we took two complementary approaches.

      First, we optimized the injection strategy by using an extremely small injection volume (100 nL per site), with a very slow infusion rate (50 nL/min), and fine glass micropipettes. With these refinements, contralateral viral spread was rarely observed.

      Second, and importantly, all animals included in the final analyses were subjected to post hoc anatomical verification. After completion of the experiments, CG were collected, sectioned, and examined for viral expression. As shown in Supplementary Figure 5F, only mice in which viral expression was strictly confined to the targeted CG, with no detectable infection in the contralateral ganglion, were included in the presented data.

      Together, these measures ensure that the reported effects are attributable to local manipulation of the intended CG. We will ensure that the Methods section more explicitly details these technical precautions and that the legend for Figure S5F clearly states its role in validating injection specificity.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Wang and colleagues aims to determine whether the left and right LPGi differentially regulate hepatic glucose metabolism and to reveal decussation of hepatic sympathetic nerves.

      The authors used tissue clearing to identify sympathetic fibers in the liver lobes, then injected PRV into the hepatic lobes. Five days post-injection, PRV-labeled neurons in the LPGi were identified. The results indicated contralateral dominance of premotor neurons and partial innervation of more than one lobe. Then the authors activated each side of the LPGi, resulting in a greater increase in blood glucose levels after right-sided activation than after left-sided activation, as well as changes in protein expression in the liver lobes. These data suggested modulation of HGP (hepatic glucose production) in a lobe-specific manner. Chemical denervation of a particular lobe did not affect glucose levels due to compensation by the other lobes. In addition, nerve bundles decussate in the hepatic portal region.

      We thank the reviewer for the thorough and constructive evaluation of our manuscript. In direct response, we will undertake comprehensive revisions to enhance the rigor and clarity of the study, including: (i) correcting ambiguous or misleading terminology pertaining to anatomical resolution and sympathetic circuit organization; (ii) expanding the Methods section with complete experimental details, improved image presentation, and explicit justification of our viral and genetic approaches; and (iii) strengthening data interpretation by addressing issues related to sparse PRV labeling, projection heterogeneity, and the functional implications of double-labeled neurons. All revisions are expected to be completed within 8 weeks.

      Strengths:

      The manuscript is timely and relevant. It is important to understand the sympathetic regulation of the liver and the contribution of each lobe to hepatic glucose production. The authors use state-of-the-art methodology.

      Weaknesses:

      (1) The wording/terminology used in the manuscript is misleading, and it is not used in the proper context. For instance, the goal of the study is "to investigate whether cerebral hemispheres differentially regulate hepatic glucose metabolism..." (see abstract); however, the authors focus on the brainstem (a single structure without hemispheres). Similarly, symmetric is not the best word for the projections.

      We thank the reviewer for raising these critical points regarding terminology and conceptual framing. We acknowledge that certain phrases in our original manuscript may have been overly broad or ambiguous, particularly in describing the scope of sympathetic heterogeneity and the specificity of neural projections. Due to practical constraints and the scope of our study, our investigation is focused on the brainstem, which represents the final common pathway for these lateralized commands. We acknowledge that terms referring to the cerebral hemispheres do not accurately describe our study.

      We are revising the manuscript to ensure accurate and consistent terminology and will submit the revised version with these corrections.

      (2) Sparse labeling of liver-related neurons was shown in the LPGi (Figure 1). It would be ideal to have lower magnification images to show the area. Higher quality images would be necessary, as it is difficult to identify brainstem areas. The low number of labeled neurons in the LPGi after five days of inoculation is surprising. Previous findings showed extensive labeling in the ventral brainstem at four days post-inoculation (Desmoulins et al., 2025). Unfortunately, it is not possible to compare the injection paradigm/methods because the PRV inoculation is missing from the methods section. If the PRV is different from the previously published viral tracers, time-dependent studies to determine the order of neurons and the time course of infection would be necessary.

      We sincerely thank the reviewer for these detailed and constructive comments regarding the PRV tracing experiments. We fully agree that careful presentation and interpretation of the anatomical data are essential for ensuring rigor and transparency. We address each point in detail below.

      (1) Image magnification and anatomical context of LPGi labeling

      We agree that the original images did not sufficiently convey the broader anatomical context of the LPGi. In the revised manuscript, we will replace the original panels in Figure 1 with new images that include lower-magnification overviews of the brainstem, alongside higher-magnification views of the LPGi. These images clearly delineate the LPGi with respect to established anatomical landmarks and atlas boundaries. Image contrast and resolution will also be optimized to allow unambiguous identification of PRV-labeled neurons and surrounding structures.

      (2) Sparse LPGi labeling at 5 days post-injection and methodological details

      We apologize for the omission of the detailed PRV injection protocol in the original Methods section. We deliberately used small-volume, focal injections (1 µL per liver lobe) to minimize viral spread and to restrict labeling to circuits specifically connected to the targeted hepatic region. Under these conditions, early-stage or intermediate-order upstream nuclei such as the LPGi are expected to exhibit relatively sparse labeling compared to more proximal autonomic nuclei. This information will add, including the PRV strain, viral titer, injection volume, precise injection coordinates, and surgical procedures.

      (3) Not all LPGi cells are liver-related. Was the entire LPGi population stimulated, or was it done in a cell-type-specific manner? What was the strain, sex, and age of the mice? What was the rationale for using the particular viral constructs?

      We thank the reviewer for this insightful and important question. We agree that not all neurons within the LPGi are liver-related, and we apologize that our rationale was not clearly articulated in the original manuscript.

      (1) Our decision to target GABAergic neurons in the LPGi using Gad1-Cre mice was based on prior experimental evidence rather than an assumption about the entire LPGi population. In our previous study (Cell Metab. 2025;37(11):2264-2279.e10), we performed single-cell RNA sequencing on retrogradely labeled LPGi neurons following liver tracing. These analyses revealed that the majority of liver-projecting LPGi neurons are GABAergic in nature. Based on these findings, we chose to selectively manipulate GABAergic neurons in the LPGi rather than the entire LPGi neuronal population, in order to achieve greater cellular specificity and to minimize potential confounding effects arising from heterogeneous neuron types within this region. We regret that this rationale was not clearly described in the original submission and have now revised the manuscript to explicitly state this reasoning.

      (2) In addition, we apologize for the omission of mouse strain, sex, and age information in the Methods section. These details will be fully added.

      (3) We selected AAV-based viral vectors, specifically the AAV9 serotype, due to their well-established efficiency in transducing neurons in the brainstem, relatively low toxicity, and widespread use in circuit-level chemogenetic and optogenetic studies. When combined with Cre-dependent viral constructs in Gad1-Cre mice, this approach enabled selective and reliable manipulation of LPGi GABAergic neurons.

      (4) The authors should consider the effect of stimulation of double-labeled neurons (innervating more than one lobe) and potential confounding effects regarding other physiological functions.

      We thank the reviewer for raising this important point. We agree that neurons innervating more than one liver lobe could, in principle, introduce potential confounding effects and may reflect higher-order integrative autonomic neurons.

      This consideration is consistent with a key finding of the cited study: the celiac-superior mesenteric ganglion (CG-SMG) contains molecularly distinct sympathetic neuron populations (e.g., RXFP1<sup>+</sup> vs. SHOX2<sup>+</sup>) that exhibit complementary organ projections and separate, non‑overlapping functions. Specifically, RXFP1<sup>+</sup> neurons innervate secretory organs (pancreas, bile duct) to regulate secretion, while SHOX2<sup>+</sup> neurons innervate the gastrointestinal tract to control motility. This functional segregation supports the concept of specialized autonomic modules rather than a uniform,“fight or flight”response, reinforcing the need for careful interpretation of circuit-specific manipulations. (Nature. 2025;637(8047):895-902; Neuron. Published online December 10, 2025).

      In our PRV tracing experiments, the proportion of double-labeled neurons was relatively small, suggesting that the majority of labeled LPGi neurons preferentially associate with individual hepatic lobes. Nevertheless, we recognize that activation of this minority population could contribute to broader physiological effects beyond strictly lobe-specific regulation. We acknowledge that the absence of single-cell-level resolution in the current study limits our ability to further dissect the functional heterogeneity of these projection-defined neurons, and we will explicitly state this as a limitation in the revised manuscript. We will explicitly acknowledge this possibility in the revised manuscript and included it as a limitation of the current study. We thank the reviewer for highlighting this important conceptual consideration.

      (5) The authors state that "central projections directly descend along the sympathetic chain to the celiac-superior mesenteric ganglia". What they mean is unclear. Do the authors refer to pre-ganglionic neurons or premotor neurons? How does it fit with the previous literature?

      We thank the reviewer for pointing out this imprecise wording. We agree that the original phrasing was anatomically inaccurate and potentially confusing. The pathways we intended to describe involve brainstem premotor neurons that project to sympathetic preganglionic neurons in the spinal cord. These preganglionic neurons then innervate neurons in the celiac–superior mesenteric ganglia, which in turn provide postganglionic input to the liver.

      We are revising the manuscript to clearly distinguish premotor from preganglionic neurons and to describe this pathway in a manner consistent with the established organization of sympathetic autonomic circuits reported in the previous literature. The revised wording will explicitly reflect this hierarchical relay structure.

      (6) How was the chemical denervation completed for the individual lobes?

      We thank the reviewer for raising this important methodological concern. We agree that potential diffusion of 6-OHDA is a critical issue when performing lobe-specific chemical denervation, and we apologize that our original description did not sufficiently clarify how this was controlled.

      In the revised Methods section, we will provide a detailed description of the denervation procedure, including the injection volume and concentration of 6-OHDA, as well as the physical separation and isolation of individual hepatic lobes during application to minimize diffusion to adjacent tissue.

      To directly assess the specificity of the chemical denervation, we included immunofluorescence and Western blot analyses demonstrating a selective reduction of sympathetic markers in the targeted lobe, with minimal effects on non-targeted lobes. These results support the effectiveness and relative spatial confinement of the 6-OHDA treatment under our experimental conditions.

      We thank the reviewer for highlighting this point, which has helped us improve both the clarity and rigor of the manuscript.

      (7) The Western Blot images look like they are from different blots, but there are no details provided regarding protein amount (loading) or housekeeping. What was the reason to switch beta-actin and alpha-tubulin? In Figures 3F -G, the GS expression is not a good representative image. Were chemiluminescence or fluorescence antibodies used? Were the membranes reused?

      We thank the reviewer for this careful and detailed evaluation of the Western blot data. We apologize that insufficient methodological detail was provided in the original submission.

      (1) We would like to clarify that the protein bands shown within each panel were derived from the same membrane. To improve transparency, we will provide full, uncropped images of the corresponding membranes in the supplementary materials. In addition, detailed information regarding protein loading amounts, gel conditions, and housekeeping controls will be added to the Methods section.

      (2) The use of different loading controls (β-actin or α-tubulin) reflects a technical consideration rather than an experimental inconsistency. In our experiments, the molecular weight of the TH (62kDa) was too close to α-tubulin (55kDa), and β-actin (42kDa) was therefore used to avoid band overlap and to ensure accurate quantification.

      (3) Regarding the GS signal shown in Figures 3F–G, we agree that the original representative image was suboptimal. This appears to be related to antibody performance rather than sample quality. To address this, we are repeating the GS Western blot using a newly validated antibody. The original tissue samples had been aliquoted and stored at −80 °C, allowing reliable re-analysis. This work will be done in 8 weeks.

      (4) All Western blot experiments were detected using chemiluminescence, and membrane stripping and reprobing procedures are now explicitly described in the Methods section.

      We thank the reviewer for highlighting these issues, which significantly improve the rigor and clarity of our data presentation.

      (8) Key references using PRV for liver innervation studies are missing (Stanley et al, 2010 [PMID: 20351287]; Torres et al., 2021 [PMID: 34231420]; Desmoulins et al., 2025 [PMID: 39647176]).

      We thank the reviewer for pointing out these important and highly relevant references that were inadvertently omitted in our initial submission. The studies by Stanley et al. (Proc Natl Acad Sci U S A, 2010), Torres et al. (Am J Physiol Regul Integr Comp Physiol, 2021), and Desmoulins et al. (Auton Neurosci, 2025) represent key PRV-based retrograde tracing work that has mapped central neural circuits innervating the liver and thus provide essential context for our anatomical analyses.

      We agree that inclusion of these studies is necessary to properly situate our findings within the existing literature. Accordingly, we will incorporate citations to these references in the revised manuscript and discuss their relationship to our results.

      Reviewer #3 (Public review):

      Summary:

      This study found a lobe-specific, lateralized control of hepatic glucose metabolism by the brain and provides anatomical evidence for sympathetic crossover at the porta hepatis. The findings are particularly insightful to the researchers in the field of liver metabolism, regeneration, and tumors.

      Strengths:

      Increasing evidence suggests spatial heterogeneity of the liver across many aspects of metabolism and regenerative capacity. The current study has provided interesting findings: neuronal innervation of the liver also shows anatomical differences across lobes. The findings could be particularly useful for understanding liver pathophysiology and treatment, such as metabolic interventions or transplantation.

      Weaknesses:

      Inclusion of detailed method and Discussion:

      We sincerely thank the reviewer for the positive and constructive feedback, which will significantly enhance both the methodological rigor and the broader biological interpretation of our study. In direct response, we will revise the Discussion to elaborate on the potential physiological advantages of a lateralized and lobe-specific pattern of liver innervation. Furthermore, we will expand the Methods section to include a comprehensive description of the quantitative analysis applied to PRV-labeled neurons. Together, these revisions will strengthen the manuscript’s clarity, depth, and relevance to researchers in hepatic metabolism, regeneration, and disease. We expect to complete all updates within 8 weeks.

      (1) The quantitative results of PRV-labeled neurons are presented, and please include the specific quantitative methods.

      We thank the reviewer for this helpful suggestion. We will add a detailed description of the quantitative methods used to analyze PRV-labeled neurons in the revised Methods section. This includes information on the counting criteria, the brain regions analyzed, how the regions of interest were delineated, and the normalization procedures applied to obtain the reported neuron counts.

      (2) The Discussion can be expanded to include potential biological advantages of this complex lateralized innervation pattern.

      We appreciate the reviewer’s suggestion. We will expand the Discussion to include a paragraph addressing the potential biological significance of lateralized liver innervation. We highlight that this asymmetric organization could allow for more precise, lobe-specific regulation of hepatic metabolism, enable integration of distinct physiological signals, and potentially provide robustness against perturbations. These points will discuss in the revised manuscript.

      Reviewer #4 (Public review):

      Summary:

      The studies here are highly informative in terms of anatomical tracing and sympathetic nerve function in the liver related to glucose levels, but given that they are performed in a single species, it is challenging to translated them to humans, or to determine whether these neural circuits are evolutionarily conserved. Dual-labeling anatomical studies are elegant, and the addition of chemogenetic and optogenetic studies is mechanistically informative. Denervation studies lack appropriate controls, and the role of sensory innervation in the liver is overlooked.

      We sincerely appreciate the reviewer's thoughtful evaluation and fully agree that findings derived from a single-species model must be interpreted with caution in relation to human physiology. In direct response, we will revise the manuscript to explicitly clarify that all experimental data were obtained in mice and to provide a discussion of the limitations regarding direct extrapolation to humans. Concurrently, we will expand the Discussion section by integrating our findings with recent human and translational studies, including a multicenter clinical trial demonstrating that catheter-based endovascular denervation of the celiac and hepatic arteries significantly improved glycemic control in patients with poorly controlled type 2 diabetes, without major adverse events (Signal Transduct Target Ther. 2025;10(1):371). While our current work focuses on defining the anatomical organization and functional asymmetry of this circuit in mice, the clinical findings suggest that the core principles, sympathetic control of hepatic glucose metabolism via CG-liver pathways, may be conserved and of translational relevance. Additionally, we will clarify the interpretation of tyrosine hydroxylase labeling and expand the discussion of hepatic sensory and parasympathetic innervation, acknowledging their important roles in liver–brain communication and identifying them as key directions for future research. Collectively, these revisions will provide a more balanced, clinically informed, and rigorous framework for interpreting our findings, and we aim to complete all updates within 8 weeks.

      Specific Weaknesses - Major:

      (1) The species name should be included in the title.

      We thank the reviewer for this suggestion. We agree that the species should be clearly indicated. The findings presented in this study were obtained in mice using tissue clearing and whole-organ imaging approaches. Due to technical limitations, these observations are currently limited to the mouse strain. We will update the title and clarified the species used throughout the manuscript.

      (2) Tyrosine hydroxylase was used to mark sympathetic fibers in the liver, but this marker also hits a portion of sensory fibers that need to be ruled out in whole-mount imaging data

      We thank the reviewer for pointing this out. We acknowledge that tyrosine hydroxylase (TH) labels not only sympathetic fibers but also a subset of sensory fibers. We will add a limitation of this point in the revised manuscript. In addition, ongoing experiments using retrograde PRV labeling from the liver, combined with sectioning, are being used to distinguish sympathetic fibers from vagal and dorsal root ganglion–derived sensory fibers. These data will be included in a forthcoming update of the manuscript and are expected to be completed in approximately 6 weeks.

      (3) Chemogenetic and optogenetic data demonstrating hyperglycemia should be described in the context of prior work demonstrating liver nerve involvement in these processes. There is only a brief mention in the Discussion currently, but comparing methods and observations would be helpful.

      We thank the reviewer for this suggestion. Previous studies largely relied on electrical stimulation to modulate liver innervation, which provides relatively coarse control of neural activity (Eur J Biochem. 1992;207(2):399-411). By contrast, our use of chemogenetic and optogenetic approaches allows selective, cell-type–specific manipulation of LPGi neurons. We will revise the Discussion to place our functional data in the context of prior work, highlighting how these more precise approaches improve understanding of the contribution of liver-innervating neurons to hyperglycemia.

      (4) Sympathetic denervation with 6-OHDA can drive compensatory increases to tissue sensory innervation, and this should be measured in the liver denervation studies to implicate potential crosstalk, especially given the increase in LPGi cFOS that may be due to afferent nerve activity. Compensatory sympathetic drive may not be the only culprit, though it is clearly assumed to be. The sensory or parasympathetic/vagal innervation of the liver is altogether ignored in this paper and could be better described in general.

      We thank the reviewer for this insightful comment and agree that chemical sympathetic denervation with 6-OHDA may induce compensatory changes in non-sympathetic hepatic inputs, including sensory and parasympathetic (vagal) innervation. As the reviewer correctly points out, increased LPGi cFOS activity may reflect afferent nerve engagement rather than solely compensatory sympathetic drive.

      More broadly, we agree that the central nervous system functions as an integrated homeostatic network that continuously processes diverse afferent signals, including hepatic sensory and vagal inputs, as well as other interoceptive cues. From this perspective, the LPGi cFOS changes observed in our study likely represent one component of a complex integrative response rather than evidence for a single dominant pathway.

      We acknowledge that the present study did not directly assess hepatic sensory or parasympathetic innervation, which represents a limitation in scope. In the revised manuscript, we will expand the Discussion to explicitly note this limitation and provide a more balanced consideration of potential crosstalk among sympathetic, sensory, and parasympathetic pathways in shaping LPGi activity following hepatic denervation.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Although the findings are interesting, this reviewer has major concerns about the experimental design, methodology, results, and interpretation of the data. Experimental details are lacking, including basic information (age, sex, strain of mice, procedures, magnification, etc.).

      We thank the reviewer for this important recommendation. We agree that comprehensive reporting of experimental details is essential for rigor and reproducibility.

      In the revised manuscript, we will add complete information regarding mouse strain, sex, age, and sample size for each experiment. In addition, detailed descriptions of surgical procedures, viral constructs, injection parameters, imaging magnification, and analysis methods have been incorporated into the Methods section.

      These revisions ensure that all experiments are described with sufficient technical detail and clarity to allow accurate interpretation and replication of our findings.

      Reviewer #3 (Recommendations for the authors):

      Addressing a few questions might help:

      (1) The study found that liver-associated LPGi neurons are predominantly GABAergic. It would be informative to molecularly characterize the PRV-traced, liver-projecting LPGi neurons to determine their neurochemical phenotypes.

      We thank the reviewer for this insightful suggestion. We agree that molecular characterization of liver-projecting LPGi neurons is important for understanding their functional identity.

      This issue has been addressed in detail in our recent study (Cell Metab. 2025;37(11):2264-2279.e10), in which we performed single-cell RNA sequencing on retrogradely traced LPGi neurons connected to the liver. These analyses demonstrated that the majority of liver-projecting LPGi neurons are GABAergic, with a defined transcriptional profile distinct from neighboring non–liver-related populations.

      Based on these findings, the current study selectively targets GABAergic LPGi neurons using Gad1-Cre mice. We are now explicitly referencing and summarizing these molecular results in the revised manuscript to clarify the neurochemical identity of the PRV-traced LPGi neurons.

      (2) Is it possible to do a local microinjection of a sodium channel blocker (e.g., lidocaine) or an adrenergic receptor antagonist into the porta hepatis? That would potentially provide additional evidence for the porta hepatis as the functional crossover point.

      We appreciate the reviewer’s thoughtful suggestion. While pharmacological blockade at the porta hepatis could modulate local neural activity, the proposed approach may not fully capture the distinction between ipsilateral and contralateral inputs, and may not conclusively establish neural crossover at this particular site.

      In our view, the anatomical evidence provided by whole-mount tissue clearing, dual-labeled tracing, and direct visualization of decussating nerve bundles at the porta hepatis offers a more definitive demonstration of sympathetic crossover. Pharmacological blockade would affect both crossed and uncrossed fibers simultaneously and therefore would not specifically resolve the anatomical organization of this decussation.

      Nevertheless, we agree that functional interrogation of the porta hepatis represents an interesting direction for future work, and we will now acknowledge this possibility in the Discussion.

      (3) It is possible to investigate the effects of unilateral LPGi manipulation or ablation of one side of CG/SMG on liver metabolism, such as hyperglycemia?

      We thank the reviewer for this important suggestion. We agree that unilateral ablation or silencing of the CG-SMG could provide additional insight into lateralized sympathetic control of liver metabolism.

      However, precise and selective ablation of one side of the CG-SMG through 6-OHDA without affecting the contralateral ganglion or adjacent autonomic structures remains technically challenging, particularly given the anatomical connectivity between the two sides. We are currently optimizing approaches to achieve reliable unilateral manipulation.

      If successful within the revision timeframe, we will include these experiments and corresponding metabolic analyses in the revised manuscript. If not, we will explicitly discuss this experimental limitation and the predicted metabolic consequences of unilateral CG-SMG ablation as an important direction for future studies. This work will be done in 6 weeks.

      Reviewer #4 (Recommendations for the authors):

      In the abstract and elsewhere, the use of the term 'sympathetic release' is unclear - do you mean release of nerve products, such as the neurotransmitter norepinephrine? This should be more clearly defined.

      We thank the reviewer for pointing out this ambiguity. We agree that the term “sympathetic release” was imprecise. In the revised manuscript, we will explicitly refer to the release of sympathetic neurotransmitters, primarily norepinephrine, from postganglionic sympathetic fibers.

      We will revise the wording throughout the manuscript to ensure accurate and consistent terminology and to avoid potential confusion regarding the underlying neurobiological mechanisms.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors investigate the effects of Miro1 on VSMC biology after injury. Using conditional knockout animals, they provide the important observation that Miro1 is required for neointima formation. They also confirm that Miro1 is expressed in human coronary arteries. Specifically, in conditions of coronary diseases, it is localized in both media and neointima, and, in atherosclerotic plaque, Miro1 is expressed in proliferating cells.

      However, the role of Miro1 in VSMC in CV diseases is poorly studied, and the data available are limited; therefore, the authors decided to deepen this aspect. The evidence that Miro-/- VSMCs show impaired proliferation and an arrest in S phase is solid and further sustained by restoring Miro1 to control levels, normalizing proliferation. Miro1 also affects mitochondrial distribution, which is strikingly changed after Miro1 deletion. Both effects are associated with impaired energy metabolism due to the ability of Miro1 to participate in MICOS/MIB complex assembly, influencing mitochondrial cristae folding. Interestingly, the authors also show the interaction of Miro1 with NDUFA9, globally affecting super complex 2 assembly and complex I activity.

      Finally, these important findings also apply to human cells and can be partially replicated using a pharmacological approach, proposing Miro1 as a target for vasoproliferative diseases.

      Strengths:

      The discovery of Miro1 relevance in neointima information is compelling, as well as the evidence in VSMC that MIRO1 loss impairs mitochondrial cristae formation, expanding observations previously obtained in embryonic fibroblasts.

      The identification of MIRO1 interaction with NDUFA9 is novel and adds value to this paper. Similarly, the findings that VSMC proliferation requires mitochondrial ATP support the new idea that these cells do not rely mostly on glycolysis.

      Weaknesses:

      (1) Figure 3:

      I appreciate the system used to assess mitochondrial distribution; however, I believe that time-lapse microscopy to evaluate mitochondrial movements in real time should be mandatory. The experimental timing is compatible with time-lapse imaging, and these experiments will provide a quantitative estimation of the distance travelled by mitochondria and the fraction of mitochondria that change position over time. I also suggest evaluating mitochondrial shape in control and MIRO1-/- VSMC to assess whether MIRO1 absence could impact mitochondrial morphology, altering fission/fusion machinery, since mitochondrial shape could differently influence the mobility.

      Mitochondrial motility experiments. WT and Miro1-/- VSMCs were transiently transfected with mito-ds-red and untargeted GFP adenoviruses to fluorescently label mitochondria and cytosol, respectively. Live-cell fluorescence confocal microscopy was used to acquire mitochondrial images at one-minute intervals over a 25-30-minute period. WT cells exhibited dynamic reorganization of the mitochondrial network, whereas Miro1-/- VSMCs displayed minimal mitochondrial movement, characterized only by limited oscillatory behavior without network remodeling (Supplemental Video 1).

      Mitochondrial shape (form factor) was assessed by confocal microscopy in WT and Miro1-/- VSMCs. Analysis of the mitochondrial form factor (defined as the ratio of mitochondrial length to width) during cell cycle progression revealed morphological changes in wild type (WT) cells, characterized by an increase in form factor. In contrast, Miro1-/- cells exhibited no significant alterations in mitochondrial morphology (Figure 3- Figure supplement 1B).

      (2) Figure 6:

      The evidence of MIRO1 ablation on cristae remodeling is solid; however, considering that the mechanism proposed to explain the finding is the modulation of MICOS/MIB complex, as shown in Figure 6D, I suggest performing EM analysis in each condition. In my mind, Miro1 KK and Miro1 TM should lead to different cristae phenotypes according to the different impact on MICOS/MIB complex assembly. Especially, Miro1 TM should mimic Miro1 -/- condition, while Miro1 KK should drive a less severe phenotype. This would supply a good correlation between Miro1, MICOS/MIB complex formation and cristae folding.

      I also suggest performing supercomplex assembly and complex I activity with each plasmid to correlate MICOS/MIB complex assembly with the respiratory chain efficiency.

      Complex I activity assays revealed that overexpression of MIRO1-WT fully restored enzymatic activity in MIRO1-/- cells, whereas MIRO1-KK provided partial rescue. In contrast, a MIRO1 mutant lacking the transmembrane domain failed to restore activity and resembled the Miro1-/- phenotype (Figure 6- Figure supplement 2).

      The Complex I activity in each Miro1 mutant correlated with the degree of MICOS/MIB complex assembly in pulldown assays, implying a functional link between Miro1 and mitochondrial cristae organization.

      Moreover, an in-gel Complex V activity assay was performed to evaluate the enzymatic activity of mitochondrial ATP synthase in a native gel following electrophoresis. To normalize the activity signal, a Blue Native PAGE of the same samples was probed for the ATP5F1 subunit. A modest, yet statistically significant reduction in Complex V activity was observed in Miro1-/- cells (Figure 6- Figure supplement 1).

      (3) I noticed that none of the in vitro findings have been validated in an in vivo model. I believe this represents a significant gap that would be valuable to address. In your animal model, it should not be too complex to analyze mitochondria by electron microscopy to assess cristae morphology. Additionally, supercomplex assembly and complex I activity could be evaluated in tissue homogenates to corroborate the in vitro observations.

      We appreciate the reviewer’s comment. However, our currently available samples have been processed by light microscopy and are therefore not suitable for embedding for light for electron microscopy.

      (4) I find the results presented in Figure S7 somewhat unclear. The authors employ a pharmacological strategy to reduce Miro1 and validate the findings previously obtained with the genetic knockout model. They report increased mitophagy and a reduction in mitochondrial mass. However, in my opinion, these changes alone could significantly impact cellular metabolism. A lower number of mitochondria would naturally result in decreased ATP production and reduced mitochondrial respiration. This, in turn, weakens the proposed direct link between Miro1 deletion and impaired metabolic function or altered electron transport chain (ETC) activity. I believe this section would benefit from additional experiments and a more in-depth discussion.

      We initially conducted experiments using the MIRO1 reducer to explore the translational potential of our findings. These experiments aimed to provide a foundation in vivo studies. However, despite multiple attempts, we were unable to demonstrate a significant effect of MIRO1reducer, delivered via a Pluronic gel, on the mitochondria of the vascular wall. Of note, he role of MIRO1 in mitophagy has been well-established in several studies (for example, PMID: 34152608), which show that genetic deletion of Miro1 delays the translocation of the E3 ubiquitin ligase Parkin onto damaged mitochondria, thereby reducing mitochondrial clearance in fibroblasts and cultured neurons. Furthermore, loss of Miro1 in the hippocampus and cortex increases mitofusin levels with the appearance of hyperfused mitochondria and activation of the integrated stress response. Thus, MIRO1 deletion in genetic models does not result in a substantial reduction of mitochondria but causes hyperfused mitochondria. The rationale for developing the MIRO1 reducer stems from genetic forms of Parkinson’s disease, where Miro1 is retained in PD cells but degraded in healthy cells following mitochondrial depolarization (PMID: 31564441). Thus, the degradation of mutant MIRO1 by the reducer does not phenocopy the effects of genetic MIRO1 depletion. Thus, we believe the data with the reducer demonstrate that MIRO1 can be acutely targeted in vitro, but the mechanism of action (as the reviewer points out, the reduction of mitochondrial mass may lead to decreased ATP levels, potentially reducing cell proliferation) differs from that of chronic genetic deletion. In fact, we observe somewhat increased mitochondrial length in MIRO1-/- cells. We acknowledge that this is complex and have revised the paragraph to clarify the use of the MIRO1 reducer.

      Reviewer #2 (Public review):

      Summary:

      This study identifies the outer mitochondrial GTPase MIRO1 as a central regulator of vascular smooth muscle cell (VSMC) proliferation and neointima formation after carotid injury in vivo and PDGF-stimulation ex vivo. Using smooth muscle-specific knockout male mice, complementary in vitro murine and human VSMC cell models, and analyses of mitochondrial positioning, cristae architecture, and respirometry, the authors provide solid evidence that MIRO1 couples mitochondrial motility with ATP production to meet the energetic demands of the G1/S cell cycle transition. However, a component of the metabolic analyses is suboptimal and would benefit from more robust methodologies. The work is valuable because it links mitochondrial dynamics to vascular remodeling and suggests MIRO1 as a therapeutic target for vasoproliferative diseases, although whether pharmacological targeting of MIRO1 in vivo can effectively reduce neointima after carotid injury has not been explored. This paper will be of interest to those working on VSMCs and mitochondrial biology.

      Strengths:

      The strength of the study lies in its comprehensive approach, assessing the role of MIRO1 in VSMC proliferation in vivo, ex vivo, and importantly in human cells. The subject provides mechanistic links between MIRO1-mediated regulation of mitochondrial mobility and optimal respiratory chain function to cell cycle progression and proliferation. Finally, the findings are potentially clinically relevant given the presence of MIRO1 in human atherosclerotic plaques and the available small molecule MIRO1.

      Weaknesses:

      (1) There is a consistent lack of reporting across figure legends, including group sizes, n numbers, how many independent experiments were performed, or whether the data is mean +/- SD or SEM, etc. This needs to be corrected.

      These data were added in the revised manuscript.

      (2) The in vivo carotid injury experiments are in male mice fed a high-fat diet; this should be explicitly stated in the abstract, as it's unclear if there are any sex- or diet-dependent differences. Is VSMC proliferation/neointima formation different in chow-fed mice after carotid injury?

      This is an important point, and we appreciate the feedback. In this model, the transgene is located on the Y chromosome. As a result, only male mice can be studied. However, in our previous experiments, we have not observed any sex-dependent changes in neointimal formation. Additionally, please note that smooth muscle cell proliferation in neointimal formation is enhanced in models of cholesterol-fed mice on a high-fat diet.

      (3) The main body of the methods section is thin, and it's unclear why the majority of the methods are in the supplemental file. The authors should consider moving these to the main article, especially in an online-only journal.

      We thank the reviewer for this suggestion. We moved the methods to the main manuscript.

      (4) Certain metabolic analyses are suboptimal, including ATP concentration and Complex I activity measurements. The measurement of ATP/ADP and ATP/AMP ratios for energy charge status (luminometer or mass spectrometry), while high-resolution respirometry (Oroboros) to determine mitochondrial complex I activity in permeabilized VSMCs would be more informative.

      ATP/ADP and ATP/AMP ratios were assessed on samples from WT and Miro1-/- VSMCs using an ATP/ADP/AMP Assay Kit (Cat#: A-125) purchased from Biomedical Research Service, University at Buffalo, New York). Miro1-/- samples exhibited reduced ATP levels accompanied by elevated concentrations of ADP and AMP. As a result, both ATP/ADP and ATP/AMP ratios were significantly lower in MIRO1-/- cells compared to WT, indicating impaired cellular energy homeostasis (Figure 5B, C).

      (5) The statement that 'mitochondrial mobility is not required for optimal ATP production' is poorly supported. XF Seahorse analysis should be performed with nocodazole and also following MIRO1 reconstitution +/- EF hands.

      To evaluate the metabolic effects of Nocodazole, we conducted Seahorse metabolic assays on vascular smooth muscle cells with various conditions (VSMCs). We used WT VSMCs, Miro1-/- VSMCs, and Miro1-/- VSMCs that expressed either MIRO1-WT, KK, or ΔTM mutants.Our results demonstrate that Nocodazole exposure did not compromise mitochondrial respiratory activity. However, Miro1-/- VSMCs displayed a trend toward reduced basal and maximal mitochondrial respiration when compared to WT cells. This deficit was only partially corrected by the expression of the MIRO1-KK mutant. In contrast, reintroducing MIRO1-WT through adenoviral delivery fully restored mitochondrial respiration to normal levels (Figure 5- Figure supplement 1).

      (6) The authors should consider moving MIRO1 small molecule data into the main figures. A lot of value would be added to the study if the authors could demonstrate that therapeutic targeting of MIRO1 could prevent neointima formation in vivo.

      We appreciate the reviewer's comment and attempted the suggested in vivo experiments using the commercially available Miro1 reducer. For these experiments, we used a pluronic gel to deliver the reducer to the adventitial area surrounding the carotid artery. Despite numerous attempts to optimize the experimental conditions, we were unable to reliably detect a significant effect of the reducer on mitochondria in the vascular wall.

      Reviewer #3 (Public review):

      Summary:

      This study addresses the role of MIRO1 in vascular smooth muscle cell proliferation, proposing a link between MIRO1 loss and altered growth due to disrupted mitochondrial dynamics and function. While the findings are potentially useful for understanding the importance of mitochondrial positioning and function in this specific cell type within health and disease contexts, the evidence presented appears incomplete, with key bioenergetic and mechanistic claims lacking adequate support.

      Strengths:

      (1)The study focuses on an important regulatory protein, MIRO1, and its role in vascular smooth muscle cell (VSMC) proliferation, a relatively underexplored context.

      (2) It explores the link between smooth muscle cell growth, mitochondrial dynamics, and bioenergetics, which is a potentially significant area for both basic and translational biology.

      (3) The use of both in vivo and in vitro systems provides a potentially useful experimental framework to interrogate MIRO1 function in this context.

      Weaknesses:

      (1) The central claim that MIRO1 loss impairs mitochondrial bioenergetics is not convincingly demonstrated, with only modest changes in respiratory parameters and no direct evidence of functional respiratory chain deficiency.

      (2) The proposed link between MIRO1 and respiratory supercomplex assembly or function is speculative, lacking mechanistic detail and supported by incomplete or inconsistent biochemical data.

      (3) Key mitochondrial assays are either insufficiently controlled or poorly interpreted, undermining the strength of the conclusions regarding oxidative phosphorylation.

      (4) The study does not adequately assess mitochondrial content or biogenesis, which could confound interpretations of changes in respiratory activity.

      (5) Overall, the evidence for a direct impact of MIRO1 on mitochondrial respiratory function in the experimental setting is weak, and the conclusions overreach the data.

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1)  Throughout the manuscript, the authors incorrectly use "mobility" to describe the active transport of mitochondria. The appropriate term is "mitochondrial motility," which refers to active, motor-driven movement. "Mobility" implies passive diffusion and is not scientifically accurate in this context.

      (2) "Super complex" should be consistently written as "supercomplex," in line with accepted mitochondrial biology terminology.

      We thank the reviewer for this comment and revised the text accordingly.

      (3) A significant limitation of the in vivo model is the mild phenotype observed, which is expected from an inducible knockout system. The authors should clarify whether a constitutive, tissue-specific knockout was considered and, if not, whether embryonic lethality or another limitation prevented its generation.

      This genetic model was originally developed by Dr. Janet Shaw at the University of Utah. In the original publication, Miro1 was constitutively knocked out in neurons. Germline inactivation of Miro1 was achieved by crossing mice harboring the Miro1F allele with a mouse line expressing Cre recombinase under the control of the hypoxanthine-guanine phosphoribosyltransferase (HPRT) promoter. Mating Miro1+/− mice resulted in Miro1−/− animals, which were cyanotic and died shortly after birth. Due to this outcome, we opted to develop an inducible, smooth muscle-specific model. Additionally, we considered testing whether the acute use of an inhibitor or a knockdown system targeting Miro1 could be evaluated as a potential therapeutic approach.

      (4) In Figure 1A and S1A, the authors use Western blotting to validate the knockout in the aorta and IHC in carotid arteries. The choice of different methods does not seem justified, and qPCR data are shown only for the aorta. IHC appears to be suboptimal for assessing MIRO1 levels in vascular tissue due to high autofluorescence, and IHC in Figure S1A is merely qualitative, with no quantification provided.

      We present complementary approaches to validate the deletion of Miro1. For Western blot analysis, we used the aorta because it provides more material for analysis. The autofluorescence observed via immunofluorescence is characteristic of elastin fibers within the media layer, making our results typical for this technique. As shown in Figure 1- Figure supplement 1, our data demonstrate a significant decrease, if not a complete knockout, of the target protein specifically in smooth muscle cells.

      (5) In Figure 1G, the bottom left panel (magnification) shows a lower green signal than the top left panel, suggesting these may have been collected with different signal intensity. This raises concerns about image consistency and representation.

      Top images in Figure 1G are taken at magnification 63x. Bottom images were made at magnification 20x. The intensity is different between the two magnifications, but similar between genotypes.

      (6) In Figure S3, the sampling is uncontrolled: the healthy subject and the patient differ markedly in age. The claim of colocalization is not substantiated with any quantitative analysis.

      As outlined in the Methods section, our heart samples were obtained from LVAD patients or explanted hearts from transplant recipients. Due to the limited availability of such samples, there is indeed a difference in age between the healthy subject and the patient. While we acknowledge this limitation, the scarcity of samples made it challenging to control for age. Additionally, we determined that performing a quantitative analysis of colocalization would not yield robust or meaningful data given the constraints of our sample size and variability. 

      (7) Figure S4A lacks statistical analysis, which is necessary for interpreting the data shown.

      This appears to be a misunderstanding. In this manuscript, we do present statistically significant differences and focus on those that are biologically meaningful. Specifically, we highlight differences between PDGF treatment versus no treatment within the same genotype, as well as differences between the two genotypes under the same treatment condition (control or PDGF treatment). In this particular case, there is only a statistical difference between WT+PDGF and SM-Miro1-/, but since this is not a meaningful comparison, it is not shown. Please note that this approach applies to all figures in the manuscript. Including all comparisons—whether statistically significant or not, and whether biologically meaningful or not—may appear rigorous but in our opinion, ultimately detracts from the main message of this paper.

      (8) The authors state, "given the generally poor proliferation of VSMCs from SM-MIRO1-/- mice, in later experiments we used VSMCs from MIRO1fl/fl mice and infected them with adenovirus expressing cre." This is not convincing, especially since in vivo cre efficiency is generally lower than in vitro. Moreover, the methods indicate that "VSMCs from littermate controls were subjected to the same procedure with empty vector control adenovirus," yet in Figure 2A, the control appears to be MIRO1fl/fl VSMCs transduced with Ad-EV. The logic and consistency of the controls used need clarification.

      For the initial experiments, cells were explanted from SM-MIRO1-/- mice (Figure 2- Figure supplement 1). In these mice, Cre recombination had occurred in vivo, and the cells exhibited very poor growth. In fact, their growth was so limited that we decided not to pursue this experimental approach after three independent experiments.

      For subsequent experiments, cells were explanted from Miro1fl/fl mice and passaged several times, which allowed us to generate the number of cells required for the experiments (Figure 2B). Once sufficient Miro1fl/fl cells were obtained, they were treated with adenovirus expressing Cre, as described in the Methods section. Control cells were treated with an empty vector adenovirus. To clarify, the control cells are Miro1fl/fl cells infected with an empty vector adenovirus, while the MIRO1-/- cells are Miro1fl/fl cells infected with adenovirus expressing Cre. The statement that “littermate controls were used” is incorrect as in fact, Miro1fl/fl cells from the same preparation were either infected with an empty vector adenovirus, or with adenovirus expressing Cre. As mentioned, the knockdown was confirmed by Western blotting.

      (9) Figure 2C shows a growth delay in MIRO1-/- cells. Have the authors performed additional time points to determine when these cells return to G1 and quantify the duration of the lag?

      This is an excellent suggestion. So far, we have not performed this experiment.

      (10) In the 24 h time point of Figure 2C, MIRO1-/- cells appear to be cycling, yet no cyclin E signal is detected. How do the authors explain this inconsistency? Additionally, in Figure 2H, the quantification of cyclin E is unreliable, given that lanes 3 and 4 show no detectable signal.

      We agree with the reviewer—the inconsistency is driven by the exposure of the immunoblot presented. We revisited the data, reviewed the quantification, and performed an additional experiment. We are now presenting an exposure that demonstrates levels of cyclin E (Figure 2G).

      (11) In Figure 3D, the authors present mitochondrial probability map vs. distance from center curves. How was the "center" defined in this analysis? Were radial distances normalized across cells (e.g., to the cell radius or maximum extent)? If not, variation in cell and/or nucleus size or shape could significantly affect the resulting profiles. No statistical analysis is provided for this assessment, which undermines its quantitative value. Furthermore, the rationale behind the use of mito95 values is not clearly explained.

      The center refers to the center of the microchip's Y-shaped pattern, to which each cell is attached. Since all Y-shapes on the chip are identical in size, normalization is not required. The size of the optimal Y-shapes was tested as recommended by CYTOO. For further context, please refer to the papers by the Kittler group.

      Additionally, a graph demonstrating the percentage of mitochondria localized at specific distances can be produced for any given distance. Notably, the further from the center of the chip, the more pronounced the differences become.

      (12) The authors apply a 72 h oligomycin treatment to assess proliferation and a 16 h treatment to measure ATP levels. This discrepancy in experimental design is not justified in the manuscript. The length of treatment directly impacts the interpretation of the data in Figures 4C, 4D, and 4E, and needs to be addressed.

      Thank you for this comment. We have performed additional experiments to align these time points. In the revised manuscript, we now present proliferation and ATP production measured at the same time point (Figure 4A, B for proliferation and ATP levels).

      (13) The manuscript repeatedly suggests that MIRO1 loss causes a defect in mitochondrial ATP production, yet no direct demonstration of a bioenergetic defect is provided. The claim relies on a modest decrease in supercomplex species (of undefined composition) and a mild reduction in complex I activity that does not support a substantial OXPHOS defect. Notably, the respirometry data in Figure 5I do not align with the BN-PAGE results in Figure 6I. There is increasing evidence that respiratory chain supercomplexes do not confer a catalytic advantage. The authors should directly assess the enzymatic activities of all respiratory complexes. Reported complex I activity in MIRO1-/- cells appears rotenone-like (virtually zero, figure 3K) or ~30% residual (Figure 3L), suggesting a near-total loss of functional complex I, which is not reflected in the BN-PAGE. Additionally, complex I activity has not been normalized to a mitochondrial reference, such as citrate synthase.

      Given that we work in primary cells and are limited by the number of cells we can generate, we concentrated on ETC1 and 5 and performed experiments in cells after expression of MIRO1 WT and MIRO1 mutants (Figure 6- Figure supplement 1). Please note that the addition of Rotenone abolishes the slope of NADH consumptions (Figure 6- Figure supplement 2F).

      While the ETC1 activity is measured in Fig. 6K, the blue native gel shown in Figure 6I is performed without substrate and thus, indicative of protein complex abundance rather than complex activity.

      In additional experiments, we normalized the activity to citrate synthase as requested.

      (14) In the methods section, the complex I activity assay is incorrectly described: complex I is a NADH dehydrogenase, so the assay measures NADH oxidation, not NADPH.

      We thank the reviewer for his comment and revised the manuscript accordingly.

      (15) The authors have not assessed mitochondrial mass, which is a critical omission. Differences in mitochondrial biogenesis or content could underlie several observed phenotypes and should be controlled for.

      A qPCR assay was used to assess mitochondrial DNA copy number in WT and Miro1-/- VSMCs. We determined the abundance of COX1 and MT-RNR1 DNA as mitochondrial gene targets and NDUFV DNA as the nuclear reference gene. While the results in Miro1-/- cells were highly variable, no statistically significant reduction of copy numbers was detected (Figure 3- Figure supplement 1B).

      (16) Complex IV signal is missing in Figure 6I. Its omission is not acknowledged or explained.

      Thank you for this comment. We believe this is due to a technical issue. Complex IV can be challenging to detect consistently, as its visibility is highly dependent on sample preparation conditions. In this specific case, we suspect that the buffer used during the isolation process may have influenced the detection of Complex IV.

      (17) Figure 6D does not appear representative of the quantifications shown. C-MYC signal is visibly reduced in the mutant, consistent with the lower levels of interactors such as Sam50 and NDUFA9. Additionally, the SDHA band is aligned at the bottom of the blot box. The list of antibodies used, and their catalog number is missing, or it was not provided to the reviewers. It seems plausible that the authors used a cocktail antibody set (e.g., Abcam ab110412), which includes anti-NDUFA9. This would contradict the claim of reduced complex I and SC levels, as the steady-state levels of NDUFA9 appear unchanged.

      We acknowledge that the expression of the myc-MIRO1 mutant is lower compared to myc-MIRO1 WT or myc-MIRO1 KK. Achieving identical expression levels when overexpressing multiple MIRO1 constructs is challenging. We agree that the lower expression of this mutant contributes to a reduced pull-down. Our quantification shows a reduction in association, although it is not statistically significant.

      A list of the antibodies was provided in the Methods section.

      We would like to clarify that we did not use an antibody cocktail in our experiments.

      (18) The title of Figure 6, "Loss of Miro1 leads to dysregulation of ETC activity under growth conditions," is vague. The term "dysregulation" should be replaced with a more specific mechanistic descriptor-what specific regulatory defect is meant?

      We thank the reviewer for this suggestion and rephrased the title.

      (19) In the results text for Figure 6, the authors state: "These data demonstrate that MIRO1 associates with MIB/MICOS and that this interaction promotes the formation of mitochondrial super complexes and the activity of ETC complex I." This conclusion is speculative and not mechanistically supported by the data presented.

      We appreciate the reviewer's feedback. We have revised the text to clarify the relationship between MIRO1, MIB/MICOS, supercomplex formation, and ETC activity. The updated text now states: "These data demonstrate that MIRO1 associates with MIB/MICOS. Additionally, MIRO1 promotes the formation of mitochondrial supercomplexes and enhances the activity of ETC complex I.”

      (20) In Figure 7A, it is unclear what the 3x siControl/siMiro1 pairs represent-are these different cell lines or technical replicates of the same line? No loading control is shown. If changes in mitochondrial protein abundance are being evaluated, using COX4 as a loading control is inappropriate. The uneven COX4 signal across samples further complicates interpretation

      Please note that we used primary cells, not cell lines. The three siControl/siMiro1 pairs represent independent cell isolations, each transfected with either siControl or. siMIRO1 mRNA. While the possibility of a difference in mitochondrial mass is an interesting question, the primary objective of this experiment is to demonstrate that the technique effectively results in the knockdown of Miro1, which is exclusively localized to mitochondria and not present in the cytosol. As such, we believe that Cox4 serves as a reasonable loading control. Although Miro1 knockdown may lead to a reduction in mitochondrial mass, the focus of this experiment is not to assess mitochondrial mass but to confirm the reduction in Miro1 protein levels on mitochondria. We also performed anti-VDAC immunoblots on the same membranes as alternative loading control (Author response image 1).

      Author response image 1.

      (21) Figure 7G is difficult to interpret. Why did the authors choose to use a sensor-based method instead of the chemiluminescent assay to measure ATP in these samples?

      Both methods were employed to assess ATP levels in human samples. ATP measurements obtained with luminescent assay are provided.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we cannot reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s<sup>-1</sup>.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated the below figures to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4) Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5) Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment) <sub>fast</sub>/p(recruitment) <sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 below that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements. The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”  

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis below, which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Mazar & Yovel 2025 dissect the inverse problem of how echolocators in groups manage to navigate their surroundings despite intense jamming using computational simulations.

      The authors show that despite the 'noisy' sensory environments that echolocating groups present, agents can still access some amount of echo-related information and use it to navigate their local environment. It is known that echolocating bats have strong small and large-scale spatial memory that plays an important role for individuals. The results from this paper also point to the potential importance of an even lower-level, short-term role of memory in the form of echo 'integration' across multiple calls, despite the unpredictability of echo detection in groups. The paper generates a useful basis to think about the mechanisms in echolocating groups for experimental investigations too.

      Strengths:

      The paper builds on biologically well-motivated and parametrised 2D acoustics and sensory simulation setup to investigate the various key parameters of interest

      The 'null-model' of echolocators not being able to tell apart objects & conspecifics while echolocating still shows agents succesfully emerge from groups - even though the probability of emergence drops severely in comparison to cognitively more 'capable' agents. This is nonetheless an important result showing the direction-of-arrival of a sound itself is the 'minimum' set of ingredients needed for echolocators navigating their environment.

      The results generate an important basis in unraveling how agents may navigate in sensorially noisy environments with a lot of irrelevant and very few relevant cues.

      The 2D simulation framework is simple and computationally tractable enough to perform multiple runs to investigate many variables - while also remaining true to the aim of the investigation.

      Weaknesses:

      Authors have not yet provided convincing justification for the use of different echolocation phases during emergence and in cave behaviour. In the previous modelling paper cited for the details - here the bat-agents are performing a foraging task, and so the switch in echolocation phases is understandable. While flying with conspecifics, the lab's previous paper has shown what they call a 'clutter response' - but this is not necessarily the same as going into a 'buzz'-type call behaviour. As pointed out by another reviewer - the results of the simulations may hinge on the fact that bats are showing this echolocation phase-switching, and thus improving their echo-detection. This is not necessarily a major flaw - but something for readers to consider in light of the sparse experimental evidence at hand currently.

      The use of echolocation phases—defined as the sequential search, approach, and buzz call patterns—has been documented not only during foraging but also in tasks such as landing, obstacle avoidance, clutter navigation, and drinking. Bat call structure has been shown to vary systematically with object proximity, not exclusively in response to prey. During obstacle avoidance, phase transitions were observed, with approach calls emitted in grouped sequences and with reduced durations (Gustafson & Schnitzler, 1979; Schnitzler et al., 1987). In landing contexts, bats have been reported to emit short-duration calls and decrease inter-pulse intervals—buzz-like patterns also observed during prey capture— suggesting shared acoustic strategies across behaviors (Hagino et al., 2007; Hiryu et al., 2008; Melcón et al., 2007, 2009). Comparable patterns have been reported during drinking maneuvers, where “drinking buzzes” have been proposed to guide a precise approach to the water surface, analogous to landing buzzes (Griffiths, 2013; Russo et al., 2016). In response to environmental complexity, bats were found to shorten calls and increase repetition rates when navigating cluttered spaces compared to open ones (Falk et al., 2014; Kalko & Schnitzler, 1993).

      Moreover, field recordings from our study of Rhinopoma microphyllum (Goldshtein et al., 2025) revealed shortened call durations and inter-pulse intervals during dense group flight outside the cave during emergence—patterns consistent with terminal-approach phase that is typical when coming very close to an object (another bat in this case). The Author response image 1 shows an approach sequence recorded from a tagged bat approximately 20 meters from the cave entrance, with self-generated echolocation calls marked. The inter-pulse-interval of ca. 20 ms is used by these bats when a reflective object (another bat in this case) is nearby. 

      Author response image 1.

      These results provide direct evidence that bats actively employ approach-phase echolocation during swarming likely to avoid collision with other bats. This supports the view that echolocation phase transitions are a general proximity-based sensing strategy, adapted across a variety of behavioral scenarios—not limited to hunting alone. 

      In our simulations, bats predominantly emitted calls in the approach phase, with only rare occurrences of buzz-phase calls.

      See lines 355-363 in the revised manuscript.

      The decision to model direction-of-arrival with such high angular resolution (1-2 degrees) is not entirely justifiable - and the authors may wish to do simulation runs with lower angular resolution. Past experimental paradigms haven't really separated out target-strength as a confounding factor for angular resolution (e.g. see the cited Simmons et al. 1983 paper). Moreover, to this reviewer's reading of the cited paper - it is not entirely clear how this experiment provides source-data to support the DoA-SNR parametrisation in this manuscript. The cited paper has two array-configurations, both of which are measured to have similar received levels upon ensonification. A relationship between angular resolution and signal-to-noise ratio is understandable perhaps - and one can formulate such a relationship, but here the reviewer asks that the origin/justification be made clear. On an independent line, also see the recent contrasting results of Geberl, Kugler, Wiegrebe 2019 (Curr. Biol.) - who suggest even poorer angular resolution in echolocation.

      We thank the reviewer for raising this important point. The acuity of 1.5–3° in horizontal direction-of-arrival (DoA) estimation is based on the classical work of Simmons et al. with Eptesicus fuscus (Simmons et al., 1983). Similar precision was later supported by Erwin et al. (Erwin et al., 2001), who modeled azimuth estimation from measured interaural intensity differences (IIDs), reporting an average error of 0.2° with a standard deviation of ~2.2°, consistent with the behavioral data found by Simmons. The decline in acuity with increasing arrival angle has also been demonstrated in behavioral and physiological studies of binaural IID processing (Erwin et al., 2001; Fay, 1995; Razak, 2012; Wohlgemuth et al., 2016). The error model itself was first introduced in our earlier work (Mazar & Yovel, 2020).

      Importantly, Geberl et al. (Geberl et al., 2019) examined the resolution of weak targets masked by nearby strong flankers  and found poor spatial discrimination of ~45 degrees; however, they were studying a detection problem, rather than the horizontal acuity of azimuth estimation. Indeed, our model assumes there is no spatial discrimination at all.

      Overall, while our DoA–SNR parametrization can certainly be critiqued and alternative parameterizations could be tested in future work, we believe it reflects a reasonable and empirically supported assumption. 

      Reviewer #2 (Public review):

      This manuscript describes a detailed model for bats flying together through a fixed geometry. The model considers elements which are faithful to both bat biosonar production and reception and the acoustics governing how sound moves in air and interacts with obstacles. The model also incorporates behavioral patterns observed in bats, like one-dimensional feature following and temporal integration of cognitive maps. From a simulation study of the model and comparison of the results with the literature, the authors gain insight into how often bats may experience destructive interference of their acoustic signals and those of their peers, and how much such interference may actually negatively effect the groups' ability to navigate effectively. The authors use generalized linear models to test the significance of the effects they observe.

      The work relies on a thoughtful and detailed model which faithfully incorporates salient features, such as acoustic elements like the filter for a biological receiver and temporal aggregation as a kind of memory in the system. At the same time, the authors abstract features that are complicating without being expected to give additional insights, as can be seen in the choice of a two-dimensional rather than three-dimensional system. I thought that the level of abstraction in the model was perfect, enough to demonstrate their results without needless details. The results are compelling and interesting, and the authors do a great job discussing them in the context of the biological literature.

      With respect to the first version of the manuscript, the authors have remedied all my outstanding questions or concerns in the current version. The new supplementary figure 5 is especially helpful in understanding the geometry.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Data Availability: This reviewer lauds the authors for switching from a private commercial folder requiring login to one that does not. At the cost of being overtly pedantic - the Github repository is not a long-term archival resource. The ideal solution is to upload the code in an academic repository (Zenodo, OSF, etc.) to periodically create a 'static snapshot' of code for archival, while also hosting a 'live' version on Github.

      We have uploaded to Zenodo repository, and updated the link in the paper:

      How bats exit a crowded colony when relying on echolocation only - a modeling approach

      In one of the rebuttals to Reviewer #3- the authors have cited a wrong paper (Beleyur & Goerlitz 2019) - while discussing broad bandwidth calls improving detection - and may wish to correct this if possible on record.

      We have removed the incorrect citation from the revised version of the manuscript.

      Specific comments on the 2nd manuscript:

      Figure 5: Table 1 says 1, 2,5,10,20,40,100 bats were simulated (line 138-139) but the conclusion (line 398) says '1 to 100 bats' per 3msq. However, the X-axis only stops at 40 and says 'number of bats', while the legend says bats/3msq....what is actually being plotted? Moreover, in the entire paper there is a constant back-and-forth between density and # of bats - perhaps it is explained beforehand, but it is a bit unsettling - and more can be done to clarify these two conventions.

      While most parameters were tested across the full range of 1 to 100 bats per 3 m², a subset of conditions—including misidentification, multi-call clustering, wall target strength, and conspecific target strength—were simulated only up to 40 bats due to significantly longer run-times. This is now clarified in both the main text and the Table 1 caption.

      In our simulations, the primary parameter was the number of bats placed within a 3 m² starting area, which directly determined the initial density (bats per 3 m²). Throughout the manuscript, we use “number of bats” to refer to the simulation input, while “density” denotes the equivalent ecological measure. Figure 5 and related captions have been revised accordingly to note these conventions and to indicate when results are shown only up to 40 bats (see lines 120–122, 314-317 in the revised text).

      Table 1: This was made considerably difficult to read given the visual clutter - and I hope I've understood these changes correctly.

      What is in the square brackets of the effect-size (e.g. first row with values 'Exit prob. (%)' says -0.37/bat [63:100] ? What does this 63:100 refer to?

      What is the 'process flag'

      Values in square brackets indicate the minimum and maximum values of the metric across the tested range (e.g., [63:100] shows the range of exit probabilities observed across different bat densities).

      The term “process flag” has been replaced with “with and without multi-call clustering” for clarity

      Both the table layout and caption have been revised to reduce visual clutter and to make these conventions clearer to the reader. 

      Lines 562-3: "In our study, due to the dense cave environment, the bats are found to operate in the approach phase nearly all of the time, which is consistent with natural cave emergence behavior" - bats are 'found to' implies there is some experimental data or it is an emergent property. See above for the point questioing the implementation of multiple echolocation phases in the model, but also - here the bat-agents are allowed to show different phases and thus they do so -- it is a constraint of the implementation and not a result per se given the size of the cave and the number of bats involved...

      We removed the sentence from the Methods section, since it could be misinterpreted as an experimental finding rather than a model outcome. Instead, we now discuss this in the Discussion, clarifying that the predominance of the approach phase arises from the cluttered cave environment in our simulations, which is consistent with natural emergence behavior (see lines 355-363). In this context, the use of echolocation phases is presented as a biologically plausible modeling choice rather than an empirical result.

      Lines 659-660: The parametrisation between DoA and SNR is supposedly found in 'Equation 10' - which this reviewer could not find in the manuscript

      The equation was accidentally omitted in the previous revision and has now been reinserted into the manuscript. It defines how direction-of-arrival (DoA) error depends on SNR and azimuth angle (see lines 603-605).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The key discovery of the manuscript is that the authors found that genetically wild type females descended from Khdc3 mutants shows abnormal gene expression relating to hepatic metabolism, which persist over multiple generations and pass through both female and male lineages. They also find dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with Khdc3 mutant ancestry. These data provide solid evidence further support that phenotype can be transmitted to multiple generations without altering DNA sequence, supporting the involvement of epigenetic mechanisms. The authors further performed exploratory studies on the small RNA profiles in the oocytes of Khdc3-null females, and their wild type descendants, suggesting that altered small RNA expression could be a contributor of the observed phenotype transmission, although this has not been functionally validated.

      Reviewer #2 (Public review):

      Summary:

      This manuscript aimed to investigate the non-genetic impact of KHDC3 mutation on the liver metabolism. To do that they analyzed the female liver transcriptome of genetically wild type mice descended from female ancestors with a mutation in the Khdc3 gene. They found that genetically wild type females descended from Khdc3 mutants have hepatic transcriptional dysregulation which persist over multiple generations in the progenies descended from female ancestors with a mutation in the Khdc3 gene. This transcriptomic deregulation was associated with dysregulation of hepatically-metabolized molecules in the blood of these wild type mice with female mutational ancestry. Furthermore, to determine whether small non-coding RNA could be involved in the maternal non-genetic transmission of the hepatic transcriptomic deregulation, they performed small RNA-seq of oocytes from Khdc3-/- mice and genetically wild type female mice descended from female ancestors with a Khdc3 mutation and claimed that oocytes of wild type female offspring from Khdc3-null females has dysregulation of multiple small RNAs.

      Finally, they claimed that their data demonstrates that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes.

      However, at this stage and considering the information provided in the paper, I think that these conclusions are too preliminary. Indeed, several controls/experiments need to be added to reach those conclusions.

      Additional context you think would help readers interpret or understand the significance of the work

      Line 25: this first sentence is very strong and needs to be documented in the introduction.

      Line 48: Reference 5 is not appropriate since the paper shows the remodeling of small RNA during post-testicular maturation of mammalian sperm and their sensibility to environment. Please, change it

      Line 51: "implies" is too strong and should be replaced by « suggests »

      Line 67: reference is missing

      Database, the accession numbers are lacking.

      References showing the maternal transmission of non-genetically inherited phenotypes in mice via small RNA need to be added

      Line 378: All RNA-Seq and small RNA-Seq data are available in the NCBI GEO

      We have changed references as requested, and updated portions of the introduction in order to mention specifically genes that seem to regulate an RNA-based genetic nurture effect.  We are not aware of any published work that has demonstrated maternal transmission of non-genetic phenotypes via small RNAs; if the reviewer has a specific reference in mind, we would be happy to read it and add it to our manuscript.  We did add a few sentences describing why this work has primarily been performed in males/fathers.

      Reviewer #1 (Recommendations for the authors):

      (1) In addition to the altered hepatic gene expression and metabolites, did the authors notice any overall phenotypes? including body weight, overall growth, eating behavior, etc?

      We have added information on more general phenotypes of the mice, including litter size, birth weights, and weights at 3 and 8 weeks of age.  We have also performed a metabolic analysis of WT****** mice at 8 months of age.  Overall, there are no striking differences in the WT* mice in these broad phenotypic measures, and also no indication that a smaller litter size or larger birthweight are the drivers of our observed hepatic abnormalities.

      (2) When analyzing the small RNAs, the authors mentioned that they have mapped the reads aging rRNAs. This should have resulted in the identifications of many rRNA-derived small RNAs (rsRNAs). The authors should also perform analyses on the differential expression of rsRNAs in this context. Both tsRNAs and rsRNAs has been shown to be involved in epigenetic inheritance (at least in sperm) (Nat Cell Biol 2018, PMID: 29695786).

      In the oocyte small RNA data, we did not notice many differences in either piRNAs or rRNAs between either the WT and KO oocytes, or the WT and WT** oocytes.  The most significant differences by far were in miRNA and tsRNA.  We have added that we do not see any differences in rRNAs.

      Reviewer #2 (Recommendations for the authors):

      To support your conclusion, you should include the following Data/experiments:

      (1) In the abstract, you wrote "Our results demonstrate that ancestral mutation in Khdc3 can produce transgenerational inherited phenotypes". The full phenotypic description of the phenotype (weight at birth, 3-weeks, 8weights old, phenotype of the liver...) of each progeny should carefully described/analyzed.

      Female KHDC3-deficient mice showed reduced fertility with smaller litter. Given the fact that litter size influences early growth and adult physiology (DOI: 10.1016/j.cmet.2020.07.014), all the metabolic effects observed in the paper could be the result of the litter size. Information about the litter size should be provided. Without this information, it is difficult to evaluate the non-genetic impact of KHDC3 mutation on the metabolism of the progenies.

      We have added information on more general phenotypes of the mice, including litter size, birth weights, and weights at 3 and 8 weeks of age (Figure 3). We have also performed a metabolic analysis of WT****** mice at 8 months of age.  Overall, there were no striking differences in the WT* mice in these broad phenotypic measures, and also no indication that a smaller litter size or larger birthweight are the drivers of our observed hepatic abnormalities.

      We have also added a new figure in order to examine the mechanism of transmission of our observed transcriptional abnormalities (Figure 5).  By transferring serum from WT* mice into wild type recipients, we observe alterations to hepatic gene expression, suggesting that serum-based molecules are driving the altered non-genetic factors in the oocyte.  This lends further support to the conclusion that the observed changes in WT* mice are from inherited germ cell abnormalities (informed by somatic metabolic abnormalities and communicated via blood), and not a consequence of litter sizes or growth rates.

      (2) In addition to the lack of phenotypic information of the progenies, the DEG for the small RNA-seq should be filtered on padj(FDR)<0.05 and not on pvalue<0.05. In Figure 4a, the legend is missing.

      We did not alter the filtering on the small RNA-Seq data.  We are not focusing on any specific small RNA, rather we are stating that these groups (miRNA, tsRNA) of small RNAs are dysregulated; accordingly we believe that using pval is not inappropriate in this circumstance.  The analysis was performed similarly to 4 cell embryo RNA-Seq performed by Harris et al, Cell Reports (PMID 38573852).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the encoding of forelimb movement parameters using a reach-to-grasp task in mice. The authors use a modified version of the water-reaching paradigm developed by Galinanes and Huber. Two-photon calcium imaging was then performed with GCaMP6f to measure activity across both the contralateral caudal forelimb area (CFA) and the forelimb portion of primary somatosensory cortex (fS1) as mice perform the reaching behavior. Established methods were used to extract the activity of imaged neurons in layer 2/3, including methods for deconvolving the calcium indicator's response function from fluorescence time series. Video-based limb tracking was performed to track the positions of several sites on the forelimb during reaching and extract numerous low-level (joint angle) and high-level (reach direction) parameters. The authors find substantial encoding of parameters for both the proximal and distal parts of the limb across both CFA and fS1, with individual neurons showing heterogeneous parameter encoding. Limb movement can be decoded similarly well from both CFA and fS1, though CFA activity enables decoding of reach direction earlier and for a more extended duration than fS1 activity. Collectively, these results indicate involvement of a broadly distributed sensorimotor region in mouse cortex in determining low-level features of limb movement during reach-to-grasp.

      Strengths:

      The technical approach is of very high quality. In particular, the decoding methods are well designed and rigorous. The use of partial correlations to distinguish correlation between cortical activity and either proximal or distal limb parameters or either low- or high-level movement parameters was very nice. The limb tracking was also of extremely high quality, and critical here to revealing the richness of distal limb movement during task performance.

      The task itself also reflects an important extension of the original work by Galinanes and Huber. The demonstration of a clear, trackable grasp component in a paradigm where mice will perform hundreds of trials per day expands the experimental opportunities for the field. This is an exciting development.

      The findings here are important and the support for them is solid. The work represents an important step forward toward understanding the cortical origins of limb control signals. One can imagine numerous extensions of this work to address basic questions that have not been reachable in other model systems.

      Collectively, these strengths made this manuscript a pleasure to read and review.

      Thank you!

      Weaknesses:

      In the last section of the results, the authors purport to examine the representation of "higher-level target-related signals," using the decoding of reach direction. While I think the authors are careful in their phrasing here, I think they should be more explicit about what these signals could be reflecting. The "signals" here that are used to decode direction could relate to anything - low-level signals related to limb or postural muscles, or true high-level commands that dictate only what movement downstream motor centers should execute, rather than the muscle commands that dictate how. One could imagine using a partial correlation-type approach again here to extract a signal uncorrelated with all the measured low-level parameters, but there would still be all the unmeasured ones. Again, I think it is still ok to call these "high-level signals," but I think some explicit discussion of what these signals could reflect is necessary.

      Thank you for this excellent suggestion. We have followed both pieces of the reviewer’s advice. First, we performed the suggested analysis, partialing off the kinematics then performing target classification on the residuals. This is now Figure 6S1. The analysis revealed the presence of target-related information in the neural activity after subtracting off all linear correlations with kinematics, supporting our claims that higher-level information is present in both populations. The exact timing of classifier performances varied substantially across mice, potentially due to differences in reach-to-grasp strategy, kinematic tracking fidelity, and exact spatial locations of each recorded FOV. Following the second suggestion, we have made the relevant text more careful. We now conclude simply that higher-level signals, meaning those signals that are largely unrelated to forelimb joint angle kinematics, are present but with variable timing and strengths in each area. That text now reads:

      “Target decoding performance could result from truly higher-level signals that code abstractly for target location, or alternatively could be supported by strong encoding of kinematic variables that differed between targets. To disambiguate these possibilities, we refit the linear classifier to neural data after regressing off variance related to the joint angle kinematics. The strength and exact time course of the resulting target decoding varied somewhat across animals, but the earliest portion of target decoding performance persisted in all animals after the removal of kinematics and performance remained stronger for M1-fl than S1-fl (Fig. 6S1B). We thus conclude that higher-level signals are present in both areas, but differ in their exact timing and strength. However, we note that other possible signals, such as postural changes, could not be controlled for here.”

      Related to this, I think the manuscript in general does not do an adequate job of explicitly raising the important caveats in interpreting parametric correlations in motor system signals, like those raised by Todorov, 2000. The authors do an expert job of handling the correlations, using PCA to extract uncorrelated components and using the partial correlation approach. However, more clarity about the range of possible signal types the recorded activity could reflect seems necessary.

      This is an important point, and our text could have unintentionally misled readers. We have now attempted to make this point explicit in the Discussion and in the Results for Figure 6. This Discussion text now reads:

      “Moreover, as is widely known (Todorov 2000), the exact role of these kinematically-related signals is challenging to determine from correlative measures alone; thus, determining whether these signals are used for direct movement control or instead indirectly reflect control performed elsewhere is left as a topic for future work.”

      The manuscript could also do a better job of clarifying relevant similarities and differences between the rodent and primate systems, especially given the claims about the rodent being a "first-class" system for examining the cellular and circuit basis of motor control, which I certainly agree with. Interspecies similarities and differences could be better addressed both in the Introduction, where results from both rodents and primates are intermixed (second paragraph), and in the Discussion, where more clarity on how results here agree and disagree with those from primates would be helpful. For example, the ratio of corticospinal projections targeting sensory and motor divisions of the spinal cord differs substantially between rodents and primates. As another example, the relatively high physical proximity between the typical neurons in mouse M1 and S1 compared to primates seems likely to yoke their activity together to a greater extent. There is also the relatively large extent of fS1 from which forelimb movements can be elicited through intracortical microstimulation at current levels similar to those for evoking movement from M1. All of these seem relevant in the context of findings that activity in mouse M1 and S1 are similar.

      We understand two points to address here. The first point is that we needed to be more careful to attribute previous results as being from the rodent vs. monkey. We agree. We have now revised several parts of the paper to make these distinctions clearer. The second point is about the potential benefit of a thorough review of the many ways in which primate and rodent sensorimotor systems differ. We entirely agree that this could be useful for the field. However, this is a sizable endeavor and doing it full justice is beyond what we know how to fit in the space allotted for framing our results here. We therefore sought a compromise, acknowledging how our results correspond to existing results in the primate without exhaustively accounting for how they differ. Future work will be necessary to more carefully disambiguate whether species-specific differences are due to biomechanical, neurological, ethological, or as-of-yet undetermined sources. We have incorporated your final specific points about what could produce similar information in M1 and S1 into the Discussion.

      “This may simply be a consequence of widely distributed representations of movement across mouse cortex (Musall et al. 2019; Steinmetz et al. 2019; Stringer et al. 2019), including forelimb somatosensory areas, or may be a consequence of the close physical proximity of M1-fl and S1-fl hindering development of functionally distinct representations (Tennant et al. 2011).”

      In addition, there are a number of other issues related to the interpretation of findings here that are not adequately addressed. These are described in the Recommendations for improvement.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Grier, Salimian, and Kaufman characterize the relationship between the activity of neurons in sensorimotor cortex and forelimb kinematics in mice performing a reach-to-grasp task. First, they train animals to reach to two cued targets to retrieve water reward, measure limb motion with high resolution, and characterize the stereotyped kinematics of the shoulder, elbow, wrist, and digits. Next, they find that inactivation of the caudal forelimb motor area severely impairs coordination of the limb and prevents successful performance of the task. They then use calcium imaging to measure the activity of neurons in motor and somatosensory cortex, and demonstrate that fine details of limb kinematics can be decoded with high fidelity from this activity. Finally, they show reach direction (left vs right target) can be decoded earlier in the trial from motor than from somatosensory cortex.

      Strengths:

      In my opinion, this manuscript is technically outstanding and really sets a new bar for motor systems neurophysiology in the mouse. The writing and figures are clear, and the claims are supported by the data. This study is timely, as there has been a recent trend towards recording large numbers of neurons across the brain in relatively uncontrolled tasks and inferring a widespread but coarse encoding of high-level task variables. The central finding here, that sensorimotor cortical activity reflects fine details of forelimb movement, argues against the resurgent idea of cortical equipotentiality, and in favor of a high degree of specificity in the responses of individual neurons and of the specialization of cortical areas.

      Thank you!

      Weaknesses:

      It would be helpful for the authors to be more explicit about which models of mouse cortical function their results support or rule out, and how their findings break new conceptual ground.

      We appreciate this feedback and have attempted to make these details clearer through changes to the Introduction and Discussion. One key change is noted below:

      “The presence of detailed kinematic signals in the sensorimotor cortex supports a model of mouse sensorimotor cortex in which M1-fl and S1-fl play a strong role in shaping the fine details of reaching and grasping movements.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      In addition to the weaknesses noted above, I suggest the authors also address the following:

      The last results section is generally lacking in statistical support for claims. Statistical support should be added.

      Thank you for pointing this out, we have added more statistical support to this section.

      The consideration in the Discussion of relevant previous findings and potential explanations for the distal limb signals in mouse sensorimotor cortex is somewhat lacking. There are several specific issues:

      (1) In contrast to the present study, the studies cited in regards to a lack of motor cortical involvement did not involve dexterous movements - in fact, Kawai et al. explicitly engineered a task that did not involve dexterity to distinguish the role of motor cortex in learning from its known role in dextrous movement execution. In Kawai et al., the authors note one rat who adopted a more dexterous approach to the lever pressing task; in this rat, a motor cortical lesion did cause a longer-lasting reduction in task performance. In additional experiments reported in Kawai's PhD thesis, performance of a dextrous task does erode with motor cortex lesion, as seen in other studies, like the early rodent reaching work of Whishaw and colleagues.

      (2) Other possible explanations for the persistence of non-dexterous tasks following motor cortical removal are compensation by, or redundant functionality in, other motor system regions.

      (3) It is also worth noting that stimulation in different regions of mouse M1 and S1 evokes alternately, digit, wrist, and elbow movements in fairly similar proportions (Tennant, 2011), suggesting that descending pathways substantially target spinal circuits that control all forelimb joints.

      (4) It also seems relevant that although the recovery time course is longer, nonhuman primates also retain substantial hand control after motor cortical removal (e.g. Lashley, 1925; Glees and Cole, 1950; Passingham et al., 1983). Humans of course, appear to be a different story.

      These are good points. We have tried to make the Discussion better reflect the tension in the literature, including with this new text:

      “However, several other previous results have indirectly suggested that M1 and S1 may be involved in the details of forelimb movement. Performance suffers with inactivation or lesioning of M1 and S1 in skilled, complex manual behaviors (Guo et al 2015, Mizes et al 2024, Whishaw et al 1990) or idiosyncratic use of digits to accomplish non-dexterous tasks (Kawai 2014). The sparing of non-dexterous tasks with these lesions may also reflect redundancy in control as opposed to irrelevance of M1 and S1. Nevertheless, our finding of low-level kinematic information in sensorimotor cortex supports a role for cortex beyond simply providing redundant high-level commands to these subcortical areas.”

      We have avoided mentioning points 3 and 4 in the paper; the stimulation results might follow from activating projections not normally involved in this behavior, and discussing primates in this context would require a long list of caveats. We agree that these points are worth thinking about, but are concerned that they are too circumstantial to include in interpreting the results formally.

      Although similar decoding performance is achieved using neurons from both CFA and fS1, I am left wondering whether you would do substantially better with CFA using activity at additional preceding time points, or when using exclusively time points from the past. The primary model used here appears to use neural signals from corresponding time points to decode limb parameters, but results seemingly could be different when using preceding time points as regressors.

      We appreciate this suggestion and have added the analysis to an additional supplementary panel for Figure 5 (Figure 5S3). Incorporating lags into the decoder via a Wiener filter does indeed improve the decoding performance, but this could simply be due to the increase in the number of predictor variables. This analysis did not, however, further disambiguate M1-fl and S1-fl: the performance improvement was similar across areas for both causal and acausal lag configurations. This could be a consequence of the time resolution of calcium imaging, so further experiments with electrophysiology would be required to rule this possibility out. We now note this new result:

      “Including additional causal (-100 ms preceding) and/or acausal (-100 ms preceding to 100 following) lags improved decoding performance modestly and similarly for both areas (Fig. 5S3E-F).”

      Related to this, I am also worried about the bleeding of signals across time here. If you deconvolve and interpolate between time points, the interpolation seemingly will pull information into the past, up to half the sampling period, which here is on the order of how long it takes signals to travel to and from the limb. The authors do not make any inappropriate claims about the neural signals here reflecting causes or consequences of what is happening at the limb, but readers (like me) will still try to draw these sorts of conclusions. Is it possible that, although decoding from instantaneous signals is similar for the two regions, the M1 signals are actually motor signals related to future limb state while the S1 signals are sensory consequences? Even if many of the relevant details related to conduction times are not known, perhaps the authors could clarify what can and can't be said related to causal interpretation here.

      Thank you for suggesting further explanation here. We agree that our interpretation could be made more specific. We have added text in the Discussion section to speak more directly to what can and cannot be concluded from our analyses. In short, it is hard to be certain of lags in calcium imaging data for many reasons, and using recording methods with finer temporal resolution (like electrophysiology) will be necessary for determining the precise temporal relationships between kinematics and neural activity. In the absence of these recordings, we limit our claim to kinematic information being present in M1-fl and S1-fl neural activity and leave determining the causal role of this information to future work.

      New clarifying text in the Discussion:

      “The use of calcium imaging further prevents strong conclusions about whether activity reflects future limb states or sensory consequences. Confirming this limitation, inclusion of lagged data in the decoding models, whether causal or acausal, resulted in similar performance changes in both areas.”

      An alternative reason why lift onset is less decodable in CFA is that CFA activates substantially before lift onset, as has been observed in previous rodent studies (Kargo and Nitz, 2004; Miri et al., 2017; Veuthey et al., 2020), perhaps as some sort of movement preparation. S1, on the other hand, may not have this early activity, and so may show a clearer transient at onset when the hand and limb start to move. This seems more likely than the explanations provided by the authors.

      This is a valid possible alternative explanation and we have updated the Discussion to reflect this. This difference in the structure of M1-fl activity versus S1-fl is apparent in the projections of Figure 6A, which show M1-fl projections more clearly aligned to cue-onset than S1-fl projections.

      “Our lift time decoding results are consistent with this view and align with recent observations characterizing mouse proprioceptive forelimb cortex, (Alonso et al 2023), although an alternative explanation may be simply that M1-fl activates earlier than S1-fl during reaching (Kargo and Nitz 2004; Miri et al 2017; Veuthey et al 2020).”

      To better clarify relevant similarities and differences between the rodent and primate systems, the Introduction could include some of these similarities and differences exposed by the literature currently cited, and the Discussion could include an additional paragraph specifically relating findings here to previous observations in the primate.

      We appreciate the reviewer’s thoughtfulness on possible framings of our results. When writing this paper, framing was a major challenge for us and we drafted quite a few versions of the Introduction including some that focused more on mouse-primate comparison. In the end, we decided the most critical function of the Intro was to set up our central question, of “levels-of-sensorimotor-control”. The rich primate literature was valuable here, but getting into a protracted compare-and-contrast exercise quickly became a distraction from the point. Further, we sought to highlight the relevance and importance of the question answered in our work as the mouse has gained prominence for filling gaps that are challenging to address with primates. This paper serves as one of many early steps towards the ultimate goal of revealing general properties of sensorimotor cortical function with the mouse model. We have made some subtle changes to the Introduction that we hope will more clearly communicate this narrative. 

      We agree that a Discussion paragraph directly relating our results to those in primates would benefit our conclusions and have added one:

      “These results expand our understanding of the rodent sensorimotor system and highlight similarities to nonhuman primates. We show here evidence in mice of detailed joint angle kinematic signals from the full forelimb in M1 and S1, as has been shown in macaque cortex during tasks involving reaching and grasping objects (Vargas-Irwin et al. 2010; Saleh et al. 2010, 2012; Goodman et al. 2019; Okorokova et al. 2020). Additionally, the earlier onset of movement-related activity in M1-fl compared to S1-fl is similar to macaque M1 and S1 (Tanji and Evarts 1976). Taken together these results suggest that the mouse can be employed to address questions traditionally explored in primates about how cortical activity encodes detailed movement commands.”

      Although this is outside the scope of the present study, it would be interesting to image descending projection neurons to see what signals are conveyed downstream, and to what targets. Some signals observed in layer 2/3 may not be strongly reflected in descending projections.

      We agree that recording from descending projection neurons in this task would be of deep interest – and also agree that these experiments are beyond the scope of the present study. We look forward to performing these additional experiments in future work.

      Minor:

      (1) The use of "CFA" and “fS1” is a bit confusing. S1, like M1, is defined primarily based on histological criteria, while CFA is defined by intracortical microstimulation. CFA contains a substantial fraction of fS1, seemingly most of it based on the maps shown in Tennant et al., 2011. This is not really a criticism, as the field has not reached any sort of consensus on this nomenclature yet.

      We are similarly unhappy with the inconsistency of the terminology in the field, and struggled with how not to make it worse.  After much debate and consultation with colleagues, we decided to use “M1” and “S1” to evoke the century of literature on these areas; and “-fl” to indicate forelimb because it is more intuitive than “-ul” and avoids using the illegible “-ll” for hindlimb (relevant to our subsequent paper). For what we called M1-fl, we recorded where we did because anecdotally we saw similar responses across that swath; but note that this definition is also consistent with the definition of “MOp-ul” found with multimodal mapping by

      Munoz-Castaneda (2021), which extends a little anteriorly of MOp as defined by the Allen CCF. As the field continues to mature, we hope future work can converge on a set of shared terms.

      (2) Page 4: "Inactivations and lesions of M1 and S1 have shown that M1 is required for the execution of dexterous reach-to-grasp movements" - to me, earlier work from Whishaw and colleagues deserves to be cited here.

      We appreciate the suggestion and have updated the references in this section to better reflect the prior work from Whishaw and other researchers.

      (3) Page 5: "evoking sufficient trial-to-trial variability to avoid model overfitting." - what I think the authors are referring to here is a particular kind of "overfitting," the consequence of not exploring the full movement space, as opposed to model overfitting from issues with the model-fitting method itself. Rather than just saying overfitting, the authors could be clearer about what they are referring to.

      The reviewer is right; the phenomenon we intended to refer to is not properly termed overfitting. Specifically, we meant that data with restricted range does not necessarily express global structure, and models can therefore incorrectly fit them. For example, fitting a linear model to data including many periods of a sine wave will correctly show a zero-slope linear component, but fitting to only a portion of a single cycle will typically yield a nonzero slope. This is not overfitting, is not exactly underfitting (because the relevant structure is barely present in the data, as opposed to missed by an insufficiently powerful model), is not bias (the data are fit well), and is not even necessarily a problem (the local relationship may be what you are interested in). Yet, it does not reflect the larger structure of the data.

      We do not know of a standard term for this phenomenon, so instead of dragging the reader through this tangential argument, we have tried to offer a simpler motivation for using multiple targets:

      “Assessing the relationship between neural activity and the details of movement requires striking a balance between achieving repeatable behavior and evoking sufficient trial-to-trial variability to broadly sample movement space”.

      (4) Page 5: Caudal Forelimb Area should not be capitalized.

      Obviated with the change in area nomenclature.

      (5) Page 7: "of linearly independent degrees of freedom" - for a neuroscience audience, I think it is better to explicitly mention that the resulting PCs are uncorrelated.

      We agree that this section could benefit from clarification. We have attempted to provide additional nuance to indicate what the analysis was intended to test.

      “Despite the strong coupling between the proximal and distal joint angles, rich variation remained in the action of different joints over time. The presence of strong correlations across joints suggested that the kinematics may be well described by a smaller number of independent degrees of freedom than the total number of recorded angles. To assess the number of linearly independent (uncorrelated) degrees of freedom amongst the 24 joint angles and velocities, we used double-cross-validated PCA (Yu et al. 2009); Methods; Fig. 3D), finding intermediate dimensionalities of 7 (median for joint angles) and 10 (velocities; Fig. 3E). This is consistent with the idea that joint angles across the limb are coordinated instead of controlled independently, and that this coordination is flexible enough over time to enable accurately performing reaching and grasping to different targets.”

      (6) Page 7: In the Results, the authors should mention what indicator is being used, the imaging frame rate, and summarize briefly how cells were defined.

      Thank you for the suggestion, these details have been added to the relevant results section for clarity.

      “To do so, we recorded neural activity from neurons in layer 2/3 M1-fl extending into the immediately adjacent secondary motor cortex (M2), and the forelimb region of S1 (S1-fl) using two-photon calcium imaging of GCaMP6f-expressing neurons in layer 2/3 (185-230 μm deep, imaged at 31 Hz, cells extracted with Suite2p (Pachitariu et al 2017)).”

      (7) Page 7: "corrected at n=2" - n doesn't typically refer to the number of tests, so for clarity I would say "corrected for dual tests."

      Thank you for pointing this out, we have corrected the text and added additional explanation in the methods for our approach to determining statistical significance across the targets and locking events.

      “P-values obtained through the ZETA were then Bonferroni corrected for dual tests when measuring the number of cells modulated to a given event and corrected for six tests (2 targets and 3 events) when measuring the overall number of modulated cells.”

      (8) Page 7: In the Results, when the decoding is introduced, it would be helpful to have a few details without having to hunt through the Methods. For example, were things regularized, how was cross-validation handled, etc?

      Thank you for the suggestion, these details have been added to the relevant results section for clarity.

      A simple linear regression model related the single-trial joint angles at all time points to single-trial neural activity at the corresponding moments. The model was fit with ridge regression, the ridge penalty was determined via a heuristic (Karabatsos 2018), and performance was measured on held-out trials (80/20 train/test split, 50 folds).

      (9) Page 8: I think it is worth noting how much mouse reaching involves shoulder rotation as opposed to movement in other joints, as this seems very different from primates.

      Thank you for pointing this out. We think this is mostly a task difference: our mice were in a quadrupedal stance, whereas monkeys are typically asked to reach from a sitting position. We now mention this in the Results. 

      “Reaching evoked particularly large rotation of the shoulder, likely because the mice reached from a quadrupedal position to targets on either side of the snout.”

      (10) Page 8: Should provide quantification to clarify what is meant by "closely tracked."

      We have updated the text to indicate that this claim was meant to be qualitative, and to more clearly highlight that the interest here is the first demonstration of the ability to reconstruct valid forelimb postures from decoded joint angles in the mouse. Quantifying the reconstruction properly would require substantially more manual data labeling, and the successful decoding itself demonstrates indirectly that the reconstructions are good enough to obtain the results of interest.

      Additionally, we reconstructed the skeletal representation of the forelimb from the decoded joint angles and found that, as intended, the reconstructed postures had strong qualitative resemblance to the true postures, even of “minor” angles like cylindrical paw deformation or digit splay (Fig. 5C,G).

      (11) Page 8: "Overall, these results suggest that instantaneous movement-related signals are similarly distributed across CFA and fS1." - I know we are being succinct here, but this sentence sounds like a non sequitur in the context of this paragraph - perhaps include a conclusion from the results in this paragraph first, then summarize the whole section.

      Thank you for the suggestion, we have updated this text to more clearly conclude the results of this section.

      Overall, these results reveal that neural activity in M1-fl and S1-fl is closely related to the kinematic details of reach-to-grasp movements. The ability to decode substantial variance in proximal and distal joints suggests that this relationship extends to the entire forelimb and the similar performance obtained from each area suggests that this information is similarly distributed across M1-fl and S1-fl. 

      (12) Page 10: Mention of projections from fS1 does not explicitly specify their preferential targeting of the dorsal horn, which seems relevant.

      We appreciate the suggestion and have added this detail to the text.

      Rodent S1-fl is known to influence interneuron populations in the spinal cord through direct and indirect projections that predominantly target the dorsal horn (Ueno et al. 2018), thus these signals may also reflect S1-fl’s important role in modulating reflex circuits to coordinate sensory feedback with movement generation (Moreno-López et al. 2016; Moreno-Lopez et al. 2021; Seki et al. 2003).

      (13) Page 31: Labels on the figure indicating what blue and red stand for would be helpful.

      Thank you for the suggestion, labels have been added to indicate left and right trials for Figure 5 C/F and Figure 6A.

      (14) Page 32: Legend does not include panel D.

      Thank you for catching this, the corresponding caption has been added.

      Reviewer #2 (Recommendations for the authors):

      (1) The Introduction could perhaps set the central question in starker relief. What specifically do the authors mean by high- vs low-level control? As suggested by the cited studies, this has been a fraught issue in primate work for decades, and I think a finer-grained framing of alternative hypotheses would help set up the results. For example, would better performance at decoding joint angles than paw position be evidence for lower-level control? The clarity of the Introduction might also be improved if the facts and unknowns were broken down by species throughout.

      We have tried to further improve the focus of the Introduction on the central question, clarify what we mean, and make clearer in the review of the literature which species a finding comes from.

      The clarifying text from the introduction is quoted below:

      Extensive motor mapping experiments in rodents have revealed that activating different parts of the sensorimotor cortex evokes movements of different body parts or different kinds of movements of the same body part, as it does in primates (for review, see (Harrison and Murphy 2014)). Yet it is unclear how the topography of stimulation-evoked movements relates to the roles of these areas during volitional actions. Perturbations during behavioral tasks in mice involving forelimb lever or reaching movements have provided a coarse-level understanding of how these areas contribute during behavior. Inactivations and lesions of M1 and S1 have shown that M1 is required for the execution of dexterous reach-to-grasp movements (Guo et al. 2015; Sauerbrei et al. 2020; Galiñanes et al. 2018; Wang et al. 2017; Whishaw et al. 1991; Whishaw 2000) and that S1 is essential for adapting learned movements to external perturbations of a joystick (Mathis et al. 2017). However, spinal cord projections from mouse M1 and S1 primarily target spinal interneurons rather than directly synapsing onto motor neurons (Gu et al. 2017; Ueno et al. 2018; Wang et al. 2017), suggesting cortical activity might play a more modulatory role. Further, stimulation of brainstem nuclei alone can evoke naturalistic forelimb actions, including realistic reaching movements involving coordinated flexion and extension of the proximal and distal limb (Esposito et al. 2014; Ruder et al. 2021; Yang et al. 2023). Taken together, these results have raised the question of what role mouse M1 and S1 play in the control of goal-directed forelimb movements. 

      One route to answering this question involves characterizing the signals present in mouse M1 and S1 during movement. If mouse M1 and S1 were to control only high-level aspects of forelimb movements, activity should be dominated by ‘abstract’ signals like target location and reflect little trial-to-trial variability in reach kinematics. If instead M1 and S1 control low-level movement features then activity should correlate strongly with forelimb joint angle kinematics and their trial-to-trial variation when reaching to different targets. While the presence of high- or low-level signals in a cortical area does not necessarily imply that they are causally responsible for these aspects of movement, characterizing what signals are present serves as a first step toward determining how these areas relate to movement.

      (2) The kinematics and calcium traces appear to be highly stereotyped across trials. If the population encodes joint angles, would one expect to find correlations between the neural and kinematic residuals after subtraction of the time-varying means? Some additional analysis and/or discussion on this point would be helpful, especially as there are only two targets.

      This is a great idea. As suggested, we implemented regression models on the residuals for each target in the new Figure 5S3. Figure 5S3 A and B show the performance when decoding the residuals for right trials and C and D show performance for left trials. Decoding remained well above chance, despite shrinking down due to predicting this relatively small within-target variation. This analysis supports our claims from the main regression models in Figure 5 and 5S1-2, and also suggests that movements ipsilateral to the reaching limb (contralateral to the recording hemisphere) may be better encoded than movements contralateral to the reaching limb. We have added a reference to this additional residual analysis in the final paragraph of the decoding section of the Results section:

      “Finally, we tested whether the ability to decode these many joint angles was a direct consequence of inter-joint correlations, and might not be indicative of the presence of “real” information about some of these joints. To do so, we fit partial correlation models that removed correlations between proximal and distal joints, or removed correlations of the joint angles with a high-level parameter – the overall distance of the paw centroid to the spout. Despite substantially lowering the behavioral variance, in each case the residuals could still be decoded from neural activity (Fig 5S2A-D). Similar decoding performance for M1-fl and S1-fl was obtained from models fit to decode single-trial residuals separately for left and right trials (Fig 5S3A-D), indicating that trial-to-trial variations on each basic movement were decodable from these populations.”

      Along similar lines, binary classification is used to characterize cue-, lift-, and contact-responsive neurons. Is it possible to exploit trial-to-trial variation in the cue-lift and lift-contact latencies to extract the time-varying marginal effects of each event (e.g., using a GLM)?

      For the detection of single-cell modulations by different events, we have elected to retain our simple statistical test to determine modulation; in our experience, encoding models typically involve a surprising number of steps to get them to do what you actually intend. We leave more extensive encoding model-style analysis to future work, currently in progress.

      (3) The authors mention prior studies suggesting that the control of some forelimb tasks can be gradually transferred from the cortex to the subcortical centers. Have they performed the inactivation at different time points across learning, and if so, do they have evidence for a diminishing effect over time (e.g., blocking of both initiation and coordination early in training)? In addition, the effects of motor cortex inactivation are similar to, but slightly different from, effects shown in reaching tasks in prior studies. Some additional discussion on this point would be useful.

      Our inactivation experiments in this study were intended to coarsely demonstrate the involvement of mouse forelimb sensorimotor cortex in our task. We have not performed the inactivations over learning and leave such experiments to future work. 

      We agree that a little more clarity relating our results to previous ones was warranted. Previous studies (Guo et al. 2015 and Galinanes et al. 2018) have demonstrated inactivation impacts on similar tasks, but for thoroughness we sought to show the same for our task as it varied from the pellet and motorized water spout tasks in both training time and target configurations. Our results are strongly in line with those of Galinanes et al. 2018 which used a fairly similar water spout target configuration. In the inactivation experiments of that paper, 3 out of 13 animals with initiation-triggered inactivations were able to initiate reaching within a time window similar to control trials. Additionally, a proportion of trials across multiple mice proceeded with little perturbation from the inactivations. This is consistent with our observation that M1-fl inactivations may either abolish movement initiation or allow movement initiation but impair task completion on a trial-by-trial and animal-to-animal basis. Further work is required to determine what factors influence these differential responses to inactivation and to determine how these effects differ across task variations (i.e., pellet vs water spout). We have added a brief description of these nuances to the text for clarity. 

      “These inactivations blocked the execution of the reach to grasp sequence, preventing the animal from making contact with the spout during the 3-second laser stimulation period (Fig. 1F; 86.5% control trials with contact within 3 seconds of cue, 5.1% inactivation trials with contact, P < 10<sup>-191</sup>, Mann-Whitney U test, 2 mice, 495 stimulation trials). Interestingly, inactivation at the time of cue often did not prevent reach initiation (mouse 1: 54.7%, mouse 2: 34.2% of inactivation trials with lift within 3 seconds; 93.5%, 86.2% control trials). Yet the movement stalled once the paw and digits extended towards the spout, producing uncoordinated and unsuccessful reaching trajectories (Fig. 1I, two representative datasets). Taken together, these results support the involvement of M1-fl in the water-reaching task and suggest that the strength of inactivation effects may depend on specific task details like training time or target configuration (c.f. Galinanes et al. 2018).”

      Minor points

      (1) The rationale for the multiple comparisons procedure in identifying event-locked responses should be explained in more detail. If I understand correctly, the authors are not correcting for comparisons across ROIs, but instead control the family-wise error rate across brain regions and event types (dividing alpha by two or six). Why not instead control the false discovery rate across ROIs? 

      Thank you for pointing this out, it was confusing as written and we received a similar comment from Reviewer 1. We have fixed the wording now to make it clearer why we did this. We simply aimed to describe how many of the recorded neurons in each area were modulated by the task as a proxy for the engagement of these areas during the behavior, and to use this measure of modulation as a criterion for including the neuron in subsequent analysis. In other words, if the question had been “are any neurons in this area modulated by the task?” then correcting for the number of ROIs would be the correct method; but if the question is, “is this neuron probably modulated and therefore worth including in my decoder?” correcting for the number of ROIs will typically be much too conservative. Thus, we only sought to correct for the false discovery rate across events and targets for each ROI. We have added additional text in the methods to clarify these choices, below. Please also see response to (7) from Reviewer 1 above.

      “Note that we did not correct for the number of ROIs tested for two reasons. First, the goal of this testing was to serve as a criterion for inclusion in subsequent decoding analyses, not to determine whether any neurons in the area at all were modulated; and second, correcting for the number of ROIs would bias comparison between areas if different numbers of ROIs were recorded in one area vs. the other.”

      (2) It appears joint angles are treated as linear variables in the decoding analysis; is this correct? This seems reasonable as long as the range of motion is not too large, but the authors might briefly comment on the issue in the Methods. 

      Yes, all joint angles are treated as linear variables in the linear regression model. We observed empirically (as can be seen in Figure 3B and Figure 5B/F) that the joint angle variables were relatively constrained to specific ranges during the task, with no angles displaying substantial wrap-around during the reaching and grasping movements. It is true that use of nonlinear decoding would almost surely improve performance further. Future work could also compare decoding of joint angles with muscle forces, which correlate and which we made no effort to distinguish here. In this work, though, the demonstration of a substantial relationship between neural activity and kinematics already tells us that fine details of movement are present in the M1 and S1-fl populations, which is a critical fact to understand these areas and was not previously known. We now comment explicitly on this, as suggested.

      “Joint angle or velocity kinematics were linearly interpolated from their original 6.66 ms to 10 ms and smoothed with a Gaussian (15 ms s.d.). These angular variables were then treated linearly in decoding analyses as their ranges were relatively constrained during the reaching and grasping movements; although the true relationships are likely nonlinear, this serves as a sufficient approximation to demonstrate the presence of a relationship between neural activity and kinematics.”

      (3) Are the limb pose estimates mirrored along the mediolateral axis? Figures 1C and 2D appear to show reaches to the left spout on the animal's right.

      Thank you for pointing out the ambiguity in the display of these data. The reach trajectories were not mirrored along the mediolateral axis, but they are displayed from the perspective of the behavioral imaging cameras as shown in Figure 1A. Thus the right target reaches (ipsilateral to the animal’s reaching arm) are on the left side of the camera image and the left target reaches (contralateral to the animal’s reaching arm) are on the right side of the image. We have clarified this in the figure captions.

    1. Author response:

      The following is the authors’ response to the previous reviews

      General recommendations (from the Reviewing Editor):

      The reviewers agreed that addressing some specific concerns would improve the clarity of the paper and the strength of the conclusions. These points are listed below, and described in more detail in the reviewer-specific 'Recommendations for Authors':

      We thanks the editor and reviewers for the encouraging feedback and constructive comments. We provide our point-by-point response below.

      (1) The details of the new experiment including number of subjects and a description of the analysis should be provided in the main text.

      We now provide a detailed description of the methods (including the number of subjects; N = 30) and analyses for the new experiment. See our response to Reviewer 2 for more details.

      (2) It would be informative to see how the amplitude biases observed, agree with those found by Gordon et al. 1994.

      Addressed. Please see our response to Reviewer 1, comment 1.

      (3) Each of the models lead to different bias patterns. It would be very helpful to hear the author's interpretation, ideally with a mathematical explanation, of what leads to these distinct patterns.

      Addressed. Please see our response to Reviewer 1, comment 2.

      Reviewer #1 (Recommendations for the authors):

      (1) Most of my points have been addressed convincingly in this revision. The new experiment in which also biases in movement amplitude were determined is a welcome addition to the paper. However, I could not see the results of this study, as the authors did not include Fig. 4 in the manuscript, but repeated Fig. 3. That's unfortunate as I would have like to see the similarity between the biases in direction and amplitude. Moreover, I would have liked to see how the amplitude biases agree with those found by Gordon et al. EBR (1994) 99:112-130, and to which extent Gordon et al.'s explanation can explain the pattern.

      We apologize for including the incorrect figure in the previous version of our manuscript. We did make a correction and submitted a corrected version, but it appears that it didn’t make its way to you. The correct Figure 4 is now in the manuscript.

      The motor biases in amplitude (extent) observed in Experiment 4 (Author response image 1) are qualitatively similar to the pattern reported by Gordon et al. 1994. While the exact peaks do not match perfectly, both datasets show a two-peaked pattern.

      Gordon et al. (1994) attributed the bias in amplitude to direction-dependent variation in movement speed which, in their view, arise from anisotropies in limb inertia. Specifically, moving the upper arm along its quasiorthogonal direction (i.e., rotation about the elbow) requires lower effective inertia than moving parallel to the upper-arm axis. Given the arm posture in both datasets, the upper limb points toward ~135°/315°, with the orthogonal direction corresponding to ~45°/225°. The two-peaked speed profiles in both our data Author response image 1 and Gordon et al. are consistent with this prediction.

      Author response image 1.

      Gordon et al (1994) noted that, while the extent bias function should mirror the speed bias function, the motor planning system might proactively compensate for the speed bias. Indeed, while the extent and speed bias functions are roughly aligned in their study, the two are misaligned in our Experiment 4. For example, the speed function peaks around 45° which corresponds to a valley in the extent bias function. The difference between their data and ours could be due to a difference in the starting point configuration. However, their model predicts alignment of the speed and extent functions independent of starting point configuration. In contrast, the TR+TG model does predict our observed extent bias function and yields predictions about how this should change with different start point configurations. As such, while heterogeneity in movement speed may contribute to extent bias to some degree, we think the transformation bias and visual-target bias likely play a larger role in determining the amplitude bias observed extent bias at movement endpoint.

      We have added a discussion section about the bias function reported by Gordon et al. (1994) and their account in the manuscript (lines 482-493). We do not repeat it here, as the content largely overlaps with the response above.

      (2) One of the most important new insights from this study is that the three single-source models lead to different bias patterns, with 1, 2 or 4 peaks. However, what I miss in the paper is an intuitive explanation why they do so. Now, the models are described and their predictions are shown, but it remains unclear where these distinct patterns come from. As scientists, we want to understand things, so I would very much appreciate if the authors can provide such an intuitive explanation, for instance using a mathematical proof. That could also identify how general these patterns are, or if there are certain requirements for them to occur (such as a certain shape of the transformation bias).

      Note that the closed-form mathematical expression for the motor bias function is not straight forward. As such, the intuition comes primarily from inspection, that is, the model simulations themselves, what we show Figure 1 of the paper. Importantly, the model predictions are insensitive to the parameter values over a reasonable range. Thus, the number of peaks predicted by each model is a core distinguishing feature. We present in the Supplementary Results a formalized mathematical analysis to illustrate how different models produce different numbers of peaks in the movement-bias function.

      (3) I think it's a good idea to change the previous "Visual Bias" into a "Target Bias". This raises the question whether the "Prioprioceptive Bias" should not be changed into a "Hand Bias" or "Start Bias"?

      While we appreciate the reviewer’s point here, we prefer the term “Proprioceptive Bias” given that this term has been used in the literature and provides a contrast with sources of bias arising from vision. “Hand Bias” and "Start Bias” seem more ambiguous.

      L51: I think "would fall short" should be replaced by "would overshoot".

      L127: I think "biased toward the vertical axis" should be replaced by "biased away from the vertical axis". Figure 3 still contains the old terminology like T+V. Please replace by the new terminology. L255: Replace "Exp 1a" by "Exp 1b".

      L376: Replace 60 by 6.

      L831-2: I hope the summed LL was maximized, not minimized.

      Thanks for catching the typos. We have corrected all of them.

      Reviewer #2 (Recommendations for the authors):

      I think that Experiment 4 does not mention how many participants performed the study. (Only in the response to the reviewers I found this)

      We have added information regarding the number of participants in the Fig 4 (N=30).

      I am very happy that the authors added the biomechanical simulation into the paper. I am not convinced that this addressed my concerns exactly but it is an excellent addition and the authors have now adjusted the text appropriately.

      We appreciate the positive response to our additional assessment of biomechanical factors. We welcome any additional information on how we might fully address this issue.

      line 826: extend -> extent

      Corrected.

      Figure 4. I think that the authors have put the wrong figure here. I cannot see any data for extent. I would need to see this figure (or please correct me - but the caption doesn't match the figure and I don't see the results clearly. (I think the review might have the correct figure).

      We apologize for this mistake. We now provided the correct Figure 4 in the paper (also included in the first page of the response letter).

      I am missing the detailed description on when the direction error and distance error were calculated for exp 4 - and what exactly was used? How did the authors examine the values without correction? What time point was used? Did I miss the analysis section for this?

      Participants were instructed to make fast, straight movement without any corrections and were given up to 1 s to complete the movement. Hand position was recorded once the movement speed dropped below 1 cm/s. On 99.8% of trials, movement speed did not increase once this threshold was passed, indicating that the participants adhered to the instructions. On the remaining trials, we detected a secondary corrective movement (increase in speed >5 cm/s). On these trials, we used the position recorded when the movement speed initially dropped below 1 cm/s as the endpoint position. The pattern of results would be the same were we to exclude these trials.

      This information has been added to the Methods section (line 661-666).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      SOM+ interneurons such as Martinotti cells target the apical tufts of pyramidals in the cortex. Since interneurons in general are strongly implicated in mediating rhythmic population activity over a range of timescales, it is quite appropriate to study the consequence of rhythmic inhibition provided by SOM+ interneurons for synaptic integration, including the phenomenon of dendritic spikes. However, using conclusions from a singular study (ref 22) to identify the beta band as the rhythm mediated by SOM+ is not very accurate. SOM+ interneurons have been implicated in regulating rhythms centered just below 30 Hz (refs 22, 21). It is a range that lies in the grey zone of the traditional definition of beta and gamma. However, it is significantly higher than the 16 Hz rhythms explored in this study. It thus remains unknown how a 25-30 Hz rhythmic inhibition (that has an experimentally suggested role for dendrite targeting SOM+ INs) in apical tufts regulates dendritic spikes.

      We agree with the reviewer that the rhythms arising from SOM+ interneurons can extend their frequencies higher than the 16 Hz analyzed in this study. To address this, we have conducted a new set of simulations where we delivered distal dendritic inhibition across a range of frequencies, from 0.5 to 80 Hz (see new Results section “Frequency specific effects of rhythmic inhibition on neuronal integration”). These results revealed, surprisingly, that at 30 Hz their ability to entrain Ca<sup>2+</sup> and NMDA spikes degrades (but not Na<sup>+</sup> spikes). This suggests that beta rhythms in the 20-30 Hz range are operating at the highest frequency for which dendritically targeting inhibition will be effective. The implications are covered in the Discussion section “Interaction with microcircuitry”. They are:

      “Particularly in the visual cortex, SOM interneurons can generate a rhythm in the 25-30 Hz range [22]. We found this to be at the upper end of the frequency range for dendritic inhibitory rhythms to be effective in modulating NMDA and Ca<sup>2+</sup> spikes. If this rhythm solely recruited SOM interneurons, its effectiveness would be marginal. Potentially compensating for this, recent work has found that PV interneurons also participate in beta/low-gamma [23, 24] (but see [21, 22]). In our model, on its own when beta rhythmic inhibition was delivered perisomatically we found that it was less able to entrain spiking and had an overall hyperpolarizing effect. However, if delivered in conjunction with the distal dendritic inhibition arising from SOM interneurons, this may strengthen entrainment.”

      Distal dendritic inhibition has been previously shown to be more effective in controlling dendritic spikes. However, given the slow timescale of dendritic spikes, it can be hypothesized that high-frequency rhythmic inhibition would be ineffective in entraining the dendritic spikes either in distal or proximal location, as demonstrated by 4H and 5F, and vice versa. A computational study can take this further by exploring the robustness of this hypothesis. By sticking to a single-frequency definition of what constitutes Gamma (64 Hz) and Beta (16 Hz) inhibition, the current exploration does support the core hypothesis. However, given the temporal dynamics of dendritic spikes, it is valuable to learn, for example, the upper bound of "Beta" range (13-30Hz) inhibition that fails to phasically modulate them. In addition to the reason stated in the earlier paragraph, Alpha band activity (8-12 Hz), has been implicated (e.g. van Kerkoerle, 2014) in signaling of inter-areal feedback to the superficial layer in the cortex, potentially targeting apical tufts of pyramidals from multiple layers and resulting in alpha-range rhythmic inhibition. To make the findings significant, it might therefore be more pertinent to understand the consequences of ~10Hz rhythmic inhibition (in addition to the ~25-30 Hz Beta/Gamma) in the apical tufts for phasic modulation of dendritic spikes.

      We added an additional set of simulations that address this in the Results section ‘Frequency specific effects of rhythmic inhibition on neuronal integration’. In general, we found that dendritic and perisomatic inhibitory rhythms at lower frequencies could entrain AP generation, but with less functional specialization. This is explored in our Discussion section ‘Interneuron specializations and rhythm timescales’.

      The differential effect of Gamma and Beta range inhibition on basal and apical excitatory clusters is not convincing from the information provided. The basal cluster appears to overlap with perisomatic inhibitory synapses. The description in the methods does not have enough information to negate the visual perception (ln 979-81). With this understanding, it is not surprising that the correlation between excitation and APs is high (during the trough of gamma) for basal and not apical excitation. A more comparable scenario would be a more distal location of the basal excitatory cluster.

      While we stated in the original manuscript that we were contrasting ‘basal’ vs. ‘apical’ clustered inputs, this terminology did not reflect our intent with these analyses. We meant to contrast proximal vs. distal dendritic clustered synaptic inputs, which the reviewer correctly noted is confounded in the apical vs. basal comparison. We have rewritten these results, their discussion, and corresponding figure, to clearly state that we are contrasting proximal vs. distal synaptic input.

      Reviewer #2:

      The weaknesses are probably in some of the parameterizations of inhibitory synaptic dynamics. A unitary peak conductance of 1nS is very high for inhibitory synapses. This high value could invariably skew some of the network-level predictions. The authors could obtain specific parameters from the Neocortical Collaboration Portal (https://bbp.epfl.ch/nmcportal/microcircuit.html), which is an incredible resource for cortical neurons and synapses.

      We appreciate the valuable resource mentioned by the reviewer and will consult it when constructing future models. Regarding the present one, our choice of peak conductance was based on previous studies, namely:

      Egger R, Narayanan RT, Guest JM, Bast A, Udvary D, Messore LF, Das S, de Kock CPJ, Oberlaender M (2020) Cortical output is gated by horizontally projecting neurons in the deep layers. Neuron 105, 122-137.e128.

      and

      Xiang Z, Huguenard JR, Prince DA (2002) Synaptic inhibition of pyramidal cells evoked by different interneuronal subtypes in layer v of rat visual cortex. J Neurophysiol 88, 740-750.

      The study by Egger et al. used an inhibitory peak conductance of 1 nS and was simulating circuitry very similar to ours. We validated these synapses in pilot simulations that sought to characterize the resulting IPSPs and IPSCs, and whose results can be seen in Table 1 of our methods. These synapses exhibited IPSCs whose peak amplitudes ranged over values (~24162 pA) that agreed with the experimental literature, such as Xiang et al.

      Given this, we feel our parameterization of inhibitory synapses does not warrant any changes.

      Reviewer #3:

      What disappointed me a bit was the lack of a concise summary of what we learned beyond the fact that beta and gamma act differently on dendritic integration. The individual paragraphs of the discussion often are 80% summary of existing theories and only a single vague statement about how the results in this study relate. I think a summarizing schematic or similar would help immensely.

      We agree with the reviewer that a summary schematic would help the reader. This has been added to the manuscript as Figure 11. It demonstrates the principal findings of the paper and is referenced in the opening paragraph of the discussion section.

      Orthogonal to that, there were some points where the authors could have offered more depth on specific features. For example, the authors summarized that their "results suggest that the timescales of these rhythms align with the specialized impacts of SOM and PV interneurons on neuronal integration". Here they could go deeper and try to explain why SOM impact is specialized at slower time scales. (I think their results provide enough for a speculative outlook.)

      This discussion has been expanded under the section “Interneuron specializations and rhythm timescales”. The added text is:

      “So, while our results suggest that spatial targeting of SOM and PV interneurons aligns with the timescales of their network-level rhythms, it could also be that their timing and subcellular localization interact to produce specialized neuron-level functions [85]. For instance, NMDA and Ca<sup>2+</sup> spikes in the distal dendrites last for ~50 ms, making the slower beta rhythm more appropriate for bidirectionally controlling them. Both can be described as dynamical systems with distinct phases with differing sensitivity to inhibition. Ca<sup>2+</sup> spikes are dynamical events comprised of an initiation, plateau, and termination phase. Inhibition delivered during the plateau phase shortens their duration [86]. If the beta rhythm is comprised of cycling between periods of elevated excitation (increased NMDA spike generation) followed by elevated inhibition, then Ca<sup>2+</sup> spike initiation will tend to occur during the excitatory phase, and its plateau during the subsequent inhibitory phase. A plateau during the inhibitory phase will more quickly enter termination. This is bidirectional control. On the other hand, slower rhythms (e.g. 1 Hz) initiate Ca<sup>2+</sup> spikes during the excitatory phase that plateau and enter termination autonomously, before the inhibitory phase is reached. The same principle holds for NMDA spikes [87]. As a result, rhythms in the range from 15-30 Hz are optimal for synchronizing the onsets and offsets of dendritic spikes across a population of neurons.

      The integrative effects of gamma (>40 Hz) are also specialized. Low frequency inhibitory rhythms delivered to the soma tended to shift the membrane potential higher or lower with the rhythm’s phase, effectively bringing it closer or farther from AP generation but not changing the neuron’s sensitivity to fast synaptic inputs. In the gamma frequency range, this is reversed, with the mean membrane potential not varying with rhythm phase but with a shifting bias to positive or negative membrane potential fluctuations. In addition, the trough phase of gamma lowers the threshold for AP generation, while slower rhythms like beta only raise the threshold. Consequently, the timing of gamma is ideal for increasing the sensitivity of the neuron to rapid excitation. This agrees with the observation that gamma oscillations accompany rapid excitation-inhibition balancing [88].”

      We also extended our discussion section ‘Relevance to coding’ to explore how beta and gamma rhythms can support sparse vs. dense population coding, respectively. It reads:

      “One interpretation of rhythms arising from local inhibitory feedback is that they maintain the balance between excitation and inhibition. This can be thought of as a normalization operation that maintains activity within a set range. Normalization can be achieved either through a subtractive effect that raises the threshold for initiating an action potential, or a multiplicative effect that lowers the slope of the relationship between excitation and action potential firing rate. When considered at the population level, these normalization effects impact coding in different ways. Subtractive normalization increases sparsity by dropping out neurons whose excitation is below the raised threshold. Multiplicative normalization, however, encourages dense codes by scaling down firing rates and compressing the range of firing rates. This study found that while both perisomatic and distal dendritic inhibition produced subtractive effects, only perisomatic had a multiplicative effect. Tying this to beta and gamma, beta rhythms may encourage sparse population codes while gamma allows for dense.”

      Beyond that, the authors invite the community to reappraise the role of gamma and beta in coding. This idea seems to be hindered by the fact that I cannot find a mention of a release of the model used in this work. The base pyramidal cell model is of course available from the original study, but it would be helpful for follow-up work to release the complete setup including excitatory and inhibitory synapses and their activation in the different simulation paradigms used. As well as code related to that.

      We have added a Code and Data Availability section that addresses this. It reads: “Simulation code is deposited at ModelDB athttps://modeldb.science/2019883 . The raw simulation data are available from DBH upon request. Analysis code is posted as a github repo at https://github.com/dbheadley/InhibOnDendComp.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The Drosophila wing disc is an epithelial tissue, the study of which has provided many insights into the genetic regulation of organ patterning and growth. One fundamental aspect of wing development is the positioning of the wing primordia, which occurs at the confluence of two developmental boundaries, the anterior-posterior and the dorsal-ventral. The dorsal-ventral boundary is determined by the domain of expression of the gene apterous, which is set early in the development of the wing disc. For this reason, the regulation of apterous expression is a fundamental aspect of wing formation.

      In this manuscript, the authors used state-of-the-art genomic engineering and a bottom-up approach to analyze the contribution of a 463 base pair fragment of apterous regulatory DNA. They find compelling evidence about the inner structure of this regulatory DNA and the upstream transcription factors that likely bind to this DNA to regulate apterous early expression in the Drosophila wing disc.

      Strengths:

      This manuscript has several strengths concerning both the experimental techniques used to address the problem of gene regulation and the relevance of the subject. To identify the mode of operation of the 463 bp enhancer, the authors use a balanced combination of different experimental approaches. First, they use bioinformatic analysis (sequence conservation and identification of transcription factors binding sites) to identify individual modules within the 463 bp enhancer. Second, they identify the functional modules through genetic analysis by generating Drosophila strains with individual deletions. Each deletion is characterized by looking at the resulting adult phenotype and also by monitoring apterous expression in the mutant wing discs. They then use a clever method to interfere in a more dynamic manner with the function of the enhancer, by directing the expression of catalytically inactive Cas9 to specific regions of this DNA. Finally, they recur to a more classical genetic approach to uncover the relevance of candidate transcription factors, some of them previously known and others suggested by the bioinformatic analysis of the 463 bp sequence. This workflow is clearly reflected in the manuscript, and constitutes a great example of how to proceed experimentally in the analysis of regulatory DNA.

      We thank the reviewer for these positive comments on the manuscript.

      Weaknesses:

      There are several caveats with the data that might be constructed as weaknesses, some of them are intrinsic to this detailed analysis or to the experimental difficulties of dealing with the wing disc in its earliest stages, and others are more conceptual and are offered here in case the authors may wish to consider them.

      (1) The primordium of the wing region of the wing imaginal disc is defined by the expression of the gen vestigial, which is regulated by inputs coming from the dorsal-ventral boundary (Notch and wg) and from the anterior-posterior boundary (Dpp). Having such a principal role in wing primordium specification and expansion, I am surprised that this manuscript does not mention this gene in the main text and only contains indirect references to it. I consider that the manuscript would have benefited a lot by including vestigial in the analysis, at least as a marker of early wing primordium. This might allow us to visualize directly the positioning of the primordium in the apterous mutants generated in this study, adding more verisimilitude to the interpretations that place this domain based on indirect evidence.

      Vg does indeed play a critical role on the formation of the wing disc, and it is an ideal marker for the identification of the wing pouch. In the updated version of the article, we have now followed the expression of vg in some of the OR463 mutants via immunostaining of the Vg protein (Supplementary Figure 6). Cells within posterior wing outgrowths in Δm1flies were invariably positive for Vg. This result further supports our previous identification of these cells as pouch cells. In those mutants in which no cross-over between DV and AP was observed, vg expression was severely reduced or absent, indicating that the wing pouch had not been specified. We thank the reviewer for this experimental idea, which we believe strengthens the final manuscript.

      We have added to the text:

      “To identify the nature of the posterior outgrowths, we performed anti-Vestigal (Vg) antibody staining of Δm1 mutants (Supplementary Figure 6). Vg is a key regulator of wing specifications and also participates in wing growth and patterning (Baena-Lopez & García-Bellido, 2006; Kim et al., 1996; Zecca & Struhl, 2007a). In those discs, in which the stripe was extended and the P compartment was enlarged, Vg was detected throughout the outgrowth, supporting the wing pouch identity of this region (Supplementary Figure 6B). Hemizygous Δm3 mutants presented a highly reduced anti-Vg signal, which suggests that no wing pouch is specified in these mutants (Supplementary Figure 6C).”

      (2) The authors place some emphasis on the idea that their work addresses possible coordination between setting the D/V boundary and the A/P boundary:

      Abstract: "Thus, the correct establishment of ap expression pattern with respect to en must be tightly controlled", "...challenging the mechanism by which apE miss-regulation leads to AP defects." "Detailed mutational analyses using CRISPR/Cas revealed a role of apE in positioning the DV boundary with respect to the AP boundary"

      Introduction: "However, little is known about how the expression pattern of ap is set up with respect that of en. In other words, how is the DV boundary positioned with respect to the AP boundary?"

      "How such interaction between ap and the AP specification program arises is unknown."

      Results: "Some of these phenotypes are reminiscent of those reported for apBlot (Whittle, 1979) and point towards a yet undescribed crosstalk between ap early expression and the AP specification program."

      At the same time, they express the notion, with which this reviewer agrees, that all defects observed in A/P patterning arising as a result of apterous miss-regulation are due to the fact that in their mutants, apterous expression is lost mainly in the posterior dorsal compartment, bringing novel confrontations between the A/P and the D/V boundaries.

      To me, the key point is why the expression of apterous in different mutants of the OR463 enhancer affects only the posterior compartment. This should be discussed because it is far from obvious that apterous expression has different regulatory requirements in the anterior and posterior compartments.

      We agree with the reviewer that the differential effect of the mutations on the expression of ap in the A and P compartment is a key factor underlying our explanation of how the phenotypes arise. To clarify this point, we have now extended our first discussion point. Moreover, we have included some other references of differential enhancer regulation in different wing disc compartments. In addition, we have discussed whether this effect has to do with the different regulation of the enhancer in the A and P compartment or due to regulation of downstream effectors.

      Added paragraph:

      “Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      (3) The description of gene expression in the wing disc of novel apterous mutants is only carried out in late third instar discs (Figs. 2, 3, 5, and 7). This is understandable given the technical difficulties of dealing with early discs, as those shown in the analysis of candidate apterous regulatory transcription factors (Fig. 4F, Fig. 6 C-D). However, because the effects of the mutants on apterous expression are expected to occur much earlier than the time of expression analysis, this fact should be discussed.

      We agree with the reviewer regarding the limitations of our analysis whenever we analyzed third instar larvae to assess the expression of the OE463 enhancer. We have included a statement in which this is mentioned in the discussion:

      “It is important to acknowledge that all expression analyses were conducted in third-instar discs, a stage that follows the initial establishment of ap expression. Earlier effects are therefore inferred rather than directly observed, as imaging and staging of early discs present significant technical challenges due to their small size and fragility. A direct observation of the early wing disc across mutant conditions would likely help to clarify the role of the discovered factors during early ap expression.”

      Reviewer #2 (Public Review):

      In their manuscript, "Transcriptional control of compartmental boundary positioning during Drosophila wing development," Aguilar and colleagues do an exceptional job of exploring how tissue axes are established across Drosophila development. The authors perform a series of functional perturbations using mutational analyses at the native locus of apterous (ap), and perform tissue-specific enhancer disruption via dCas9 expression. This innovative approach allowed them to explore the spatio-temporal requirements of an apterous enhancer. Combining these techniques allowed the authors to explore the molecular basis of apterous expression, connecting the genotypes to the phenotypical effects of enhancer perturbations. To me, this paper was a beautiful example of what can be done using modern drosophila genetics to understand classic questions in developmental biology and transcriptional regulation.

      In sum, this was a rigorous paper bridging scales from the molecular to phenotypes, with new insight into how enhancers control compartmental boundary positioning during Drosophila wing development.

      We would like to thank the reviewer for its positive and encouraging comments, as well as for the careful review of the manuscript and figures. We have adapted most of the suggestions in the new manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, authors use the Drosophila wing as a model system and combine state-ofthe-art genetic engineering to identify and validate the molecular players mediating the activity of one of the cis-regulatory enhancers of the apterous gene involved in the regulation of its expression domain in the dorsal compartment of the wing primordium during larval development.

      (1) The authors raise two very important questions in the Introduction: (1) who is locating the relative position of the AP and DV boundaries in the developing wing, and (2) who is responsible for the maintenance of the apterous expression domain late in larval development. None of these two questions have been responded to and, indeed, the summary of the work (as stated in the conclusions of the last paragraph of the Introduction) does not resolve any of these questions.

      We believe the results presented, together with those added during the revision, shed some on the positioning of the boundary. We proposed that the combined integration of four TFs by the OR463 enhancer is fundamental for the correct positioning. Additionally, we proposed a model on how these positioning problems result in the phenotypes observed (Supplementary figure 7, now also shown in Figure 2D). Our results indicate that ap expression in the PD quadrant is particularly sensitive to mutations in the enhancer, which we have now further elaborated on in the first part of the discussion. Together, we believe that our results do tackle the first problem posed in the introduction, while not completely solving them. As for the second question, we have tried to remove any suggestions that this article tries to explain later regulation of apterous. Probably this misunderstanding arises from a sentence in the introduction which has now been deleted. The means of the maintenance of ap expression in later stages has been partially explored previously (See Bieli et al 2015) and it is subject of our current studies.

      (2) The authors have identified two different regions whose deletions give very interesting phenotypes in the adult wing (AP identify change & outgrowths, and loss of wing), and have bioinformatically identified and functionally verified 4 TFs that mediate the activity of these regions by their capacity to phenocopy the wing phenotype. While identification of the 2 TFs acting on the m1 is incremental with respect to previous work on the identification of the enhancer responsible for the early expression of Ap, identification of Antp and Grn does not explain the loss of function phenotype of the m3 enhancer. Does any of these results shed any light on the first two Qs? Do these results explain the compartment boundary position in the wing as stated in the title? Expression of lacZ reporter assays is fundamental to demonstrate their model of Figure 8. The reduction of the PD compartment is difficult to understand by the sole reduction in ap expression in this region (which has not been demonstrated).

      We agree that the identification of Antp and Grn does not by itself explain the loss-of-function phenotype of the m3 enhancer. However, these transcription factors represent the best current candidates for direct regulators for this enhancer. We have clarified in the text that Antp and Grn may not act as instructive inputs but rather play a permissive role in enabling ap expression through m3. Importantly, the dCas9-mediated perturbation experiments directly demonstrate that targeted manipulation of apE in this region is sufficient to produce the characteristic duplications, providing functional evidence that apE activity underlies the observed phenotypes. In addition, lacZ reporter assays confirm that apE expression is indeed affected in all cases where the experimental setup permitted detection. Together, these results validate that the observed morphological phenotypes stem from perturbation of apE activity and support the proposed model for enhancer regulation and its role in compartment boundary maintenance.

      (3) The authors state in one of the sections "Spatio-temporal analysis of apE via dCas9 ". No temporal manipulation of gene activity is shown. The authors should combine GAL4/UAs with the Gal80ts to demonstrate the temporal requirements of Antp/Grn and Pnt/Hth as depicted in their model of Figure 8.

      We agree with the reviewer that the temporal dimension was not explored in the first version of the manuscript (aside of the temporal constrains of en-Gal4 driver). As suggested by the reviewer, we have now used a tub-Gal80ts allele to temporally control the enhancer perturbation and delimit its window of activity. The results are included in two new panels in the figure 3 (H and H’). The new data agrees with the notion that apE enhancer is important up to L2 stages but dispensable later in development. We have added the following paragraph to the text:

      “To define the developmental time window during which the apE enhancer remains sensitive to repression, we combined the temperature-sensitive tub-Gal80<sup>ts</sup> system with temporally controlled expression of dCas9. Animals carrying the en-Gal4, tub-Gal80<sup>ts</sup>, UAS-dCas9 and U6-OR463gRNA(4x) transgenes were maintained at 18 °C to suppress dCas9 expression. Independent sets of embryos were then shifted to 29 °C at successive developmental intervals ranging from 0 to 168 h after egg laying (AEL), so that dCas9 induction occurred at distinct time points in development (Figure 3H). Under these conditions, dCas9 transcription was induced only after the temperature shift, while the gRNAs were expressed constitutively. Wing phenotypes were quantified in adult progeny as a readout of apE enhancer perturbation. When dCas9 was expressed from embryonic or early larval stages (0–48 h AEL), nearly all wings (70–90%) displayed severe ap-like phenotypes, including posterior compartment duplication and loss of anterior–posterior boundary integrity. Shifting animals later (48–72 h AEL) still produced a majority (~66%) of abnormal wings, whereas induction after 72 h AEL resulted in progressively weaker effects and complete loss of phenotypes by 96 h AEL (Figure 3H’).

      These results delineate the developmental period during which apE activity is required for proper wing patterning. Perturbation during the first half of the second larval instar (≤ 96 h at 18 °C) was sufficient to elicit strong ap-like transformations, consistent with the enhancer being functionally required during early larval stages and becoming dispensable thereafter. The temporal decline in phenotype penetrance thus reflects the progressive loss of apE sensitivity to dCas9-mediated repression, providing a precise estimate of when its activity is no longer required for wing morphogenesis.”

      (4) The authors have not managed to explain the AP phenotype. Thus, this work opens many unresolved questions and does not resolve the title, which is a big overstatement. Thus, strengths (technically excellent), weakness (there is not much to learn about wing development and apterous regulation from these results besides the incremental identification of 4 additional TFs mediating the regulation of ap expression by their ability to phenocopy regulatory mutations of the apterous gene).

      As mentioned in response to reviewer 1, we have indeed no concrete explanation  for why the P compartment seems more sensitive to mutations. We have now further discussed this point (see below paragraph, now included in  the discussion). As for how the adult phenotypes arise from the mutant wing discs, we have a good idea (see Supplementary figure 7 and Figure 2). 

      We are pleased to hear that the reviewer considers our article technically valuable. Therefore, we have reformulated the title such as the technical merits play a bigger role in it:

      ”in situ mutational screening and CRISPR interference demonstrate that the apterous Early enhancer is required for developmental boundary positioning”

      Paragraph added to the discussion:

      " Although apE is active throughout the dorsal compartment, its disruption leads to a preferential loss of ap expression in posterior cells. The asymmetric effect of apE perturbation on the anterior and posterior compartments suggests that apE transcriptional control is not equivalent across the A/P axis. Compartment-dependent differences in enhancer regulation have also been documented in other developmental contexts; for example, the Distal-less DMX-R element is interpreted through distinct cofactor combinations (Sloppy paired anteriorly and Engrailed posteriorly) (Gebelein et al., 2004), and specific mutations within DMX-R preferentially disrupt enhancer function in anterior versus posterior cells. It is possible that apE is more sensitive to misregulation due to differential transcriptional regulation across compartments. Nevertheless, we cannot exclude the possibility that the posterior bias we observe arises not from enhancer logic per se, but from intrinsic differences in tissue architecture or the dynamics of boundary positioning during wing disc development.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Formatting of references should be checked throughout the manuscript

      Reviewer #2 (Recommendations For The Authors):

      Here, I note a few points that would help clarify the manuscript and connect it with a broader community.

      Figure 1: it could help the reader to add the landing site genetic scheme to the main figure.

      In a first draft that was exactly the original configuration, but after comparing both versions we determined that the presence of the landing site removes a bit of the focus of the phenotypes.

      Figure 1: what species were used for the conservation alignment? Further details would be nice to add here.

      We have now added a section of bioinformatical analysis, which was missing in the original manuscript:

      Sequence conservation of the OR463 fragment within the ap upstream intergenic region was analysed across different dipteran species using the “Cons 124 Insects” multiple-alignment track of the D. melanogaster dm6 genome on the UCSC Genome Browser (Kent et al., 2002, https://genome.ucsc.edu). Conservation scores were obtained from the phastCons (Siepel et al., 2005) and used to delineate conserved and less conserved blocks within OR463. Conserved transcription factor binding sites were predicted with MotEvo (Arnold et al., 2011), which defined four conserved modules (m1–m4) and six inter-modules (N1–N6). Additional motif analysis was performed using the JASPAR CORE Insecta database and the Target Explorer tool to cross-validate conserved binding-site predictions and refine motif assignments within the enhancer.

      From Figure 2: I would consider moving the model or portions of it to a main figure. These models, while descriptive, really help make the manuscript more approachable. Note that eLife does not have forced figure requirements.

      We have adapted the reviewer’s suggestion and we are very grateful for it. We think the figure has greatly improved. The final figure now highlights a small part of the model, which is still included in the Supplementary Figure.

      Figure 5: This figure is fantastic, and the results are particularly important. I would recommend increasing the weight of the arrows from D to E, making it more obvious. Did the authors consider any temperature or other perturbations to look at robustness? They mention "robustness" a few times, and this could be an excellent system to explore a bit further. For panels F and G, it would be nice to have a bit of biochemistry here to test the spacing requirements' effects on the distances (but it's great phenotypical data, regardless).

      We have chosen a darker grey to highlight the lines. 

      We appreciate the reviewer’s suggestions. With respect to robustness assays, such as temperature perturbations, we agree that the apE enhancer would be a suitable system for such experiments. However, these analyses would move the study beyond its current scope, which is focused on defining the regulatory logic of boundary positioning through mutational dissection and CRISPRi. We therefore prefer not to expand the work in this direction here, but we note that this would be an interesting avenue for future investigation.

      Similarly, biochemical assays probing spacing requirements would provide additional mechanistic insight but would represent a separate line of work. In this manuscript, we aimed to establish the functional consequences of motif spacing using in vivo genetic and phenotypic analyses, which we believe sufficiently support our conclusions.

      Thank you for the insight.

      Discussion: To the point "most point mutations or short deletions in enhancer regions have little effect on gene expression" I would push the authors to discuss their work in relation to Fuqua et al., (Nature 2020) and Kvon et al., (Cell 2020). Their work is consistent with enhancers being sensitive to mutations, and this warrants further discussion because it could be important for the transcription field.

      Hox genes as pioneer factors, I would recommend citing Loker et al., (Curr Biol 2021), as an example of Hox genes functioning as a pioneer factor.

      We thank the reviewer for this suggestion. We have now added a short paragraph in the Discussion noting how our observations may relate to the mutational patterns described in Fuqua et al. (2020) and Kvon et al. (2020), while keeping the interpretation tentative. The text now says:

      “Recent large-scale enhancer mutagenesis studies have shown that the mutational consequences within enhancers can vary widely. In some cases, many nucleotide positions appear tolerant to single-base changes and only a small subset of mutations produce clear functional effects (Kvon et al., 2020). In other enhancers, regulatory information is distributed more densely, and mutations at multiple positions can alter output (Fuqua et al., 2020). Together, these studies illustrate that enhancer sensitivity is not uniform but depends on enhancer-specific features such as motif organization, cooperativity, and redundancy. Within this broader landscape, the apE enhancer appears to represent a particularly sensitive case.”

      We also included a citation to Loker et al. (2021) in connection with the possible pioneer-like contribution of HOX input to apE.

      We would like to thank all reviewers for their effort.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      I read the paper by Parrotta et al with great interest. The authors are asking an interesting and important question regarding pain perception, which is derived from predictive processing accounts of brain function. They ask: If the brain indeed integrates information coming from within the body (interoceptive information) to comprise predictions about the expected incoming input and how to respond to it, could we provide false interoceptive information to modulate its predictions, and subsequently alter the perception of such input? To test this question, they use pain as the input and the sounds of heartbeats (falsified or accurate) as the interoceptive signal.

      Strengths:

      I found the question well-established, interesting, and important, with important implications and contributions for several fields, including neuroscience of prediction-perception, pain research, placebo research, and health psychology. The paper is well-written, the methods are adequate, and the findings largely support the hypothesis of the authors. The authors carried out a control experiment to rule out an alternative explanation of their finding, which was important.

      Weaknesses:

      I will list here one theoretical weakness or concern I had, and several methodological weaknesses.

      The theoretical concern regards what I see as a misalignment between a hypothesis and a result, which could influence our understanding of the manipulation of heartbeats, and its meaning: The authors indicate from prior literature and find in their own findings, that when preparing for an aversive incoming stimulus, heartbeats *decrease*. However, in their findings, manipulating the heartbeats that participants hear to be slower than their own prior to receiving a painful stimulus had *no effect* on participants' actual heartbeats, nor on their pain perceptions. What authors did find is that when listening to heartbeats that are *increased* in frequency - that was when their own heartbeats decreased (meaning they expected an aversive stimulus) and their pain perceptions increased.

      This is quite complex - but here is my concern: If the assumption is that the brain is collecting evidence from both outside and inside the body to prepare for an upcoming stimulus, and we know that *slowing down* of heartbeats predicts an aversive stimulus, why is it that participants responded in a change in pain perception and physiological response when listened to *increased heartbeats* and not decreased? My interpretation is that the manipulation did not fool the interoceptive signals that the brain collects, but rather the more conscious experience of participants, which may then have been translated to fear/preparation for the incoming stimulus. As the authors indicate in the discussion (lines 704-705), participants do not *know* that decreased heartbeats indicate upcoming aversive stimulus, and I would even argue the opposite - the common knowledge or intuitive response is to increase alertness when we hear increased heartbeats, like in horror films or similar scenarios. Therefore, the unfortunate conclusion is that what the authors assume is a manipulation of interoception - to me seems like a manipulation of participants' alertness or conscious experience of possible danger. I hope the (important) distinction between the two is clear enough because I find this issue of utmost importance for the point the paper is trying to make. If to summarize in one sentence - if it is decreased heartbeats that lead the brain to predict an approaching aversive input, and we assume the manipulation is altering the brain's interoceptive data collection, why isn't it responding to the decreased signal? --> My conclusion is, that this is not in fact a manipulation of interoception, unfortunately

      We thank the reviewer for their comment, which gives us the opportunity to clarify what we believe is a theoretical misunderstanding that we have not sufficiently made clear in the previous version of the manuscript. The reviewer suggests that a decreased heart rate itself might act as an internal cue for a forthcoming aversive stimulus, and questions why our manipulation of slower heartbeats then did not produce measurable effects.

      The central point is this: decreased heart rate is not a signal the brain uses to predict a threat, but is a consequence of the brain having already predicted the threat. This distinction is crucial. The well-known anticipatory decrease of heartrate serves an allostatic function: preparing the body in advance so that physiological responses to the actual stressor (such as an increase in sympathetic activation) do not overshoot. In other words, the deceleration is an output of the predictive model, not an input from which predictions are inferred. It would be maladaptive for the brain to predict threat through a decrease in heartrate, as this would then call for a further decrease, creating a potential runaway cycle.

      Instead, increased heart rate is a salient and evolutionarily conserved cue for arousal, threat, and pain. This association is reinforced both culturally - for example, through the use of accelerating heartbeats in films and media to signal urgency, as R1 mentions - and physiologically, as elevated heart rates reliably occur in response to actual (not anticipated) stressors. Decreased heartrates, in contrast, are reliably associated with the absence of stressors, for example during relaxation and before (and during) sleep. Thus, across various everyday experiences, increased (instead of decreased) heartrates are robustly associated with actual stressors, and there is no a priori reason to assume that the brain would treat decelerating heartrates as cue for threat. As we argued in previous work, “the relationship between the increase in cardiac activity and the anticipation of a threat may have emerged from participants’ first-hand experience of increased heart rates to actual, not anticipated, pain” (Parrotta et al., 2024). The changes in heart rate and pain perception that we hypothesize (and observe) are therefore fully in line with the prior literature on the anticipatory compensatory heartrate response (Bradley et al., 2008, 2005; Colloca et al., 2006; Lykken et al., 1972; Taggart et al., 1976; Tracy et al., 2017; Skora et al., 2022), as well as with Embodied Predictive Coding models (Barrett & Simmons, 2015; Pezzulo, 2014; Seth, 2013; Seth et al., 2012), which assume that our body is regulated through embodied simulations that anticipate likely bodily responses to upcoming events, thereby enabling anticipatory or allostatic regulation of physiological states (Barrett, 2017).

      We now add further explanation to this point to the Discussion (lines 740-758) and Introduction (lines 145-148; 154-156) of our manuscript to make this important point clearer.

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Barrett, L. F. (2017). The theory of constructed emotion: An active inference account of interoception and categorization. Social cognitive and affective neuroscience, 12(1), 1-23.

      Bradley, M. M., Moulder, B., & Lang, P. J. (2005). When good things go bad: The reflex physiology of defense. Psychological science, 16(6), 468-473.

      Bradley, M. M., Silakowski, T., & Lang, P. J. (2008). Fear of pain and defensive activation. PAIN®, 137(1), 156-163.

      Colloca, L., Petrovic, P., Wager, T. D., Ingvar, M., & Benedetti, F. (2010). How the number of learning trials affects placebo and nocebo responses. Pain®, 151(2), 430-439.

      Lykken, D., Macindoe, I., & Tellegen, A. (1972). Preception: Autonomic response to shock as a function of predictability in time and locus. Psychophysiology, 9(3), 318-333.

      Taggart, P., Hedworth-Whitty, R., Carruthers, M., & Gordon, P. D. (1976). Observations on electrocardiogram and plasma catecholamines during dental procedures: The forgotten vagus. British Medical Journal, 2(6039), 787-789.

      Tracy, L. M., Gibson, S. J., Georgiou-Karistianis, N., & Giummarra, M. J. (2017). Effects of explicit cueing and ambiguity on the anticipation and experience of a painful thermal stimulus. PloS One, 12(8), e0183650.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Pezzulo, G. (2014). Why do you fear the bogeyman? An embodied predictive coding model of perceptual inference. Cognitive, Affective & Behavioral Neuroscience, 14(3), 902-911.

      Seth, A., Suzuki, K., & Critchley, H. (2012). An Interoceptive Predictive Coding Model of Conscious Presence. Frontiers in Psychology, 2. https://www.frontiersin.org/articles/10.3389/fpsyg.2011.00395

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Skora, L. I., Livermore, J. J. A., & Roelofs, K. (2022). The functional role of cardiac activity in perception and action. Neuroscience & Biobehavioral Reviews, 104655.

      I will add that the control experiment - with an exteroceptive signal (knocking of wood) manipulated in a similar manner - could be seen as evidence of the fact that heartbeats are regarded as an interoceptive signal, and it is an important control experiment, however, to me it seems that what it is showing is the importance of human-relevant signals to pain prediction/perception, and not directly proves that it is considered interoceptive. For example, it could be experienced as a social cue of human anxiety/fear etc, and induce alertness.

      The reviewer asks us to consider whether our measured changes in pain response happen not because the brain treats the heartrate feedback in Experiment 1 as interoceptive stimulus, but because heartbeat sounds could have signalled threat on a more abstract, perhaps metacognitive or affective, level, in contrast to the less visceral control sounds in Experiment 2. We deem this highly unlikely for several reasons.

      First, as we point out in our response to Reviewer 3 (Point 3), if this were the case, the different sounds in both experiments should have induced overall (between-experiment) differences in pain perception and heart rate, induced by the (supposedly) generally more threatening heart beat sounds. However, when we added such comparisons, no such between-experiment differences were obtained (See Results Experiment 2, and Supplementary Materials, Cross-experiment analysis between-subjects model). Instead, we only find a significant interaction between experiment and feedback (faster, slower). Thus, it is not the heartbeat sounds per se that induce the measured changes to pain perception, but the modulation of their rate, and that identical changes to the rate of non-heartrate sounds produce no such effects. In other words, pain perception is sensitive to a change in heart rate feedback, as we predicted, instead of the overall presence of heartbeat sounds (as one would need to predict if heart beat sounds had more generally induced threat or stress).

      Second, one may suspect that it is precisely the acceleration of heartrate feedback that could act as cue to arousal, while accelerated exteroceptive feedback would not. However, if this were the case, one would need to predict a general heart rate increase with accelerated feedback, as this is the general physiological marker of increasing alertness and arousal (e.g. Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022). However, the data shows the opposite, with real heartrates decreasing when the heartrate feedback increases. This result is again fully in line with the predicted interoceptive consequences of accelerated heartrate feedback, which mandates an immediate autonomic regulation, especially when preparing for an anticipated stressor.

      Third, our view is further supported by neurophysiological evidence showing that heartbeat sounds, particularly under the belief they reflect one’s own body, are not processed merely as generic aversive or “human-relevant” signals. For instance, Vicentin et al. (2024) showed that simulated faster heartbeat sounds elicited stronger EEG alpha-band suppression, indicative of increased cortical activation  over frontocentral and right frontal areas, compatible with the localization of brain regions contributing to interoceptive processes (Kleint et al., 2015). Importantly, Kleint et al. also demonstrated via fMRI that heartbeat sounds, compared to acoustically matched tones, selectively activate bilateral anterior insula and frontal operculum, key hubs of the interoceptive network. This suggests that the semantic identity of the sound as a heartbeat is sufficient to elicit internal body representations, despite its exteroceptive nature. Further evidence comes from van Elk et al. (2014), who found that heartbeat sounds suppress the auditory N1 component, a neural marker of sensory attenuation typically associated with self-generated or predicted stimuli. The authors interpret this as evidence that the brain treats heartbeat sounds as internally predicted bodily signals, supporting interoceptive predictive coding accounts in which exteroceptive cues (i.e., auditory cardiac feedback) are integrated with visceral information to generate coherent internal body representations.

      Finally, it is worth noting that the manipulation of heartrate feedback in our study elicited measurable compensatory changes in participants’ actual heart rate. This is striking compared to our previous work (Parrotta et al., 2024), wherein we used a highly similar design as here, combined with a very strong threat manipulation. Specifically, we presented participants with highly salient threat cues (knives directed at an anatomical depiction of a heart), which predicted forthcoming pain with 100% validity (compared to flowers that did predict the absence of pain with 100%). In other words, these cues perfectly predicted actual pain, through highly visceral stimuli. Nevertheless, we found no measurable decrease in actual heartrate. From an abstract threat perspective, it is therefore striking that the much weaker manipulation of slightly increased or decreased heartrates we used here would induce such a change. The difference therefore suggests that what caused the response here is not due to an abstract feeling of threat, but because the brain indeed treated the increased heartrate feedback as an interoceptive signal for (stressor-induced) sympathetic activation, which would then be immediately down-regulated.

      Together, we hope you agree that these considerations make a strong case against a non-specific, arousal or alertness-related explanation of our data. We now make this point clearer in the new paragraph of the Discussion (Accounting for general unspecific contributionslines 796-830), and have added the relevant between experiment comparisons to the Results of Experiment 2.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Several additional, more methodological weaknesses include the very small number of trials per condition - the methods mention 18 test trials per participant for the 3 conditions, with varying pain intensities, which are later averaged (and whether this is appropriate is a different issue). This means 6 trials per condition, and only 2 trials per condition and pain intensity. I thought that this number could be increased, though it is not a huge concern of the paper. It is, however, needed to show some statistics about the distribution of responses, given the very small trial number (see recommendations for authors). The sample size is also rather small, on the verge of "just right" to meet the required sample size according to the authors' calculations.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Finally, and just as important, the data exists to analyze participants' physiological responses (ECG) after receiving the painful stimulus - this could support the authors' claims about the change in both subjective and objective responses to pain. It could also strengthen the physiological evidence, which is rather weak in terms of its effect. Nevertheless, this is missing from the paper.

      This is indeed an interesting point, and we agree that analyzing physiological responses such as ECG following the painful stimulus could offer additional insights into the objective correlates of pain. However, it is important to clarify that the experiment was not designed to investigate post-stimulus physiological responses. Our primary focus was on the anticipatory processes leading up to the pain event. Notably, in the time window immediately following the stimulus - when one might typically expect to observe physiological changes such as an increase in heart rate - participants were asked to provide subjective ratings of their nociceptive experience. It is therefore not a “clean” interval that would lend itself for measurement, especially as a substantial body of evidence indicates that one’s heart rate is strongly modulated by higher-order cognitive processes, including attentional control, executive functioning, decision-making and action itself (e.g., Forte et al., 2021a; Forte et al., 2021b; Luque-Casado et al., 2016).

      This limitation is particularly important as the induced change in pain ratings by our heart rate manipulation is substantially smaller than the changes in heart rate induced by actual pain (e.g., Loggia et al., 2011). To confirm this for our study, we simply estimated how much change in heart rate is produced by a change in actual stimulus intensity in the initial no feedback phase of our experiment. There, we find that a change between stimulus intensities 2 and 4 induces a NPS change of 32.95 and a heart rate acceleration response of 1.19 (difference in heart rate response relative to baseline, Colloca et al., 2006), d = .52, p < .001. The change of NPS induced by our implicit heart rate manipulation, however, is only a seventh of this (4.81 on the NPS). This means that the expected effect size of heart rate acceleration produced by our manipulation would only be d = .17. A power analysis, using GPower, reveals that a sample size of n = 266 would be required to detect such an effect, if it exists. Thus, while we agree that this is an exciting hypothesis to be tested, it requires a specifically designed study, and a much larger sample than was possible here.

      Colloca, L., Benedetti, F., & Pollo, A. (2006). Repeatability of autonomic responses to pain anticipation and pain stimulation. European Journal of Pain, 10(7), 659-665.

      Forte, G., Morelli, M., & Casagrande, M. (2021a). Heart rate variability and decision-making: Autonomic responses in making decisions. Brain sciences, 11(2), 243.

      Forte, G., Favieri, F., Oliha, E. O., Marotta, A., & Casagrande, M. (2021b). Anxiety and attentional processes: the role of resting heart rate variability. Brain sciences, 11(4), 480.

      Loggia, M. L., Juneau, M., & Bushnell, M. C. (2011). Autonomic responses to heat pain: Heart rate, skin conductance, and their relation to verbal ratings and stimulus intensity. PAIN®, 152(3), 592-598.

      Luque-Casado, A., Perales, J. C., Cárdenas, D., & Sanabria, D. (2016). Heart rate variability and cognitive processing: The autonomic response to task demands. Biological psychology, 113, 83-90

      I have several additional recommendations regarding data analysis (using an ANOVA rather than multiple t-tests, using raw normalized data rather than change scores, questioning the averaging across 3 pain intensities) - which I will detail in the "recommendations for authors" section.

      We provide detailed responses to these points in the “Recommendations for The Authors” section, where each of these issues is addressed point by point in response to the specific questions raised.

      Conclusion:

      To conclude, the authors have shown in their findings that predictions about an upcoming aversive (pain) stimulus - and its subsequent subjective perception - can be altered not only by external expectations, or manipulating the pain cue, as was done in studies so far, but also by manipulating a cue that has fundamental importance to human physiological status, namely heartbeats. Whether this is a manipulation of actual interoception as sensed by the brain is - in my view - left to be proven.

      Still, the paper has important implications in several fields of science ranging from neuroscience prediction-perception research, to pain and placebo research, and may have implications for clinical disorders, as the authors propose. Furthermore, it may lead - either the authors or someone else - to further test this interesting question of manipulation of interoception in a different or more controlled manner.

      I salute the authors for coming up with this interesting question and encourage them to continue and explore ways to study it and related follow-up questions.

      We sincerely thank the reviewer for the thoughtful and encouraging feedback. We hope our responses to your points below convince you a bit more that what we are measuring does indeed capture interoceptive processes, but we of course fully acknowledge that additional measures - for example from brain imaging (or computational modelling, see Reviewer 3) - could further support our interpretation, and highlights in the Limitations and Future directions section.

      Reviewer #2 (Public Review):

      In this manuscript, Parrotta et al. tested whether it is possible to modulate pain perception and heart rate by providing false HR acoustic feedback before administering electrical cutaneous shocks. To this end, they performed two experiments. The first experiment tested whether false HR acoustic feedback alters pain perception and the cardiac anticipatory response. The second experiment tested whether the same perceptual and physiological changes are observed when participants are exposed to a non-interoceptive feedback. The main results of the first experiment showed a modulatory effect for faster HR acoustic feedback on pain intensity, unpleasantness, and cardiac anticipatory response compared to a control (acoustic feedback congruent to the participant's actual HR). However, the results of the second experiment also showed an increase in pain ratings for the faster non-interoceptive acoustic feedback compared to the control condition, with no differences in pain unpleasantness or cardiac response.

      The main strengths of the manuscript are the clarity with which it was written, and its solid theoretical and conceptual framework. The researchers make an in-depth review of predictive processing models to account for the complex experience of pain, and how these models are updated by perceptual and active inference. They follow with an account of how pain expectations modulate physiological responses and draw attention to the fact that most previous studies focus on exteroceptive cues. At this point, they make the link between pain experience and heart rate changes, and introduce their own previous work showing that people may illusorily perceive a higher cardiac frequency when expecting painful stimulation, even though anticipating pain typically goes along with a decrease in HR. From here, they hypothesize that false HR acoustic feedback evokes more intense and unpleasant pain perception, although the actual HR actually decreases due to the orienting cardiac response. Furthermore, they also test the hypothesis that an exteroceptive cue will lead to no (or less) changes in those variables. The discussion of their results is also well-rooted in the existing bibliography, and for the most part, provides a credible account of the findings.

      Thank you for the clear and thoughtful review. We appreciate your positive comments on the manuscript’s clarity, theoretical framework, and interpretation of results.

      The main weaknesses of the manuscript lies in a few choices in methodology and data analysis that hinder the interpretation of the results and the conclusions as they stand.

      The first peculiar choice is the convoluted definition of the outcomes. Specifically, pain intensity and unpleasantness are first normalized and then transformed into variation rates (sic) or deltas, which makes the interpretation of the results unnecessarily complicated. This is also linked to the definitions of the smallest effect of interest (SESOI) in terms of these outcomes, which is crucial to determining the sample size and gauging the differences between conditions. However, the choice of SESOI is not properly justified, and strangely, it changes from the first experiment to the second.

      We thank the reviewer for this important observation. In the revised manuscript, we have made substantial changes and clarifications to address both aspects of this concern: (1) the definition of outcome variables and their normalization, and (2) the definition of the SESOI.

      First, As explained in our response to Reviewer #1, we have revised the analyses and removed the difference-based change scores from the main results, addressing concerns about interpretability. However, we retained the normalization procedure: all variables (heart rate, pain intensity, unpleasantness) are normalized relative to the no-feedback baseline using a standard proportional change formula (X−bX)/bX(X - bX)/bX(X−bX)/bX, where X is the feedback-phase mean and bX is the no-feedback baseline. This is a widely used normalization procedure (e.g., Bartolo et al., 2013; Cecchini et al., 2020). This method controls for interindividual variability by expressing responses relative to each participant’s own baseline. The resulting normalized values are then used directly in all analyses, and not further transformed into deltas.

      To address potential concerns about this baseline correction approach and its interpretability, we also conducted a new set of supplementary analyses (now reported in the supplementary materials) that include the no-feedback condition explicitly in the models, rather than treating it as a baseline for normalization. These models confirm that our main effects are not driven by the choice of normalization and hold even when no-feedback is analyzed as an independent condition. The new analyses and results are now reported in the Supplementary Materials.

      Second, concerning the SESOI values and their justification: The difference in SESOI values between Experiment 1 and Experiment 2 reflects the outcome of sensitivity analyses conducted for each dataset separately, rather than a post-hoc reinterpretation of our results. Specifically, we followed current methodological recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), which advise against estimating statistical power based on previously published effect sizes, especially when working with novel paradigms or when effect sizes in the literature may be inflated or imprecise. Instead, we used the sensitivity analysis function in G*Power (Version 3.1) to determine the smallest effect size our design was capable of detecting with high statistical power (90%), given the actual sample size, test type, and alpha level used in each experiment. This is a prospective, design-based estimation rather than a post-hoc analysis of observed effects. The slight differences in SESOI are due to more participants falling below our exclusions criteria in Experiment 2, leading to slightly larger effect sizes that can be detected (d = 0.62 vs d = 0.57). Importantly, both experiments remain adequately powered to detect effects of a size commonly reported in the literature on top-down pain modulation. For instance, Iodice et al. (2019) reported effects of approximately d = 0.7, which is well above the minimum detectable thresholds of our designs.

      We have now clarified the logic in the Participant section of Experiment 1 (193-218).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback.

      We very much disagree that the natural comparison is congruent vs incongruent feedback. First, please note that congruency simply refers to whether the heartrate feedback was congruent with (i.e., matched) the participant’s heartrate measurements in the no feedback trials, or whether it was incongruent, and was therefore either faster or slower than this baseline frequency. As such, simply comparing congruent with incongruent feedback could only indicate that pain ratings change when the feedback does not match the real heart rate, irrespective of whether it is faster or slower. Such a test can therefore only reveal potential general effects of surprise or salience, when the feedback heartrate does not match the real one.

      We therefore assume that the reviewer specifically refers to the comparison of congruent vs incongruent faster feedback. However, this is not a good test either, as this comparison is, by necessity, confounded with the factor of surprise described above. In other words, if a difference would be found, it would not be clear if it emerges because, as we assume, that faster feedback is represented as an interoceptive signal for threat, or simply because participants are surprised about heartrate feedback that diverges from their real heartrate. Note that even a non-significant result in the analogous comparison of congruent vs incongruent slower feedback would not be able to resolve this confound, as in null hypothesis testing the absence of a significant effect does, per definition, not indicate that there is no effect - only that it could not be detected here.

      Instead, the only possible test of our hypothesis is the one we have designed our experiment around and focussed on with our central t-test: the comparison of incongruent faster with incongruent slower feedback. This keeps any possible effects of surprise/salience from generally altered feedback constant and allows us to test our specific hypothesis: that real heart rates will decrease and pain ratings will increase when receiving false interoceptive feedback about increased compared to decreasing heartrates. Note that this test of faster vs slower feedback is also statistically the most appropriate, as it collapses our prediction onto a single and highest-powered hypothesis test: As faster and slower heartrate feedback are assumed to induce effects in the opposite direction, the effect size of their difference is, per definition, double than the averaged effect size for the two separate tests of faster vs congruent feedback and slower vs congruent feedback.

      That being said, we also included comparisons with the congruent condition in our revised analysis, in line with the reviewer’s suggestion and previous studies. These analyses help explore potential asymmetries in the effect of false feedback. While faster feedback (both interoceptive and exteroceptive) significantly modulated pain relative to congruent feedback, the slower feedback did not, consistent with previous literature showing stronger effects for arousal-increasing cues (e.g., Valins, 1966; Iodice et al., 2019). To address this point, in the revised manuscript we have added a paragraph to the Data Analysis section of Experiment 1 (lines 405-437) to make this logic clearer.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect on pain intensity compared to congruent HR feedback, which puts into question the hypothesized differences between interoceptive vs. exteroceptive cues. These results could also be influenced by the specific choice of exteroceptive cue: the researchers imply that the main driver of the effect is the nature of the cue (interoceptive vs. exteroceptive) and not its frequency. However, they attempt to generalize their findings using knocking wood sounds to all possible sounds, but it is possible that some features of these sounds (e.g., auditory roughness or loomingness) could be the drivers behind the observed effects.

      We appreciate this thoughtful comment. We agree that low-level auditory features can potentially introduce confounds in the experimental design, and we acknowledge the importance of distinguishing these factors from the higher-order distinction that is central to our study: whether the sound is perceived as interoceptive (originating from within the body) or exteroceptive (perceived as external). To this end, the knocking sound was chosen not for its specific acoustic profile, but because it lacked bodily relevance, thus allowing us to test whether the same temporal manipulations (faster, congruent, slower) would have different effects depending on whether the cue was interpreted as reflecting an internal bodily state or not. In this context, the exteroceptive cue served as a conceptual contrast rather than an exhaustive control for all auditory dimensions.

      Several aspects of our data make it unlikely that the observed effects are driven by unspecific acoustic characteristics of the sounds used in the exteroceptive and interoceptive experiments (see also our responses to Reviewer 1 and Reviewer 3 who raised similar points).

      First, if the knocking sound had inherent acoustic features that strongly influenced perception or physiological responses, we would expect it to have produced consistent effects across all feedback conditions (Faster, Slower, Congruent), regardless of the interpretive context. This would have manifested as an overall difference between experiments in the between-subjects analyses and in the supplementary mixed-effects models that included Experiment as a fixed factor. Yet, we observed no such main effects in any of our variables. Instead, significant differences emerged only in specific theoretically predicted comparisons (e.g., Faster vs. Slower), and critically, these effects depended on the cue type (interoceptive vs. exteroceptive), suggesting that perceived bodily relevance, rather than a specific acoustic property, was the critical modulator. In other words, any alternative explanation based on acoustic features would need to be able to explain why these acoustic properties would induce not an overall change in heart rate and pain perception (i.e., similarly across slower, faster, and congruent feedback), but the brain’s response to changes in the rate of this feedback – increasing pain ratings and decreasing heartrates for faster relative to slower feedback. We hope you agree that a simple effect of acoustic features would not predict such a sensitivity to the rate with which the sound was played.

      Please refer to our responses to Reviewers 1 and 2 for further aspects of the data, arguing strongly against other features associated with the sounds (e.g., alertness, arousal) could be responsible for the results, as the data pattern again goes in the opposite direction than that predicted by such accounts (e.g., faster heartrate feedback decreased real heartrate, instead of increasing them, as would be expected if accelerated heartrate feedback increased arousal).

      Finally, to further support this interpretation, we refer to neurophysiological evidence showing that heartbeat sounds are not processed as generic auditory signals, but as internal, bodily relevant cues especially when believed to reflect one’s own physiological state. For instance, fMRI research (Kleint et al., 2015) shows that heartbeat sounds engage key interoceptive regions such as the anterior insula and frontal operculum more than acoustically matched control tones. EEG data (Vicentin et al., 2024) showed that faster heartbeat sounds produce stronger alpha suppression over frontocentral areas, suggesting enhanced processing in networks associated with interoceptive attention. Moreover, van Elk et al. (2014) found that heartbeat sounds attenuate the auditory N1 response, a neural signature typically linked to self-generated or predicted bodily signals. These findings consistently demonstrate that heartbeats sounds are processed as interoceptive and self-generated signals, which is in line with our rationale that the critical factor at play concern whether it is semantically perceived as reflecting one’s own bodily state, rather than the physical properties of the sound.

      We now explicitly discuss these issues in the revised Discussion section (lines 740-758).

      Kleint, N. I., Wittchen, H. U., & Lueken, U. (2015). Probing the interoceptive network by listening to heartbeats: an fMRI study. PloS one, 10(7), e0133164.

      van Elk, M., Lenggenhager, B., Heydrich, L., & Blanke, O. (2014). Suppression of the auditory N1-component for heartbeat-related sounds reflects interoceptive predictive coding. Biological psychology, 99, 172-182.

      Vicentin, S., Guglielmi, S., Stramucci, G., Bisiacchi, P., & Cainelli, E. (2024). Listen to the beat: behavioral and neurophysiological correlates of slow and fast heartbeat sounds. International Journal of Psychophysiology, 206, 112447.

      Finally, it is noteworthy that the researchers divided the study into two experiments when it would have been optimal to test all the conditions with the same subjects in a randomized order in a single cross-over experiment to reduce between-subject variability. Taking this into consideration, I believe that the conclusions are only partially supported by the evidence. Despite of the outcome transformations, a clear effect of faster HR acoustic feedback can be observed in the first experiment, which is larger than the proposed exteroceptive counterpart. This work could be of broad interest to pain researchers, particularly those working on predictive coding of pain.

      We appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such a design indeed offers increased statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally opted for a between-subjects design due to theoretical and methodological considerations specific to studies involving deceptive feedback. Most importantly, carryover effects are a major concern in deception paradigms. Participants exposed to one type of feedback initially (e.g., interoceptive), and then the other (exteroceptive) would be more likely to develop suspicion or adaptive strategies that would alter their responses. Such expectancy effects could contaminate results in a crossover design, particularly when participants realize that feedback is manipulated. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to mitigate this risk.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      Reviewer #3 (Public Review):

      In their manuscript titled "Exposure to false cardiac feedback alters pain perception and anticipatory cardiac frequency", Parrotta and colleagues describe an experimental study on the interplay between false heart rate feedback and pain experience in healthy, adult humans. The experimental design is derived from Bayesian perspectives on interoceptive inference. In Experiment 1 (N=34), participants rated the intensity and unpleasantness of an electrical pulse presented to their middle fingers. Participants received auditory cardiac feedback prior to the electrical pulse. This feedback was congruent with the participant's heart rate or manipulated to have a higher or lower frequency than the participant's true heart rate (incongruent high/ low feedback). The authors find heightened ratings of pain intensity and unpleasantness as well as a decreased heart rate in participants who were exposed to the incongruent-high cardiac feedback. Experiment 2 (N=29) is equivalent to Experiment 1 with the exception that non-interoceptive auditory feedback was presented. Here, mean pain intensity and unpleasantness ratings were unaffected by feedback frequency.

      Strengths:

      The authors present interesting experimental data that was derived from modern theoretical accounts of interoceptive inference and pain processing.

      (1) The motivation for the study is well-explained and rooted within the current literature, whereas pain is the result of a multimodal, inferential process. The separation of nociceptive stimulation and pain experience is explained clearly and stringently throughout the text.

      (2) The idea of manipulating pain-related expectations via an internal, instead of an external cue, is very innovative.

      (3) An appropriate control experiment was implemented, where an external (non-physiological) auditory cue with parallel frequency to the cardiac cue was presented.

      (4) The chosen statistical methods are appropriate, albeit averaging may limit the opportunity for mechanistic insight, see weaknesses section.

      (5) The behavioral data, showing increased unpleasantness and intensity ratings after exposure to incongruent-high cardiac feedback, but not exteroceptive high-frequency auditory feedback, is backed up by ECG data. Here, the decrease in heart rate during the incongruent-high condition speaks towards a specific, expectation-induced physiological effect that can be seen as resulting from interoceptive inference.

      We thank the reviewer for their positive feedback. We are glad that the study’s theoretical foundation, innovative design, appropriate control conditions, and convergence of behavioral and physiological data were well received.

      Weaknesses:

      Additional analyses and/ or more extensive discussion are needed to address these limitations:

      (1) I would like to know more about potential learning effects during the study. Is there a significant change in ∆ intensity and ∆ unpleasantness over time; e.g. in early trials compared to later trials? It would be helpful to exclude the alternative explanation that over time, participants learned to interpret the exteroceptive cue more in line with the cardiac cue, and the effect is driven by a lack of learning about the slightly less familiar cue (the exteroceptive cue) in early trials. In other words, the heartbeat-like auditory feedback might be "overlearned", compared to the less naturalistic tone, and more exposure to the less naturalistic cue might rule out any differences between them w.r.t. pain unpleasantness ratings.

      We thank the reviewer for raising this important point. Please note that the repetitions in our task were relatively limited (6 trials per condition), which limits the potential influence of such differential learning effects between experiments. To address this concern, we performed an additional analysis, reported in the Supplementary Materials, using a Linear Mixed-Effects Model approach. This method allowed us to include "Trial" (the rank order of each trial) as a variable to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). All feedback conditions (no-feedback, congruent, faster, slower) and all stimulus intensity levels were included.

      Specifically, we tested the following models:

      Likert Pain Unpleasantness Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      Numeric Pain Scale of Intensity Ratings ~ Experiment × Feedback × StimInt × Trial + (StimInt + Trial | Subject)

      In both models, no significant interactions involving Trial × Experiment or Trial × Feedback × Experiment were found. Instead, we just find generally larger effects in early trials compared to later ones (Main effect of Trial within each Experiment), similar to other cognitive illusions where repeated exposure diminishes effects. Thus, although some unspecific changes over time may have occurred (e.g., due to general task exposure), these changes did not differ systematically across experimental conditions (interoceptive vs. exteroceptive) or feedback types. However, we are fully aware that the absence of significant higher-order interactions does not conclusively rule out the possibility of learning-related effects. It is possible that our models lacked the statistical power to detect more subtle or complex time-dependent modulations, particularly if such effects differ in magnitude or direction across feedback conditions.

      We report the full description of these analyses and results in the Supplementary materials 1. Cross-experiment analysis (between-subjects model).

      (2) The origin of the difference in Cohen's d (Exp. 1: .57, Exp. 2: .62) and subsequently sample size in the sensitivity analyses remains unclear, it would be helpful to clarify where these values are coming from (are they related to the effects reported in the results? If so, they should be marked as post-hoc analyses).

      Following recommendations (Anderson, Kelley & Maxwell, 2017; Albers &  Lakens, 2017), we do not report theoretical power based on previously reported effect sizes as this neglects uncertainty around effect size measurements, especially for new effects for which no reliable expected effect size estimates can be derived across the literature. Instead, the power analysis is based on a sensitivity analysis, conducted in G*Power (Version 3.1). Importantly, these are not post-hoc analyses, as they are not based on observed effect sizes in our study, but derived a priori. Sensitivity analyses estimate effect sizes that our design is well-powered (90%) to detect (i.e. given target power, sample size, type of test), for the crucial comparison between faster and slower feedback in both experiments (Lakens, 2022). Following recommendations, we also report the smallest effect size this test can in principle detect in our study (SESOI, Lakens, 2022). This yields effect sizes of d = .57 in Experiment 1 and d = .62 in Experiment 2 at 90% power and SESOIs of d = .34 and .37, respectively. Note that values are slightly higher in Experiment 2, as more participants were excluded based on our exclusion criteria. Importantly, detectable effect sizes in both experiments are smaller than reported effect sizes for comparable top-down effects on pain measurements of d = .7 (Iodice et al., 2019).  We have now added more information to the power analysis sections to make this clearer (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (3) As an alternative explanation, it is conceivable that the cardiac cue may have just increased unspecific arousal or attention to a larger extent than the exteroceptive cue. It would be helpful to discuss the role of these rather unspecific mechanisms, and how it may have differed between experiments.

      We thank the reviewer for raising this important point. We agree that, in principle, unspecific mechanisms such as increased arousal or attention driven by cardiac feedback could be an alternative explanation for the observed effects. However, several aspects of our data indicate that this is unlikely:

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed when we compared between experiments (see between-experiment t-tests in results, and in supplementary analyses). Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (Tousignant-Laflamme et al., 2005; Terkelsen et al., 2005; for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework.

      We have now integrated these considerations in the revised discussion (lines 796-830), and added the relevant between-experiment comparisons to the Results of Experiment 2 and the supplementary analysis.

      Terkelsen, A. J., Mølgaard, H., Hansen, J., Andersen, O. K., & Jensen, T. S. (2005). Acute pain increases heart rate: differential mechanisms during rest and mental stress. Autonomic Neuroscience, 121(1-2), 101-109.

      Tousignant-Laflamme, Y., Rainville, P., & Marchand, S. (2005). Establishing a link between heart rate and pain in healthy subjects: a gender effect. The journal of pain, 6(6), 341-347.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      (4) The hypothesis (increased pain intensity with incongruent-high cardiac feedback) should be motivated by some additional literature.

      We thank the reviewer for this helpful suggestion. Please note that the current phenomenon was tested in this experiment for the first time. Therefore, there is no specific prior study that motivated our hypotheses; they were driven theoretically, and derived from our model of interoceptive integration of pain and cardiac perception. The idea that accelerated cardiac feedback (relative to decelerated feedback) will increase pain perception and reduce heart rates is grounded on Embodied Predictive coding frameworks. Accordingly, expectations and signals from different sensory modalities (sensory, proprioceptive, interoceptive) are integrated both to efficiently infer crucial homeostatic and physiological variables, such as hunger, thirst, and, in this case, pain, and regulate the body’s own autonomic responses based on these inferences.

      Within this framework, the concept of an interoceptive schema (Tschantz et al., 2022; Iodice et al., 2019; Parrotta et al., 2024; Schoeller et al., 2022) offers the basis for understanding interoceptive illusions, wherein inferred levels of interoceptive states (i.e., pain) deviate from the actual physiological state. Cardiac signals conveyed by the feedback manipulation act as a misleading prior, shaping the internal generative model of pain. Specifically, an increased heart rate may signal a state of threat, establishing a prior expectation of heightened pain. Building on predictive models of interoception, we predict that this cardiac prior is integrated with interoceptive (i.e., actual nociceptive signal) and exteroceptive inputs (i.e., auditory feedback input), leading to a subjective experience of increased pain even when there is no corresponding increase in the nociceptive input.

      This idea is not completely new, but it is based on our previous findings of an interoceptive cardiac illusion driven by misleading priors about anticipated threat (i.e., pain). Specifically, in Parrotta et al. (2024), we tested whether a common false belief that heart rate increases in response to threat lead to an illusory perception of accelerated cardiac activity when anticipating pain. In two experiments, we asked participants to monitor and report their heartbeat while their ECG was recorded. Participants performed these tasks while visual cues reliably predicted a forthcoming harmless (low-intensity) vs. threatening (high-intensity) cutaneous electrical stimulus. We showed that anticipating a painful vs. harmless stimulus causes participants to report an increased cardiac frequency, which does not reflect their real cardiac response, but the common (false) belief that heart rates would accelerate under threat, reflecting the hypothesised integration of prior expectations and interoceptive inputs when estimating cardiac activity.

      Here we tested the counterpart of such a cardiac illusion. We reasoned that if cardiac interoception is shaped by expectations about pain, then the inverse should also be true: manipulating beliefs about cardiac activity (via cardiac feedback) in the context of pain anticipation should influence the perception of pain. Specifically, we hypothesized that presenting accelerated cardiac feedback would act as a misleading prior, leading to an illusory increase in pain experience, even in the absence of an actual change in nociceptive input.

      Moreover, next to the references already provided in the last version of the manuscript, there is ample prior research that provides more general support for such relationships. Specifically, studies have shown that providing mismatched cardiac feedback in contexts where cardiovascular changes are typically expected (i.e. sexual arousal, Rupp & Wallen, 2008; Valins, 1996; physical exercise, Iodice et al., 2019) can enhance the perception of interoceptive states associated with those experiences. Furthermore, findings that false cardiac feedback can influence emotional experience suggest that it is the conscious perception of physiological arousal, combined with the cognitive interpretation of the stimulus, that plays a key role in shaping emotional responses (Crucian et al., 2000).

      This point is now addressed in the revised Introduction, wherein additional references have been integrated (lines 157-170).

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      Parrotta, E., Bach, P., Perrucci, M. G., Costantini, M., & Ferri, F. (2024). Heart is deceitful above all things: Threat expectancy induces the illusory perception of increased heartrate. Cognition, 245, 105719.

      Rupp, H. A., & Wallen, K. (2008). Sex differences in response to visual sexual stimuli: A review. Archives of sexual behavior, 37(2), 206-218.

      Schoeller, F., Horowitz, A., Maes, P., Jain, A., Reggente, N., Moore, L. C., Trousselard, M., Klein, A., Barca, L., & Pezzulo, G. (2022). Interoceptive technologies for clinical neuroscience.

      Tschantz, A., Barca, L., Maisto, D., Buckley, C. L., Seth, A. K., & Pezzulo, G. (2022). Simulating homeostatic, allostatic and goal-directed forms of interoceptive control using active inference. Biological Psychology, 169, 108266.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      (5) The discussion section does not address the study's limitations in a sufficient manner. For example, I would expect a more thorough discussion on the lack of correlation between participant ratings and self-reported bodily awareness and reactivity, as assessed with the BPQ.

      We thank the reviewer for this valuable observation. In response, we have revised the Discussion section to explicitly acknowledge and elaborate on the lack of significant correlations between participants’ pain ratings and their self-reported bodily awareness and reactivity as assessed with the BPQ.

      We now clarify that the inclusion of this questionnaire was exploratory. While it would be theoretically interesting to observe a relationship between subjective pain modulation and individual differences in interoceptive awareness, detecting robust correlations between within-subject experimental effects and between-subjects trait measures such as the BPQ typically requires much larger sample sizes (often exceeding N = 200) due to the inherently low reliability of such cross-level associations (see Hedge, Powell & Sumner, 2018; the “reliability paradox”). As such, the absence of a significant correlation in our study does not undermine the conclusions we draw from our main findings. Future studies with larger samples will be needed to systematically address this question. We now acknowledge this point explicitly in the revised manuscript (lines 501-504; 832-851).

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (a) Some short, additional information on why the authors chose to focus on body awareness and supradiaphragmatic reactivity subscales would be helpful.

      We chose to focus on the body awareness and supradiaphragmatic reactivity subscales because these aspects are closely tied to emotional and physiological processing, particularly in the context of interoception. Body awareness plays a critical role in how individuals perceive and interpret bodily signals, which in turn affects emotional regulation and self-awareness. Supradiaphragmatic reactivity refers specifically to organs located or occurring above the diaphragm (i.e., the muscle that separates the chest cavity from the abdomen), which includes the heart, compared to subdiaphragmatic reactivity subscales further down. Our decision to include these subscales is further motivated by recent research, including the work by Petzschner et al. (2021), which demonstrates that the focus of attention can modulate the heartbeat-evoked potential (HEP), and that this modulation is predicted by participants’ responses on the supradiaphragmatic reactivity subscales. Thus, this subscale, and the more general body awareness scale, allows us to explore the interplay between bodily awareness, physiological reactivity, and emotional processing in our study. We now clarify this point in the revised version of the Methods - Body Perception Questionnaire (lines 384-393).

      (6) The analyses presented in this version of the manuscript allow only limited mechanistic conclusions - a computational model of participants' behavior would be a very strong addition to the paper. While this may be out of the scope of the article, it would be helpful for the reader to discuss the limitations of the presented analyses and outline avenues towards a more mechanistic understanding and analysis of the data. The computational model in [7] might contain some starting ideas.

      Thank you for your valuable feedback. We agree that a computational model would enhance the mechanistic understanding of our findings. While this is beyond the current scope, we now discuss the limitations of our analysis in the Limitations and Future directions section (lines 852-863). Specifically, we acknowledge that future studies could use computational models to better understand the interactions between physiological, cognitive, and perceptual factors.

      Some additional topics were not considered in the first version of the manuscript:

      (1) The possible advantages of a computational model of task behavior should be discussed.

      We agree that a computational model of task behavior could provide several advantages. By formalizing principles of predictive processing and active inference, such a model could generate quantitative predictions about how heart rate (HR) and feedback interact, providing a more precise understanding of their respective contributions to pain modulation. However, this is a first demonstration of a theoretically predicted phenomenon, and computationally modelling it is currently outside the scope of the article. We would be excited to explore this in the future. We have added a brief discussion of these potential advantages in the revised manuscript and suggest that future work could integrate computational modelling to further deepen our understanding of these processes (lines 852-890).

      (2) Across both experiments, there was a slightly larger number of female participants. Research suggests significant sex-related differences in pain processing [1,2]. It would be interesting to see what role this may have played in this data.

      Thank you for your insightful comment. While we acknowledge that sex-related differences in pain processing are well-documented in the literature, we do not have enough participants in our sample to test this in a well-powered way. As such, exploring the role of sex differences in pain perception will need to be addressed in future studies with more balanced samples. It would be interesting if more sensitive individuals, with a more precise representation of pain, also show smaller effects on pain perception. We have noted this point in the revised manuscript (lines 845-851) and suggest that future research could specifically investigate how sex differences might influence the modulation of pain and physiological responses in similar experimental contexts.

      (3) There are a few very relevant papers that come to mind which may be of interest. These sources might be particularly useful when discussing the roadmap towards a mechanistic understanding of the inferential processes underlying the task responses [3,4] and their clinical implications.

      Thank you for highlighting these relevant papers. We appreciate your suggestion and have now cited them in the Limitations and Future directions paragraph (lines 852-863).

      (4) In this version of the paper, we only see plots that illustrate ∆ scores, averaged across pain intensities - to better understand participant responses and the relationship with stimulus intensity, it would be helpful to see a more descriptive plot of task behavior (e.g. stimulus intensity and raw pain ratings)

      To directly address the reviewer’s request, we now provide additional descriptive plots in the supplementary material of the revised manuscript, showing raw pain ratings across different stimulus intensities and feedback conditions. These plots offer a clearer view of participant behavior without averaging across pain levels, helping to better illustrate the relationship between stimulus intensity and reported pain.

      Mogil, J. S. (2020). Qualitative sex differences in pain processing: emerging evidence of a biased literature. Nature Reviews Neuroscience, 21(7), 353-365. https://www.nature.com/articles/s41583-020-0310-6

      Sorge, R. E., & Strath, L. J. (2018). Sex differences in pain responses. Current Opinion in Physiology, 6, 75-81. https://www.sciencedirect.com/science/article/abs/pii/S2468867318300786?via%3Dihub

      Unal, O., Eren, O. C., Alkan, G., Petzschner, F. H., Yao, Y., & Stephan, K. E. (2021). Inference on homeostatic belief precision. Biological Psychology, 165, 108190.

      Allen, M., Levy, A., Parr, T., & Friston, K. J. (2022). In the body's eye: the computational anatomy of interoceptive inference. PLoS Computational Biology, 18(9), e1010490.

      Stephan, K. E., Manjaly, Z. M., Mathys, C. D., Weber, L. A., Paliwal, S., Gard, T., ... & Petzschner, F. H. (2016). Allostatic self-efficacy: A metacognitive theory of dyshomeostasis-induced fatigue and depression. Frontiers in human neuroscience, 10, 550.

      Friston, K. J., Stephan, K. E., Montague, R., & Dolan, R. J. (2014). Computational psychiatry: the brain as a phantastic organ. The Lancet Psychiatry, 1(2), 148-158.

      Eckert, A. L., Pabst, K., & Endres, D. M. (2022). A Bayesian model for chronic pain. Frontiers in Pain Research, 3, 966034.

      We thank the reviewer for highlighting these relevant references which have now been integrated in the revised version of the manuscript.

      Recommendations For The Authors: 

      Reviewer #1 (Recommendations For The Authors):

      At the time I was reviewing this paper, I could not think of a detailed experiment that would answer my biggest concern: Is this a manipulation of the brain's interoceptive data integration, or rather a manipulation of participants' alertness which indirectly influences their pain prediction?

      One incomplete idea that came to mind was delivering this signal in a more "covert" manner (though I am not sure it will suffice), or perhaps correlating the effect size of a participant with their interoceptive abilities, as measured in a different task or through a questionnaire.... Another potential idea is to tell participants that  this is someone else's HR that they hear and see if that changes the results (though requires further thought). I leave it to the authors to think further, and perhaps this is to be answered in a different paper - but if so, I am sorry to say that I do not think the claims can remain as they are now, and the paper will need a revision of its arguments, unfortunately. I urge the authors to ask further questions if my point about the concern was not made clear enough for them to address or contemplate it.

      We thank the reviewer for raising this important point. As detailed in our previous response, this point invites an important clarification regarding the role of cardiac deceleration in threat processing. Rather than serving as an interoceptive input from which the brain infers the likelihood of a forthcoming aversive event, heart rate deceleration is better described as an output of an already ongoing predictive process, as it reflects an allostatic adjustment of the bodily state aimed at minimizing the impact of the predicted perturbation (e.g., pain) and preventing sympathetic overshoot. It would be maladaptive for the brain to use a decelerating heart rate as evidence of impending threat, since this would paradoxically trigger further parasympathetic activation, initiating a potentially destabilizing feedback loop. Conversely, increased heart rate represents an evolutionarily conserved cue for arousal, threat, and pain. Our results therefore align with the idea that the brain treats externally manipulated increases in cardiac signals as congruent with anticipated sympathetic activation, prompting a compensatory autonomic and perceptual response consistent with embodied predictive processing frameworks (e.g., Barrett & Simmons, 2015; Seth, 2013).

      We would also like to re-iterate that our results cannot be explained by general differences induced by the different heart rate sounds relative to the exteroceptive (see also our detailed comments to your point above, and our response to a similar point from Reviewer 3), for three main reasons.

      (1) No main effect of Experiment on pain ratings:

      If the cardiac feedback had simply increased arousal or attention in a general (non-specific) way, we would expect a main effect of Experiment (i.e., interoceptive vs exteroceptive condition) on pain intensity or unpleasantness ratings, regardless of feedback frequency. However, such a main effect was never observed. Instead, effects were specific to the manipulation of feedback frequency.

      (2) Heart rate as an arousal measure:

      Heart rate (HR) is a classical physiological index of arousal. If there had been an unspecific increase in arousal in the interoceptive condition, we would expect a main effect of Experiment on HR. However, no such main effect was found. Instead, our HR analyses revealed a significant interaction between feedback and experiment, suggesting that HR changes depended specifically on the feedback manipulation rather than reflecting a general arousal increase.

      (3) Arousal predicts faster, not slower, heart rates

      In Experiment 1, faster interoceptive cardiac feedback led to a slowdown in heartrates both when compared to slower feedback and to congruent cardiac feedback. This is in line with the predicted compensatory response to faster heart rates. In contrast, if faster feedback would have only generally increased arousal, heart rates should have increased instead of decreased, as indicated by several prior studies (for a review, see Forte et al., 2022), predicting the opposite pattern of responses than was found in Experiment 1.

      Taken together, these findings indicate that the effects observed are unlikely to be driven by unspecific arousal or attention mechanisms, but rather are consistent with feedback-specific modulations, in line with our interoceptive inference framework. We now integrate these considerations in the general discussion (lines 796-830).

      Barrett, L. F., & Simmons, W. K. (2015). Interoceptive predictions in the brain. Nature reviews neuroscience, 16(7), 419-429.

      Forte, G., Troisi, G., Pazzaglia, M., Pascalis, V. D., & Casagrande, M. (2022). Heart rate variability and pain: a systematic review. Brain sciences, 12(2), 153.

      Seth, A. K. (2013). Interoceptive inference, emotion, and the embodied self. Trends in Cognitive Sciences, 17(11), 565-573.

      Additional recommendations:

      Major (in order of importance):

      (1) Number of trials per participant, per condition: as I mentioned, having only 6 trials for each condition is very little. The minimum requirement to accept so few trials would be to show data about the distribution of participants' responses to these trials, both per pain intensity (which was later averaged across - another issue discussed later), and across pain intensities, and see that it allows averaging across and that it is not incredibly variable such that the mean is unreliable.

      We appreciate the reviewer’s concern regarding the limited number of trials per condition. This choice was driven by both theoretical and methodological considerations.

      First, as is common in body illusion paradigms (e.g., the Rubber Hand Illusion, Botvinick & Cohen, 1998; the Full Body Illusion, Ehrsson, 2007; the Cardio-visual full body illusion, Pratviel et al., 2022) only a few trials are typically employed due to the immediate effects these manipulations elicit. Repetition can reduce the strength of the illusion through habituation, increased awareness, or loss of believability.

      Second, the experiment was already quite long (1.5h to 2h per participant) and cognitively demanding. It would not have been feasible to expand it further without compromising data quality due to fatigue, attentional decline, or participant disengagement.

      Third, the need for a large number of trials is more relevant when using implicit measures such as response times or physiological indices, which are typically indirectly related to the psychological constructs of interest. In contrast, explicit ratings are often more sensitive and less noisy, and thus require fewer repetitions to yield reliable effects (e.g., Corneille et al., 2024).

      Importantly, we also addressed your concern analytically. We ran therefore linear mixed-effects model analyses across all dependent variables (See Supplementary materials), with Trial (i.e., the rank order of each trial) included as a predictor to account for potential time-on-task effects such as learning, adaptation, or fatigue (e.g., Möckel et al., 2015). These models captured trial-by-trial variability and allowed us to test for systematic changes in heart rate (HR) and pain ratings including interactions with feedback conditions (e.g., Klieg et al., 2011; Baayen et al., 2010; Ambrosini et al., 2019). The consistent effects of Trial suggest that repetition dampens the illusion, reinforcing our decision to limit the number of exposures.

      In the interoceptive experiment, these analyses revealed a significant Feedback × Trial interaction (F(3, 711.19) = 6.16, p < .001), indicating that the effect of feedback on HR was not constant over time. As we suspected, and in line with other illusion-like effects, the difference between Faster and Slower feedback, which was significant early on (estimate = 1.68 bpm, p = .0007), decreased by mid-session (estimate = 0.69 bpm, p = .0048), and was no longer significant in later trials (estimate = 0.30 bpm, p = .4775). At the end of the session, HR values in the Faster and Slower conditions even numerically converged (Faster: M = 74.4, Slower: M = 74.1), and the non-significant contrast confirms that the difference had effectively vanished (for further details about slope estimation, see Supplementary material).

      The same pattern emerged for pain-unpleasantness ratings. A significant Feedback × Trial interaction (F (3, 675.33) = 3.44, p = .0165) revealed that the difference between Faster and Slower feedback was strongest at the beginning of the session and progressively weakened. Specifically, Faster feedback produced higher unpleasantness than Slower in early trials (estimate= -0.28, p = .0058) and mid-session (estimate = - 0.19, p = .0001), but this contrast was no longer significant in the final trials, wherein all the differences between active feedback conditions vanished (all ps > .55).

      Finally, similar results were yielded for pain intensity ratings. A significant Feedback × Trial interaction (F (3, 669.15) = 9.86, p < .001) showed that the Faster vs Slower difference was greatest at the start of the session and progressively vanished over trials. In early trials Faster feedback exceeded Slower (estimate=-8.33, p = .0001); by mid-session this gap had shrunk to 4.48 points (p < .0001); and in the final trials it was no longer significant (all ps > .94).

      Taken together, our results show that the illusion induced by Faster relative to slower feedback fades with repetition; adding further trials would likely have masked this key effect, confirming the methodological choice to restrict each condition to fewer exposures. To conclude, given that this is the first study to investigate an illusion of pain using heartbeat-based manipulation, we intentionally limited repeated exposures to preserve the integrity of the illusion. The use of mixed models as complementary analyses strengthens the reliability of our conclusions within these necessary design constraints. We now clarify this point in the Procedure paragraph (lines 328-335)

      Ambrosini, E., Peressotti, F., Gennari, M., Benavides-Varela, S., & Montefinese, M. (2023). Aging-related effects on the controlled retrieval of semantic information. Psychology and Aging, 38(3), 219.

      Baayen, R. H., & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 12-28.

      Botvinick, M., & Cohen, J. (1998). Rubber hands ‘feel’touch that eyes see. Nature, 391(6669), 756-756.

      Corneille, O., & Gawronski, B. (2024). Self-reports are better measurement instruments than implicit measures. Nature Reviews Psychology, 3(12), 835–846.

      Ehrsson, H. H. (2007). The experimental induction of out-of-body experiences. Science, 317(5841), 1048-1048.

      Kliegl, R., Wei, P., Dambacher, M., Yan, M., & Zhou, X. (2011). Experimental effects and individual differences in linear mixed models: Estimating the relation of spatial, object, and attraction effects in visual attention. Frontiers in Psychology, 1, 238. https://doi.org/10.3389/fpsyg.2010.00238

      Möckel, T., Beste, C., & Wascher, E. (2015). The effects of time on task in response selection-an ERP study of mental fatigue. Scientific reports, 5(1), 10113.

      Pratviel, Y., Bouni, A., Deschodt-Arsac, V., Larrue, F., & Arsac, L. M. (2022). Avatar embodiment in VR: Are there individual susceptibilities to visuo-tactile or cardio-visual stimulations?. Frontiers in Virtual Reality, 3, 954808.

      (2) Using different pain intensities: what was the purpose of training participants on correctly identifying pain intensities? You state that the aim of having 5 intensities is to cause ambiguity. What is the purpose of making sure participants accurately identify the intensities? Also, why then only 3 intensities were used in the test phase? The rationale for these is lacking.

      We thank the reviewer for raising these important points regarding the use of different pain intensities. The purpose of using five levels during the calibration and training phases was to introduce variability and increase ambiguity in the participants’ sensory experience. This variability aimed to reduce predictability and prevent participants from forming fixed expectations about stimulus intensity, thereby enhancing the plausibility of the illusion. It also helped prevent habituation to a single intensity and made the manipulation subtler and more credible. We had no specific theoretical hypotheses about this manipulation. Regarding the accuracy training, although the paradigm introduced ambiguity, it was important to ensure that participants developed a stable and consistent internal representation of the pain scale. This step was essential to control for individual differences in sensory discrimination and to ensure that illusion effects were not confounded by participants’ inability to reliably distinguish between intensities.

      As for the use of only three pain intensities in the test phase, the rationale was to focus on a manageable subset that still covered a meaningful range of the stimulus spectrum. This approach followed the same logic as Iodice et al. (2019, PNAS), who used five (rather than all seven) intensity levels during their experimental session. Specifically, they excluded the extreme levels (45 W and 125 W) used during baseline, to avoid floor and ceiling effects and to ensure that each test intensity could be paired with both a “slower” and a “faster” feedback from an adjacent level. This would not have been possible at the extremes of the intensity range, where no adjacent level exists in one direction. We adopted the same strategy to preserve the internal consistency and plausibility of our feedback manipulation.

      We further clarified these points in the revised manuscript (lines 336-342).

      Iodice, P., Porciello, G., Bufalari, I., Barca, L., & Pezzulo, G. (2019). An interoceptive illusion of effort induced by false heart-rate feedback. Proceedings of the National Academy of Sciences, 116(28), 13897-13902.

      (3) Averaging across pain intensities: this is, in my opinion, not the best approach as by matching a participant's specific responses to a pain stimulus before and after the manipulation, you can more closely identify changes resulting from the manipulation. Nevertheless, the minimal requirement to do so is to show data of distributions of pain intensities so we know they did not differ between conditions per participant, and in general - as you indicate they were randomly distributed.

      We thank the reviewer for this thoughtful comment. The decision to average across pain intensities in our main analyses was driven by the specific aim of the study: we did not intend to determine at which exact intensity level the illusion was most effective, and the limited number of trials makes such an analysis difficult. Rather, we introduced variability in nociceptive input to increase ambiguity and reduce predictability in the participants’ sensory experience. This variability was critical for enhancing the plausibility of the illusion by preventing participants from forming fixed expectations about stimulus strength. Additionally, using a range of intensities helped to minimize habituation effects and made the feedback manipulation subtler and more credible.

      That said, we appreciate the reviewer’s point that matching specific responses before and after the manipulation at each intensity level could provide further insights into how the illusion operates across varying levels of nociceptive input. We therefore conducted supplementary analyses using linear mixed-effects models in which all three stimulus intensities were included as a continuous fixed factor. This allowed us to examine whether the effects of feedback were intensity-specific or generalized across different levels of stimulation

      These analyses revealed that, in both the interoceptive and exteroceptive experiments, the effect of feedback on pain ratings was significantly modulated by stimulus intensity, as indicated by a Feedback × Stimulus Intensity interaction (Interoceptive: unpleasantness F(3, 672.32)=3.90, p=.0088; intensity ratings F(3, 667.07)=3.46, p=.016. Exteroceptive: unpleasantness F(3, 569.16)=8.21, p<.0001; intensity ratings F(3, 570.65)=3.00, p=.0301). The interaction term confirmed that the impact of feedback varied with stimulus strength, yet the pattern that emerged in each study diverged markedly.

      In the interoceptive experiment, the accelerated-heartbeat feedback (Faster) systematically heightened pain relative to the decelerated version (Slower) at every level of noxious input: for low-intensity trials Faster exceeded Slower by 0.22 ± 0.08 points on the unpleasantness scale (t = 2.84, p = .0094) and by 3.87 ± 1.69 units on the numeric intensity scale (t = 2.29, p = .0448); at the medium intensity the corresponding differences were 0.19 ± 0.05 (t = -4.02, p = .0001) and 4.52 ± 1.06 (t = 4.28, p < .0001); and even at the highest intensity, Faster still surpassed Slower by 0.17 ± 0.08 on unpleasantness (t = 2.21, p = .0326) and by 5.16 ± 1.67 on intensity (t = 3.09, p = .0032). This uniform Faster > Slower pattern indicates that the interoceptive manipulation amplifies perceived pain in a stimulus-independent fashion.

      The exteroceptive control experiment told a different story: the Faster-Slower contrast reached significance only at the most noxious setting (unpleasantness: estimate = 0.24 ± 0.07, t = -3.24, p = .0019; intensity: estimate = - 5.14 ± 1.82, t = 2.83, p = .0072) and was absent at the medium level (intensity , p=0.29; unpleasantness,  p=0.45), while at the lowest level Slower actually produced numerically higher unpleasantness (2.56 versus 2.40) and intensity ratings (44.7 versus 42.2).

      Thus, although both studies show that feedback effects depend on the actual nociceptive level of the stimulus, the results suggest that the faster vs. slower interoceptive feedback manipulation delivers a robust and intensity-invariant enhancement of pain, whereas the exteroceptive cue exerts a sporadic influence that surfaces solely under maximal stimulation.

      These new results are now included in the Supplementary Materials, where we report the detailed analyses for both the Interoceptive and Exteroceptive experiments on the Likert unpleasantness ratings and the numeric pain intensity ratings.

      (4) Sample size: It seems that the sample size was determined after the experiment was conducted, as the required N is identical to the actual N. I would be transparent about that, and say that retrospective sample size analyses support the ability of your sample size to support your claims. In general, a larger sample size than is required is always recommended, and if you were to run another study, I suggest you increase the sample size.

      As also addressed in our responses to your later comments (see our detailed reply regarding the justification of SESOI and power analyses), the power analyses reported here were not post-hoc power analyses based on obtained results. In line with current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2018), we did not base our analyses on previously reported effect sizes, as these can carry considerable uncertainty, particularly for novel effects where robust estimates are lacking. Instead, we used sensitivity analyses, conducted using the sensitivity analysis function in G*Power (Version 3.1). Sensitivity analyses allow us to report effect sizes that our design was adequately powered (90%) to detect, given the actual sample size, desired power level, and the statistical test used in each experiment (Lakens, 2022). Following further guidance (Lakens, 2022), we also report the smallest effect size of interest (SESOI) that these tests could reliably detect.

      This approach indicated that our design was powered to detect effect sizes of d = 0.57 in Experiment 1 and d = 0.62 in Experiment 2, with corresponding SESOIs of d = 0.34 and d = 0.37, respectively. The slightly higher value in Experiment 2 reflects the greater number of participants excluded (from an equal number originally tested) based on pre-specified criteria. Importantly, both experiments were well-powered to detect effects smaller than those typically reported in similar top-down pain modulation studies, where effect sizes around d = 0.7 have been observed (Iodice et al., 2019).

      We have now clarified this rationale in the revised manuscript, Experiment 1- Methods - Participants (lines 208-217).

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562. https://doi.org/10.1177/0956797617723724

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      (5) Analysis: the use of change scores instead of the actual scores is not recommended, as it is a loss of data, but could have been ignored if it didn't have a significant effect on the analyses conducted. Instead of conducting an RM-ANOVA of conditions (faster, slower, normal heartbeats) across participants, finding significant interaction, and then moving on to specific post-hoc paired comparisons between conditions, the authors begin with the change score but then move on to conduct the said paired comparisons without ever anchoring these analyses in an appropriate larger ANOVA. I strongly recommend the use of an ANOVA but if not, the authors would have to correct for multiple comparisons at the minimum.

      We thank the reviewer for their comment regarding the use of change scores. These were originally derived from the difference between the slower and faster feedback conditions relative to the congruent condition. In line with the reviewer’s recommendation, we have now removed these difference-based change scores from the main analysis. The results remain identical. Please note that we have retained the normalization procedure, relative to each participant’s initial baseline in the no feedback trials, as it is widely used in the interoceptive and pain literature (e.g., Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019). This approach helps to control for interindividual variability and baseline differences by expressing each participant’s response relative to their no-feedback baseline. As before, normalization was applied across all dependent variables (heart rate, pain intensity, and pain unpleasantness).

      To address the reviewer’s concern about statistical validity, we now first report a 1-factor repeated-measures ANOVA (Greenhouse-Geisser corrected) for each dependent variable, with feedback condition (slower, congruent, faster) as the within-subject factor.

      These show in each case a significant main effect, which we then follow with planned paired-sample t-tests comparing:

      Faster vs. slower feedback (our main hypothesis, as these manipulations are expected to produce largest, most powerful, test of our hypothesis, see response to Reviewer 3),

      Faster vs. congruent and slower vs. congruent (to test for potential asymmetries, as suggested  by previous false heart rate feedback studies).

      The rationale of these analyses is further discussed in the Data Analysis of Experiment 1 (lines 405-437).

      Although we report the omnibus one-factor RM-ANOVAs to satisfy conventional expectations, we note that such tests are not statistically necessary, nor even optimal, when the research question is fully captured by a priori, theory-driven contrasts. Extensive methodological work shows that, in this situation, going straight to planned contrasts maximises power without inflating Type I error and avoids the logical circularity of first testing an effect one does not predict (e.g., Rosenthal & Rosnow, 1985). In other words, an omnibus F is warranted only when one wishes to protect against unspecified patterns of differences. Here our hypotheses were precise (Faster ≠ Slower; potential asymmetry relative to Congruent), so the planned paired comparisons would have sufficed statistically. We therefore include the RM-ANOVAs solely for readers who expect to see them, but our inferential conclusions rest on the theoretically motivated contrasts.

      Rosenthal, R., & Rosnow, R. L. (1985). Contrast analysis. New York: Cambridge.

      (6) Correlations: were there correlations between subjects' own heartbeats (which are considered a predictive cue) and pain perceptions? This is critical to show that the two are in fact related.

      We thank the reviewer for this thoughtful suggestion. While we agree that testing for a correlation between anticipatory heart rate responses and subjective pain ratings is theoretically relevant. However, we have not conducted this analysis in the current manuscript, as our study was not designed or powered to reliably detect such individual differences. As noted by Hedge, Powell, and Sumner (2018), robust within-subject experimental designs tend to minimize between-subject variability in order to detect clear experimental effects. This reduction in variance at the between-subject level limits the reliability of correlational analyses involving trait-like or individual response patterns. This issue, known as the reliability paradox, highlights that measures showing robust within-subject effects may not show stable individual differences, and therefore correlations with other individual-level variables (like subjective ratings used here) require much larger samples to produce interpretable results than available here (and commonly used in the literature), typically more than 200 participants. For these reasons, we believe that running such an analysis in our current dataset would not yield informative results and could be misleading.

      We now explicitly acknowledge this point in the revised version of the manuscript (Limitations and future directions, lines 832-851) and suggest that future studies specifically designed to examine individual variability in anticipatory physiological responses and pain perception would be better suited to address this question.

      Hedge, C., Powell, G., & Sumner, P. (2018). The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behavior Research Methods, 50(3), 1166-1186. https://doi.org/10.3758/s13428-017-0935-1

      (7) The direct comparison between studies is great! and finally the use of ANOVA - but why without the appropriate post-hoc tests to support the bold claims in lines 542-544? This is needed. Same for 556-558.

      We apologize if our writing was not clear here, but the result of the ANOVAs fully warrants the claims in 542-544 (now lines 616-618) and 556-558 (now lines 601-603).

      In a 2x2 design, the interaction term is mathematically identical to comparing the difference induced by Factor 1 at one level of Factor 2 with the same difference induced at the other level of Factor 2. In our 2x2 analysis with the factors Experiment (Cardiac feedback, Exteroceptive feedback - between participants) and Feedback Frequency (faster, slower - within participants), the interaction therefore directly tests whether the effect of Feedback frequency differs statistically (i.e., is larger or smaller) in the participants in the interoceptive and exteroceptive experiments. Thus, the conclusion that “faster feedback affected the perceptual bias more strongly in the Experiment 1 than in Experiment 2” captures the outcome of the significant interaction exactly. Indeed, this test would be statistically equivalent (and would produce identical p values) to a simple between-group t-test between each participant’s difference between the faster and slower feedback in the interoceptive group and the analogous differences between the faster and slower feedback in the exteroceptive group, as illustrated in standard examples of factorial analysis (see, e.g., Maxwell, Delaney and Kelley, 2018).

      Please note that, for the above reason, mathematically the conclusion of larger effects in one experiment than the other is licensed by the significant interaction even without follow-up t-tests. However, if the reader would like to see these tests, they are simply the main analysis results reported in each of the two experiment sections, where significant (t-test) differences between faster and slower feedback were induced with interoceptive cues (Experiment 1) but not exteroceptive cues (Experiment 2). Reporting them in the between-experiment comparison section again would therefore be redundant.

      To avoid this lack of clarity, we have now re-written the results section of each experiment. First, as noted above, we now precede our main hypothesis test - the crucial t-test comparing heartrate and pain ratings after faster vs slower feedback - with an ANOVA including all three levels (faster, congruent, slower feedback). Moreover, we removed the separate between-experiment comparison section. Instead, in the Result section of the exteroceptive Experiment 2, we now directly compare the (absent or reversed) effects of faster vs slower feedback directly, with a between-groups t-test, with the present effects in the interoceptive Experiment 1. This shows conclusively, and hopefully more clearly, that the effects in both experiments differ. We hope that this makes the logic of our analyses clearer.

      Maxwell, S. E., Delaney, H. D., & Kelley, K. (2017). Designing experiments and analyzing data: A model comparison perspective. Routledge.

      (8) The discussion is missing a limitation paragraph.

      Thank you for the suggestion. We have now added a dedicated limitations paragraph in the Discussion section (lines 832-890).

      Additional recommendations:

      Minor (chronological order):

      (1) Sample size calculations for both experiments: what was the effect size based on? A citation or further information is needed. Also, clarify why the effect size differed between the two experiments.

      Please see above

      (2) "Participants were asked to either not drink coffee or smoke cigarettes" - either is implying that one of the two was asked. I suspect it is redundant as both were not permitted.

      The intention was to restrict both behaviors, so we have corrected the sentence to clarify that participants were asked not to drink coffee or smoke cigarettes before the session.

      (3) Normalization of ECG - what exactly was normalized, namely what measure of the ECG?

      The normalized measure was the heart rate, expressed in beats per minute (bpm). We now clarify this in the Data Analysis section of Experiment 1 (Measures of the heart rate recorded with the ECG (beats per minute) in the feedback phase were normalized)

      (4) Line 360: "Mean Δ pain unpleasantness ratings were analysed analogously" - this is unclear, if already described in methods then should be removed here, if not - should be further explained here.

      Thank you for your observation. We are no longer using change scores.

      (5) Lines 418-420: "Consequently, perceptual and cardiac modulations associated with the feedback manipulation should be reduced over the exposure to the faster exteroceptive sound." - why reduced and not unchanged? I didn't follow the logic.

      We chose the term “reduced” rather than “unchanged” to remain cautious in our interpretation. Statistically, the absence of a significant effect in one experiment does not necessarily mean that no effect is present; it simply means we did not detect one. For this reason, we avoided using language that would suggest complete absence of modulation. It also more closely matches the results of the between experiment comparisons that we report in the Result section of Experiment 2, which can in principle only show that the effect in Experiment 2 was smaller than that of Experiment 1, not that it was absent. Even the TOST analysis that we utilize to show the absence of an effect can only show that any effect that is present is smaller than we could reasonably expect to detect with our experimental design, not its complete absence.

      Also, on a theoretical level, pain is a complex, multidimensional experience influenced not only by sensory input but also by cognitive, emotional, social and expectancy factors. For this reason, we considered it important to remain open to the possibility that other mechanisms beyond the misleading cardiac prior induced by the feedback might have contributed to the observed effects. If such other influences had contributed to the induced differences between faster and slower feedback in Experiment 1, some remainder of this difference could have been observed in Experiment 2 as well.

      Thus, for both statistical and theoretical reasons, we were careful to predict a reduction of the crucial difference, not its complete elimination. However, to warrant the possibility that effects could be completely eliminated we now write that “perceptual and cardiac modulations associated with the feedback manipulation should be reduced or eliminated with exteroceptive feedback”

      (6) Study 2 generation of feedback - was this again tailored per participants (25% above and beyond their own HR at baseline + gradually increasing or decreasing), or identical for everyone?

      Yes, in Study 2, the generation of feedback was tailored to each participant, mirroring the procedure or Experiment 1. Specifically, the feedback was set to be 25% above or below their baseline heart rate, with the feedback gradually increasing or decreasing. This individualized approach ensured that each participant experienced feedback relative to their own baseline heart rate. We now clarify this in the Methods section (lines 306-318).

      (7) I did not follow why we need the TOST and how to interpret its results.

      We thank the reviewer for raising this important point. In classical null hypothesis significance testing (NHST), a non-significant p-value (e.g., p > .05) only indicates that we failed to find a statistically significant difference, not that there is no difference. It therefore does not allow us to conclude that two conditions are equivalent – only that we cannot confidently say they are different. In our case, to support the claim that exteroceptive feedback does not induce perceptual or physiological changes (unlike interoceptive feedback), we needed a method to test for the absence of a meaningful effect, not just the absence of a statistically detectable one.

      The TOST (Two One-Sided Tests) procedure reverses the logic of NHST by testing whether the observed effect falls within a predefined equivalence interval, called the smallest effect size of interest (SESOI) that is in principle measurable with our design parameters (e.g., type of test, number of participants). This approach is necessary when the goal is not to detect a difference, but rather to demonstrate that an observed effect is so small that it can be considered negligible – or at the least smaller than we could in principle expect to observe in the given experiment. We used the TOST procedure in Experiment 2 to test for statistical equivalence between the effects of faster and slower exteroceptive feedback on pain ratings and heart rate.

      We hope that the clearer explanation now provided in data analysis of Experiment 2 section (lines 5589-563) fully addresses the reviewer’s concern.

      (8) Lines 492-3: authors say TOST significant, while p value = 0.065

      We thank the reviewer for spotting this inconsistency. The discrepancy was due to a typographical error in the initial manuscript. During the revision of the paper, we rechecked and fully recomputed all TOST analyses, and the results have now been corrected throughout the manuscript to accurately reflect the statistical outcomes. In particular, for the comparison of heart rate between faster and slower exteroceptive feedback in Experiment 2, the corrected TOST analysis now shows a significant equivalence, with the observed effect size being d = -0.19 (90% CI [-0.36, -0.03]) and both one-sided tests yielding p = .025 and p < .001. These updated results are reported in the revised Results section.

      Reviewer #2 (Recommendations For The Authors):

      I would suggest the authors revise their definition of pain in the introduction, since it is not always a protective experience. The new IASP definition specifically takes this into consideration.

      We thank the reviewer for this suggestion. We have updated the definition of pain in the Introduction (lines 2-4) to align with the most recent IASP definition (2020), which characterizes pain as “an unpleasant sensory and emotional experience associated with, or resembling that associated with, actual or potential tissue damage” (lines 51-53).

      The work on exteroceptive cues does not necessarily neglect the role of interoceptive sources of information, although it is true that it has been comparatively less studied. I suggest rephrasing this sentence to reflect this.

      We thank the reviewer for pointing out this important nuance. We agree that studies employing exteroceptive cues to modulate pain perception do not necessarily neglect the role of interoceptive sources, even though these are not always the primary focus of investigation. Our intention was not to imply a strict dichotomy, but rather to highlight that interoceptive mechanisms have been comparatively under-investigated. We have revised the sentence in the Introduction accordingly to better reflect this perspective (Introduction, lines 110-112, “Although interoceptive processes may have contributed to the observed effects, these studies did not specifically target interoceptive sources of information within the inferential process.”).

      The last paragraph of the introduction (lines 158-164) contains generalizations beyond what can be supported by the data and the results, about the generation of predictive processes and the origins of these predictions. The statements regarding the understanding of pain-related pathologies in terms of chronic aberrant predictions in the context of this study are also unwarranted.

      We have deleted this paragraph now.

      I could not find the study registration (at least in clinicaltrials.gov). This is curious considering that the hypothesis and the experimental design seem in principle well thought out, and a study pre-registration improves the credibility of the research (Nosek et al., 2018). I also find the choice for the smallest effect of interest (SESOI) odd. Besides the unnecessary variable transformations (more on that later), there is no justification for why that particular SESOI was chosen, or why it changes between experiments (Dienes, 2021; King, 2011), which makes the choice look arbitrary. The SESOI is a fundamental component of a priori power analysis (Lakens, 2022), and without rationale and preregistration, it is impossible to tell whether this is a case of SPARKing or not (Sasaki & Yamada, 2023).

      We acknowledge that the study was not preregistered. Although our hypotheses and design were developed a priori and informed by established theoretical frameworks, the lack of formal preregistration is a limitation.

      The SESOI values for Experiments 1 and 2 were derived from sensitivity analyses based on the fixed design parameters (type of test, number of participants, alpha level) of our study, not from any post-hoc interpretation based on observed results - they can therefore not be a case of SPARKing. Following current recommendations (Anderson, Kelley & Maxwell, 2017; Albers & Lakens, 2017; Lakens, 2022), we avoided basing power estimates on published effect sizes, as no such values exist for in novel paradigms, and are typically inflated due to publication and other biases. Instead, sensitivity analyses (using G*Power, v 3.1) allows us to calculate, prospectively, the smallest effect each design could detect with 90 % power, given the actual sample size, test type, and α level. Because more participants were excluded in Experiment 2, this design can detect slightly larger effects (d = 0.62) than Experiment 1 (d = 0.57). Please note that both studies therefore remain well-powered to capture effects of the magnitude typically reported in previous research using feedback manipulations to explore interoceptive illusions (e.g., Iodice et al., 2019, d ≈ 0.7).

      We have added this clarification to the Participants section of Experiment 1 (Lines 208-217).

      Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty. Psychological Science, 28(11), 1547-1562.

      Lakens, D. (2022). Sample size justification. Collabra: psychology, 8(1), 33267.

      Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.

      In the Apparatus subsection, it is stated that the intensity of the electrical stimuli was fixed at 2 ms. I believe the authors refer to the duration of the stimulus, not its intensity.

      You are right, thank you for pointing that out. The text should refer to the duration of the electrical stimulus, not its intensity. We have corrected this wording in the revised manuscript to avoid confusion.

      It would be interesting to report (in graphical form) the stimulation intensities corresponding to the calibration procedure for the five different pain levels identified for all subjects.

      That's a good suggestion. We have included a supplementary figure showing the stimulation intensities corresponding to the five individually calibrated pain levels across all participants (Supplementary Figure 11.)

      It is questionable that researchers state that "pain and unpleasantness should be rated independently" but then the first level of the Likert scale for unpleasantness is "1=no pain". This is particularly relevant since simulation (and specifically electrical stimulation) can be unpleasant but non-painful at the same time. Since the experiments were already performed, the researchers should at least explain this choice.

      Thank you for raising this point. You are right in that the label of “no pain” in the pain unpleasantness scale was not ideal, and we now acknowledge this in the text (lines 886-890). Please note that this was always the second rating that participants gave (after pain intensity), and the strongest results come from this first rating.

      Discussion.

      I did not find in the manuscript the rationale for varying the frequency of the heart rate by 25% (instead of any other arbitrary quantity).

      We thank the Reviewer for this observation, which prompted us to clarify the rationale behind our choice of a ±25% manipulation of heart rate feedback. False feedback paradigms have historically relied on a variety of approaches to modulate perceived cardiac signals. Some studies have adopted non-individualised values, using fixed frequencies (e.g., 60 or 110 bpm) to evoke states of calm or arousal, independently of participants’ actual physiology (Valins, 1966; Shahidi & Baluch, 1991; Crucian et al., 2000; Tajadura-Jiménez et al., 2008). Others have used the participant’s real-time heart rate as a basis, introducing accelerations or decelerations without applying a specific percentage transformation (e.g., Iodice et al., 2019). More recently, a growing body of work has employed percentage-based alterations of the instantaneous heart rate, offering a controlled and participant-specific manipulation. These include studies using −20% (Azevedo et al., 2017), ±30% (Dey et al., 2018), and even ±50% (Gray et al., 2007).

      These different methodologies - non-individualised, absolute, or proportionally scaled - have all been shown to effectively modulate subjective and physiological responses. They suggest that the impact of false feedback does not depend on a single fixed method, but rather on the plausibility and salience of the manipulation within the context of the task. We chose to apply a ±25% variation because it falls well within the most commonly used range and strikes a balance between producing a detectable effect and maintaining the illusion of physiological realism. The magnitude is conceptually justified as being large enough to shape interoceptive and emotional experience (as shown by Azevedo and Dey), yet small enough to avoid implausible or disruptive alterations, such as those approaching ±50%. We have now clarified this rationale in the revised Procedure paragraph of Experiment 1 (lines 306-318).

      T. Azevedo, R., Bennett, N., Bilicki, A., Hooper, J., Markopoulou, F., & Tsakiris, M. (2017). The calming effect of a new wearable device during the anticipation of public speech. Scientific reports, 7(1), 2285.

      Crucian, G. P., Hughes, J. D., Barrett, A. M., Williamson, D. J. G., Bauer, R. M., Bowers, D., & Heilman, K. M. (2000). Emotional and physiological responses to false feedback. Cortex, 36(5), 623-647.

      Dey, A., Chen, H., Billinghurst, M., & Lindeman, R. W. (2018, October). Effects of manipulating physiological feedback in immersive virtual environments. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play (pp. 101-111).

      Gray, M. A., Harrison, N. A., Wiens, S., & Critchley, H. D. (2007). Modulation of emotional appraisal by false physiological feedback during fMRI. PLoS one, 2(6), e546.

      Shahidi, S., & Baluch, B. (1991). False heart-rate feedback, social anxiety and self-attribution of embarrassment. Psychological reports, 69(3), 1024-1026.

      Tajadura-Jiménez, A., Väljamäe, A., & Västfjäll, D. (2008). Self-representation in mediated environments: the experience of emotions modulated by auditory-vibrotactile heartbeat. CyberPsychology & Behavior, 11(1), 33-38.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      The researchers state that pain ratings collected in the feedback phase were normalized to the no-feedback phase to control for inter-individual variability in pain perception, as established by previous research. They cite three studies involving smell and taste, of which the last two contain the same normalization presented in this study. However, unlike these studies, the outcomes here require no normalization whatsoever, because there should be no (or very little) inter-individual variability in pain intensity ratings. Indeed, pain intensity ratings in this study are anchored to 30, 50, and 70 / 100 as a condition of the experimental design. The researchers go to extreme lengths to ensure this is the case, by adjusting stimulation intensities until at least 75% of stimulation intensities are correctly matched to their pain ratings counterpart in the pre-experiment procedure. In other words, inter-individual variability in this study is in stimulation intensities, and not pain intensity ratings. Even if it could be argued that pain unpleasantness and heart rate still need to account for inter-individual variability, the best way to do this is by using the baseline (no-feedback) measures as covariates in a mixed linear model. Another advantage of this approach is that all the effects can be described in terms of the original scales and are readily understandable, and post hoc tests between levels can be corrected for multiple comparisons. On the contrary, the familywise error rate for the comparisons between conditions in the current analysis is larger than 5% (since there is a "main" paired t-test and additional "simple" tests).

      We disagree that there is little to no variability in the no feedback phase. Participants were tested in their ability to distinguish intensities in an initial pre-experiment calibration phase. In the no feedback phase, participants rated the pain stimuli in the full experimental context.

      In the pre-experiment calibration phase, participants were tested only once in their ability to match five electrical‐stimulation levels to the 0-100 NPS scale, before any feedback manipulation started. During this pre-experiment calibration we required that each level was classified correctly on ≥ 75 % of the four repetitions; “correct” meant falling within ± 5 NPS units of the target anchor (e.g., a response of 25–35 was accepted for the 30/100 anchor). This procedure served one purpose only: to make sure that every participant entered the main experiment with three unambiguously distinguishable stimulation levels (30 / 50 / 70). We integrated this point in the revised manuscript lines 263-270.

      Once the real task began, the context changed: shocks are unpredictable, attention is drawn to the heartbeat, and participants must judge both intensity and unpleasantness. In this full experimental setting the no-feedback block indeed shows considerable variability, even for the pain intensity ratings. Participants mean rating on the NPS scale was 46.4, with a standard deviation of 11.9 - thus participants vary quite strongly in their mean ratings (range 14.5 to 70). Moreover, while all participants show a positive correlation between actual intensities and their ratings (i.e., they rate the higher intensities as more intense than the lower ones), they vary in how much of the scale they use, with differences between reported highest and lowest intensities ranging between 8 and 91, for the participants showing the smallest and largest differences, respectively.

      Thus, while we simplified the analysis to remove the difference scoring relative to the congruent trials and now use these congruent trials as an additional condition in the analysis, we retained the normalisation procedure to account for the in-fact-existing between-participant variability, and ensure consistency with prior research (Bartolo et al., 2013; Cecchini et al., 2020; Riello et al., 2019) and our a priori analysis plan.

      However, to ensure we fully address your point here (and the other reviewers’ points about potential additional factors affecting the effects, like trial number and stimulus intensity), we also report an additional linear mixed-effects model analysis without normalization. It includes every feedback level as condition (No-Feedback, Congruent, Slower, Faster), plus additional predictors for actual stimulus intensity and trial rank within the experiment (as suggested by the other reviewers). This confirms that all relevant results remain intact once baseline and congruent trials are explicitly included in the model.

      In brief, cross‐experiment analyses demonstrated that the Faster vs Slower contrast was markedly larger when the feedback was interoceptive than when it was exteroceptive. This held for heart-rate deceleration (b = 0.94 bpm, p = .005), for increases in unpleasantness (b = -0.16 Likert units, p = .015), and in pain-intensity ratings (b = -3.27 NPS points, p = .037).

      These findings were then further confirmed by within-experiment analyses. Within the interoceptive experiment, the mixed-model on raw scores replicated every original effect: heart rate was lower after Faster than Slower feedback (estimate = –0.69 bpm, p = .005); unpleasantness was higher after Faster than Slower feedback (estimate = 0.19, p < .001); pain-intensity rose after Faster versus Slower (estimate=-4.285, p < .001). In the exteroceptive experiment, however, none of these Faster–Slower contrasts reached significance for heart rate (all ps > .33), unpleasantness (all ps > .43) or intensity (all ps > .10).  Because these effects remain significant even with No-Feedback and Congruent trials explicitly included in the model and vanish under exteroceptive control, the supplementary, non-normalised analyses confirm that the faster vs. slower interoceptive feedback uniquely lowers anticipatory heart rate while amplifying both intensity and unpleasantness of pain, independent of data transformation or reference conditions.  Please see Supplementary analyses for further details.

      Bartolo, M., Serrao, M., Gamgebeli, Z., Alpaidze, M., Perrotta, A., Padua, L., Pierelli, F., Nappi, G., & Sandrini, G. (2013). Modulation of the human nociceptive flexion reflex by pleasant and unpleasant odors. PAIN®, 154(10), 2054-2059.

      Cecchini, M. P., Riello, M., Sandri, A., Zanini, A., Fiorio, M., & Tinazzi, M. (2020). Smell and taste dissociations in the modulation of tonic pain perception induced by a capsaicin cream application. European Journal of Pain, 24(10), 1946-1955.

      Riello, M., Cecchini, M. P., Zanini, A., Di Chiappari, M., Tinazzi, M., & Fiorio, M. (2019). Perception of phasic pain is modulated by smell and taste. European Journal of Pain, 23(10), 1790-1800.

      I could initially not find a rationale for bringing upfront the comparison between faster vs. slower HR acoustic feedback when in principle the intuitive comparisons would be faster vs. congruent and slower vs. congruent feedback. This is even more relevant considering that in the proposed main comparison, the congruent feedback does not play a role: since Δ outcomes are calculated as (faster - congruent) and (slower - congruent), a paired t-test between Δ faster and Δ slower outcomes equals (faster - congruent) - (slower - congruent) = (faster - slower). I later realized that the statistical comparison (paired t-test) of pain intensity ratings of faster vs. slower acoustic feedback is significant in experiment 1 but not in experiment 2, which in principle would support the argument that interoceptive, but not exteroceptive, feedback modulates pain perception. However, the "simple" t-tests show that faster feedback modulates pain perception in both experiments, although the effect is larger in experiment 1 (interoceptive feedback) compared to experiment 2 (exteroceptive feedback).

      The comparison between faster and slower feedback is indeed crucial, and we regret not having made this clearer in the first version of the manuscript. As noted in our response to your point in the public review, this comparison is both statistically most powerful, and theoretically the most appropriate, as it controls for any influence of salience or surprise when heart rates deviate (in either direction) from what is expected. It therefore provides a clean measure of how much accelerated heartrate affects pain perception and physiological response, relative to an equal change in the opposite direction. However, as noted above, in the new version of the manuscript we have now removed the analysis via difference scores, and directly compared all three relevant conditions (faster, congruent, slower), first via an ANOVA and then with follow-up planned t-tests.

      Please refer to our previous response for further details (i.e., Furthermore, the researchers propose the comparison of faster vs. slower delta HR acoustic feedback throughout the manuscript when the natural comparison is the incongruent vs. the congruent feedback [..]).

      The design of experiment two involves the selection of knocking wood sounds to act as exteroceptive acoustic feedback. Since the purpose is to test whether sound affects pain intensity ratings, unpleasantness, and heart rate, it would have made sense to choose sounds that would be more likely to elicit such changes, e.g. Taffou et al. (2021), Chen & Wang (2022), Zhou et al. (2022), Tajadura-Jiménez et al. (2010). Whereas I acknowledge that there is a difference in effect sizes between experiment 1 and experiment 2 for the faster acoustic feedback, I am not fully convinced that this difference is due to the nature of the feedback (interoceptive vs. exteroceptive), since a similar difference could arguably be obtained by exteroceptive sound with looming or rough qualities. Since the experiment was already carried out and this hypothesis cannot be tested, I suggest that the researchers moderate the inferences made in the Discussion regarding these results.

      Please refer to our previous response for a previous detailed answer to this point in the Public Review (i.e., This could be influenced by the fact that the faster HR exteroceptive cue in experiment 2 also shows a significant modulatory effect [..]). As we describe there, we see little grounds to suspect such a non-specific influence of acoustic parameters, as it is specifically the sensitivity to the change in heart rate (faster vs slower) that is affected by our between-experiment manipulation, not the overall response to the different exteroceptive or interoceptive sounds. Moreover, the specific change induced by the faster interoceptive feedback - a heartrate deceleration - is not consistent with a change in arousal or alertness (which would have predicted an increase in heartrate with increasing arousal). See also Discussion-Accounting for general unspecific contributions.

      Additionally, the fact that no significant effects were found for unpleasantness ratings or heart rate (absence of evidence) should not be taken as proof that faster exteroceptive feedback does not induce an effect on these outcomes (evidence of absence). In this case, it could be that there is actually no effect on these variables, or that the experiment was not sufficiently powered to detect those effects. This would depend on the SESOIs for these variables, which as stated before, was not properly justified.

      We very much agree that the absence of significant effects should not be interpreted as definitive evidence of absence. Indeed, we were careful not to overinterpret the null findings for heart rate and unpleasantness ratings, and we conducted additional analyses to clarify their interpretation. First, the TOST analysis shows that any effects in Experiment 2 are (significantly) smaller than the smallest effect size that can possibly be detected in our experiment, given the experimental parameters (number of participants, type of test, alpha level). Second, and more importantly, we run between-experiments comparisons (see Results Experiment 2, and Supplementary materials, Cross-experiment analysis between-subjects model) of the crucial difference in the changes induced by faster and slower feedback. This showed that the differences were larger with interoceptive (Experiment 1) than exteroceptive cues (Experiment 2). Thus, even if a smaller than is in principle detectable effect is induced by the exteroceptive cues in Experiment 2, it is smaller than with interoceptive cues in Experiment 1.

      To ensure we fully address this point, we have now simplified our main analysis (main manuscript), replicated it with a different analysis (Supplementary material), we motivate more clearly (Methods Experiment 1), why the comparison between faster and slower feedback is crucial, and we make clearer that the difference between these conditions is larger in Experiment 1 than Experiment 2 (Results Experiment 2). Moreover, we went through the manuscript and ensured that our wording does not over-interpret the absence of effects in Experiment 2, as an absence of a difference.

      The section "Additional comparison analysis between experiments" encompasses in a way all possible comparisons between levels of the different factors in both experiments. My original suggestion regarding the use of a mixed linear model with covariates is still valid for this case. This analysis also brings into question another aspect of the experimental design: what is the rationale for dividing the study into two experiments, considering that variability and confounding factors would have been much better controlled in a single experimental session that includes all conditions?

      We thank the reviewer for their comment. We would like to note, first, that the between-experiment analyses did not encompass all possible comparisons between levels, as it just included faster and slower feedback for the within-experiment comparison Instead, they focus on the specific interaction between faster and slower feedback on the one hand, and interoceptive vs exteroceptive cues on the other. This interaction essentially compares, for each dependent measure (HR, pain unpleasantness, pain intensity), the difference between faster and slower feedback in Experiment 1 with that the same difference in Experiment 2 (and would produce identical p values to a between-experiment t-test). The significant interactions therefore indicate larger effects of interoceptive cues than exteroceptive ones for each of the measures. To make this clearer, we have now exchanged the analysis with between-experiment t-tests of the difference between faster and slower feedback for each measure (Results Experiment 2), producing identical results. Moreover, as suggested, we also now report linear mixed model analyses (see Supplementary Materials), which provide a comprehensive comparison across experiments.

      Regarding the experimental design, we appreciate the reviewer’s suggestion regarding a within-subject crossover design. While such an approach indeed offers greater statistical power by reducing interindividual variability (Charness, Gneezy, & Kuhn, 2012), we intentionally chose a between-subjects design due to theoretical and methodological considerations specific to deceptive feedback paradigms. First, carryover effects are a major concern in deception studies. Participants exposed to one type of feedback could develop suspicion or adaptive strategies that would alter their responses in subsequent conditions (Martin & Sayette, 1993). Expectancy effects could thus contaminate results in a crossover design, particularly when feedback manipulation becomes apparent. In line with this idea, past studies on false cardiac feedback (e.g., Valins, 1966; Pennebaker & Lightner, 1980) often employed between-subjects or blocked designs to maintain the ecological validity of the illusion.

      Charness, G., Gneezy, U., & Kuhn, M. A. (2012). Experimental methods: Between-subject and within-subject design. Journal of economic behavior & organization, 81(1), 1-8.

      Martin, C. S., & Sayette, M. A. (1993). Experimental design in alcohol administration research: limitations and alternatives in the manipulation of dosage-set. Journal of studies on alcohol, 54(6), 750-761.

      Pennebaker, J. W., & Lightner, J. M. (1980). Competition of internal and external information in an exercise setting. Journal of personality and social psychology, 39(1), 165.

      Valins, S. (1966). Cognitive effects of false heart-rate feedback. Journal of personality and social psychology, 4(4), 400.

      References

      Chen ZS, Wang J. Pain, from perception to action: A computational perspective. iScience. 2022 Dec 1;26(1):105707. doi: 10.1016/j.isci.2022.105707.

      Dienes Z. Obtaining Evidence for No Effect. Collabra: Psychology 2021 Jan 4; 7 (1): 28202. doi: 10.1525/collabra.28202

      King MT. A point of minimal important difference (MID): a critique of terminology and methods. Expert Rev Pharmacoecon Outcomes Res. 2011 Apr;11(2):171-84. doi: 10.1586/erp.11.9.

      Lakens D. Sample Size Justification. Collabra: Psychology 2022 Jan 5; 8 (1): 33267. doi: 10.1525/collabra.33267

      Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. The preregistration revolution. Proc Natl Acad Sci U S A. 2018 Mar 13;115(11):2600-2606. doi: 10.1073/pnas.1708274114.

      Sasaki K, Yamada Y. SPARKing: Sample-size planning after the results are known. Front Hum Neurosci. 2023 Feb 22;17:912338. doi: 10.3389/fnhum.2023.912338.

      Taffou M, Suied C, Viaud-Delmon I. Auditory roughness elicits defense reactions. Sci Rep. 2021 Jan 13;11(1):956. doi: 10.1038/s41598-020-79767-0.

      Tajadura-Jiménez A, Väljamäe A, Asutay E, Västfjäll D. Embodied auditory perception: The emotional impact of approaching and receding sound sources. Emotion. 2010, 10(2), 216-229.https://doi.org/10.1037/a0018422

      Zhou W, Ye C, Wang H, Mao Y, Zhang W, Liu A, Yang CL, Li T, Hayashi L, Zhao W, Chen L, Liu Y, Tao W, Zhang Z. Sound induces analgesia through corticothalamic circuits. Science. 2022 Jul 8;377(6602):198-204. doi: 10.1126/science.abn4663.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript would benefit from some spelling- and grammar checking.

      Done

      Discussion:

      The discussion section is rather lengthy and would benefit from some re-structuring, editing, and sub-section headers.

      In response, we have restructured and edited the Discussion section to improve clarity and flow.

      I personally had a difficult time understanding how the data relates to the rubber hand illusion (l.623-630). I would recommend revising or deleting this section.

      We thank the reviewer for this valuable feedback. We have revised the paragraph and made the parallel clearer (lines 731-739).

      Other areas are a bit short and might benefit from some elaboration, such as clinical implications. Since they were mentioned in the abstract, I had expected a bit more thorough discussion here (l. 718).

      Thank you for this suggestion. We have expanded the discussion to more thoroughly address the clinical implications of our interoceptive pain illusion (See Limitations and Future Directions paragraph).

      Further, clarification is needed for the following:

      I would like some more details on participant instructions; in particular, the potential difference in instruction between Exp. 1 and 2, if any. In Exp. 1, it says: (l. 280) "Crucially, they were also informed that over the 60 seconds preceding the administration of the shock, they were exposed to acoustic feedback, which was equivalent to their ongoing heart rate". Was there a similar instruction for Exp. 2? If yes, it would suggest a more specific effect of cardiac auditory feedback; if no, the ramifications of this difference in instructions should be more thoroughly discussed.

      Thank you for this suggestion. We have clarified this point in the Procedure of Experiment 2 (548-550).

    1. Author response:

      We thank the editors and all reviewers for the detailed evaluation of the work and the overall positive remarks, as well as the constructive feedback to improve our manuscript. Based on the integrated comments of the reviewers and advice of the reviewing editor, we will suitably address all comments raised by the reviewers, and we outline our revision plan below:

      Interpretation of findings

      ● We will carefully reframe our interpretation of the data regarding the role of the pallium in the coupled saccade-tail turning events, and clearly state that we have not established a causal role, which requires additional perturbation experiments.

      ● We will also acknowledge the confounding interpretation that the pallial activities recorded may also represent or include arousal state signals.

      Streamlining the presentation

      ● In the introduction, we will better contextualize our study with additional discussions on (i) the advantageous use of zebrafish to study chemosensation, factoring in differences in the spread of chemical cues in water vs. air, and (ii) the disruption of eye-body coordination and underlying neural circuits.

      ● We will streamline the presentation of data in Fig. 1 by keeping the overall responses of the larvae to each chemical across concentrations in the main figure, while moving suitable additional details to a supplementary figure.

      ● Similarly, for each of the subsequent main figures, wherever suitable we will select an illustrative, core set of panels to retain in the main figure, and move other more detailed plots to supplementary figures.

      ● We will incorporate additional references and discussions of the past literature, including relating our findings to (i) chemosensation/multisensory integration in Drosophila, (ii) thermosensation-driven and navigational behavior in larval zebrafish, and (iii) fleeing or escape behavior in zebrafish and other species.

      ● We will clarify our animal subject inclusion criteria, that all larval subjects with sufficiently high-quality, stable imaging were included (i.e., we only excluded larvae because of insufficient quality of imaging, but not other factors).

      ● For applicable plots, adding suitable additional details to the plots or legends (e.g., clarification of measures, specifying numbers of cells).

      Data analysis and statistics

      We will perform additional data analysis, by making comparisons with statistics performedon fish subject-level, and include confident intervals wherever applicable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      The manuscript submitted by Langenbacher et al., entitled " Rtf1-dependent transcriptional pausing regulates cardiogenesis", describes very interesting and highly impactful observations about the function of Rtf-1 in cardiac development. Over the last few years, the Chen lab has published novel insights into the genes involved in cardiac morphogenesis. Here, they used the mouse model, the zebrafish model, cellular assays, single cell transcription, chemical inhibition, and pathway analysis to provide a comprehensive view of Rtf1 in RNAPII (Pol2) transcription pausing during cardiac development. They also conducted knockdown-rescue experiments to dissect the functions of Rtf1 domains. 

      Strengths:

      The most interesting discovery is the connection between Rtf1 and CDK9 in regulating Pol2 pausing as an essential step in normal heart development. The design and execution of these experiments also demonstrate a thorough approach to revealing a previously underappreciated role of Pol2 transcription pausing in cardiac development. This study also highlights the potential amelioration of related cardiac deficiencies using small molecule inhibitors against cyclin dependent kinases, many of which are already clinically approved, while many other specific inhibitors are at various preclinical stages of development for the treatment of other human diseases. Thus, this work is impactful and highly significant. 

      We thank the reviewer for appreciating our work.

      Reviewer #2 (Public Review): 

      Summary: 

      Langenbacher at el. examine the requirement of Rtf1, a component of the PAF1C, which regulates transcriptional pausing in cardiac development. The authors first confirm their previous morphant study with newly generated rtf1 mutant alleles, which recapitulate the defects in cardiac progenitor and diUerentiation gene expression observed previously in morphants. They then examine the conservation of Rtf1 in mouse embryos and embryonic stem cell-derived cardiomyocytes. Conditional loss of Rtf1 in mesodermal lineages and depletion in murine ESCs demonstrates a failure to turn on cardiac progenitor and diUerentiation marker genes, supporting conservation of Rtf1 in promoting cardiac development. The authors subsequently employ bulk RNA-seq on flow-sorted hand2:GFP+ cells and multiomic single-cell RNA-seq on whole Rtf1-depleted embryos at the 10-12 stage. These experiments corroborate that genes associated with cardiac and muscle development are lost. Furthermore, the diUerentiation trajectories suggest that the expression of genes associated with cardiac maturation is not initiated.  Structure-function analysis supports that the Plus3 domain is necessary for its function in promoting cardiac progenitor formation. ChIP-seq for RNA Pol II on 1012 somite stage embryos suggests that Rtf1 is required for proper promoter pausing. This defect can partially be rescued through use of a pharmacological inhibitor for Cdk9, which inhibits elongation, can partially restore elongation in rtf1 mutants.  

      Strengths: 

      Many aspects of the data are strong, which support the basic conclusions of the authors that Rtf1 is required for transcriptional pausing and has a conserved requirement in vertebrate cardiac development. Areas of strength include the genetic data supporting the conserved requirement for Rtf1 in promoting cardiac development, the complementary bulk and single-cell RNA-sequencing approaches providing some insight into the gene expression changes of the cardiac progenitors, the structure-function analysis supporting the requirement of the Plus3 domain, and the pharmacological epistasis combined with the RNA Pol II ChIP-seq, supporting the mechanism implicating Cdk9 in the Rtf1 dependent mechanism of RNA Pol II pausing. 

      We thank the reviewer for the summary and for recognizing many strengths of our work. 

      Weaknesses: 

      While most of the basic conclusions are supported by the data, there are a number of analyses that are confusing as to why they chose to perform the experiments the way they did and some places where the interpretations presently do not support the interpretations. One of the conclusions is that the phenotype aUects the maturation of the cardiomyocytes and they are arresting in an immature state. However, this seems to be mostly derived from picking a few candidates from the single cell data in Fig. 6. If that were the case, wouldn't the expectation be to observe relatively normal expression of earlier marker genes required for specification, such as Nkx2.5 and Gata5/6? The in situ expression analysis from fish and mice (Fig. 2 and Fig. 3) and bulk RNA-seq (Fig. 5) seems to suggest that there are pretty early specification and diUerentiation defects. While some genes associated with cardiac development are not changed, many of these are not specific to cardiomyocyte progenitors and expressed broadly throughout the ALPM. Similarly, it is not clear why a consistent set of cardiac progenitor genes (for instance mef2ca, nkx2.5, and tbx20) was analyzed for all the experiments, in particular with the single cell analysis. 

      A major conclusion of our study is that Rtf1 deficiency impairs myocardial lineage differentiation from mesoderm, as suggested by the reviewer. Thus, the main goal of this study is to understand how Rtf1 drives cardiac differentiation from the LPM, rather than the maturation of cardiomyocytes.  Multiple lines of evidence support this conclusion:

      (a) In situ hybridization showed that Rtf1 mutant embryos do not have nkx2.5+ cardiac progenitor cells and subsequently fail to produce cardiomyocytes (Figs. 2, 3).

      (b) RT-PCR analysis showed that knockdown of Rtf1 in mouse embryonic stem cells causes a dramatic reduction of cardiac gene expression and production of significantly fewer beating patches (Fig.4).

      (c) Bulk RNA sequencing revealed significant downregulation of cardiac lineage genes, including nkx2.5 (Fig. 5).

      (d) Single cell RNA sequencing clearly showed that lateral plate mesoderm (LPM) cells are significantly more abundant in Rtf1 morphant,s whereas cardiac progenitors are less abundant (Fig. 6 and Fig.6 Supplement 1-5). 

      When feasible, we used cardiac lineage restricted markers in our assays. Nkx2.5 and tbx5a are not highlighted in the single cell analysis because their expression in our sc-seq dataset was too low to examine in the clustering/trajectory analysis.  In this revised manuscript, we provide violin plots showing the low expression levels of these genes in single cells from Rtf1 deficient embryos (Figure 6 Supplement 5).

      The point of the multiomic analysis is confusing. RNA- and ATAC-seq were apparently done at the same time. Yet, the focus of the analysis that is presented is on a small part of the RNA-seq data. This data set could have been more thoroughly analyzed, particularly in light of how chromatin changes may be associated with the transcriptional pausing. This seems to be a lost opportunity. Additionally, how the single cell data is covered in Supplemental Fig. 2 and 3 is confusing. There is no indication of what the diUerent clusters are in the Figure or the legend. 

      In this study, we performed single cell multiome analysis and used both scRNAseq and scATACseq datasets to generate reliable clustering.  The scRNAseq analysis reveals how Rtf1 deficiency impacts cardiac differentiation from mesoderm, which inspired us to investigate the underlying mechanism and led to the discovery of defects in Rtf1-dependent transcriptional pause release.

      We agree with the reviewer that deep examination of Rtf1-dependent chromatin changes would provide additional insights into how Rtf1 influences early development and careful examination of the scATACseq dataset is certainly a good future direction.  

      In this revised manuscript, we have revised Fig.6 Supplement 1 to include the predicted cell types and provide an additional excel file showing the annotation of all 39 clusters (Supplementary Table 2). 

      While the effect of Rtf1 loss on cardiomyocyte markers is certainly dramatic, it is not clear how well the mutant fish have been analyzed and how specific the eUect is to this population. It is interpreted that the eUects on cardiomyocytes are not due to "transfating" of other cell fates, yet supplemental Fig. 4 shows numerous eUects on potentially adjacent cell populations. Minimally, additional data needs to be provided showing the live fish at these stages and marker analysis to support these statements. In some images, it is not clear the embryos are the same stage (one can see pigmentation in the eyes of controls that is not in the mutants/morphants), causing some concern about developmental delay in the mutants. 

      Single cell RNA sequencing showed an increased abundance of LPM cells and a reduced abundance of cardiac progenitors in Rtf1 morphants (Fig. 6 and Fig.6 Supplement 1-5). The reclustering of anterior lateral plate mesoderm (ALPM) cells and their derivatives further showed that cells representing undifferentiated ALPM were increased whereas cells representing all three ALPM derivatives were reduced. These findings indicate a defect in ALPM differentiation. 

      The reviewer questioned whether we examined stage-matched embryos. In our assay, Rtf1 mutant embryos were collected from crosses of Rtf1 heterozygotes. Each clutch from these crosses consists of ¼ embryos showing rtf1 mutant phenotypes and ¾ embryos showing wild type phenotypes which were used as control. Mutants and their wild type siblings were fixed or analyzed at the same time.

      The reviewer questioned the specificity of the Rtf1 deficient cardiac phenotype and pointed out that Rtf1 mutant embryos do not have pigment cells around the eye.  Rtf1 is a ubiquitously expressed transcriptional regulator.  Previous studies in zebrafish have shown that Rtf1 deficiency significantly impacts embryonic development. Rtf1 deficiency causes severe defects in cardiac lineage and neural crest cell development; consequently, Rtf1 deficient embryos do not have cardiomyocytes and pigmentation (Langenbacher et al., 2011, Akanuma et al., 2007, and Jurynec et al., 2019).  We now provide an image showing a 2-day-old Rtf1 mutant embryo and their wild type sibling to illustrate the cardiac, neural crest, and somitogenesis defects caused by loss of Rtf1 activity (Fig. 2 Supplement 1).

      With respect to the transcriptional pausing defects in the Rtf1 deficient embryos, it is not clear from the data how this eUect relates to the expression of the cardiac markers. This could have been directly analyzed with some additional sequencing, such as PRO-seq, which would provide a direct analysis of transcriptional elongation. 

      We showed that Rtf1 deficiency results in a nearly genome-wide decrease in promoterproximal pausing and downregulation of cardiac makers. Attenuating transcriptional pause release could restore cardiomyocyte formation in Rtf1 deficient embryos. In this revised manuscript, we provide additional RNAseq data showing that the expression levels of critical cardiac development genes such as nkx2.5, tbx5a, tbx20, mef2ca, mef2cb, ttn.2, and ryr2b are significantly rescued.  We agree with the reviewer that further analyses using the PRO-seq approach could provide additional insights, but it is beyond the scope of this manuscript. 

      Some additional minor issues include the rationale that sequence conservation suggests an important requirement of a gene (line 137), which there are many examples this isn't the case, referencing figures panels out of order in Figs. 4, 7, and 8) as described in the text, and using the morphants for some experiments, such as the rescue, that could have been done in a blinded manner with the mutants. 

      We have clarified the rationale in this revised manuscript and made the eRort to reference figures in order. 

      The reviewer commented that rescue experiments “could have been done in a blinded manner with the mutants”. This was indeed how the flavopiridol rescue and cdk9 knockdown experiments were carried out. Embryos from crosses of Rtf1 heterozygotes were collected, fixed after treatment and subjected to in situ hybridization. Embryos were then scored for cardiac phenotype and genotyped (Fig.8 d-g). Morpholino knockdown was used in genomic experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest (Fig. 2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This reviewer has a few suggestions below, aimed at improving the clarity and impact of the current study. Once these items are addressed, the manuscript should be of interest to the Elife reader. 

      Item 1. Strengthening the interaction between Rfh1 and CDK9 on Pol2 pausing. 

      The authors have convincingly shown that the chemical inhibition of CDK9 by flavopiridol can partially rescue the expression of cardiac genes in the zebrafish model. Although flavopiridol is FDA approved and has been a classical inhibitor for the dissection of CDK9 function, it also inhibits related CDKs (such as Flavopiridol (Alvocidib) competes with ATP to inhibit CDKs including CDK1, CDK2, CDK4, CDK6, and CDK9 with IC50 values in the 20-100 nM range) Therefore, this study could be more impactful if the authors can provide evidence on which of these CDKs may be most relevant during Rtf1-dependent cardiogenesis. To determine whether the observed cardiac defect indicates a preferential role for CDK9, or that other CDKs may also be able to provide partial rescue may be clarified using additional, more selective small molecules (e.g., BAY1251152, LDC000067 are commercially available). 

      The reviewer raised a reasonable concern about the specificity of flavopiridol. We thank the reviewer for the insightful suggestion and share the concern about specificity. To address this question, we have used an orthogonal testing through morpholino inhibition where we directly targeted CDK9 and observed the same level of rescue, supporting a critical role of transcription pausing in cardiogenesis.

      Item 2. Differences between CRISPR lines and morphants 

      Much of the work presented used Rtf1 morphants while the authors have already generated 2 CRISPR lines. What is the diUerence between morphants and mutants? The authors should comment on the similarities and/or differences between using morphants or mutants in their study and whether the same Rtf1- CDK9 connection also occurs in the CRISPR lines. 

      The morphology of our mutants (rtf1<sup>LA2678</sup> and rtf1<sup>LA2679</sup>) resembles the morphants and the previously reported ENU-induced rtf1<sup>KT641</sup> allele. Extensive in situ hybridization analysis showed that the morphants faithfully recapitulate the mutant phenotypes (Fig.2). We have performed rescue experiments (flavopiridol and CDK9 morpholino) using Rtf1 mutant embryos and found that inhibiting Cdk9 restores cardiomyocyte formation (Fig.8). 

      Item 3. Discuss the therapeutic relevance of study 

      The authors have already generated a mouse model of Rtf1 Mesp1-Cre knockout where cardiac muscle development is severely derailed (Fig 3B). Thus, a demonstration of a conserved role for CDK9 inhibitor in rescuing cardiogenesis using mouse cells or the mouse model will provide important information on a conserved pathway function relevant to mammalian heart development. In the Discussion, how this underlying mechanistic role may be useful in the treatment of congenital heart disease should be provided.  

      Thank you for the insight. We have incorporated your comments in the discussion. 

      Item 4. Insights into the role of CDK9-Rtf1 in response to stress versus in cardiogenesis. 

      In the Discussion, the authors commented on the role of additional stress-related stimuli such as heat shock and inflammation that have been linked to CDK9 activity. However, the current ms provides the first, endogenous role of Pol2 pausing in a critical developmental step during normal cardiogenesis. The authors should emphasize the novelty and significance of their work by providing a paragraph on the state of knowledge on the molecular mechanisms governing cardiogenesis, then placing their discovery within this framework. This minor addition will also clarify the significance of this work to the broad readership of eLife. 

      Thank you for the suggestion. We have incorporated your comments and elaborate on the novelty and significance of our work in the discussion. 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is diUicult to assess what the overt defects are in the embryos at any stages. Images of live images were not included in the supplement. Do these have a small, malformed heart tube later or are the embryos just deteriorating due to broad defects? 

      The Rtf1 deficient embryos do not produce nkx2.5+ cardiac progenitors. Consequently, we never observed a heart tube or detected cells expressing cardiomyocyte marker genes such as myl7. This finding is consistent with previous reports using rtf1 morphants and rtf<sup>1KT64</sup>, an ENU-induced point mutation allele (Langenbacher et al., 2011 and Akanuma, 2007). In this revised manuscript, we provide a live image of 2-day-old wild type and rtf1<sup>LA2679/LA2679</sup> embryos (Fig. 2 Supplement 1). After two days, rtf1 mutant embryos undergo broad cell death. 

      (2) Fig. 2, although the in situs are convincing, there is not a quantitative assessment of expression changes for these genes. This could have been done for the bulk or single cell RNA-seq experiments, but was not and these genes weren't not included in the heat maps. A quantitative assessment of these genes would benefit the study. 

      The top 40 most significantly differentially expressed genes are displayed in the heatmap presented in Fig.5d. The complete differential gene expression analysis results for our hand2 FACS-based comparison of rtf1 morphants and controls is presented in Supplementary Data File 1.  In this revised manuscript, we provide a new supplemental figure with violin plots showing the expression levels of genes of interest in our single cell sequencing dataset (Fig.6 Supplement 5).

      (3) It doesn't not appear that any statistical tests were used for the comparisons in Fig. 2.

      We now provide the statistical data in the legend and Fig.2 b, d, f, h and i.

      (4) It's not clear the magnifications and orientations of the embryos in Fig. 3b are the same. 

      Embryos shown in Fig.3b are at the same magnification. However, because Rtf1 mutant embryos display severe morphological defects, the orientation of mutant embryos was adjusted to examine the cardiac tissue.

      (5) The n's for analysis of MLC2v in WT Rtf1 CKO embryos in Fig. 3b are only 1. At least a few more embryos should be analyzed to confirm that the phenotype is consistent. 

      We have revised the figure and present the number of embryos analyzed and statistics in Fig.3c. 

      (6) A number of figure panels are referred to out of order in the text. Fig. 4E-G are before Fig. 4C, D, Fig. 7C  before 7B, Fig. 8D-I before 8A ,B. In general, it is easier for the reader if the figures panels are presented in the order they are referred to in the text. 

      Revised as suggested.

      (7) While additional genes can be included, it is not clear why the same sets of genes are not examined in the bulk or single-cell RNA-seq as with the in situs or expression was analyzed in embryos. I suggest including the genes like nkx2.5, tbx20, myl7, in all the sequencing analysis. 

      We used the same set of genes in all analyses when possible. However, the low expression of genes such as nkx2.5 and myl7 in our sc-seq dataset preclude them from the clustering/trajectory analysis. In this revised manuscript, we present violin plots showing their expression in wild type and rtf1 morphants (Fig. 6 Supplement 5).

      (8) If a multiomic approach was used, why wasn't its analysis incorporated more into the manuscript? In general, a clearer presentation and deeper analysis of the single cell data would benefit the study. The integration of the RNA and ATAC would benefit the analysis.

      As addressed in our response to the reviewer’s public review, both datasets were used in clustering. Examining changes in chromatin accessibility is certainly interesting, but beyond the scope of this study. 

      (9) Many of the markers analyzed are not cardiac specific or it is not clear they are expressed in cardiac progenitors at the stage of the analysis. Hand2 has broader expression. Additional confirmation of some of the genes through in situ would help the interpretations. 

      Markers used for the in situ hybridization analysis (myl7, mef2ca, nkx2.5, tbx5a, and tbx20) are known for their critical role in heart development. For sc-seq trajectory analyses, most displayed genes (sema3e, bmp6, ttn.2, mef2cb, tnnt2a, ryr2b, and myh7bb) were identified based on their differential expression along the LPM-cardiac progenitor pseudotime trajectory. Rather than selecting genes based on their cardiac specificity, our goal was to examine the progressive gene expression changes associated with cardiac progenitor formation and compare gene expression of wild type and rtf1 deficient embryos.

      (10) Additional labels of the cell clusters are needed for Supplemental Figs. 2 and 3. 

      The cluster IDs were presented on Supplementary Figures 2 and 3. In this revised version, we added predicted cell types to the UMAP (revised Fig.6 Supplement 1) and provided an excel file with this information (revised Supplementary Table 2). 

      (11) On lines 101-102, the interpretation from the previous data is that diUerentiation of the LPM requires Rtf1. However, later from the single cell data the interpretation based on the markers is that Rtf1 loss aUects maturation. However, it is not clear this interpretation is correct or what changed from the single cell data. If that were the case, one would expect to see maintenance of more early marks and subsequent loss of maturation markers, which does not appear to the be the case from the presented data.

      Our data suggests that cardiac progenitor formation is not accomplished by simultaneously switching on all cardiac marker genes. Our pseudotime trajectory analysis highlights tnnt2a, ryr2b, and myh7bb as genes that increase in expression in a lagged manner compared to mef2cb (Fig. 6). Thus, the abnormal activation of mef2cb without subsequent upregulation of tnnt2a, ryr2b, and myh7bb in rtf1 morphants suggests a requirement for rtf1 in the progressive gene expression changes required for proper cardiac progenitor differentiation. Our single cell experiment focuses on the process of cardiac progenitor differentiation and does not provide insights into cardiomyocyte maturation. We have edited the text to clarify these interpretations. 

      (12) The interpretation that there is not "transfating" is not supported by the shown data. Analysis of markers in other tissues, again with in situ, to show spatially would benefit the study. 

      As stated in our response to the reviewer’s public review, we observed a dramatic increase of ALPM cells, but a decrease of ALPM derivatives including the cardiac lineage. We did not observe the expansion of one ALPM-derived subpopulation at the expense of the others. These observations suggest a defect in ALPM differentiation and argue against the notion that the region of the ALPM that would normally give rise to cardiac progenitors is instead differentiating into another cell type.

      (13) The rationale that sequence conservation means a gene is important (lines 137-139) is not really true. There are examples a lot of highly conserved genes whose mutants don't have defects. 

      We have revised the text to avoid confusion. 

      (14) The data showing that the 8 bp mutations do not aUect the RNA transcript is not shown or at least indicated in Fig. 7. It would seem that this experiment could have been done in the mutant embryos, in which case the experiment would have been semi-blinded as the genotyping would occur after imaging. 

      The modified Rtf1 wt RNA (Rtf1 wt* in revised Fig. 7) robustly rescued nkx2.5 expression in rtf1 deficient embryos, demonstrating that the 8 bp modifications do not negatively impact the activity of the injected RNA. As stated previously, morpholino knockdown was used in some experiments because our characterization of rtf1 morphants showed that they faithfully recapitulate the rtf1 mutant phenotype during the timeframe of interest.

      (15) Using a technique like PRO-seq at the same stage as the ChIP-seq would complement the ChIP-seq and allow a more detailed analysis of the transcriptional pausing on specific genes observed in WT and mutant embryos. 

      As stated in our response to the reviewer’s public review, we appreciate the suggestion but PRO-seq is beyond the scope of this study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This useful study reports that the exogenous expression of the microRNA miR-195 can partially compensate in early B cell development for the loss of EBF1, one of the key transcription factors in B cells. While this finding will be of interest to those studying lymphocyte development, the evidence, particularly with regard to the molecular mechanisms that underpin the effect of miR-195, is currently incomplete. 

      Public Reviews: 

      Reviewer #1 (Public review):

      Summary: 

      Here, the authors are proposing a role for miR-196, a microRNA that has been shown to bind and enhance the degradation of mRNA targets in the regulation of cell processes, and has a novel role in allowing the emergence of CD19+ cells in cells in which Ebf1, a critical B-cell transcription factor, has been genetically removed. 

      Strengths: 

      That over-expression of mR-195 can allow the emergence of CD19+ cells missing Ebf1 is somewhat novel. 

      Their data does perhaps support to a degree the emergence of a transcriptional network that may bypass the absence of Ebf1, including the FOXO1 transcription factor, but this data is not strong or definitive. 

      Weaknesses: 

      It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. 

      The authors have provided insufficient data to allow a thorough appraisal of the stepwise molecular changes that could account for their observed phenotype. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors investigate miRNA miR-195 in the context of B-cell development. They demonstrate that ectopic expression of miR-195 in hematopoietic progenitor cells can, to a considerable extent, override the consequences of deletion of Ebf1, a central Blineage defining transcription factor, in vitro and upon short-term transplantation into immunodeficient mice in vivo. In addition, the authors demonstrate that the reverse experiment, genetic deletion of miR-195, has virtually no effect on B-cell development. Mechanistically, the authors identify Foxo1 phosphorylation as one pathway partially contributing to the rescue effect of miR-195. An additional analysis of epigenetics by ATACseq adds potential additional factors that might also contribute to the effect of ectopic expression of miR-195. 

      Strengths: 

      The authors employ a robust assay system, Ebf1-KO HPC, to test for B-lineage promoting factors. The manuscript overall takes on an interesting perspective rarely employed for the analysis of miRNA by overexpressing the miRNA of interest. Ideally, this approach may reveal, if not the physiological function of this miRNA, the role of distinct pathways in developmental processes. 

      Weaknesses: 

      At the same time, this approach constitutes a major weakness: It does not reveal information on the physiological role of miR-195. In fact, the authors themselves demonstrate in their KO approach, that miR-195 has virtually no role in B-cell development, as has been demonstrated already in 2020 by Hutter and colleagues. While the authors cite this paper, unfortunately, they do so in a different context, hence omitting that their findings are not original. 

      Conceptually, the authors stress that a predominant function of miRNA (in contrast to transcription factors, as the authors suggest) lies in fine-tuning. However, there appears to be a misconception. Misregulation of fine-tuning of gene expression may result in substantial biological effects, especially in developmental processes. The authors want to highlight that miR-195 is somewhat of an exception in that regard, but this is clearly not the case. In addition to miR-150, as referenced by the authors, also the miR-17-92 or miR-221/222 families play a significant role in B-cell development, their absence resulting in stage-specific developmental blocks, and other miRNAs, such as miR-155, miR-142, miR-181, and miR-223 are critical regulators of leukocyte development and function. Thus, while in many instances a single miRNA moderately affects gene expression at the level of an individual target, quite frequently targets converge in common pathways, hence controlling critical biological processes. 

      The paper has some methodological weaknesses as well: For the most part, it lacks thorough statistical analysis, and only representative FACS plots are provided. Many bar graphs are based on heavy normalization making the T-tests employed inapplicable. No details are provided regarding the statistical analysis of microarrays. Generation of the miR-195-KO mice is insufficiently described and no validation of deletion is provided. Important controls are missing as well, the most important one being a direct rescue of Ebf1-KO cells by re-expression of Ebf1. This control is critical to quantify the extent of override of Ebf1-deficiency elicited by miR-195 and should essentially be included in all experiments. A quantitative comparison is essential to support the authors' main conclusion highlighted in the title of the manuscript. As the manuscript currently stands, only negative controls are provided, which, given the profound role of Ebf1, are insufficient, because many experiments, such as assessment of V(D)J recombination, IgM surface expression, or class-switch recombination, are completely negative in controls. In addition, the authors should also perform long-term reconstitution experiments. While it is somewhat surprising that the authors obtained splenic IgM+ B cells after just 10 days, these experiments would be certainly much more informative after longer periods of time. Using "classical" mixed bone marrow chimeras using a combination of B-cell defective (such as mb1/mb1) bone marrow and reconstituted Ebf1-KO progenitors would permit much more refined analyses. 

      With regard to mechanism, the authors show that the Foxo1 phosphorylation pathway accounts for the rescue of CD19 expression, but not for other factors, as mentioned in the discussion. The authors then resort to epigenetics analysis, but their rationale remains somewhat vague. It remains unclear how miR-195 is linked to epigenetic changes. 

      Reviewer #3 (Public review): 

      Summary: 

      In this study, Miyatake et al. present the interesting finding that ectopic expression of miR-195 in EBF1-deficient hematopoietic progenitor cells can partially rescue their developmental block and allow B cells to progress to a B220+ CD19+ cells stage. Notably, this is accompanied by an upregulation of B-cell-specific genes and, correspondingly, a downregulation of T, myeloid, and NK lineage-related genes, suggesting that miR-195 expression is at least in part equivalent to EBF1 activity in orchestrating the complex gene regulatory network underlying B cell development. Strengthening this point, ATAC sequencing of miR-195-expressing EBF1-deficient B220+CD19+ cells and a comparison of these data to public datasets of EBF1-deficient and -proficient cells suggest that miR-195 indirectly regulates gene expression and chromatin accessibility of some, but not all regions regulated by EBF1. 

      Mechanistically, the authors identify a subset of potential target genes of miR-195 involved in MAPK and PI3K signaling. Dampening of these pathways has previously been demonstrated to activate FOXO1, a key transcription factor for early B cells downstream of EBF1. Accordingly, the authors hypothesize that miR-195 exerts its function through FOXO1. Supporting this claim, also exogenous FOXO1 expression is able to promote the development of EBF1-deficient cells to the B220+CD19+ stage and thus recapitulates the miR-195 phenotype. 

      Strengths: 

      The strength of the presented study is the detailed assessment of the altered chromatin accessibility in response to ectopic miR-195 expression. This provides insight into how miR-195 impacts the gene regulatory network that governs B-cell development and allows the formation of mechanistic hypotheses. 

      Weaknesses: 

      The key weakness of this study is that its findings are based on the artificial and ectopic expression of a miRNA out of its normal context, which in my opinion strongly limits the biological relevance of the presented work. 

      While the authors performed qPCRs for miR-195 on different B cell populations and show that its relative expression peaks in early B cells, it remains unclear whether the absolute miR-195 expression is sufficiently high to have any meaningful biological activity. In fact, other miRNA expression data from immune cells (e.g. DOI

      10.1182/blood-2010-10-316034 and DOI 10.1016/j.immuni.2010.05.009) suggest that miR-195 is only weakly, if at all, expressed in the hematopoietic system. 

      The authors support their finding by a CRISPR-derived miR-195 knockout mouse model which displays mild, but significant differences in the hematopoietic stem cell compartment and in B cell development. However, they fail to acknowledge and discuss a lymphocyte-specific miR-195 knockout mouse that does not show any B cell defects in the bone marrow or spleen and thus contradicts the authors' findings (DOI

      10.1111/febs.15493). Of note, B-1 B cells in particular have been shown to be elevated upon loss of miR-15-16-1 and/or miR-15b-16-2, which contradicts the data presented here for loss of the family member miR-195. 

      A second weakness is that some claims by the authors appear overstated or at least not fully backed up by the presented data. In particular, the findings that miR-195expressing cells can undergo VDJ recombination, express the pre-BCR/BCR and class switch needs to be strengthened. It would be beneficial to include additional controls to these experiments, e.g. a RAG-deficient mouse as a reference/negative control for the ddPCR and the surface IgM staining, and cells deficient in class switching for the IgG1 flow cytometric staining. 

      Moreover, the manuscript would be strengthened by a more thorough investigation of the hypothesis that miR-195 promotes the stabilization and activity of FOXO1, e.g. by comparing the authors' ATACseq data to the FOXO1 signature. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      Miyatake et al., present a manuscript that explores the role of miR-195 in B cell development. 

      Their data suggests a role for this microRNA: 

      Using an Ebf1 fetal liver knockout of B-cell differentiation that a small population of CD19 expressing with some evidence of V(D)J recombination capable of class switch can be derived by transduction of miR-195. 

      In the emergent CD19+ Ebf1-/- cells, the authors provide some evidence that Mapk and Akt3 may be miR-195 targets that are downregulated allowing FOXO1 transcription factor pathway may be involved in the emergent CD19+ cells arising from miR-195 transduction. 

      Perhaps less compelling data is provided with regards to a role for miR-195 in normal Bcell development through analysis of a miR-195 knockout model. 

      While there are some interesting preliminary data presented for a role for miR-195 in the context of Ebf1-/- cells, there are some questions I think the authors could consider. 

      Comments: 

      (1-1) It is difficult to ascertain the potential role of miR-195 transduction in allowing the emergence of CD19+ cells from the data provided. miR-195 has been generally shown to destabilize mRNA transcripts by 3' UTR binding that targets mRNA transcripts for degradation. The effect of transduction of miR-195 would therefore be expected to be related to the degradation of factors opposing aspects of B-lineage specification or maintenance. I would be particularly interested in transcriptional or epigenetic regulators that may be modified in this way, at an mRNA as well as protein level.

      We appreciate the reviewerʼs thoughtful comments and agree that miRNAs often exert their effects through the degradation or translational repression of mRNAs encoding regulatory factors. In our study, we attempted to address this point by combining predictive analysis (using TargetScan and starBase) with luciferase reporter assays and qPCR to validate several potential targets of miR-195, including Mapk3 and Akt3. We acknowledge that this is not a comprehensive mechanistic analysis. We agree that a broader and systematic identification of direct targets of miR-195, particularly those involved in transcriptional and epigenetic regulation, would further clarify the mechanisms involved. However, due to limitations in resources and time, we are currently unable to perform global proteomic or ChIP-based validations. Nevertheless, our ATAC-seq and microarray data indicate that miR-195 overexpression leads to increased accessibility and expression of several key B-lineage transcription factors (Pax5, Runx1, Irf8), suggesting that miR-195 indirectly activates transcriptional programs relevant to B cell commitment. We have now clarified this limitation in the revised Discussion section (lines 505‒524), and we emphasize that our current findings represent the potential of miR-195 rather than its physiological role. We hope that this clarification addresses the concern.

      (1-2) While I acknowledge the authors have undertaken TargetScan and starBase analysis to try and predict miR-195 interactions, they do not provide a comprehensive list of putative targets that can be referenced against their cDNA data. Though they postulate Mapk3 and Akt3 as putative miR-195 targets and assay these in luciferase reporter systems (Figure 4), these were not clearly differentially regulated in the microarray data they provided (Figure 1E) as being downregulated on miR-195 transduction in Ebf1-/- cells.

      We thank the reviewer for pointing out the need for a more comprehensive list of predicted miR-195 targets. In response, we have now included a supplementary table 4 (human) and 5 (mouse) listing all putative miR-195 targets predicted by TargetScan and starBase. As noted, Mapk3 expression was indeed downregulated upon miR-195 transduction, consistent with our luciferase reporter and qPCR results. For Akt3, we observed variability in the microarray data depending on the probe used, resulting in inconsistent expression levels. We acknowledge this and have added a clarification in the revised manuscript (lines 335‒339), noting that the regulation of Akt3 by miR-195 is potentially probe-dependent and may require further validation. We hope this clarification resolves the concern.

      (1-3) The authors should provide a more comprehensive analysis of transcriptional changes induced by miR-195 Ebf1-/- specifically in the preproB cell stage of development in Ebf1-/- and miR-195 Ebf1-/- cells. The differentially expressed gene list should be provided as a supplemental file. The gene expression data should be provided for the different B-cell differentiation stages, eg. Ebf1-/- preproB cells, and Ebf1-/- miR-195 preproB cells, CD19+ cells and more differentiated subsets induced by miR-195 transduction.

      We appreciate the reviewerʼs suggestion to provide a more comprehensive transcriptomic analysis at different B-cell differentiation stages. Unfortunately, due to the limited availability of cells and technical constraints, we were unable to perform RNA-seq on miR-195 transduced Ebf1<sup>−/−</sup> pre-pro-B or CD19+ cells. However, to address this point, we referenced publicly available RNA-seq data (GEO accession: GSE92434), which includes transcriptomic profiles of Ebf1<sup>−/−</sup> pro-B cells and wild-type controls. By comparing our microarray data from miR-195 transduced Ebf1<sup>−/−</sup> cells with this dataset, we found partial restoration of expression for several key B-lineage genes, such as Pax5, Runx1, and Irf8, which are normally downregulated in the absence of EBF1. This comparison supports the notion that miR-195 partially reactivates the transcriptional network essential for B cell development. We have added this interpretation to the Discussion section (lines 528‒533).

      (1-4) More replicates (at least 3 of each genotype) are required for their Western Blots for FOXO1 and pFOXO1 (Fig 4C, D). Western blots should also be provided for other known B-lineage transcriptional regulators such as PAX5 and ERG.

      We thank the reviewer for these valuable suggestions. In response, we have now quantified and added the relative band intensities of FOXO1 and pFOXO1 from three independent experiments in the revised Figure 4C, and we include statistical analysis to support the reproducibility of these results. Additionally, as requested, we performed western blotting for PAX5 and ERG using the same samples. The results showed no significant change in these protein levels between miR-195-transduced and control Ebf1<sup>−/−</sup> cells, consistent with the modest upregulation observed in our microarray data. We have included the PAX5 and ERG western blot images in Supplementary Figure S3 and have revised the text in the Results section (lines 351‒35)

      (1-5) The authors have not shown a transcriptional binding by ChIPseq or other methods such as cut and tag/ cut and run for FOXO1 binding to B-lineage genes in their Ebf1-/- miR-195 CD19+ cells to be able to definitively show this TF is critical for the emergence of the C19+ cell phenotype by demonstrating direct binding to "upregulated" genes cis-regulatory regions in the Ebf1-/- miR-195 CD19+ cells

      We appreciate the reviewerʼs suggestion regarding the use of ChIP-seq or related methods to demonstrate direct FOXO1 binding to cis-regulatory regions of B-lineage genes in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. We agree that such data would provide definitive evidence of FOXO1's direct involvement in promoting the B cell-like transcriptional program. However, due to current technical limitations, including the scarcity of CD19⁺ cells derived from Ebf1<sup>−/−</sup> miR-195 transduction and the requirement for large cell numbers in ChIP-seq or CUT&RUN protocols, we were unable to perform these assays in this study. Nevertheless, our current data provide multiple lines of indirect evidence supporting the involvement of FOXO1:

      miR-195 transduction leads to reduced phosphorylation and increased accumulation of FOXO1 protein (Fig. 4C).

      Overexpression of FOXO1 in Ebf1<sup>−/−</sup> HPCs partially recapitulates the miR-195 phenotype (Fig. 4D).

      ATAC-seq data show increased chromatin accessibility at known FOXO1 target gene loci (e.g., Pax5, Runx1, Irf8) in miR-195-induced CD19⁺ cells, many of which overlap with FOXO1 motifs(Fig.5)

      These observations collectively suggest that FOXO1 activity is functionally important for the emergence of CD19⁺ cells, even though direct binding has not been confirmed. We have added this limitation to the Discussion (lines 531‒537), and we note that future studies using FOXO1 CUT&RUN in this system would be valuable to further define the underlying mechanism.

      (1-6) The authors have not shown significant upregulation of expression of other critical B-cell regulatory transcription factors in their Ebf1-/- miR-195 CD19+ cells that could account for the emergence of these cells such as Pax5 or Erg. The legend in Figure 1E suggests for example the change in expression of Pax5 is modest if anything at best as no LogFC or western blot data is presented. 

      We thank the reviewer for raising this point. In our microarray analysis (Figure 1D, original Figure 1E), we observed that both Pax5 and Erg mRNA levels were upregulated in Ebf1<sup>−/−</sup> cells upon miR-195 transduction. Specifically, Pax5 showed an increase of approximately log₂FC 1.2, and Erg was also consistently elevated across biological replicates. These changes, although modest, were statistically significant and consistent with the upregulation of other B-lineage-associated transcription factors, such as Runx1 and Irf8. We agree that the magnitude of Pax5 upregulation is not as high as typically seen during full B cell commitment, and therefore may not have been immediately apparent in Figure 1D (original Figure 1E). To clarify this point, we have now revised the text in the Results section (lines 170‒174) to highlight the observed changes in Pax5 and Erg expression. We believe that the upregulation of these transcription factors, together with increased FOXO1 activity and changes in chromatin accessibility (Figure 5), contributes to the partial reactivation of the B cell gene regulatory network in the absence of EBF1.

      (1-7) Which V(D)J transcripts have been produced? A more detailed analysis other than ddPCR is required to help understand the emergence of this population that can presumably proceed through the preBCR and BCR checkpoints.

      We appreciate the reviewerʼs interest in understanding the nature of the V(D)J rearrangements in Ebf1<sup>−/−</sup> miR-195 CD19⁺ cells. As noted, our current data rely on droplet digital PCR (ddPCR), which was used to detect rearranged VH-JH segments in the bone marrow of engrafted mice. While this approach does not allow for detailed mapping of specific V, D, or J gene usage, it provides a sensitive and quantitative measure of V(D)J recombination activity. The detection of rearranged VH-JH fragments in miR-195-transduced Ebf1<sup>−/−</sup> cells suggests that at least partial recombination of the immunoglobulin heavy chain locus is occurring̶an essential checkpoint for progression past the pro-B cell stage. Given the lack of such rearrangements in control-transduced Ebf1<sup>−/−</sup> cells, we interpret this as evidence that miR-195 enables cells to initiate the recombination process. We acknowledge the limitations of ddPCR and agree that a more detailed analysis using VDJ-seq or singlecell RNA-seq would be valuable in determining the diversity and completeness of the V(D)J transcripts produced. This is a direction we intend to pursue in future work. We have added this limitation to the Discussion section (lines 538‒543).

      (1-8) The authors reveal that the Foxo1 transduced Ebf1-/- cells (Fig. 4D) do not persist in vitro or be detected via transplant assay (line 256) and therefore does not represent a truly "rescued" B cell, suggesting that CD19+ cells Ebf1-/- miR-195 transduced cells have more B-cell potential. Further characterisation is therefore warranted of this cell population. For instance, can these cells be induced to undergo myeloid differentiation in myeloid cytokine conditions? What other B-lineage transcriptional regulators are expressed in this cell population that could account for VDJ recombination and expression of a B-lineage transcriptional program (see comments 1, 3, and 5) that allow transition through preBCR and BCR checkpoints as well as undergo class switching?

      We thank the reviewer for this insightful comment. We agree that the persistence and lineage potential of the CD19⁺ cells emerging from Ebf1<sup>−/−</sup> miR-195-transduced progenitors deserve further characterization. Although we were unable to perform additional lineage re-direction assays, our current data provide several lines of evidence suggesting that these cells are stably committed toward the B-lineage:

      Gene expression profiling revealed upregulation of multiple B cell transcriptional regulators, including Pax5, Runx1, and Irf8.

      ATAC-seq analysis showed increased chromatin accessibility at B cell‒specific loci and enrichment of motifs bound by key B-lineage factors such as FOXO1 and E2A.

      The cells express surface IgM and undergo class switch recombination to IgG1 upon stimulation, indicating successful transition through the pre-BCR and BCR checkpoints and acquisition of mature B cell functions.

      Importantly, no upregulation of myeloid- or T-lineage genes was detected in the microarray analysis, arguing against multipotency at this stage.We acknowledge that functional tests for lineage plasticity under altered cytokine conditions would provide important insights and plan to address this question in future studies. This limitation has now been noted in the revised Discussion (lines 544‒550).

      (1-9) In the original Ebf1-/- miR-195 CD19+ experiments, a wild-type control should be provided for each experiment. 

      We appreciate the reviewerʼs suggestion to include wild-type controls in all experiments. While we did not include wild-type samples side-by-side in every assay, we carefully designed our experiments to include biologically appropriate and informative comparisons. For example, in the bone marrow transplantation experiments (Figure 2), Ebf1<sup>−/−</sup> cells transduced with empty vector served as negative controls, clearly lacking CD19 expression, V(D)J recombination, IgM surface expression, and class switch capability. This allowed us to specifically assess the gain-of-function effects of miR-195 in the EBF1-deficient background. In several analyses̶such as the ATAC-seq and microarray comparisons̶we did incorporate or refer to existing wild-type datasets (e.g., GSE92434), providing context for the extent of recovery toward a WT-like profile. We agree, however, that including parallel WT controls across all experimental platforms would enhance interpretability.

      (1-10) For ATACseq data, a comparison between Ebf1-/- preproB cells and Ebf1-/- miR-195 CD19+ cells should be undertaken.

      We thank the reviewer for this important point. As suggested, we have performed a direct comparison of chromatin accessibility between Ebf1<sub>−/−</sub> pre-pro-B‒like cells (CD19<sub>-</sub>, control transduction) and Ebf1<sub>−/−</sub> miR-195‒transduced CD19⁺ cells. This comparison is shown in green in Figure 5B and represents the ATAC-seq peaks differentially accessible between these two populations.  

      (1-11) I cannot agree with the authors with some of their statements such as Line 242 - "therefore miR-195 considered to have similar function with EBF1 to some extent" - how can this be the case when miR-195 is a miRNA and EBF1 is a transcription factor with pioneering transcriptional activity? Surely the effects of miR-195 must be secondary.

      We thank the reviewer for pointing out the inappropriateness of comparing miR-195 to EBF1 in terms of functional similarity. We agree that miR-195, as a microRNA, operates through post-transcriptional regulation and does not possess the pioneering transcriptional activity characteristic of EBF1. To avoid confusion or overstatement, we have removed the sentence in line 242 ("therefore miR-195 is considered to have similar function with EBF1 to some extent").

      (1-12) It is unclear whether this observation is in fact physiological. When the authors analyse a knockout model of miR-195, there is not much of a change in the B-cell phenotype. Their findings may therefore be an artefact of an overexpression system. The authors should comment on this observation in their discussion.  

      We thank the reviewer for this important observation. We agree that the mild phenotype observed in our miR-195 knockout mice suggests that miR-195 is not essential for B cell development under steady-state physiological conditions. Accordingly, we do not claim a physiological requirement for miR-195. Rather, our study demonstrates that miR-195 possesses the potential to activate a B-lineage program in the absence of EBF1 when ectopically expressed. This functional potential̶rather than its endogenous necessity̶ is the main focus of our work. We have now clarified this distinction in the revised Discussion section (lines 551‒560), and we emphasize that our findings highlight an alternative regulatory pathway that can be artificially engaged under specific conditions.

      (1-13) I recommend the authors check spelling and grammar throughout their manuscript.

      We thank the reviewer for the suggestion. In response, we have carefully reviewed the manuscript for spelling, grammar, and clarity. Minor corrections have been made throughout the text to improve readability and ensure consistency. We hope that the revised version addresses any language-related concerns. In addition, the manuscript has been reviewed by professional editing service to improve the language quality.

      (1-14) In general, I recommend more comprehensive primary data be presented in the manuscript or supplementary files to add value to their submission.

      We thank the reviewer for this helpful suggestion. In response, we have revised the manuscript and supplementary materials to include additional primary data wherever possible. The bar graphs have been updated to include individual data points to show variability and replicate information. Uncropped western blot images are now provided in Supplementary Figure S2. We hope these additions provide greater transparency and value to the manuscript. 

      Reviewer #2 (Recommendations for the authors): 

      I have a number of suggestions with regard to inclusion of details and controls: 

      (2-1) The authors need to provide more details on in vitro differentiation, especially culture times. 

      Thank you for your comment. The culture conditions for in vitro differentiation of Ebf1<sup>−/−</sup> hematopoietic progenitor cells are described in the Methods section (lines 648‒ 649) under “Culture of lineage-negative (Lin‒) cells from the fetal liver.” As stated, cells were cultured more than 7 days under the specified conditions.

      (2-2) In Figure 1E, the authors need to provide information on statistics (FDR or similar). 

      I thank the reviewer for the suggestion. In Figure 1D (Original Figure 1E) (the microarray analysis), only two biological replicates were available for each condition (n = 2 per group). Due to this limited sample size, we did not perform statistical testing, as the power would be insufficient to produce reliable p-values or adjusted FDRs. Instead, we focused on genes with consistent and biologically meaningful changes in expression, and presented representative examples based on fold change values.

      (2-3) For in vivo experiments (Figure 2) the authors should comment on their use of two different recipient mouse strains despite very low n numbers. As described above, classical mixed BM chimeras would be much more informative. In these experiments, the authors should also show the formation of other lymphoid lineages. This would answer the question of whether miR-195 redirects cells to the B lineage. Most importantly, absolute numbers need to be provided, especially in conjunction with Ebf1 rescue as described above. 

      We thank the reviewer for the thoughtful and detailed suggestions regarding our in vivo experiments. Regarding the use of different recipient mouse strains, our initial intention was to perform the transplantations in BRG mice; however, due to facility restrictions and animal husbandry considerations, we had to switch to NOG mice. All in vivo experiments were performed with n = 3 per group, in accordance with ethical guidelines and efforts to minimize animal use while still ensuring reproducibility. With respect to the suggestion of mixed bone marrow chimeras, we agree that this approach can provide valuable information on lineage competitiveness. However, in our system, miR-195 confers only a very limited B cell developmental potential in Ebf1<sup>−/−</sup> progenitors. In such a setting, the inclusion of wild-type competitor cells would overwhelmingly dominate the B cell compartment, likely masking any measurable effect of miR-195. Therefore, we opted to assess the gain-of-function potential of miR-195 in a noncompetitive setting. Regarding the assessment of other lymphoid lineages, we focused our analysis on the emergence of B-lineage cells, as the frequency of CD19⁺ cells induced by miR-195 is quite low. Given this low efficiency, we consider it unlikely that miR-195 significantly alters the development of non-B lineages, and thus did not observe substantial lineage diversion effects. Our aim was not to demonstrate lineage redirection, but rather to show that miR-195 can confer partial B cell potential in the absence of EBF1.

      Finally, we acknowledge the importance of presenting absolute cell numbers. However, the cell number collected from the mice were so few that we did not get the reliable results, we described it in the manuscript. (lines 498-501)

      (2-4) The statistics in Figure 3 are inadequate. No S.D. is provided for WT. How then was normalization performed? Student's T-test cannot be applied to ratios. 

      We thank the reviewer for highlighting the need for more appropriate statistical analysis. Due to considerable inter-batch variability in absolute measurements, we normalized the KO values to their paired WT counterparts from the same experimental batch. Specifically, for each replicate, we calculated the KO/WT ratio to control for batch-specific variation. We then applied a one-sample t-test (against a null hypothesis of ratio = 1) to determine statistical significance. We have now revised the figure to show individual ratio values for each replicate and updated the legend and Methods to clearly explain the statistical approach. We hope this addresses the concern and improves the clarity and rigor of the analysis.

      (2-5) In Figure 4A, the authors should comment on the strong repression of the Akt3UTR. 

      We appreciate the reviewerʼs observation regarding the strong repression observed with the Akt3 3'UTR construct. Indeed, we also noted that luciferase activity was markedly reduced in the presence of the Akt3 3'UTR, even in cells transduced with a control vector. We hypothesize that the Akt3 3'UTR contains strong post-transcriptional regulatory elements̶such as AU-rich elements or binding sites for endogenous miRNAs or RNA-binding proteins̶which may suppress mRNA stability or translation independent of miR-195. Alternatively, the secondary structure or length of the UTR may inherently reduce luciferase expression. We have added this limitation to the Discussion section (lines 561‒569).

      (2-6) The Western blot in Figure 4C is of insufficient quality. The authors need to provide unspliced versions of the bands including markers. 

      We thank the reviewer for this important comment. In response, we have included the unprocessed, full-length Western blot images corresponding to Figure 4C as Fig. S2. This provides a transparent view of the original data and addresses the concern about image cropping.

      (2-7) The ATACseq experiment in Figure 5 is difficult to comprehend. A simpler design including Ebf1 rescue controls would clearly improve this part. 

      We thank the reviewer for this valuable feedback. We agree that the original presentation of the ATAC-seq data may have been difficult to interpret. To address this, we have included a clear interpretation of the overlapping regions in the revised figure legend (lines 1018-1022). We hope this improves the clarity of the data and facilitates understanding of the chromatin changes mediated by EBF1 and miR-195.

      (2-8) The miR-195 KO mouse lacks validation (RT-PCR, genomic PCR) as well as a clear description of the deleted region and whether miR-497 is affected. In addition, the genetic background and number of backcrosses for the removal of potential off-target effects need to be mentioned. 

      We thank the reviewer for this important comment. The miR-195 knockout mouse was generated via CRISPR/Cas9, and Sanger sequencing confirmed a 628 bp deletion on chromosome 11 (GRCm38/mm10 chr11:70,234,425‒70,235,103). This deletion includes the entire miR-497 locus and part of the miR-195 precursor sequence. Although we do not show PCR gel images, the deletion was validated by sequencing, and the results are now clearly described in the revised Methods section (lines 607619). All transgenic mice in this study were backcrossed to the C57BL/6 background for at least eight generations.

      (2-9) The manuscript requires extensive editing for language. 

      We appreciate the reviewerʼs comment. The manuscript has now been revised and professionally edited for language by a native English-speaking editor. We believe clarity and readability have been significantly improved.

      Reviewer #3 (Recommendations for the authors): 

      (3-1) What is the expression level of miR-195 after viral overexpression? In Figure 4B, the authors show a 2.5-fold increase, but this appears very low for the experimental system (expression through the MDH1 retroviral construct) and the observed repressive effects (e.g. Figure 4A and B). 

      We thank the reviewer for this insightful comment. We agree that the apparent ~2.5fold increase in miR-195 levels (Figure 4B) may seem modest in the context of retroviral overexpression and the associated functional effects. However, due to the high sequence similarity within the miR-15/16/195/497 family, it is technically challenging to measure mature miR-195 levels with complete specificity. The baseline signal observed in control samples likely reflects cross-reactivity with endogenous miRNAs such as miR-497 or miR-16, which share similar seed sequences. Therefore, the reported fold-change may underestimate the true level of ectopic miR-195 expression. Despite this, we observed robust repression of validated targets (e.g., Mapk3, Akt3) in both qPCR and luciferase assays, indicating that functionally effective levels of miR-195 were achieved. We have now clarified this limitation and interpretation in the revised Results sections (lines 332‒335).

      (3-2) In alignment with the transparency of the data, I would encourage the authors to display the individual data points for all bar graphs. 

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have updated bar graphs to include individual data points to increase transparency and allow better visualization of data variability. In the ddPCR experiments, we provided the raw data in Fig. S1 for full transparency. In Fig. 1A, we have confirmed miR-195 expression profiles using the deposit data which the reviewer suggested, but miR-195 expression was very lower than we expected. We also performed scRNA-seq using hematopoietic lineage cells in 8-week-old C57BL/6 mice, but we could not get the reproducibility of miR-195 expression profiles. Therefore, we determined that this is an artifact caused by the miR-195 probe used for qPCR, and deleted Fig. 1A.

      (3-3) The references appear to be compromised. For example, the authors state that "The Ebf1−/+ mouse was originally generated by R. Grosschedl (39)" (line 297), but this is not the respective paper. Likewise, the knockout mouse was generated "based on the CRISPR/Cas9 system established by C. Gurumurthy (40)" (line 299), but he/she is not involved in the referenced study. 

      We thank the reviewer for pointing out the discrepancies in the reference citations. Upon revising the Methods section to integrate it with the main text, the reference numbering became misaligned. We have corrected the reference in the revised manuscript, and we thank the reviewer for bringing this to our attention.

      (3-4) Given that the miRNA Taqman assays the authors used here have difficulties to discriminate closely related miRNAs such as e.g. miR-16 (highly expressed in the hematopoietic system) and miR-195, I would suggest that the authors test their qPCR in an appropriate setup, e.g. in their knockout mouse model. In this context, did the authors use another small RNA as a reference for the qPCR analysis? In the methods, only GAPDH is mentioned, but in my opinion, another RNA that uses the same stemloop-based cDNA synthesis protocol would be better suited.

      We thank the reviewer for this valuable and technically insightful comment.

      As correctly pointed out, TaqMan-based qPCR assays for miRNAs such as miR-195 can show cross-reactivity with closely related family members, particularly miR-16, which is abundantly expressed in hematopoietic cells. Indeed, due to this limitation, we do not treat the qPCR results shown in the original Figures 1A and 4B as definitive quantification of miR-195 expression. Rather, these data are used to provide a suggestion and a rough estimate of overexpression efficiency, while our core functional analyses rely on phenotypic and molecular outcomes such as target gene repression and lineage emergence. With this in mind, although we acknowledge that a small RNA reference based on the same stem-loop cDNA synthesis would offer a more compatible normalization in principle, the inherent variability and lack of absolute specificity in such assays also limits their interpretive value. Therefore, we used GAPDH as a normalization control for consistency with other qPCR analyses in the manuscript. We have now clarified this rationale and limitation in the revised Methods sections (lines 712‒716), and we thank the reviewer again for highlighting this important technical consideration.

      (3-5) The Western blot data used to support the hypothesis that FOXO1 phosphorylation is reduced upon overexpression of miR-195 are not convincing. The authors should not crop everything but the band. 

      We thank the reviewer for the helpful comment. In response, we have now provided the full-length, uncropped Western blot images corresponding to Figure 4C, including both total FOXO1 and phospho-FOXO1 blots. These images are included in Fig. S2.

    1. Author response:

      The following is the authors’ response to the original reviews

      Comment from the editors at eLife:

      You could consider further strengthening the manuscript with the incorporation of new relevant public datasets for network modeling, but that is entirely your choice.

      We thank the editors and reviewers for their thoughtful and positive feedback on our article. We are particularly appreciative of the eLife assessment describing our work as valuable with a convincing methodology.

      As suggested, we have expanded our neuron class analysis by incorporating transcriptomic data from young adult animals (Kaletsky et al., 2016 Nature; Ghaddar et al., 2023 Science Advances; St Ange et al., 2024 Cell Genomics) to complement our existing analysis of larval stage 4 (L4) animals.

      In addition, we have updated Table S1 to include the outcross status of all strains used in this study, providing clearer information on the genotypes tested. We have also corrected the typographical errors noted by the reviewers. Please note that page and line numbers below refer to the MS Word Document with tracked changes set to ‘simple markup’.

      We greatly appreciate the reviewers’ input and hope these revisions further enhance the value and clarity of our study.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling. The additional tested hits provide a comprehensive analysis of the main molecular pathways that could have affected learning. However, the revised manuscript includes more information and analysis, raising additional concerns.

      Major comments:

      As reviewer 4 noted, and as also shown to be relevant for C30G12.6 presented in Figure 6, the backcrossing of the mutants is important, as background mutations may lead to the observed effects. Could the authors add to Table 1, sheet 1, the outcrossing status of the tested mutants?

      We appreciate this important point. A column has now been added to Table S1 to indicate the outcross status of all strains used in this study. Additionally, we have updated the table legend on page 77 to clarify how to interpret the information provided in this column.

      It is important to validate that the results of the positive hits (where learning was affected), such as acc-1, acc-3, and lgc-46, do not stem from background mutations.

      While we agree that confirming the absence of background mutations is important, we have taken alternative steps to address this concern:

      - The outcross status of each strain is now clearly indicated in Table S1.

      - Observed phenotypes were consistent across multiple biological replicates over extended periods (months, sometimes years), reducing the likelihood that results stem from background mutations.

      We believe these measures provide confidence in the validity of our findings.

      The fold change in the number of hits for different neurons in the CENGEN-based rank analysis requires a statistical test (discussed on pages 17-19 and summarized in Table S7). Similar to the other gene enrichment analyses presented in the manuscript, the new rank analysis also requires a statistical test. Since the authors extensively elaborate on the results from this analysis, I think a statistical analysis is especially important for its interpretation. For example, if considering the IL1 neurons, which ranked highest, and assuming random groups of genes-each having the same size as those of the ranked neurons (209 genes in total for IL1 in Table S7)-how common would it be to get the calculated fold change of 1.38 or higher? Such bootstrapping analysis is common for enrichment analysis. Perhaps the authors could consult with an institutional expert (Dr. Pawel Skuza, Flinders University) for the statistical aspects of this analysis.

      We appreciate the suggestion and agree that statistical testing can be valuable for enrichment analyses. However, implementing additional tests such as bootstrapping is beyond the scope of this study. Our aim was to provide a descriptive overview rather than inferential statistics. To ensure transparency and interpretability, we have:

      - Clearly reported fold changes and rankings in Table S7.

      - Discussed the limitations of this approach in the manuscript text (page 18, lines 17–20).

      - Clearly outlined the methods used to perform this analysis (pages 53–54).

      We believe this descriptive analysis provides sufficient context for interpreting these results.

      The learning phenotypes from Figure S8, concerning acc-1, acc-3, and lgc-46 mutants, are summarized in a scheme in Figure 4; however, the chemotaxis results are found in the supplemental Figure S8. Perhaps I missed the reasoning, but for transparency, I think the relevant Figure S8 results should be shown together with their summary scheme in Figure 4.

      Thank you for this suggestion to improve clarity. We have now moved the panels corresponding to cholinergic signalling components from Figure S8 into Figure 4 on page 21, so that the summary scheme and underlying data are presented together. The figure legends and main text have been updated accordingly to reflect the correct figure numbers.

      Reviewer #2 (Public review):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathway analysis. The authors performed functional characterization of over two dozen of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Strengths:

      The authors have thoughtfully and transparently designed and reported the results of their study. Controls are carefully thought-out, and hits are ranked as strong and weak. By combining their proteomics with behavioral analysis, the authors also highlight the biological significance of their proteomics findings, and support that even weak hits are meaningful.

      The authors display a high degree of statistical rigor, incorporating normality tests into their behavioral data which is beyond the field standard.

      The authors include pathway analysis that generates interesting hypotheses about processes involved learning and memory

      The authors generally provide thoughtful interpretations for all of their results, both positive and negative, as well as any unexpected outcomes.

      Weaknesses:

      - The authors use the Cengen single cell-transcriptomic atlas to predict where the proteins in the "learning proteome" are likely to be expressed and use this data to identify neurons that are likely significant to learning, and building hypothetical circuit. This is an excellent idea; however, the Cengen dataset only contains transcriptomic data from juvenile L4 animals, while the authors performed their proteome experiments in Day 1 Adult animals. It is well documented that the C. elegans nervous system transcriptome is significant different between these two stages (Kaletsky et al., 2016, St. Ange et al., 2024), so the authors might be missing important expression data, resulting in inaccurate or incomplete networks. The adult neuronal single-cell atlas data (https://cestaan.princeton.edu/) would be better suited to incorporate into neuronal expression analysis.

      Thank you for highlighting this important point. We have now incorporated transcriptomic data from young adult animals to complement the L4-based CeNGEN dataset. Specifically, we integrated data from CeSTAAN (https://cestaan.princeton.edu/, including St. Ange et al., 2024) and WormSeq (https://wormseq.org/, including Ghaddar et al., 2023), as outlined below. Importantly, CeSTAAN and WormSeq provide data for 79 and 104 neuron classes, respectively (compared to 128 from CeNGEN); for this reason, the main analysis focuses on CeNGEN due to its broader coverage, with additional datasets noted in brackets for completeness. This is stated on page 18, lines 15–17 to ensure transparency regarding our rationale.

      The main text has been updated to describe these datasets and their integration into our analysis (pages 18–20), and further details on how these resources were used have been added to the Experimental Procedures (pages 53–54).

      We also incorporated data from Kaletsky et al. (2016) and St. Ange et al. (2024) into our neuron identity checks for all assigned and unassigned hits (page 16, lines 8–19). This analysis shows that the nervous system is highly represented in our proteome data: 75–87% of assigned hits and 75–83% of all hits correspond to neuron-enriched genes identified by St. Ange et al. and Kaletsky et al.

      In addition, we used several transcriptomic databases to confirm that learning regulators identified in this study through TurboID and validation experiments are expressed in the same neuron classes as suggested by CenGEN (page 36).

      - The authors offer many interpretations for why mutants in "learning proteome" hits have no detectable phenotype, which is commendable. They are however overlooking another important interpretation, it is possible that these changes to the proteome are important for memory, which is dependent upon translation and protein level changes, and is molecularly distinct from learning. It is well established in the field mutating or knocking down memory regulators in other paradigms will often have no detectable effect on learning. Incorporating this interpretation into the discussion and highlighting it as an area for future exploration would strengthen the manuscript.

      Thank you for this suggestion. We have incorporated this interpretation into the Results section (page 31, lines 17–23), specifying the potential role of these proteomic changes in memory encoding and retention, which are molecularly distinct from learning.

      - A minor weakness - In the discussion, the authors state that the Lakhina, et al 2015 used RNA-seq to assess memory transcriptome changes. This study used microarray analysis.

      This has been corrected on page 38, line 5.

      Significance:

      The approach used in this study is interesting and has the potential to further our knowledge about the molecular mechanisms of associative behaviors. There have been multiple transcriptomic studies in the worm looking at gene expression changes in the context of behavioral training. This study compliments and extends those studies, by examining how the proteome changes in a different training paradigm. This approach here could be employed for multiple different training paradigms, presenting a new technical advance for the field. This paper would be of interest to the broader field of behavioral and molecular neuroscience. Though it uses an invertebrate system, many findings in the worm regarding learning and memory translate to higher organisms, making this paper of interest and significant to the broader field of behavioral neuroscience.

      Reviewer #4 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Thank you, we appreciate this positive feedback.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors stated in their response to reviewers that "referring to a phenotype as both a trend and non-significant may confuse readers, which was originally stated in the manuscript in two locations," and that such sentences were removed. Unfortunately, in the new text (page 28, lines 18-19), the authors write: "uev-3 mutants showed a lower average CI after training compared with wild-type, but this did not reach statistical significance." As stated before, I find such sentences confusing and not interpretable. If the changes are not significant, then the lower average CI is not informative.

      Thank you for pointing this out. This has been corrected to improve clarity – we say instead that “trained phenotypes between wild-type and uev-3 mutants were not statistically significant” (page 29, lines 21–22).

      In response to reviewers' comments, the authors added more information about the biotinylation efficiency of the experiment, which is also described in the text:

      Page 8, line 27: "we found that biotin exposure increased the signal 1.3-fold for non-Tg and 1.7-fold for TurboID C. elegans."

      Page 10, line 4: "Quantification of the signal within entire lanes showed a 1.1-fold increase in the 'TurboID, control' lane compared with the 'non-Tg, control' lane, and a 1.9-fold increase in the 'TurboID, trained' lane compared with the 'non-Tg, trained' lane."

      Is it common in this field not to show the actual raw quantified numbers? I was expecting either a bar graph or instead that the measured values would appear in the text alongside the fold-change information.

      Table S2 (and its table legend on page 77) have been edited to include raw area values.

      Figure 5: Typo? - "pan neuronal expression of ..." The allele number is written as 139, but I believe it should be 179, as in the rest of the paper.

      The typo has been corrected on page 25.

      The results describing the absence of a learning phenotype in backcrossed C30G12.6 are presented in the main figure. If the authors believe this is an important result, I understand keeping it in the main figure; however, I find this uncommon.

      Thank you for your comment. We consider the absence of a learning phenotype in backcrossed C30G12.6 to be an important control for interpreting the original findings, which is why we have retained it in the main figure.

      Reviewer #4 (Recommendations for the authors):

      I noted a few typos.

      (1) In Fig 5B, the transgene is depicted kin-2(ce139) but it is probably kin-2(ce179).

      The typo has been corrected on page 25.

      (2) In text, R97C and ce179 are used interchangeably, but in fact there is no description that they are identical.

      We now state the following in the manuscript: “We tested worms with the ce179 mutant allele in kin-2, in which a conserved residue in the inhibitory domain (which normally functions to keep PKA turned off in the absence of cAMP) is mutated to cause an R92C amino acid change – this results in increased PKA activity (Schade et al., 2005).” (page 25, lines 1–3),

      (3) p31 line 7, Figure S7 -> Fig S9 C-E

      We apologise for this typographical error. This figure number is meant to correspond to salt associative learning assay data (Fig. S8), not salt aversive learning (Fig. S9). Since the data from Fig. S8 was moved to Fig. 4, the figure citation has been changed from Fig. S7 (which was incorrect) to Fig. 4 (page 32, line 17).

      (4) p45 line 11, Fig S9 -> Fig S6

      The typo has been corrected (page 47, line 12).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Bisht et al. investigate the role of PPE2, a Mycobacterium tuberculosis (Mtb) secreted virulence factor, in adipose tissue physiology during tuberculosis (TB) infection. Previous work by this group established the significance of PPE proteins in Mtb virulence and their role in modulating the innate immune response. Here, the authors present compelling evidence that PPE2 regulates host cell adipogenesis and lipolysis, thereby establishing a link to the development of insulin resistance during TB infection. These fundamental findings demonstrate, for the first time, that a bacterial virulence factor is directly involved in the profound body fat loss, or "wasting," which is a long-established clinical symptom of active TB.

      Key Strengths:

      The confidence in the major findings of this study is significantly strengthened by the authors' comprehensive approach. They judiciously employ multiple experimental systems, including:

      (1) Purified PPE2 protein.

      (2) A non-pathogenic Mycobacterium strain engineered to express PPE2.

      (3) A pathogenic clinical Mtb strain (CDC1551) utilizing a targeted PPE2 deletion mutant.

      (4) While the presence of Mtb in adipose tissues in human and animal models is well-documented, this study is groundbreaking in demonstrating that an Mtb virulence-associated factor actively modulates host fatty acid metabolism within the adipose tissue.

      We thank the reviewer for his appreciation that in this work we demonstrated for the first time that an Mtb virulent factor is directly linked to TB-associated wasting.

      Weakness:

      Although the manuscript provides solid evidence associating the presence of PPE2 with transcriptional changes in host fatty acid machinery within the adipose tissue, the underlying mechanistic details remain elusive. A focused, deep mechanistic follow-up study will be essential to fully appreciate the complex biological implications of the findings reported here.

      We agree with the reviewer that a deep-focused, mechanistic follow-up study is necessary to further elucidate the complex biological implications of PPE2 actions. However, we believe that we have uncovered at least one of the possible mechanisms by which PPE2 increases lipolysis and circulating free fatty acids during infection by targeting cAMP-PKA-HSL pathway (Figure 7). In future studies we will aim to dissect out the mechanisms by which PPE2 triggers hyperglycaemia and insulin resistance.

      Reviewer #2 (Public review):

      Summary:

      In the manuscript entitled "The PPE2 protein of Mycobacterium tuberculosis is respon,sible for the development of hyperglycemia and insulin resistance during tuberculosis" the authors identify PPE2, a secretory protein of Mycobacterium tuberculosis, as a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis. PPE2, a secretory protein of Mycobacterium tuberculosis, is a modulator of adipose function. They show that PPE2 treatment in mice causes fat loss, immune cell infiltration into adipose, reduced gene expression of PPAR-γ, C/EBP-α, and adiponectin, and glucose intolerance. Overall, the authors link PPE2 with adipose tissue perturbation and insulin resistance following infection with M. tuberculosis.

      Strengths:

      While it is known that M. tuberculosis persists in adipose, the mycobacterial factors contributing to adipose dysfunction are unknown. The study uses multiple mechanisms, including recombinant purified protein, non-pathogenic mycobacterium expressing PPE2, and a clinical strain of M. tuberculosis depleted of PPE2, to show that PPE2 may play an important role in causing fat loss, lipolysis, and insulin resistance following infection. The authors show that PPE2, through unknown mechanisms, decreases gene expression of proteins involved in adipogenesis. Although the mechanisms are unclear, this study advances the field as it is the first to identify a secreted factor (PPE2) from M. tuberculosis to play a role in disrupting adipose tissue.

      We thank the reviewer for his appreciation of our findings presented in the manuscript.

      Weaknesses:

      (1) There is a lack of completeness amongst the figures that greatly diminishes the claims and impact of the manuscript. For example, in Figures 2 and 5, the authors measure adipocyte area in H&E-stained adipose tissue to show adipose hypertrophy. However, this was not completed in Figures 3 and 4 despite the authors claiming that treatment with rPPE2 induces adipose hypertrophy. It is unclear why the adipocyte area was not measured in these figures, and having this included would support the author's claim and strengthen the manuscript. The same is true for immune cell infiltration, where the authors say there is increased immune cell infiltration following PPE2 treatment. This is based on H&E staining, but the data supporting this is limited. Although the authors measure CD3+ T cell infiltration in adipose tissue from mice infected with the clinical strain where PPE was depleted, staining was performed in only this experiment. Completing these experiments by showing data to support that PPE2 induces immune cell infiltration would greatly strengthen the manuscript.

      As per the suggestion of the esteemed reviewer, in the revised manuscript we will attempt to analyse adipocyte area in both Figures 3 and 4. In the original manuscript, immune cell infiltration analyses (H&E staining and CD3+ staining) was restricted to only M. tuberculosis-mouse infection model, which best reflects the human tuberculosis pathology.  In other experiments involving infection with M. smegmatis expressing PPE2, immune cell infiltration studies will be carried out.

      (2) The authors state that a Student's t-test was performed to calculate the significance between two samples. However, there is no discussion of what statistical method was used when there were more than 2 groups, which occurs throughout the manuscript, such as in Figure 5, where 4 groups are analyzed. Having the appropriate statistical analysis is important for the impact of the manuscript.

      We agree with the reviewer that we missed to include ANOVA in the statistical analyses. We will include one-way ANOVA analysis where more than two groups are present and mention the statistical methods in the figure legends as well in the text of the revised manuscript.

      Reviewer #3 (Public review):

      Summary:

      In this manuscript titled "The PPE protein of Mycobacterium tuberculosis is responsible for the development of hyperglycemia and insulin resistance during tuberculosis", Bisht et al describe that PPE2 protein from Mtb is a key modulator of adipose tissue physiology that contributes to the development of insulin resistance. The authors have used 3T3-L1 preadipocyte cell lines, M. smegmatis overexpression strain, mice model, and genetically modified Mtb deletion strains to demonstrate that PPE promotes persistence in adipose tissue and regulates glucose homeostasis. Using qPCR and RNA-seq experiments, the authors demonstrate that PPE2 regulates the expression of key genes involved in adipogenesis.

      Strengths:

      Using purified protein, the authors show that PPE2 regulates adipose tissue physiology, and this effect was neutralised in the presence of anti-PPE2. The expression of several adipogenic markers was also reduced in 3TL-1 adipocytes treated with rPPE2 and in mice infected with M. smegmatis strains overexpressing PPE2. Using a mouse model of infection, the authors show that PPE2 contributes to enhanced mycobacterial survival within fat tissues. The authors also show infiltration of immune cells in the fat tissues of mice infected with wild-type and ppe2-complemented strains compared to the ppe2 KO strain. In order to gain a better mechanistic understanding of how PPE2 regulates adipogenesis, the authors employed an RNA-seq approach and identified 191 genes that were significantly differentially expressed in the fat tissues of mice infected with wild-type and ppe2 KO Mtb strains. The differentially expressed genes included transcripts encoding for proteins involved in chemokine/cytokine signalling, ER stress response. The expression of a few of these markers was also validated by qPCR and western blot analysis. Finally, the authors also show that PPE2 promotes lipolysis by reducing phosphodiesterase levels and activating PKA-HSL signalling. The experimental design is overall reasonable, and the methods used are reliable. Overall, the current study did provide some new information on the contribution of PPE2 in regulating adipose tissue physiology.

      We thank the reviewer for encouraging comments about the manuscript.

      Weaknesses:

      (1) The authors have used several methodologies to show that PPE2 regulates adipose tissue physiology and glucose homeostasis. But the exact mechanism is still not clear.

      We have clearly demonstrated that PPE2 inhibit PPAR-γ and C/EBP-α expression to block adipogenic differentiation. Further, we demonstrated a possible mechanism by which PPE2 trigger lipolysis via activation of the ER stress and cAMP/PKA/HSL pathway which is responsible for increasing free fatty acids in circulation (Figure 7) as confirmed by our observation that PPE2KO (ppe2 knock-out) Mtb infected mice had lower NEFA as compared to the those infected with wild-type Mtb (Figure 7F). Crucially, we showed that this mechanism is clinically relevant since NEFA levels in the sera of TB patients were higher as compared to the healthy controls (Figure 7G) confirming presence of dyslipidemia in TB patients which is an established risk factor for insulin resistance (Karpe et al., 2011; Bhattacharya et al., 2007), As increased free fatty acids have been shown to be linked to development of insulin resistance in several studies, this mechanism links PPE2 with the regulation of glucose homeostasis.

      (2) Mtb encodes several PE/PPE proteins? The authors have used PPE2 for their study. Will secretory PPE2 homologs also regulate similar cellular processes?

      It is known that Mtb encodes several PE/PPE family proteins and some of these have been implicated to play a role in host–pathogen interactions (Mukhopadhyay and Balaji, 2011; Dahiya et al., 2025). However, so far only PPE2 is shown to be present in the circulation (Bisht et al., 2023) which is the main reason we chose it for this study. Presence of PPE2 homologues in the circulation is not known so far.

      (3) How do the authors rule out that the differences observed in the fat tissues of mice infected with wild-type and mutant strains are not associated with reduced bacterial burdens? Is it possible to include another Mtb attenuated strain as a control in mice experiments for few critical experiments?

      We agree with the reviewer that the differences in bacterial burden can influence host tissue responses.  Precisely for this reason, we did not rely on just one infection model alone. We used a multi-pronged approach to de-couple the effects of PPE2 from the effects of bacterial load, like;

      (1) In vitro Model using recombinantly purified PPE2 protein (rPPE2) (Figure 1): In cultured 3T3-L1 adipocytes, purified rPPE2 protein directly inhibited adipogenesis by downregulating important factors like PPAR-g,C/EBP-α and Fatty acid synthase (which play a critical role in triglyceride metabolism) demonstrating a direct effect of PPE2 in the complete absence of infection.

      (2) Recombinant Protein Injection (Figure 3): By injecting recombinantly purified PPE2 protein (rPPE2) into mice, we observed similar metabolic perturbations (fat loss, impaired glucose tolerance) in the complete absence of any bacteria, demonstrating that PPE2 can drive these phenotypes independent of bacterial burden. Further study of rescuing of PPE2 action in rPPE2-immunized mice strongly confirm the specific role of PPE2 in establishing hyperglycaemia and insulin resistance (Figure 4).

      While the Mtb aerosol model can be questioned for bacterial load effects, it provides crucial in vivo validation that PPE2 function is relevant in the context of mycobacterial infection.

      References

      Bhattacharya S, Dey D, Roy SS. Molecular mechanism of insulin resistance. J Biosci. 2007 Mar;32(2):405-13. doi: 10.1007/s12038-007-0038-8. PMID: 17435330.

      Bisht MK, Pal R, Dahiya P, Naz S, Sanyal P, Nandicoori VK, Ghosh S, Mukhopadhyay S. The PPE2 protein of Mycobacterium tuberculosis is secreted during infection and facilitates mycobacterial survival inside the host. Tuberculosis (Edinb). 2023 Dec;143:102421. doi: 10.1016/j.tube.2023.102421. Epub 2023 Oct 12. PMID: 37879126.

      Dahiya P, Bisht MK, Mukhopadhyay S. Role of PE family of proteins in mycobacterial virulence: Potential on anti-TB vaccine and drug design. Int Rev Immunol. 2025; 44(4):213-228. doi: 10.1080/08830185.2025.2455161. Epub 2025 Jan 31. PMID: 39889764.

      Karpe F, Dickmann JR, Frayn KN. Fatty acids, obesity, and insulin resistance: time for a reevaluation. Diabetes. 2011 Oct;60(10):2441-9. doi: 10.2337/db11-0425. PMID: 21948998; PMCID: PMC3178283.

      Mukhopadhyay S, Balaji KN. The PE and PPE proteins of Mycobacterium tuberculosis. Tuberculosis (Edinb). 2011 Sep;91(5):441-7. doi: 10.1016/j.tube.2011.04.004. Epub 2011 May 6. PMID: 21527209.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Syed et al. investigate the circuit underpinnings for leg grooming in the fruit fly. They identify two populations of local interneurons in the right front leg neuromere of ventral nerve cord, i.e. 62 13A neurons and 64 13B neurons. Hierarchical clustering analysis identifies each 10 morphological classes for both populations. Connectome analysis reveals their circuit interactions: these GABAergic interneurons provide synaptic inhibition either between the two subpopulations, i.e. 13B onto 13A, or among each other, i.e. 13As onto other 13As, and/or onto leg motoneurons, i.e. 13As and 13Bs onto leg motoneurons. Interestingly, 13A interneurons fall into two categories with one providing inhibition onto a broad group of motoneurons, being called "generalists", while others project to few motoneurons only, being called "specialists". Optogenetic activation and silencing of both subsets strongly effects leg grooming. As well activating or silencing subpopulations, i.e. 3 to 6 elements of the 13A and 13B groups has marked effects on leg grooming, including frequency and joint positions and even interrupting leg grooming. The authors present a computational model with the four circuit motifs found, i.e. feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. This model can reproduce relevant aspects of the grooming behavior.

      Strengths:

      The authors succeeded in providing evidence for neural circuits interacting by means of synaptic inhibition to play an important role in the generation of a fast rhythmic insect motor behavior, i.e. grooming. Two populations of local interneurons in the fruit fly VNC comprise four inhibitory circuit motifs of neural action and interaction: feed-forward inhibition, disinhibition, reciprocal inhibition and redundant inhibition. Connectome analysis identifies the similarities and differences between individual members of the two interneuron populations. Modulating the activity of small subsets of these interneuron populations markedly affects generation of the motor behavior thereby exemplifying their important role for generating grooming. The authors carefully discuss strengths and limitations of their approaches and place their findings into the broader context of motor control.

      We thank the reviewer for their thoughtful and constructive evaluation of our work.

      Weaknesses:

      Effects of modulating activity in the interneuron populations by means of optogenetics were conducted in the so-called closed-loop condition. This does not allow to differentiate between direct and secondary effects of the experimental modification in neural activity, as feedforward and feedback effects cannot be disentangled. To do so open loop experiments, e.g. in deafferented conditions, would be important. Given that many members of the two populations of interneurons do not show one, but two or more circuit motifs, it remains to be disentangled which role the individual circuit motif plays in the generation of the motor behavior in intact animals.

      Our optogenetic experiments show a role for 13A/B neurons in grooming leg movements – in an intact sensorimotor system - but we cannot yet differentiate between central and reafferent contributions. Activation of 13As or 13Bs disinhibits motor neurons and that is sufficient to induce walking/grooming. Therefore, we can show a role for the disinhibition motif.

      Proprioceptive feedback from leg movements could certainly affect the function of these reciprocal inhibition circuits. Given the synapses we observe between leg proprioceptors and 13A neurons, we think this is likely.

      Our previous work (Ravbar et al 2021) showed that grooming rhythms in dusted flies persist when sensory feedback is reduced, indicating that central control is possible. In those experiments, we used dust to stimulate grooming and optogenetic manipulation to broadly silence sensory feedback. We cannot do the same here because we do not yet have reagents to separately activate sparse subsets of inhibitory neurons while silencing specific proprioceptive neurons. More importantly, globally silencing proprioceptors would produce pleiotropic effects and severely impair baseline coordination, making it difficult to distinguish whether observed changes reflect disrupted rhythm generation or secondary consequences of impaired sensory input. Therefore, the reviewer is correct – we do not know whether the effects we observe are feedforward (central), feedback sensory, or both. We have included this in the revised results and discussion section to describe these possibilities and the limits of our current findings.

      Additionally, we have used a computational model to test the role of each motif separately and we show that in the results.  

      Comments on revisions:

      The careful revision of the manuscript improved the clarity of presentation substantially.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Syed et al. presents a detailed investigation of inhibitory interneurons, specifically from the 13A and 13B hemilineages, which contribute to the generation of rhythmic leg movements underlying grooming behavior in Drosophila. After performing a detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits, the authors build on this anatomical framework by performing optogenetic perturbation experiments to functionally test predictions derived from the connectome. Finally, they integrate these findings into a computational model that links anatomical connectivity with behavior, offering a systems-level view of how inhibitory circuits may contribute to grooming pattern generation.

      Strengths:

      (1) Performing an extensive and detailed connectomic analysis, which offers novel insights into the organization of premotor inhibitory circuits.

      (2) Making sense of the largely uncharacterized 13A/13B nerve cord circuitry by combining connectomics and optogenetics is very impressive and will lay the foundation for future experiments in this field.

      (3) Testing the predictions from experiments using a simplified and elegant model.

      Thank you for the positive assessment of our work.

      Weaknesses:

      (1) In Figure 4-figure supplement 1, the inclusion of walking assays in dusted flies is problematic, as these flies are already strongly biased toward grooming behavior and rarely walk. To assess how 13A neuron activation influences walking, such experiments should be conducted in undusted flies under baseline locomotor conditions.

      We agree that there are better ways to assay potential contributions of 13A/13B neurons to walking. We intended to focus on how normal activity in these inhibitory neurons affects coordination during grooming, and we included walking because we observed it in our optogenetic experiments and because it also involves rhythmic leg movements. The walking data is reported in a supplementary figure because we think this merits further study with assays designed to quantify walking specifically. We will make these goals clearer in the revised manuscript and we are happy to share our reagents with other research groups more equipped to analyze walking differences.

      (2) Regarding Fig 5: The 70ms on/off stimulation with a slow opsin seems problematic. CsChrimson off kinetics are slow and unlikely to cause actual activity changes in the desired neurons with the temporal precision the authors are suggesting they get. Regardless, it is amazing the authors get the behavior! It would still be important for authors to mention the optogentics caveat, and potentially supplement the data with stimulation at different frequencies, or using faster opsins like ChrimsonR.

      We were also intrigued by the behavioral consequences of activating these inhibitory neurons with CsChrimson. We appreciate the reviewer’s point that CsChrimson’s slow off-kinetics limit precise temporal control. To address this, we repeated our frequency analysis using a range of pulse durations (10/10, 50/50, 70/70, 110/110, and 120/120 ms on/off) and compared the mean frequency of proximal joint extension/flexion cycles across conditions. We found no significant difference in frequency (LLMS, p > 0.05), suggesting that the observed grooming rhythm is not dictated by pulse period but instead reflects an intrinsic property of the premotor circuit once activated. We now include these results in ‘Figure 5—figure supplement 1’ and clarify in the text that we interpret pulsed activation as triggering, rather than precisely pacing, the endogenous grooming rhythm. We continue to note in the manuscript that CsChrimson’s slow off-kinetics may limit temporal precision. We will try ChrimsonR in future experiments.

      Overall, I think the strengths outweigh the weaknesses, and I consider this a timely and comprehensive addition to the field.

      Reviewer #3 (Public review):

      Summary:

      The authors set out to determine how GABAergic inhibitory premotor circuits contribute to the rhythmic alternation of leg flexion and extension during Drosophila grooming. To do this, they first mapped the ~120 13A and 13B hemilineage inhibitory neurons in the prothoracic segment of the VNC and clustered them by morphology and synaptic partners. They then tested the contribution of these cells to flexion and extension using optogenetic activation and inhibition and kinematic analyses of limb joints. Finally, they produced a computational model representing an abstract version of the circuit to determine how the connectivity identified in EM might relate to functional output. The study makes important contributions to the literature.

      The authors have identified an interesting question and use a strong set of complementary tools to address it:

      They analysed serial‐section TEM data to obtain reconstructions of every 13A and 13B neuron in the prothoracic segment. They manually proofread over 60 13A neurons and 64 13B neurons, then used automated synapse detection to build detailed connectivity maps and cluster neurons into functional motifs.

      They used optogenetic tools with a range of genetic driver lines in freely behaving flies to test the contribution of subsets of 13A and 13B neurons.

      They used a connectome-constrained computational model to determine how the mapped connectivity relates to the rhythmic output of the behavior.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I still have the following specific suggestions and questions, which need the attention of the authors:

      P5, 2nd para, li 1: shouldn't "(Figures 1E and 1E')" be (Figures 1G and 1H)?

      P7, last para, li 3: shouldn't "(Figures 2C and 2D)" be (Figures 2A and 2B)?

      P19, para 2, last 2li: "...we observe that optogenetic activation......triggers grooming movements." I could not find the place in the text or a figure, where this was reported or shown. Please specify

      P19, last para: "... shows that 13A neurons can generate rhyhtmic movements....." Given that the experiments were conducted in closed-loop, i.e. including the loop through the leg and its movements, the following formulation appears more justified: "....shows that 13A neurons significantly contribute to the generation of rhythmic movements,....."

      P28, para 1, li 3 from bottom: "...themselves, rather than solely between antagonistsic motor neurons." While the authors are correct that in the stick insect and locust alternating inhibitory synaptic drive to flexor and extensor motoneurons has been shown to underly alternating activity of these two antagonistic motoneuron pools the previous studies have not shown or claimed that these synaptic inputs arise from direct interactions between these motoneuron pools. Based on this this text should be moved to the part "feed-forward inhibition" on page 27.

      P28: "redundant inhibition": this motif has been shown to be instrumental in the locust flight CPG, e.g. Robertson & Pearson, 1985, Fig. 16.

      P28: "reciprocal inhibition" The reviewer agrees with the authors that this motif has been shown for the mouse spinal cord, but also for other CPGs in vertebrates and invertebrates, e.g. clione, leech, xenopus - see the initial comment "(3) Intro and Discussion"

      Thank you, we have incorporated the suggested corrections and clarifications into the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      I'm satisfied with the revised version

      Reviewer #3 (Recommendations for the authors):

      The authors have made a substantial effort to address my original points. They corrected the title, expanded Discussion and Methods sections, reran statistical tests using mixed models, added modelling clarifications and constraints, and fixed or removed confusing figure panels. Those changes have improved clarity and reduced some of the claims that I thought were exaggerated.

      That said, some of my concerns remain only partially addressed, which could be fixed with relatively small tweaks. The authors should:

      (1) Explicitly separate empirical findings from modelling inferences throughout the manuscript, including the Abstract, Results and Discussion (i.e., label claims of "intrinsic rhythmogenesis" as model-based inferences, not direct experimental demonstrations)

      (2) Provide supplemental information on modelling to quantify the role of the black-box input (e.g., quantitative coordination/phase/frequency metrics for full model vs constant-input vs no black box), show pre- vs post-fine-tuning weight changes and the exact tuning constraints/optimization details (I could not find these details)

      (3) To ensure results are reproducible, provide a supplemental table mapping each split line to EM-identified neuron(s) with NBLAST/morphological scores for each match;

      (4) Fully document the statistical models (exact LMM/GLMM formulas, software/packages, etc);

      (5) Deposit model code, trained weights and analysis scripts in a public repository.

      We have updated the GitHub repository with the full statistical analysis documentation and model code, including trained weights and scripts.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1:

      (1) As such amount of work has been put into developing this community tool, it would be worth thinking about how it could serve other multiplex-immunofluorescence methods (such as immunoSABER, 4i, etc). Adding an extra tab where the particular method that uses those reagents is mentioned. This would also help as IBEX itself and related methods evolve in the future.

      We agree and currently support six other methods beyond the original ”IBEX2D Manual”, with the most generic being ”Multiplexed 2D Imaging”: standard, single cycle (non-iterative) imaging method applied to thin, 2D (5-30 micron) tissue sections. Descriptions of supported methods are given in the reagent glossary. We plan to evolve to include multiplex IF methods such as Immuno-SABER, 4i, Cell DIVE, etc. The current structure of the reagent resources table can support other immunofluorescence methods without modifications. The table contains information for IBEX and related methods. The particular method for which a reagent validation was evaluated is specified in the column titled ”Method”. Descriptions of supported methods are given in the reagent glossary.

      (2) It has a rather minimal description of the software. In particular, there is software that has not been developed for IBEX specifically but that could be used for IBEX datasets (ASHLAR, WSIReg, VALIS, WARPY, and QuPath, etc). It would be nice if there was mention of those.

      ASHLAR, WSIReg, VALIS, and Warpy have been added to the Knowledge-Base. These software components are specifically relevant for iterative imaging protocols which require image alignment. With respect to QuPath, Fiji, Napari and other general microscopy image analysis frameworks, these are not listed. Such frameworks provide a wide range of operations relevant for many microscopy image analysis tasks and are likely already familiar to researchers who are interested in the information contained in the Knowledge-Base.

      (3) There is a concern about how the negative data information will be added, as no publication or peer-review process can back it up. Perhaps the particular conditions of the experiment should be very well described to allow future users to assess the validity.

      We agree with this observation and have added the following language to the contribute page:

      ”When reporting information that has not appeared in a peer-reviewed publication, both negative and positive results, include more details with respect to experimental conditions and provide sample images as part of the supporting material files. In all cases, peer reviewed or not, we encourage providing additional details in the supporting material that you deem important and are not part of the csv file structure. These include, but are not limited to, lot numbers, versioned protocols used in the work, and any other information which will facilitate validation reproducibility.”

      (4) The proposed scheme where a reagent can be validated or recommended against by up to 4 different labs should be good. It may be good to make sure that researchers who validate belong to different labs and are not only different ORCID that belong to the same group. Similar to making a case of recommendations against a reagent.

      We generally support this recommendation. Based on our experience, even members within the same laboratory encounter challenges when attempting to validate reagents contributed by current or former colleagues. Additionally, research labs often experience significant personnel turnover, with minimal overlap over a five year span.

      To address these concerns, we have updated the instructions on the contribute page as follows: ”We only accept up to 5 ORCID additions in the Agree or Disagree columns. This means that the original contributor’s work was replicated by up to 4 individuals or refuted by up to 5 people. Priority is given to contributions from individuals in laboratories distinct from the original source.”

      (5) It is very interesting to keep track of the protocol versions used. Perhaps users should be able to validate independent versions and it will be important to know how information is kept.

      Thank you for your suggestion. We encourage members of the community to cite the latest version of the Knowledge-Base in the “Citing the Knowledge-Base” section.

      (6) The final point I would make is that the need to form a GitHub repository may deter some people from submitting data. For sporadic contributions, authors could think that users could either reach out to main developers and/or provide a submission form that can help less experienced users of command-line and GitHub programming, but still promote the contribution from the community.

      We have given this significant thought and now support a secondary path for contributing that does not require familiarity with git or GitHub. This path involves downloading a zip file, modifying the contents of the csv files and providing supporting material text files and images. Once the work is completed, the contributor contacts the Knowledge-Base maintainers and we complete the submission together, with the maintainers dealing with the usage of git and GitHub. This information has been added to the notes which are listed at the top of the Contribute page. We have recently completed the first contribution that followed this new workflow.

      We still encourage researchers to familiarize themselves with git and the GitHub repository hosting service. These tools have been shown to be useful for collaborative and reproducible laboratory research.

      Reviewer #2:

      (1) The potential impact of IBEX KB is very clear. However, the paper would benefit by also discussing more on KB maintenance and outreach, and how higher participation could be incentivized.

      We have added the following details to the discussion:

      The KB is actively maintained by its chairs, who meet bi-weekly to ensure its continued development and maintenance. In addition to these regular meetings, we engage with both current and prospective community members to gather feedback, encourage contributions, and expand the collective knowledge supporting the KB. To broaden outreach and foster sustained engagement, the IBEX community will collaborate with synergistic initiatives such as the HuBMAP Affinity Reagents Working Group, the European Society for Spatial Biology (ESSB), and the Global Alliance for Spatial Technologies (GESTALT).

      As a further incentive for participation, we intend to launch an annual “Reagent Validation Week”, a community driven event inspired by software hackathons. During this dedicated week, researchers would focus on validating or reproducing validation for selected reagents and contribute their findings to the KB. We have also discussed hosting an “Around the World” symposium, featuring presentations from both junior and senior scientists across the community, to showcase diverse perspectives and foster global collaboration.

      (2) Use of resources like GitHub may limit engagement from non-coding members of the scientific community. Will there be alternative options like a user-friendly web interface to contribute more easily?

      We agree with this observation and have addressed it. Please see detailed response to point 6 from Reviewer 1.

      Reviewer #3:

      (1) IBEX is a specific immunofluorescence method. However, the utility of the Knowledge base is not limited to the specific IBEX method. Therefore, I suggest removing the unnecessary branding of the term IBEX from the KB and citing potentially other similar cyclic immunofluorescence methods in the manuscript (e.g. CycIF Lin et al 2018). This would also emphasize the wider impact and applicability of the KB to the wider imaging community.

      For now, we have decided to keep the original reference to the IBEX method in the resource name and re-brand it in the next development phase. In that phase we intend to solicit reagent validations for methods unrelated to IBEX. We have added the reference to the CycIF publication. The manuscript text now reads: “We are optimistic that future versions will include extension of the IBEX method to other tissues and species and we intend to solicit contributions of reagent validations for other multiplexed imaging techniques such as CycIF Lin et al. (2015). At that point in time we expect to re-brand the KB as the IBEX++ Knowledge-Base...”

      (2) I believe reporting negative results with reagents is highly valuable. However, the way to report antibodies must include more details. To ensure data quality, every report should be linked to a specific protocol + images (or doc with the standard document variations, and sample information. This should be a mandatory requirement.

      We agree that this information is desirable, but we do not agree that it should be mandatory. In the contribution instructions we now explicitly list lot numbers and versioned protocols as examples of details that we encourage contributors to include in their supporting material files. We believe that requiring this information for a contribution sets the bar too high and will deter many from contributing information that can benefit others.

      (3) While cross-validation among researchers is beneficial, even if five individuals fail to reproduce results with a given antibody, their findings may be influenced by techniquespecific factors. It is also important to consider whether these researchers come from the same group, institution, or geographical region, as this could impact reproducibility. Additionally, entries that have not been reproduced at least five times using the same protocol should still be considered valuable information. To address this, an ”insufficient validation data” flag could be implemented, ensuring that incomplete but useful findings remain accessible.

      The contribution instructions now state that ”Priority is given to contributions from individuals in laboratories distinct from the original source”.

      While our goal is to support reproducing reagent validations, we do not expect these type of contributions be the rule as the only incentive we can provide to encourage this behavior is co-authorship on the authoritative dataset. As a result, it is likely that many of the validations will have a single endorser, the original contributor. These results are valuable information and we do not think they should be singled out (insufficient validation label). We leave it up to the users of the KB to decide whether they trust recommendations with multiple endorsers or if endorsement by a single highly trusted contributor is sufficient for them. In all cases, issues with contributions can be rasied and discussed on the KB discussion forum.

      The rationale for limiting the number of reproduction studies to five was that this is a minimal, yet sufficiently large, number that provides confidence in the results. Placing an upper limit ensures that researchers do not provide reproduction results for widely used and well established reagents just because these results are readily available to them.

      (4) This system could flag reagents with inconsistent reports, highlight potential techniquespecific issues, and suggest alternative reagents with stronger validation records. Furthermore, a validation confidence ranking could be introduced, taking into account the number of independent confirmations, protocol consistency, and reproducibility data. These measures would help refine the reporting process while maintaining transparency and scientific rigor.

      We agree that the functionality described here is desirable, but this is not part of the KB. At its core the KB is a dataset and we do not envision developing dedicated tools to perform these tasks. Instead, we foresee using the KB as context for interacting with AI agents. Providing the KB as context to an AI, one can currently use it to answer domain specific questions and perform related tasks such as designing imaging panels (under subject matter expert supervision). This was added to the sample usecases in the manuscript with a transcript from interaction with an AI model using the website as context provided as supplemental material.

      (5) Regarding image formats for results reporting, while JPG files are convenient due to their small size, TIFF files offer significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis. I suggest in this regard making available the possibility of including a link to the original TIFF data

      The goal of the supporting material image is similar to that of an image used in a manuscript and it should not be used for data analysis purposes. This is the reason we chose the JPG format. Sharing these images is not intended to be a substitute for publicly sharing the original images and their associated metadata. This is now noted in the contributing instructions.

      (6) Homepage:

      Include a brief summary of the knowledge base’s purpose and tabs to provide clarity for new users. The current homepage is a bit misleading for newcomers.

      The homepage has been modified to include information about the Knowledge-Base, contents and how to use it including as context for interaction with AI agents.

      (7) Reagent Resources Section: Enable users to search for a target name directly, rather than filtering through dropdown options.

      The dropdown menu explicitly shows all available targets and also allows for direct search of target name. To use it for direct search, once the dropdown is selected start typing the name of the target and the focus will jump to it. Thus, if looking for ”Zrf1” there is no need to scroll through all targets in the dropdown. This also facilitates easy clearing of a filter, select the dropdown and start typing the word ”clear”, then press enter when it is highlighted. This information has been added to the page.

      Provide an option to download the dataset as a CSV file. This feature will be highly valued by non-computational researchers.

      Links to download the reagent resources csv file and the whole Knowledge-Base have been added.

      Add the same column documentation here as in the contributor instructions. For example, you need to make clear the distinctions between ”Recommend,” ”Agree,” and ”Disagree” ratings, as they may be misleading to those who have not visited the rules to contribute.

      A link to the column documentation in the contributor instructions has been added here. Information on the website is displayed in one location and linked as needed. Duplicated display of information creates uncertainty for users and results in more complex instructions when referring to the information.

      Include additional details in the dataset, such as lot numbers, or the date of the contribution, that could be relevant in different settings.

      Please see response to point 2.

      (8) Data & Software Section:

      Add filtering options in the table based on organism and tissue availability

      This data is not encoded in the available information in an independent manner so we do not directly enable filtering. It is usually included in the ”Details” free form text. This text is duplicated from the original dataset descriptions. One can still search this page using the browsers search functionality to achieve behavior similar to filtering. While the ”Details” text may not be visible due to the usage of the accordion user interface, it is still searchable and will automatically expand when the search text is found under the collapsed accordion button.

      (9) Contributor Section:

      Incorporate figures from the manuscript to make it more visual and improve understanding of rules and standards.

      Figure 4 from the manuscript was added to this page.

      I believe reporting negative results with reagents is highly valuable. However, to ensure data quality, every report should be linked to a specific protocol and sample information. This should be a mandatory requirement. To streamline the process, warnings for certain reagents could be implemented, but a reagent should not be outright labeled as ineffective without proper validation.

      Please see response to point 2.

      Cross-validation among researchers is beneficial, but even if five individuals fail to reproduce results with a given antibody, it may still be due to technique-specific factorsparticularly for non-routine antibodies.

      We agree with this observation and have modified the contribution instructions accordingly:

      When overturning previously reported results, the number of ORCIDs in the Disagree column becomes greater than those in the Agree column, we will open the contribution for public discussion on the Knowledge-Base forum before accepting it.

      The intent is to increase the community’s confidence in the results, particularly when dealing with non-routine antibodies. This allows the original contributor and other members of the community to engage with the researchers who were unable to replicate a specific validation, possibly helping them to replicate the original results by adding missing details to the KB, or explicitly identifying and documenting issues with the original work.

      Regarding image formats, JPG files are convenient due to their small size, but TIFF offers significant advantages, such as preserving metadata and maintaining the integrity of real data values. Proper signal adjustments may not always be applied by researchers, making TIFF crucial for accurate data analysis.

      Please see response to point 5.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      The authors assess the impact of E-cigarette smoke exposure on mouse lungs using single cell RNA sequencing. Air was used as control and several flavors (fruit, menthol, tobacco) were tested. Differentially expressed genes (DEGs) were identified for each group and compared against the air control. Changes in gene expression in either myeloid or lymphoid cells were identified for each flavor and the results varied by sex. The scRNAseq dataset will be of interest to the lung immunity and e-cig research communities and some of the observed effects could be important. Unfortunately, the revision did not address the reviewers' main concerns about low replicate numbers and lack of validations. The study remains preliminary, and no solid conclusions could be drawn about the effects of E-cig exposure as a whole or any flavor-specific phenotypes.

      Strengths:

      The study is the first to use scRNAseq to systematically analyze the impact of e-cigarettes on the lung. The dataset will be of broad interest.

      Weaknesses:

      scRNAseq studies may have low replicate numbers due to the high cost of studies but at least 2 or 3 biological replicates for each experimental group is required to ensure rigor of the interpretation. This study had only N=1 per sex per group and some sex-dependent effects were observed. This could have been remedied by validating key observations from the study using traditional methods such as flow cytometry and qPCR, but the limited number of validation experiments did not support the conclusions of the scRNA seq analysis. An important control group (PG:VG) had extremely low cell numbers and was basically not useful. Statistical analysis is lacking in almost all figures. Overall, this is a preliminary study with some potentially interesting observations, but no solid conclusions can be made from the data presented.

      The only new validation experiment is the immunofluorescent staining of neutrophils in Figure 4. The images are very low resolution and low quality and it is not clear which cells are neutrophils. S100A8 (calprotectin) is highly abundant in neutrophils but not strictly neutrophil-specific. It's hard to distinguish positive cells from autofluorescence in both Ly6g and S100a8 channels. No statistical analysis in the quantification.

      We thank the reviewer for identifying the strengths of this study and pointing out the gaps in knowledge. Overall, our purpose to present this data is to provide the scRNA seq results as a resource to a wider community. We have used techniques like flow cytometry, multianalyte cytokine array and immunofluorescence to validate some of the results. We agree with the reviewer that we were unable to rightly point out the significance of our findings with the immunofluorescent stain in the previous edit. We have revised the manuscript and included the quantification for both Ly6G+ and S100A8+ cells in e-cig aerosol exposed and control lung tissues. Briefly, we identified a marked decrease in the staining for S100A8 (marker for neutrophil activation) in tobacco-flavored e-cig exposed mouse lungs as compared to controls. Upon considering the corroborating evidence from scRNA seq and flow cytometry with regards to increased neutrophil percentages in experimental group and lowered staining for active neutrophils using immunofluorescence, we speculate that exposure to e-cig (tobacco) aerosols may alter the neutrophil dynamics within the lungs. Also, co-immunofluorescence identified a more prominent co-localization of the two markers in control samples as compared to the treatment group which points towards some changes in the innate immune milieu within the lungs upon exposures. Future work is required to validate these speculations.

      We have now discussed all the above-mentioned points in the Discussion section of the revised manuscript and toned down our conclusions regarding sex-dependent changes from scRNA seq data.

      It is unclear what the meaning of Fig. 3A and B is, since these numbers only reflect the number of cells captured in the scRNAseq experiment and are not biologically meaningful. Flow cytometry quantification is presented as cell counts, but the percentage of cells from the CD45+ gate should be shown. No statistical analysis is shown, and flow cytometry results do not support the conclusions of scRNAseq data.

      We thank the reviewer for this question. However, we would like to highlight that scRNA seq and flow cytometry may show similar trends but cannot be identical as one relies on cell surface markers (protein) for identification of cell types, while other is dependent on the transcriptomic signatures to identify the cell types. In our data, for the myeloid cells (alveolar macrophages and neutrophils), the scRNA and flow cytometry data match in trend. However, the trends do not match with respect to the lymphoid cells being studied (CD4 and CD8 T cells). The possible explanation for such a finding could be possible high gene dropout rates in scRNA seq, different analytical resolution for the two techniques and pooling of samples in our single cell workflow. We realize these shortcomings in our analyses and mention it clearly in the discussion as limitation of our work. It is important to note also that cell frequencies identified in scRNA seq just provide wide and indistinct indications which need to be further validated, which we tried to accomplish in our work to some degree. Our flow-based results clearly highlight the sex-specific variations in the immune cell percentages (something we could not have anticipated earlier). In future studies, we will include more replicates to tease out sex-based variations upon acute and chronic exposure to e-cig aerosols.

      We have now replotted the graphs in Fig 3A and B and plotted the flow quantification as the percentage of total CD45+ cells. The gating strategy for the flow plots is also included as Figure S6 in the revised manuscript.

      Reviewer #2 (Public review):

      This study provides some interesting observations on how different flavour e-cigarettes can affect lung immunology; however, there are numerous flaws, including a low replicate number and a lack of effective validation methods, meaning findings may not be repeated. This is a revised article but several weaknesses remain related to the analysis and interpretation of the data.

      Strengths:

      The strength of the study is the successful scRNA-seq experiment which gives some preliminary data that can be used to create new hypotheses in this area.

      Weaknesses:

      Although some text weaknesses have been addressed since resubmission, other specific weaknesses remain: The major weakness is the n-number and analysis methods. Two biological n per group is not acceptable to base any solid conclusions. Any validatory data was too little (only cell % data) and not always supporting the findings (e.g. figure 3D does not match 3B/4A). Other examples include:

      There aren't enough cells to justify analysis - only 300-1500 myeloid cells per group with not many of these being neutrophils or the apparent 'Ly6G- neutrophils'.

      We thank the reviewer for the comment, but we disagree with the reviewer in terms of the justification of analyses. All the flavored e-cig aerosol groups were compared with air controls to deduce the outcomes in the current study. We already acknowledge low sample quality for PGVG group and have only included the comparisons with PGVG upon reviewer’s request which is open to interpretation by the reader.

      By that measure, each treatment group (except PGVG group) has over 1000 cells with 24777 genes being analyzed for each cell type, which by the standards of single cell is sufficient. We understand that this strategy should not be used for detection of rare cell populations, which was neither the purpose of this manuscript nor was attempted. We conduct comparisons of broader cell types and mention more samples need to be added in the Discussion section of the revised manuscript.

      As for the Ly6G neutrophil category, we don’t only base our results on scRNA analyses but also perform co-immunofluorescence and multi-analyte analyses and use evidence from previous literature to back our outcome. To avoid over-stating our results we have revamped the whole manuscript and ensured to tone down our results with relation to the presence of Ly6G- neutrophils. We do understand that more work is required in the future, but our work clearly shows the shift in neutrophil dynamics upon exposure which should be reported, in our opinion.

      The dynamic range of RNA measurement using scRNAseq is known to be limited - how do we know whether genes are not expressed or just didn't hit detection? This links into the Ly6G negative neutrophil comments, but in general the lack of gene expression in this kind of data should be viewed with caution, especially with a low n number and few cells. The data in the entire paper is not strong enough to base any solid conclusion - it is not just the RNA-sequencing data.

      We acknowledge this to be a valid point and have revamped the manuscript and toned down our conclusions. However, such limitations exist with any scRNA seq dataset and so must be interpreted accordingly by the readers. We do understand that due to the low cell counts and the limitations with scRNA seq we should not perform DESeq2 analyses for Ly6G+ versus Ly6G- neutrophil categories, which was never attempted at the first place. However, our results with co-immunofluorescence, multianalyte assay and scRNA expression analyses in myeloid cluster do point towards a shift in neutrophil activation which needs to be further investigated. Furthermore, Ly6G deficiency has been linked to immature neutrophils in many previous studies and is not an unlikely outcome that needs to be treated with immense skepticism.

      We wish to make this dataset available as a resource to influence future research. We are aware of its limitations and have been transparent with regards to our experimental design, capture strategy, the quality of obtained results, and possible caveats to make it is open for discussion by the readers.

      There is no data supporting the presence of Ly6G negative neutrophils. In the flow cytometry only Ly6G+ cells are shown with no evidence of Ly6G negative neutrophils (assuming equal CD11b expression). There is no new data to support this claim since resubmission and the New figures 4C and D actually show there are no Ly6G negative cells - the cells that the authors deem Ly6G negative are actually positive - but the red overlay of S100A8 is so strong it blocks out the green signal - looking to the Ly6G single stains (green only) you can see that the reported S100A8+Ly6G- cells all have Ly6G (with different staining intensities).

      We thank the reviewer for this query and do understand the skepticism. We have now quantified the data to provide more clarity for interpretation. As we were using paraffin embedded tissues, some autofluorescence is expected which could explain some of reviewer’s concerns. However we expect that the inclusion of better quality images and quantification must address some of the concerns raised by the reviewer.

      Eosinophils are heavily involved in lung macrophage biology, but are missing from the analysis - it is highly likely the RNA-sequence picked out eosinophils as Ly6G- neutrophils rather than 'digestion issues' the authors claim

      We thank the reviewer for raising a valid concern. However, the Ly6G- cluster cannot be eosinophils in our case. Literature suggests SiglecF as an important biomarker of eosinophils which was absent in the Ly6G- cluster our in scRNA seq analyses as shown in File S18 and Figure 6B of the revised manuscript. We have now provided a detailed explanation (Lines 476-488; 503-506) of the observed results pertaining to eosinophil population in the revised manuscript to further address some of the concerns raised by this reviewer.

      After author comments, it appears the schematic in Figure 1A is misleading and there are not n=2/group/sex but actually only n=1/group/sex (as shown in Figure 6A). Meaning the n number is even lower than the previous assumption.

      We concur with reviewers’ valid concern and so are willing to provide this data as a resource for a wider audience to assist future work. Pooling of samples have been practiced by many groups previously to save resources and expense. We did it for the very same reason. It may not be the preferred approach, but it still has its merit considering the vast amount of cell-specific data generated using this strategy. To avoid overstating our results we have ensured to maintain transparency in our reporting and acknowledge all the limitations of this study.

      We do not believe that the strength of scRNA seq lies in drawing conclusive results, but to tease our possible targets and direction that need to be validated with more work. In that respect, our study does identify the target cell types and biological processes which could be of importance for future studies.

      Reviewer #3 (Public review):

      This work aims to establish cell-type specific changes in gene expression upon exposure to different flavors of commercial e-cigarette aerosols compared to control or vehicle. Kaur et al. conclude that immune cells are most affected, with the greatest dysregulation found in myeloid cells exposed to tobacco-flavored e-cigs and lymphoid cells exposed to fruit-flavored e-cigs. The up- and down-regulated genes are heavily associated with innate immune response. The authors suggest that a Ly6G-deficient subset of neutrophils is found to be increased in abundance for the treatment groups, while gene expression remains consistent, which could indicate impaired function. Increased expression of CD4+ and CD8+ T cells along with their associated markers for proliferation and cytotoxicity is thought to be a result of activation following this decline in neutrophil-mediated immune response.

      Strengths:

      Single cell sequencing data can be very valuable in identifying potential health risks and clinical pathologies of lung conditions associated with e-cigarettes considering they are still relatively new.

      Not many studies have been performed on cell-type specific differential gene expression following exposure to e-cig aerosols.

      The assays performed address several factors of e-cig exposure such as metal concentration in the liquid and condensate, coil composition, cotinine/nicotine levels in serum and the product itself, cell types affected, which genes are up- or down-regulated and what pathways they control.

      Considerations were made to ensure clinical relevance such as selecting mice whose ages corresponded with human adolescents so that data collected was relevant.

      Weaknesses:

      The exposure period of 1 hour a day for 5 days is not representative of chronic use and this time point may be too short to see a full response in all cell types. The experimental design is not well-supported based on the literature available for similar mouse models. Clinical relevance of this short exposure remains unclear.

      We thank the reviewer for this query. However, we would like to emphasize that chronic exposure was never the intention of this study. We wished to design a study for acute nose-only exposure owing to which the study duration was left shorter. Shorter durations limit the stress and discomfort to the animal. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. To our knowledge there is no gold standard of e-cig aerosol exposure which is widely accepted other than the CORESTA recommendations, which we followed. Also, we show in our study how the daily exposure to leached metals vary in a flavor-dependent manner thus validating that exposure regime does need more attention in terms of equal dosing, particle distribution and composition- something we have started doing in our future studies. We have included all the explanations in the revised manuscript (Lines 82-85, 425-435, 648-654).

      Several claims lack supporting evidence or use data that is not statistically significant. In particular, there were no statistical analyses to compare results across sex, so conclusions stating there is a sex bias for things like Ly6G+ neutrophil percentage by condition are observational.

      We agree with reviewer’s comment and have taken this into consideration. We have now revamped the whole manuscript and toned down most of the sex-based conclusions stated in this work. Having said that, it is important to note that most of the work relying solely on scRNA seq, as is the case for this study, is observational in nature and needs to be assessed bearing this in mind.

      Overall, the paper and its discussion are relatively surface-level and do not delve into the significance of the findings or how they fit into the bigger picture of the field. It is not clear whether this paper is intended to be used as a resource for other researchers or as an original research article.

      We have now reworked on the Discussion and tried to incorporate more in-depth discussion and the results providing our insights regarding the observations, discrepancies and the possible explanations. We have also made it clear that this paper is intended to be used as a resource by other researchers (Lines 577-579)

      The manuscript has some validation of findings but not very comprehensive.

      We have now revamped the manuscript. We have Included quantification for immunofluorescence data with better representation of the GO analyses. We have worked on the Results and Discussion sections to make this a useful resource for the scientific community.

      This paper provides a strong foundation for follow-up experiments that take a closer look at the effects of e-cig exposure on innate immunity. There is still room to elaborate on the differential gene expression within and between various cell types.

      We thank the reviewer for pointing out the strength of this paper. The reason why we refrained from elaborating of the differential gene expressions within and between various cell types was due to low sample number and sequencing depth for this study. However the raw data will be provided with the final publication, which should be freely accessible to the public to re-analyze the data set as they deem fit.

      Comments on revisions:

      The reviewers have addressed major concerns with better validation of data and improved organization of the paper. However, we still have some concerns and suggestions pertaining to the statistical analyses and justifications for experimental design.

      We appreciate the nuance of this experimental design, and the reviewers have adequately commented on why they chose nose-only exposure over whole body exposure. However, the justification for the duration of the exposure, and the clinical relevance of a short exposure, have not been addressed in the revised manuscript.

      We thank the editor for this query. We have now addressed this query briefly in Lines 82-85, 425-435, 648-654 of the revised manuscript. We would like to add, however, that we intend to design a study for acute nose-only exposure for this project. Shorter durations limit the stress and discomfort to the animal, owing to which a duration of 1hour per day was chosen. The in vivo study using nose-only exposure regimen is still developing with multiple exposure regimen being used by different groups. Ours is one such study in that direction just intended to identify cell-specific changes upon exposure. Considering our results in Figure 1B showing variations in the level of metals leached in each flavor per day, the appropriate exposure regimen to design a controlled, reproducible experiment needs to be discussed. There could be room for improvement in our strategy, but this was the best regimen that we found to be appropriate per the literature and our prior knowledge in the field.

      The presentation of cell counts should be represented by a percentage/proportion rather than a raw number of cells. Without normalization to the total number of cells, comparisons cannot be made across groups/conditions. This comment applies to several figures.

      We thank the editor for this comment and have now made the requested change in the revised manuscript.

      We appreciate that the authors have taken the reviewers' advice to validate their findings. However, we have concerns regarding the immunofluorescent staining shown in Figure 4. If the red channel is showing a pan-neutrophil marker (S100A8) and the green channel is showing only a subset of neutrophils (LY6G+), then the green channel should have far less signal than the red channel. This expected pattern is not what is shown in the figure, with the Ly6G marker apparently showing more expression than S100A8. Additionally, the FACS data states that only 4-5% of cells are neutrophils, but the red channel co-localizes with far more than 4-5% of the DAPI stain, meaning this population is overrepresented, potentially due to background fluorescence (noise). In addition, some of the shapes in the staining pattern do not look like true neutrophils, although it is difficult to tell because there remains a lot of background staining. The authors need to verify that their S100A8 and Ly6G antibodies work and are specific to the populations they intend to target. It is possible that only the brightest spots are truly S100A8+ or Ly6G+.

      We thank the editor for this comment and acknowledge that we may have made broad generalizations in our interpretation of our data previously. We have now revisited the data and quantified the two fluorescence for better interpretation of our results. We have also reassessed our conclusions from this data and reworded the manuscript accordingly. Briefly we believe that Ly6G deficiency could be an indication of the presence of immature neutrophils in the lungs. This is a common process of neutrophil maturation. An active neutrophil population has Ly6G and should also express S100A8 indicating a normal neutrophilic response against stressors. However, our results, despite some autofluorescence which is common with lung tissues, shows a marked decline in the S100A8+ cells in the lung of tobacco-flavored e-cig aerosol exposed mice as compared to air controls. We also do not see prominent co-localization of the two markers in exposed group thus proving a shift in neutrophil dynamics which requires further investigation. We would also like to mention here that S100A8 is predominantly expressed in neutrophils, but is also expressed by monocytes and macrophages, so that could explain the over-representation of these cells in our immunofluorescence results. We have now included this in the Discussion section (Lines 489- 538) of the revised manuscript.

      Paraffin sections do not always yield the best immunostaining results and the images themselves are low magnification and low resolution.

      We agree with the editor that paraffin sections may not yield best results, we have worked on the final figure to improve the quality of the displayed results and zoomed-in some parts of the merged image to show the differences in the co-localization patterns for the two markers in our treated and control groups for easier interpretation.

      Please change the scale bars to white so they are more visible in each channel.

      The merged image in Figure 6C now has a white scale bar.

      We appreciate that this is a preliminary test used as a resource for the community, but there is interesting biology regarding immune cells that warrants DEG analysis by the authors. This computational analysis can be easily added with no additional experiments required.

      We thank the editor for this comment and agree that interesting biology regarding immune cells could be explored upon performing the DEG analyses on individual immune populations. However, due to the small sample size, low sequencing depth and pooling of same sex animals in each treatment group, we refrained from performing that analyses fearing over-representation of our results. We will be providing the link to the raw data with this publication which will be freely accessible to public on NIH GEO resource to allow further analyses on this dataset by the judgement of the investigator who utilizes it as a resource.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (Minor) The pathway analyses in Fig. 6-8 have different fonts than what's used in all other figures.

      We have now made the requested change in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We would like to proceed with this paper as a Version of Record but we will correct the mistake that we made in the Key resources table. As the reviewer noted we had added the wrong guide RNA sequence here. We are super thankful to the reviewer and apologize for the mistake.


      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This important study identifies a new key factor in orchestrating the process of glial wrapping of axons in Drosophila wandering larvae. The evidence supporting the claims of the authors is convincing and the EM studies are of outstanding quality.

      We are thankful for this kind and very positive judgment.

      However, the quantification of the wrapping index, the role of Htl/Uif/Notch signaling in differentiation vs growth/wrapping, and the mechanism of how Uif "stabilizes" a specific membrane domain capable of interacting with specific axons might require further clarification or discussion.

      This is now addressed

      Reviewer #1 (Public review):

      Summary:

      A central function of glial cells is the ensheathment of axons. Wrapping of larger-diameter axons involves myelin-forming glial classes (such as oligodendrocytes), whereas smaller axons are covered by non-myelin-forming glial processes (such as olfactory ensheathing glia). While we have some insights into the underlying molecular mechanisms orchestrating myelination, our understanding of the signaling pathways at work in non-myelinating glia remains limited. As non-myelinating glial ensheathment of axons is highly conserved in both vertebrates and invertebrates, the nervous system of Drosophila melanogaster, and in particular the larval peripheral nerves, have emerged as a powerful model to elucidate the regulation of axon ensheathment by a class of glia called wrapping glia. Using this model, this study seeks to specifically address the question, as to which molecular mechanisms contribute to the regulation of the extent of glial ensheathment focusing on the interaction of wrapping glia with axons. 

      Strengths and Weaknesses:

      For this purpose, the study combines state-of-the-art genetic approaches with high-resolution imaging, including classic electron microscopy. The genetic methods involve RNAi-mediated knockdown, acute Crispr-Cas9 knock-outs, and genetic epistasis approaches to manipulate gene function with the help of cell-type specific drivers. The successful use of acute Crispr-Cas9 mediated knockout tools (which required the generation of new genetic reagents for this study) will be of general interest to the Drosophila community. 

      The authors set out to identify new molecular determinants mediating the extent of axon wrapping in the peripheral nerves of third-instar wandering Drosophila larvae. They could show that over-expressing a constitutive-active version of the Fibroblast growth factor receptor Heartless (Htl) causes an increase in wrapping glial branching, leading to the formation of swellings in nerves close to the cell body (named bulges). To identify new determinants involved in axon wrapping acting downstream of Htl, the authors next conducted an impressive large-scale genetic interaction screen (which has become rare, but remains a very powerful approach), and identified Uninflatable (Uif) in this way. Uif is a large single-pass transmembrane protein that contains a whole series of extracellular domains, including Epidermal growth factor-like domains. Linking this protein to glial branch formation is novel, as it has so far been mostly studied in the context of tracheal maturation and growth. Intriguingly, a knock-down or knock-out of uif reduces branch complexity and also suppresses htl over-expression defects. Importantly, uif over-expression causes the formation of excessive membrane stacks. Together these observations are in in line with the notion that htl may act upstream of uif. 

      Further epistasis experiments using this model implicated also the Notch signaling pathway as a crucial regulator of glial wrapping: reduction in Notch signaling reduces wrapping, whereas over-activation of the pathway increases axonal wrapping (but does not cause the formation of bulges). Importantly, defects caused by the over-expression of uif can be suppressed by activated Notch signaling. Knock-down experiments in neurons suggest further that neither Delta nor Serrate act as neuronal ligands to activate Notch signaling in wrapping glia, whereas knock-down of Contactin, a GPI anchored Immunoglobulin domain-containing protein led to reduced axon wrapping by glia, and thus could act as an activating ligand in this context. 

      Based on these results the authors put forward a model proposing that Uif normally suppresses Notch signaling, and that activation of Notch by Contactin leads to suppression of Htl, to trigger the ensheathment of axons. While these are intriguing propositions, future experiments would need to conclusively address whether and how Uif could "stabilize" a specific membrane domain capable of interacting with specific axons.

      We absolutely agree with the reviewer that it would be fantastic to understand whether and how Uif could stabilize specific membrane domains that are capable of interacting with axons. To address this we need to be able to label such membrane domains and unfortunately we still cannot do so. We analyzed the distribution of PIP2/PIP3 but failed to detect any differences. Thus we still lack wrapping glial membrane markers that are able to label specific compartments.

      Moreover, to obtain evidence for Uif suppression by Notch to inhibit "precocious" axon wrapping and for a "gradual increase" of Notch signaling that silences uif and htl, (1) reporters for N and Htl signaling in larvae, (2) monitoring of different stages at a time point when branch extension begins, and (3) a reagent enabling to visualize Uif expression could be important next tools/approaches. Considering the qualitatively different phenotypes of reduced branching, compared to excessive membrane stacks close to cell bodies, it would perhaps be worthwhile to explore more deeply how membrane formation in wrapping glia is orchestrated at the subcellular level by Uif.

      In the revised version of the manuscript we have now included the use of Notch and RTK-signaling reporters.

      (1) reporters for N and Htl signaling in larvae,

      We had already employed the classic reporter generated by the Bray lab: Gbe-Su(H)-lacZ. This unfortunately failed to detect any activity in larval wrapping glia nuclei but was able to detect Notch activity in the adult wrapping glia (Figure S5C,F).

      We did, as requested, the analysis of a RTK signaling reporter.  The activity of sty-lacZ that we had previously characterized in the lab (Sieglitz et al., 2013) increases by 22% when Notch is silenced. Given the normal distribution of the data points, this shows a trend which, however, is not in the significance range. We have not included this in the paper, but would be happy to do so, if requested.

      Author response image 1.

       

      (2) monitoring of different stages at a time point when branch extension begins,

      The reviewer asks for an important question; however, this is extremely difficult to tackle experimentally. It would require a detailed electron microscopic analysis of early larval stages which cannot be done in a reasonable amount of time. We have however added additional information on wrapping glia growth summarizing recently published work from the lab (Kautzmann et al., 2025).

      (3) a reagent enabling to visualize Uif expression could be important next tools/approaches.

      The final comment of the reviewer also addresses an extremely relevant and important issue. We employed antibodies generated by the lab of R. Ward, but they did not allow detection of the protein in larval nerves. We also attempted to generate anti-Uif peptide antibodies but these antibodies unfortunately do not work in tissue. We are still trying to generate suitable reagents but for the current revision cannot offer any solution.

      Lastly, we agree with the reviewer that it would be worthwhile to explore how Uif controls membrane formation at the subcellular level. This, however, is a completely new project and will require the identification of the binding partners of Uif in wrapping glia to start working on a link between Uif and membrane extension. The reduced branching phenotype might well be a direct consequence of excessive membrane formation as it likely blocks recourses needed for efficient growth of glial processes.

      Finally, in light of the importance of correct ensheathment of axons by glia for neuronal function, this study will be of general interest to the glial biology community. 

      We are very grateful for this very positive comment.

      Reviewer #2 (Public review): 

      The FGF receptor Heartless has previously been implicated in Drosophila peripheral glial growth and axonal wrapping. Here, the authors perform a large-scale screen of over 2600 RNAi lines to find factors that control the downstream signaling in this process. They identify a transmembrane protein Uninflatable to be necessary for the formation of plasma membrane domains. They further find that a Uif regulatory target, Notch, is necessary for glial wrapping. Interestingly, additional evidence suggests Notch itself regulates uif and htl, suggesting a feedback system. Together, they propose that Uif functions as a "switch" to regulate the balance between glial growl and wrapping of axons. 

      Little is known about how glial cell properties are coordinated with axons, and the identification of Uif is a promising link to shed light on this orchestration. The manuscript is well-written, and the experiments are generally well-controlled. The EM studies in particular are of outstanding quality and really help to mechanistically dissect the consequences of Uif and Notch signaling in the regulation of glial processes. Together, this valuable study provides convincing evidence of a new player coordinating the interactions controlling the glial wrapping of axons.

      Reviewer #1 (Recommendations for the authors): 

      (1) To be reproducible and understandable, it would be important to provide detailed information about crosses and genotypes, as reagents are currently listed individually and genotypes are provided in rather simplified versions. 

      We have added the requested information to the text.

      (2) Neurons are inherently resistant to RNAi-mediated knockdown and it thus may be necessary to introduce the over-expression of UAS-dcr2 when assessing neuronal requirements and to specifically exclude Delta or Serrate as ligands. 

      We agree with the reviewer and have repeated the knockdown experiments using UAS-dcr2 and obtained the same results. To use an RNAi independent approach we also employed sgRNA expression in the presence of Cas9. The neuron specific gene knockout also showed no glial wrapping phenotype. These results are now added to the manuscript.

      (3) Throughout the manuscript, the authors use the terms "growth" and "differentiation" referring to the extent of branch formation versus axon wrapping. However glial differentiation and growth could have different meanings (for instance, growth could implicate changes in cell size or numbers, while differentiation could refer to a change from an immature precursor-like state to a mature cell identity). It may thus be useful to replace these general terms with more specific ones. 

      This is a very good point. When we use the term “growth” we only infer on glial cell growth and thus, the increase in cell mass. Proliferation is excluded and this is now explicitly stated in the manuscript. The term “differentiation” is indeed difficult and therefore we changed it either directly addressing the morphology or to axon wrapping.

      (4) Page 4. "remake" fibers should be Remak fibers. 

      We have corrected this typo.

      (5) Page 5. "Heartless controls glial growth but does promote axonal wrapping", this sentence is not clear in its message because of the "but".

      We have corrected this sentence.

      (6) Generally, many gene names are used as abbreviations without introductions (e.g. Sos, Rl, Msk on page 7). These would require an introduction.

      All genetic elements are now introduced.

      (7) Page 8. When Cas9 is expressed ubiquitously ... It would be helpful to add how this is done (nsyb-Gal4, nrv2-Gal4, or another Gal4 driver are used to express UAS-Cas9, as the listed Gal4 drivers seem to be specific to neurons or glia?).

      This now added. We used the following genotype for ubiquitous knockout using the four different uif specific sgRNAs (UAS-uif<sup>sgRNA X</sup>): [w; UAS-Cas9/ Df(2L)ED438; da-Gal4 /UAS-uif<sup>sgRNA X</sup>]. We used the following genotype for a glial knockout in wrapping glia ([+/+; UAS-Cas9/+; nrv2-Gal4,UAS-CD8::mCherry/UAS-uif<sup>sgRNA X</sup>].

      We had previously shown that nrv2-Gal4 is a wrapping glia specific driver in the larval PNS (Kottmeier et al., 2020).

      Moreover, the authors mention that "This indicates that a putatively secreted version of Uif is not functional". This conclusion would need to be explained in detail.

      First, because it requires quite some detective work to understand the panels in Figure 1 on which this statement is based; second, since the acutely induced double-stranded breaks in the DNA and subsequent repair may cause variable defects, it may indeed be not certain what changes have been induced in each cell; and third considering that there is a putative cleavage site, would it be not be expected that the protein is not functional, when it is not cleaved, and there is no secreted extracellular part (unless the cleavage site is not required). The latter could probably only be addressed by rescue experiments with UAS transgenes with identified changes.

      We agree with the reviewer. The rescue experiments are unfortunately difficult, since even expression of a full length uif construct does not fully rescue the uif mutant phenotype (Loubéry et al., 2014). We therefore explained the conclusion taken from the different sgRNA knockout experiments better and also removed the statement that secreted Uif forms are non-functional.

      In the Star Method reagent table, it is not clear, why all 8 oligonucleotides are for "uif cleavage just before transmembrane domain" despite targeting different locations. 

      We are very sorry for this mistake and corrected it now. Thank you very much for spotting this.

      (8) Page 13. However, we expressed activated Notch,... the word "when" seems to be missing, and it would be helpful to specify how this was done (over-expression of N[ICD].

      We corrected it now accordingly.

      (9) To strengthen the point similarity of phenotypes caused by Htl pathway over-activation and Uif over-expression, it would be helpful to also show an EM electron micrograph of the former.

      We now added an extensive description of the phenotype caused by activated Heartless. This is shown as new Figure 2.

      (10) Figure 4C, the larval nerve seems to be younger, as many extracellular spaces between axons are detected.

      This perception is a misunderstanding and we are sorry for not explaining this better. The third instar larvae are all age matched. The particular specimen in Figure 4C shows some fixation artifacts that result in the loss of material. Importantly, however, membranes are not affected. Similar loss of material is also seen in Figure 6C. For further examples please see a study on nerve anatomy by (Kautzmann et al., 2025).

      (11) The model could be presented as a figure panel in the manuscript. To connect the recommendation section with the above public review, a step forward could be to adjust the model and the wording in the Result section and to move some of the less explored points and thoughts to the discussion.

      We are thankful for this advice and have moved an updated model figure to the end of the main text (now Figure 7).

      Reviewer #2 (Recommendations for the authors):

      (1) Screen and the interest in Uif: Out of the ~62 genes that came out of the RNAi screen, why did the authors prioritize and focus on Uif? What were the other genes that came out of the screen, and did any of those impinge on Notch signaling? 

      We have now more thoroughly described the results of the screen.  We selected Uif as it was the only transmembrane // adhesion protein identified and given the findings that Uif decorate apical membrane domains in epithelial cells, we hoped to identify a protein specific for a similar membrane domain in wrapping glia.

      Notch as well as its downstream transcription factors were not included in the initial screen, and were only analyzed, once we had seen the contribution of Notch. Interestingly, here is one single hit in our screen linked to Notch signaling: Gp150. Here however, we have tested additional dsRNA expressing lines and were not able to reproduce the phenotype. This information is added to the discussion.

      The authors performed a large-scale screen of 2600 RNAi lines, it seems more details about what came out of the screen and why the focus on Uif would benefit the manuscript. 

      See above comment.

      Relatedly, there would be a discussion of the limitations of the screen, and that it was really a screen looking to modify a gain-of-function phenotype from the activated Htl allele; it seems a screen of this design may lead to artifacts that may not reflect endogenous signaling.

      We have now added a short paragraph on suppressor screens, employing gain of function alleles to the introduction.

      “In Drosophila, such suppressor screens have been used successfully many times (Macagno et al., 2014; Rebay et al., 2000; Therrien et al., 2000). Possibly, such screens also uncover genes that are not directly linked to the signaling pathway under study but this can be tested in further experiments. Our screen led to the unexpected identification of the large transmembrane protein Uninflatable, which in epithelial cells localizes to the apical plasma membrane. Loss of uninflatable suppresses the phenotype caused by activated RTK signaling. In addition, we find that uif knockdown and uif knockout larvae show impaired glial growth while an excess of Uninflatable leads to the formation of ectopic wrapping membrane processes that, however, fail to interact with axons. uninflatable is also known to inhibit Notch.  “

      (2) In general this study relies on RNAi knockdown, and is generally well controlled in using multiple RNAi lines giving the same phenotype, and also controlled for by tissue-specific gene knockout. However, there is little in the way of antibody staining to directly confirm the target of interest is lost/reduced, which would obviously strengthen the study. 

      Lacking the tools or ability to assess RNAi efficiency (qPCR, antibody staining), some conclusions need to be tempered. For example, in the experiments in Figure S6 regarding canonical Notch signaling, the authors do not find a phenotype by Delta or Serrate knockdown, but there are no experiments that show Delta or Serrate are lost. Thus, if the authors cannot directly test for RNAi efficiency, these conclusions should be tempered throughout the manuscript. 

      We agree with the reviewer and now provide information on the use of Dicer in our RNAi experiments and conducted new sgRNA/Cas9 experiments. In addition we tempered our wording stating that Dl and or Ser are still possible ligands.

      (3) More description is needed regarding how the authors are measuring and calculating the "wrapping index". In principle, the approach seems sound. However, are there cases where axons are "partially" wrapped of various magnitudes, and how are these cases treated in the analysis? Are there additional controls of previously characterized mutants to illustrate the dynamic range of the wrapping index in various conditions?

      This is now explained.

      Further, can the authors quantify the phenotypes in the axonal "bulges" in Figures 1, 3, and 5?

      This is a difficult question. Although we can easily quantify the number of bulges we cannot quantify the severity of the phenotype as this will require EM analysis. Sectioning nerves at a specific distance of the ventral nerve cord already requires very careful adjustments. Sectioning at the level of a bulge is way more difficult and it is not possible to get the number of sections needed to quantify the bulge phenotype.

      The fact is that all wrapping glial cells develop swellings (bulges) at the position of the nucleus. As there are in general three wrapping glial cells per segmental nerve, the number of bulges is three.

      (4) It seems difficult to clearly untangle the functions of Htl/Uif/Notch in differentiation itself vs subsequent steps in growth/wrapping. For example, if the differentiation steps are not properly coordinated, couldn't this give rise to some observed differences in growth or wrapping at later stages? I'm not sure of any obvious experiments to pursue here, but at least a brief discussion of these issues in the manuscript would be of use.

      We have discussed this in our discussion now more carefully. To discriminate the function of the three genes in either differentiation or in a stepwise mode of growth and differentiation.

      When comparing the different loss of function phenotypes they al appear the same, which would argue all three genes act in a common process.

      However, when we look at gain of function phenotypes, Htl and Uif behave different compared to Notch. This would favor for two distinct processes.

      We have now added activity markers for RTK signaling to directly show that Notch silences RTK activity. Unfortunately we were not able to do a similar reciprocal experiment.

      Minor:

      (1) The Introduction is too long, and would benefit from revisions to make it shorter and more concise.

      We have shortened the introduction and hopefully made it more concise.

      (2) A schematic illustrating the model the authors propose about Htl, Uif, and Notch in glial differentiation, growth, and wrapping would benefit the clarity of this work. 

      We had previously added the graphical abstract below that we updated and included as a Figure in the main text.

      References

      Kautzmann, S., Rey, S., Krebs, A., and Klämbt, C. (2025). Cholinergic and glutamatergic axons differentially require glial support in the Drosophila PNS. Glia. 10.1002/glia.70011.

      Kottmeier, R., Bittern, J., Schoofs, A., Scheiwe, F., Matzat, T., Pankratz, M., and Klämbt, C. (2020). Wrapping glia regulates neuronal signaling speed and precision in the peripheral nervous system of Drosophila. Nature communications 11, 4491-4417. 10.1038/s41467-020-18291-1.

      Loubéry, S., Seum, C., Moraleda, A., Daeden, A., Fürthauer, M., and González-Gaitán, M. (2014). Uninflatable and Notch control the targeting of Sara endosomes during asymmetric division. Current biology : CB 24, 2142-2148. 10.1016/j.cub.2014.07.054.

      Macagno, J.P., Diaz Vera, J., Yu, Y., MacPherson, I., Sandilands, E., Palmer, R., Norman, J.C., Frame, M., and Vidal, M. (2014). FAK acts as a suppressor of RTK-MAP kinase signalling in Drosophila melanogaster epithelia and human cancer cells. PLoS Genet 10, e1004262. 10.1371/journal.pgen.1004262.

      Rebay, I., Chen, F., Hsiao, F., Kolodziej, P.A., Kuang, B.H., Laverty, T., Suh, C., Voas, M., Williams, A., and Rubin, G.M. (2000). A genetic screen for novel components of the Ras/Mitogen-activated protein kinase signaling pathway that interact with the yan gene of Drosophila identifies split ends, a new RNA recognition motif-containing protein. Genetics 154, 695-712. 10.1093/genetics/154.2.695.

      Sieglitz, F., Matzat, T., Yuva-Adyemir, Y., Neuert, H., Altenhein, B., and Klämbt, C. (2013). Antagonistic Feedback Loops Involving Rau and Sprouty in the Drosophila Eye Control Neuronal and Glial Differentiation. Science signaling 6, ra96. 10.1126/scisignal.2004651.

      Therrien, M., Morrison, D.K., Wong, A.M., and Rubin, G.M. (2000). A genetic screen for modifiers of a kinase suppressor of Ras-dependent rough eye phenotype in Drosophila. Genetics 156, 1231-1242.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary 

      In this manuscript, Weir et al. investigate why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. Most mammals, including humans, have rod-dominant retinas, making the 13LGS retina both an intriguing evolutionary divergence and a valuable model for uncovering novel mechanisms of cone generation. The developmental programs underlying this adaptation were previously unknown. 

      Using an integrated approach that combines single-cell RNA sequencing (scRNAseq), scATACseq, and histology, the authors generate a comprehensive atlas of retinal neurogenesis in 13LGS. Notably, comparative analyses with mouse datasets reveal that in 13LGS, cones can arise from late-stage neurogenic progenitors, a striking contrast to mouse and primate retinas, where late progenitors typically generate rods and other late-born cell types but not cones. They further identify a shift in the timing (heterochrony) of expression of several transcription factors.

      Further, the authors show that these factors act through species-specific regulatory elements. And overall, functional experiments support a role for several of these candidates in cone production. 

      Strengths 

      This study stands out for its rigorous and multi-layered methodology. The combination of transcriptomic, epigenomic, and histological data yields a detailed and coherent view of cone development in 13LGS. Cross-species comparisons are thoughtfully executed, lending strong evolutionary context to the findings. The conclusions are, in general, well supported by the evidence, and the datasets generated represent a substantial resource for the field. The work will be of high value to both evolutionary neurobiology and regenerative medicine, particularly in the design of strategies to replace lost cone photoreceptors in human disease. 

      Weaknesses 

      (1) Overall, the conclusions are strongly supported by the data, but the paper would benefit from additional clarifications. In particular, some of the conclusions could be toned down slightly to reflect that the observed changes in candidate gene function, such as those for Zic3 by itself, are modest and may represent part of a more complex regulatory network.  

      We have revised the text to qualify these conclusions as suggested.

      “Zic3 promotes cone-specific gene expression and is necessary for generating the full complement of cone photoreceptors”

      “Pou2f1 overexpression upregulated an overlapping but distinct, and larger, set of cone-specific genes relative to Zic3, while also downregulating many of the same rod-specific genes, often to a greater extent (Fig. 3C).”

      “This resulted in a statistically significant ~20% reduction in the density of cone photoreceptors in the mutant retina (Fig. 3E,F), while the relative numbers of rods and horizontal cells remained unaffected (Fig. S4A-D).”

      “Our analysis suggests that gene regulatory networks controlling cone specification are highly redundant, with transcription factors acting in complex, redundant, and potentially synergistic combinations. This is further supported by our findings on the synergistic effects of combined overexpression of Zic3 and Pou2f1 increasing both the number of differentially expressed genes and their level of change in expression relative to the modest changes seen with overexpression of either gene alone (Fig. 3) and the relatively mild or undetectable phenotypes observed following loss of function of Zic3 and Mef2c (Fig. 3, Fig. S6), as well as other cone-promoting factors such as Onecut1 and Pou2f1[18,19].“

      (2) Additional explanations about the cell composition of the 13LGS retina are needed. The ratios between cone and rod are clearly detailed, but do those lead to changes in other cell types? 

      The 13LGS retina, like most cone-dominant retinas, shows relatively lower numbers of rod and cone photoreceptors (~20%) than do nocturnal species such as mice (~80%). The difference is made up by increased numbers of inner retinal neurons and Muller glia. While rigorous histological quantification of the abundance of inner retinal cell types has not yet been performed for 13LGS, we can estimate these values using our snATAC-Seq data.  These numbers are provided in Table ST1, and are now discussed in the text.  

      (3) Could the lack of a clear trajectory for rod differentiation be just an effect of low cell numbers for this population? 

      This is indeed likely to be the case. This is now stated explicitly in the text.

      “However, no clear trajectory for rod differentiation was detected, likely due to the very low number of rod cells detected prior to P17 (Fig. 2A).”

      (4) The immunohistochemistry and RNA hybridization experiments shown in Figure S2 would benefit from supporting controls to strengthen their interpretability. While it has to be recognized that performing immunostainings on non-conventional species is not a simple task, negative controls are necessary to establish the baseline background levels, especially in cases where there seems to be labeling around the cells. The text indicates that these experiments are both immunostainings and ISH, but the figure legend only says "immunohistochemistry". Clarifying these points would improve readers' confidence in the data. 

      The figure legend has been corrected, and negative controls for P24 have been added. The figure legend has been modified as follows:

      “Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls.  (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F), 50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”

      (5) Figure S3: The text claims that overexpression of Zic3 alone is sufficient to induce the conelike photoreceptor precursor cells as well as horizontal cell-like precursors, but this is not clear in Figure S3A nor in any other figure. Similarly, the effects of Pou2f1 overexpression are different in Figure S3A and Figure S3B. In Figure S3B, the effects described (increased presence of cone-like and horizontal-like precursors) are very clear, whereas it is not in Figure S3A. How are these experiments different? 

      These UMAP data represent two independent experiments. Total numbers and relative fractions of each cell type are now included in Table ST5.

      In these experiments, cone-like precursors were identified by both cell type clustering and differential gene expression. Cells from all conditions were found in the cone-like precursor cluster. However, cells electroporated with a plasmid expressing GFP alone only showed GFP as a differentially expressed gene, identifying them most likely as GFP+ rods. In contrast, Zic3 overexpression resulted in increased expression of cone-specific genes and decreased expression of rod-specific genes in both cone-like precursors and rods relative to controls electroporated with GFP alone. Cell type proportions across independent overexpression singlecell experiments could be influenced by a number of factors, including electroporation efficiency and ex vivo growth conditions. 

      (6) The analyses of Zic3 conditional mutants (Figure S4) reveal an increase in many cone, rod, and pan-photoreceptor genes with only a reduction in some cone genes. Thus, the overall conclusion that Zic3 is essential for cones while repressing rod genes doesn't seem to match this particular dataset. 

      We observe that loss of function of Zic3 in developing retinal progenitors leads to a reduction in the total number of cones (Fig. 4E,F). In Fig. S4, we investigate how gene expression is altered in both the remaining cones and in other retinal cell types. We only observed significant changes in mutant cones and Muller glia relative to controls. We observe a mixed phenotype in cones, with a subset of cone-specific genes downregulated (notably including Thrb), a subset of others upregulated (including Opn1sw). We also find that genes expressed both in rods and cones, as well as rod-specific genes, are downregulated in cKO cones. Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In mutant Muller glia, in contrast, we see a broad decrease in expression of Muller glia-specific genes, which likely reflects the indirect effects of Zic3 loss of function in retinal progenitors, and an upregulation of both broadly photoreceptor-specific genes and a subset of rod-specific genes, which may also result from altered adhesion between Muller glia and rods. 

      This is consistent with the conclusions in the text, although we have both modified the text and included heatmaps showing downregulation of rod-specific genes in mutant cones, to clarify this finding.

      “In addition, we observe a broad decrease in expression of genes expressed at high levels in both cones and rods (Rpgrip1, Drd4) and rod-specific genes (Rho, Cnga1, Pde6b) in mutant cones (Fig. S4F). Since rods are fragile cells that are located immediately adjacent to cones, some level of contamination of rod-specific genes is inevitable in single-cell analysis of dissociated cones (c.f. PMID: 31128945, 34788628), and this reduced level of rod contamination could result from altered adhesion between mutant rods and cones. In contrast, increased expression of rod-specific genes (Rho, Nrl, Pde6g, Gngt1) and pan-photoreceptor genes (Crx, Stx3, Rcvrn) was observed in Müller glia (Fig. S4G), which may likewise result from altered adhesion between Muller glia and rods. Finally, several Müller glia-specific genes were downregulated, including Clu, Aqp4, and Notch pathway components such as Hes1 and Id3, with the exception of Hopx, which was upregulated (Fig. S4G). This likely reflects the indirect effects of Zic3 loss of function in retinal progenitors. These findings indicate that Zic3 is essential for the proper expression of photoreceptor genes in cones while also playing a role in regulating expression of Müller glia-specific genes.”

      (7) Throughout the text, the authors used the term "evolved". To substantiate this claim, it would be important to include sequence analyses or to rephrase to a more neutral term that does not imply evolutionary inference. 

      We have modified the text as requested to replace “evolved” and “evolutionarily conserved” where possible, with examples of revised text listed below:  

      “These results demonstrate that modifications to gene regulatory networks underlie the development of cone-dominant retina,...”

      “Our results demonstrate that heterochronic expansion of the expression of transcription factors that promote cone development is a key event in the development of the cone-dominant 13LGS retina.”

      “Conserved patterns of motif accessibility, identified using ChromVAR and theTRANSFAC2018 database, (Fig. S1F, Table ST1)...”

      “However, most of these elements  mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”

      “We conclude that the development of the cone-dominant retina in 13LGS is driven by novel cisregulatory elements…”

      “Based on our bioinformatic analysis, the cone-dominant 13LGS retina follows this paradigm, in which species-specific enhancer elements…”

      “Dot plots showing the enrichment of binding sites for Otx2 and Neurod1, TFs which are broadly expressed in both neurogenic RPC and photoreceptor precursors, which are enriched in both conserved cis-regulatory elements in both species. (D) Bar plots showing the number of conversed and species-specific enhancers per TSS in four cone-promoting genes between 13LGS and mouse.”

      Reviewer #2 (Public review): 

      Summary: 

      This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the conedominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors. 

      Strengths: 

      The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field. 

      Weaknesses: 

      The functional analysis on Zic3 and Mef2C in mice does not convincingly establish that these factors are sufficient or necessary to promote cone photoreceptor specification. Several analyses lack clarity or consistency, and figure labeling and interpretation need improvement. 

      We have modified the text and figures to more clearly describe the observed roles of Zic3 and Mef2c in cone photoreceptor development as detailed in our responses to reviewer recommendations.

      Reviewer #3 (Public review): 

      Summary: 

      The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone-rich retina development. Through cross-species analysis, the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with a lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify the role of these genes in regulating competence to generate cone photoreceptors. 

      Strengths: 

      Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare it to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights into their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse. 

      Weaknesses: 

      (1) The authors chose to omit several cell classes from analyses and visualizations that would have added to their interpretations. In particular, I worry that the omission of 13LGS rods, early RPCs, and early NG from Figures 2C, D, and F is notable and would have added to the understanding of gene expression dynamics. In other words, (a) are these genes of interest unique to late RPCs or maintained from early RPCs, and (b) are rod networks suppressed compared to the mouse? 

      We were unable to include 13LGS rods in our analysis due to the extremely low number of cells detected prior to P17. Relative expression levels of cone-promoting transcription factors in 13LGS in early RPCs and early NG cells is shown in Fig. 2H. Particularly when compared to mice, we also observe elevated expression of cone-promoting genes in early-stage RPC and/or early NG cells. These include Zic3, Onecut2, Mef2c, and Pou2f1, as well as transcription factors that promote the differentiation of post-mitotic cone precursors, such as Thrb and Rxrg. Contrast this with genes that promote specification and differentiation of both rods and cones, such as Otx2 and Crx, which show similar or even slightly higher expression in mice. Genes such as Casz1, which act in late NG cells to promote rod specification, are indeed downregulated in 13LGS late NG cells relative to mice. We have modified the text to clarify these points, as shown below:

      “To further characterize species-specific patterns of gene expression and regulation during postnatal photoreceptor development, we analyzed differential gene expression, chromatin accessibility, and motif enrichment across late-stage primary and neurogenic progenitors, immature photoreceptor precursors, rods, and cones. Due to their very low number before time point P17, we were unable to include 13LGS rods in the analysis.”

      “In contrast, two broad patterns of differential expression of cone-promoting transcription factors were observed between mouse and 13LGS.”

      “First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”

      “Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”

      “In contrast, genes such as Casz1, which act in late neurogenic RPCs to promote rod specification, are downregulated in 13LGS late neurogenic RPCs relative to mice.”

      (2) The authors claim that the majority of cones are generated by late RPCs and that this is driven primarily by the enriched enhancer network around cone-promoting genes. With the temporal scRNA/ATACseq data at their disposal, the authors should compare early vs late born cones and RPCs to determine whether the same enhancers and genes are hyperactivated in early RPCs as well as in the 13LGS. This analysis will answer the important question of whether the enhancers activated/evolved to promote all cones, or are only and specifically activated within late RPCs to drive cone genesis at the expense of rods. 

      This is an excellent question.  We have addressed this question by analyzing both expression of the cone-promoting genes identified in C2 and C3 in Figure 2C and accessibility of their associated enhancer sequences, which are shown in Figure 6B, in early and late-stage RPCs and cone precursors.  The results are shown in Author response image 1 below. We observe that cone-promoting genes consistently show higher expression in both late-stage RPCs and cones.  We do not observe any clear differences in the accessibility of the associated enhancer regions, as determined by snATAC-Seq.  However, since we have not performed CUT&RUN analysis in embryonic retina for H3K27Ac or any other marker of active enhancer elements, we cannot determine whether the total number of active enhancers differs between early and late-stage RPCs. We suspect, however, this is likely to be the case, given the differences in the expression levels of these genes.

      Author response image 1.

      Relative expression levels of cone-promoting genes and accessibility of enhancer elements associated with these genes in early- and late-stage RPCs and cone precursors.

      (3) The authors repeatedly use the term 'evolved' to describe the increased number of local enhancer elements of genes that increase in expression in 13LGS late RPCs and cones. Evolution can act at multiple levels on the genome and its regulation. The authors should consider analysis of sequence level changes between mouse, 13LGS, and other species to test whether the enhancer sequences claimed to be novel in the 13LGS are, in fact, newly evolved sequence/binding sites or if the binding sites are present in mouse but only used in late RPCs of the 13LGS. 

      Novel enhancer sequences here are defined as having divergent sequences rather than simply divergent activity. This point has been clarified in the text, with the following changes made:

      “However, most of these elements mapped to sequences that were not shared between 13LGS and mouse, with intergenic enhancers exhibiting particularly low levels of conservation (Fig. 5B).”

      “...demonstrated far greater motif enrichment in active regulatory elements in 13LGS than in mice, though few of these elements mapped to sequences that were shared between 13LGS and mouse (Fig. 5C,D, Table ST10).”

      (4) The authors state that 'Enhancer elements in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors than in mice'. This statement can easily be misread to suggest that all enhancers display this, when in fact, this is only the conepromoting enhancers of late 13LGS RPCs. In a way, this is not surprising since these genes are largely less expressed in mouse vs 13LGS late RPCs, as shown in Figure 2. The manuscript is written to suggest this mechanism of enhancer number is specific to cone production in the 13LGS- it would help prove this point if the authors asked the opposite question and showed that mouse late RPCs do not have similar increased predicted binding of TFs near rodpromoting genes in C7-8. 

      The Reviewer’s point is well taken, and we agree that this mechanism is unlikely to be specific to cone photoreceptors, since we are simply looking at genes that show higher expression in late-stage neurogenic RPCs in 13LGS. We have changed the relevant text to now state:

      “Enhancer elements associated with cone-specific genes in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors in late-stage neurogenic RPCs than in mice, as might be expected, given the higher expression levels of these genes.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors): 

      (1) Minor: Clusters C1-C8 (Figure 2) are labeled as "C1-8" in the text but "G1-8" in the figure. 

      This has been done.

      (2) Minor: Showing other neurogenic factors (Olig2, Ascl1, Otx2) and late-stage specific factors (Lhx2, Sox8, Nfia/b) could be shown in Figure 2 to better support the text. 

      This has been done. These motifs are consistent in both species, but Figure 2F shows differential motifs. The reference to Figure 2F has been altered to include Table ST4, while Neurod1 motifs are shown in Fig. 2F.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figure 2 

      2A-B: The exclusion of early-stage data from the species-integrated analysis is puzzling, as it could reveal significant differences between early-stage neurogenic progenitors in mice and late-stage progenitors in 13LGS that both give rise to cones. This analysis would also shed light on how cone-promoting transcription factors are suppressed in mouse early-stage progenitors, limiting the window for cone genesis.

      2C: The figure labels G1-8, while C1-8 are referenced in the text. 

      2F: Neurog2, Olig2, Ascl1, and Neurod1 are mentioned in the text but not labeled in the figure. 

      2A-B: There are indeed substantial differences between early-stage RPC in 13LGS and latestage RPC in mice that are broadly linked to control of temporal patterning, which are mentioned in the text. For instance, early-stage RPCs in both animals express higher levels of Nr2f1/2, Meis1/2, and Foxp1/4, while late-stage RPCs express higher levels of Nfia/b/x, indicating that core distinction between early- and late-stage RPCs is maintained.  What most clearly differs in 13-LGS is the sustained expression of a subset of cone-promoting transcription factors in late-stage RPCs that are normally restricted to early-stage RPCs in mice. However, as mentioned in response to Reviewer #3’s first point, we do observe some evidence for increased expression of cone-promoting transcription factors in early-stage RPCs and NG cells of 13LGS relative to mice, although this is much less dramatic than observed at later stages.  We have modified the text to directly mention this point. G1-8 has been corrected to C1-8 in the figure, a reference to Table ST4 has been added in discussion of neurogenic bHLH factors, and Fig. 2F has been modified to label Neurod1. 

      “First, transcription factors identified in this network that are known to be required for committed cone precursor differentiation, including Thrb, Rxrg, and Sall3 [25,26,45], consistently showed stronger expression in late-stage RPCs and early-stage primary and/or neurogenic RPCs of 13LGS compared to mice.”

      “Second, transcription factors in the network known to promote cone specification in early-stage mouse RPCs, such as Onecut2 and Pou2f1, exhibited enriched expression in early and latestage primary and/or neurogenic RPCs of 13LGS, implying a heterochronic expansion of conepromoting factors into later developmental stages.”

      (2) Figure 3 

      In 3F, the cone density in the WT retina is approximately 0.25 cones per micron, while in the Zic3 cKO retina, it is about 0.2 cones per micron. However, the WT control in Figure S6C also shows about 0.2 cones per micron, raising questions about whether there is a genuine decrease in cone number or if it results from quantification variability. Additionally, the proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, which is inconsistent with the conclusion that Zic3 cKO leads to reduced cone production. Therefore, I found that the conclusion that Zic3 is necessary for cone development is not supported by the data.

      The cone density counts in the two mutant lines and accompanying littermate controls were collected by blinded counting by two different observers (R.A. for the Zic3 cKO and N.P. for the Mef2c cKO). We believe that the ~20% difference in the observed cone density in the two control samples likely represents investigator-dependent differences. These can exceed 20% between even highly skilled observers when quantifying dissociated cells (PMID: 35198419) and are likely to be even higher for immunohistochemistry samples.  Since both controls were done in parallel with littermate mutant samples, we therefore stand by our interpretation of these results.

      (3) Figures 4 and 5

      These figures are duplicates. In Figure 4, Mef2C overexpression in postnatal progenitors leads to increased numbers of neurogenic RPCs, suggesting it may promote cell proliferation rather than inhibit rod cell fate or promote cone cell fate. Electroporation of plasmids into P0 retina typically does not label cone cells, as cones are born prenatally in mice. Given the widespread GFP signal in Figure 4D, the authors should consider that the high background of GFP signal may have misled the quantification of the result.

      The figure duplication has been corrected. We respectfully disagree with the Reviewer’s statement that ex vivo electroporation performed at P0, as is the case here, does not label cones. We routinely observe small numbers of electroporated cones when performing this analysis. Cones at this age are located on the scleral face of the retina at this age and therefore in direct contact with the buffer solution containing the plasmid in question (c.f. PMID: 20729845, 31128945, 34788628, 40654906). Furthermore, since the level of GFP expression that is used to gate electroporated cells for isolation using FACS is typically considerably less than that used to identify a GFP-positive cell using standard immunohistochemical techniques, making it difficult to directly compare the efficiency of cone electroporation between these approaches. We agree, however, that Mef2c overexpression seems to broadly delay the differentiation of rod photoreceptors, and have modified the text to include discussion of this point.

      “Although a few GFP-positive electroporated cells co-expressing the cone-specific marker Gnat2 were detected in control (likely due to the electroporation of cone precursors, which we have previously observed in P0 retinal explants (Clark et al., 2019; Leavey et al., 2025; Lyu et al., 2021; Onishi et al., 2010)), there was a significant increase in double-positive cells in the test condition, matching the novel cone-like precursor population found in the scRNA-Seq (Fig. 4E).”

      “Indeed, overexpression of Mef2c increased the number of both neurogenic RPCs and immature photoreceptor precursors, suggesting that rod differentiation was broadly delayed.”

      (4) Figure S2 

      The figure legend lacks information about panels A and B. It is unclear which panels represent immunohistochemistry and which represent RNA hybridization chain reaction. Overall, the staining results are difficult to interpret, as it appears that all examined RNAs/proteins are positively stained across the sections with varying background levels. Specificity is hard to assess. For instance, in Figure S2B, the background intensity of Zic3 staining varies inconsistently from P1 to P24. The number of Zic3 mRNA dots seems to peak at P5 and decrease at P10, which contradicts the scRNA-seq results showing peak expression in mature cones.

      The figure legend has been corrected. Negative controls are now included for both in situ hybridization (Fig. S2C’) and immunostaining (Fig. S2G) at P24, along with paired experimental data.  We have quantified the total fraction of Otx2+ cells that also contain Zic3 foci, and find that coexpression peaks at P5 and P10.  This is now included as Fig. S2E.

      The number of Zic3 foci is in fact higher at P5 than P10, with XX foci/Otx2+ cell at P5 vs. YY foci/Otx2+ cell at P10.

      “Fluorescent in situ hybridization showing co-expression of (A) Pou2f1 and Otx2 or (B) Zic3, Rxrg, and Otx2 in P1, P5, P10, and P24 retinas. Insets show higher power images of highlighted areas. (C) Zic3, Rxrg, and Otx2 fluorescent in situ hybridization from P24 with matched (C’) negative controls. (D) Pou2f1 and Otx2 fluorescent in situ hybridization from P24 with matched (D’) negative controls. (E) Quantification of the fraction of Otx2-positive cells in the outer neuroblastic layer (P1, P5) and ONL (P10, P24) that also express Zic3. (F) Immunohistochemical analysis Mef2c and Otx2 expression in P1, P5, P10, and P24 retinas. (G) Mef2c and Otx2 immunohistochemistry from P24 with matched (G’) negative controls. Negative controls for fluorescent in situ hybridization omit the probe and for immunohistochemistry omit primary antibodies. Scale bars, 10 µm (S2A-F),  50 µm (S2G) and 5 µm (inset). Cell counts in E were analyzed using one-way ANOVA analysis with Sidak multiple comparisons test and 95% confidence interval. ** = p <0.01, **** = p <0.0001, and ns = non-significant. N=3 independent experiments.”

      (5) Figure S3

      In S3A and S3B, the UMAPs of the empty vector-treated groups are distinctly different. The same goes for Zic3+Pou2F1 UMAPS.

      In S3A, Zic3 overexpression alone does not appear to have any impact on cell fate. It is not evident that Zic3, even in combination with Pou2F1, has any significant impact on cone or other cell type production, as the proportions of the cones and cone precursors seem similar across different groups.

      In S3B, Zic3+Pou2F1 seems to increase HC-like precursors without increasing cone-like procursors or cones.

      Moreover, the cone-like precursors described do not seem to contribute to cone generation, as there is no increase in cones in the adult mouse retina; rather, these cells resemble rod-cone mosaic cells with expression of both rod- and cone-specific genes.

      As the Reviewer states, we observe some differences in the proportion of cell types in both control and experimental conditions between the two experiments. Notably, relatively more photoreceptors and correspondingly fewer progenitors, bipolar, and amacrine cells are observed in the samples shown in Fig. S3A relative to Fig. S3B.  However, these represent two independent experiments. Cell type proportions seen across independent ex vivo electroporation experiments such as these can be affected by a number of variables, including precise developmental age of the samples, electroporation efficiency, cell dissociation conditions, and ex vivo growth conditions.  Some differences are inevitable, which is why paired negative controls must always be done for results to be interpretable.

      In both experiments, we observe that overexpression of Zic3, Pou2f1, and most notably Zic3 and Pou2f1 lead to an increase in the relative fraction of cone-like precursors. In the experiment shown in Fig. S3B, we also observe that Zic3 alone, Onecut1 alone, and Zic3 and Pou2f1 in combination also promote generation of horizontal-like cells. All treatments likewise induce expression of different subsets of cone-enriched genes in the cone-like precursors, while also suppressing rod-specific genes in these same cells.

      Total numbers and relative fractions of each cell type are now included in Table ST5.

      (6) Figure S4

      The proportion of cone cells in the Zic3 cKO scRNA-seq data shown in Figure S4E appears comparable to the WT control, contradicting the conclusion that Zic3 cKO leads to reduced cone production. 

      Total numbers and relative fractions of each cell type are now included in Table ST6.

      (7) Figure S5

      In Figure S5A, Mef2C overexpression does not decrease expression of the rod gene Nrl. 

      This is correct, and is mentioned in the text.

      “No obvious reduction in the relative number of Nrl-positive cells was observed (Fig. S5A).”

      Reviewer #3 (Recommendations for the authors): 

      (1) The authors make several broad and definitive statements that have the potential to confuse readers. In the first sections of Results: 'retinal ganglion cells and amacrine cells were generated predominantly by early stage progenitors' but later say 'late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early born cell types.' In the discussion, the authors themselves point out limitations of analyses without birthdating. These definitive statements should be qualified/amended. 

      Both single-cell RNA and ATAC-Seq analysis can be used to accurately profile cells that have recently exited mitosis and committed to a specific cell fate. When applied to data obtained from a developmental timecourse such as is the case here, this can in turn serve as a reasonable proxy for generating birthdating data. Nonetheless, we have modified the text to state that BrdU/EdU labeling is indeed the gold standard for drawing conclusions about cell birthdates, and should be used to confirm these findings in future studies.

      “The expected temporal patterns of neurogenesis were observed in both species: retinal ganglion cells and amacrine cells were generated predominantly in the early stage, whereas bipolar cells and Müller glia were produced in the late stage.”

      “Though BrdU/EdU labeling would be required to unambiguously demonstrate species-specific differences in birthdating, our findings strongly indicate that 13LGS exhibit a selective expansion of the temporal window of cone generation, extending into late stages of neurogenesis.”

      This sentence does not make a definitive statement about 13LGS RPC competence, and we have left it unaltered. 

      “These findings suggest that late-stage RPCs in 13LGS retina are competent to generate cone photoreceptors but not other early-born cell types…”

      (2) Figure 2C clusters are referred to as C1-8 in the text but G1-8 in the figure. This is confusing and should be fixed. 

      This has been corrected.

      (3) The authors refer to many genes that show differential expression in Figure 2F, but virtually none of these are labelled in the heatmap, making it hard to follow the narrative. 

      Figure 2F represents transcription factor binding motifs that are differentially active between mouse and 13LGS, not gene expression. We have modified the figure to include names of all differentially active motifs discussed in the text, and otherwise refer the reader to Table ST4, which includes a list of all differentially expressed genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly.

      The reviewer’s comments in this letter are in Bold and Italics.

      Summary:

      This study identified three independent components of glucose dynamics-"value," "variability," and "autocorrelation", and reported important findings indicating that they play an important role in predicting coronary plaque vulnerability. Although the generalizability of the results needs further investigation due to the limited sample size and validation cohort limitations, this study makes several notable contributions: validation of autocorrelation as a new clinical indicator, theoretical support through mathematical modeling, and development of a web application for practical implementation. These contributions are likely to attract broad interest from researchers in both diabetology and cardiology and may suggest the potential for a new approach to glucose monitoring that goes beyond conventional glycemic control indicators in clinical practice.

      Strengths:

      The most notable strength of this study is the identification of three independent elements in glycemic dynamics: value, variability, and autocorrelation. In particular, the metric of autocorrelation, which has not been captured by conventional glycemic control indices, may bring a new perspective for understanding glycemic dynamics. In terms of methodological aspects, the study uses an analytical approach combining various statistical methods such as factor analysis, LASSO, and PLS regression, and enhances the reliability of results through theoretical validation using mathematical models and validation in other cohorts. In addition, the practical aspect of the research results, such as the development of a Web application, is also an important contribution to clinical implementation.

      We appreciate reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      The most significant weakness of this study is the relatively small sample size of 53 study subjects. This sample size limitation leads to a lack of statistical power, especially in subgroup analyses, and to limitations in the assessment of rare events. 

      We appreciate the reviewer’s concern regarding the sample size. We acknowledge that a larger sample size would increase statistical power, especially for subgroup analyses and the assessment of rare events.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size determination followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients. 

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective. 

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      We appreciate the reviewer’s feedback and believe that these clarifications improve the manuscript.

      In terms of validation, several challenges exist, including geographical and ethnic biases in the validation cohorts, lack of long-term follow-up data, and insufficient validation across different clinical settings. In terms of data representativeness, limiting factors include the inclusion of only subjects with well-controlled serum cholesterol and blood pressure and the use of only short-term measurement data.

      We appreciate the reviewer’s comment regarding the challenges associated with validation. In terms of geographic and ethnic diversity, our study includes validation datasets from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These datasets include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. In addition, we recognize the limited availability of publicly available datasets with sufficient sample sizes for factor decomposition that include both healthy individuals and those with type 2 diabetes (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). The main publicly available datasets with relevant clinical characteristics have already been analyzed in this study using unbiased approaches.

      However, we fully agree with the reviewer that expanding the geographic and ethnic scope, including long-term follow-up data, and validation in different clinical settings would further strengthen the robustness and generalizability of our findings. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      Regarding the validation considerations, we have added the following sentences to the Discussion section (lines 409-414, 354-361): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      In terms of elucidation of physical mechanisms, the study is not sufficient to elucidate the mechanisms linking autocorrelation and clinical outcomes or to verify them at the cellular or molecular level.

      We appreciate the reviewer’s point regarding the need for further elucidation of the physical mechanisms linking glucose autocorrelation to clinical outcomes. We fully agree with the reviewer that the detailed molecular and cellular mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes. While further research at the cellular and molecular level is needed to fully validate these findings, it is important to note that the primary goal of this study was to analyze the characteristics of glucose dynamics and gain new insights into metabolism, rather than to perform molecular biology experiments.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study.

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352): 

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      Reviewer #2 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.

      Sugimoto et al. explore the relationship between glucose dynamics - specifically value, variability, and autocorrelation - and coronary plaque vulnerability in patients with varying glucose tolerance levels. The study identifies three independent predictive factors for %NC and emphasizes the use of continuous glucose monitoring (CGM)-derived indices for coronary artery disease (CAD) risk assessment. By employing robust statistical methods and validating findings across datasets from Japan, America, and China, the authors highlight the limitations of conventional markers while proposing CGM as a novel approach for risk prediction. The study has the potential to reshape CAD risk assessment by emphasizing CGM-derived indices, aligning well with personalized medicine trends.

      Strengths:

      (1) The introduction of autocorrelation as a predictive factor for plaque vulnerability adds a novel dimension to glucose dynamic analysis.

      (2) Inclusion of datasets from diverse regions enhances generalizability.

      (3) The use of a well-characterized cohort with controlled cholesterol and blood pressure levels strengthens the findings.

      (4) The focus on CGM-derived indices aligns with personalized medicine trends, showcasing the potential for CAD risk stratification.

      We appreciate reviewer #2 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      (1) The link between autocorrelation and plaque vulnerability remains speculative without a proposed biological explanation. 

      We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. We fully agree with the reviewer that the detailed biological mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study. 

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we have added the following sentences in the Discussion section (lines 331-339, 341-352): 

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      (2) The relatively small sample size (n=270) limits statistical power, especially when stratified by glucose tolerance levels. 

      We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients. 

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences in the Discussion section (lines 409-414): 

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      (3) Strict participant selection criteria may reduce applicability to broader populations. 

      We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.

      Our study includes validation cohorts from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These cohorts include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. However, we acknowledge that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      (4) CGM-derived indices like AC_Var and ADRR may be too complex for routine clinical use without simplified models or guidelines. 

      We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We acknowledge that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers. 

      To address this concern, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability (https://cgmregressionapp2.streamlit.app/). This tool eliminates the need for manual calculations, making these indices more practical for clinical implementation.

      Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (5) The study does not compare CGM-derived indices to existing advanced CAD risk models, limiting the ability to assess their true predictive superiority. 

      We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.

      Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy. 

      We have added the following text to the Result section (lines 115-122):

      The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R² of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).

      (6) Varying CGM sampling intervals (5-minute vs. 15-minute) were not thoroughly analyzed for impact on results. 

      We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Furthermore, the regression model using CGM_Mean, CGM_Std, and AC_Var from 15-minute intervals to predict %NC achieved an R² of 0.36 and an AIC of 321, identical to the model using 5-minute intervals. These results indicate that our results are robust to variations in CGM sampling frequency. 

      We have added this analysis to the Result section (lines 122-125):

      The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R² = 0.36; AIC = 321; Fig. S1B).

      Reviewer #3 (Public review):

      We appreciate the reviewer for the critical review of the manuscript and the valuable comments. We have carefully considered the reviewer’s comments and have revised our manuscript accordingly. The reviewer’s comments in this letter are in Bold and Italics.

      Summary:

      This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as a parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM-derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM-related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.

      We appreciate reviewer #3 for the valuable and constructive comments on our manuscript.

      The goal of this study was to identify CGM features that relate to %NC. Through multiple feature selection methods, they arrive at 3 components: value, variability, and autocorrelation. While the feature list is highly correlated, the authors take steps to ensure feature selection is robust. There is a lack of clarity of what each component (value, variability, and autocorrelation) includes as while similar CGM indices fall within each component, there appear to be some indices that appear as relevant to value in one dataset and to variability in the validation. 

      We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.

      For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.

      Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.

      To address these considerations, we have added the following text to the Discussion section (lines 388-396):

      Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.

      We are sceptical about statements of significance without documentation of p-values. 

      We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.

      First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.

      Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. "The correct interpretation of confidence intervals." Proceedings of Singapore Healthcare 19.3 (2010): 276-278.). 

      For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83,

      0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively.

      We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.

      While hesitations remain, the ability of these authors to find groupings of these many CGM metrics in relation to %NC is of interest. The believability of the associations is impeded by an obtuse presentation of the results with core data (i.e. correlation plots between CGM metrics and %NC) buried in the supplement while main figures contain plots of numerical estimates from models which would be more usefully presented in supplementary tables. 

      We appreciate the reviewer’s comment regarding the presentation of our results and recognize the importance of ensuring clarity and accessibility of the core data. 

      The central finding of our study is twofold: first, that the numerous CGM-derived measures can be systematically classified into three distinct components-mean, variance, and autocorrelation-and second, that each of these components is independently associated with %NC. This insight cannot be derived simply from examining scatter plots of individual correlations, which are provided in the Supplementary Figures. Instead, it emerges from our statistical analyses in the main figures, including multiple regression models that reveal the independent contributions of these components to %NC.

      We acknowledge the reviewer’s concern regarding the accessibility of key data. To improve clarity, we have moved several scatter plots from the Supplementary Figures to the main figures (Fig. 1D-J) to allow readers to more directly visualize the relationships between CGM-derived measures and %NC. We believe this revision improved the transparency and readability of our results while maintaining the rigor of our analytical approach.

      Given the small sample size in the primary analysis, there is a lot of modeling done with parameters estimated where simpler measures would serve and be more convincing as they require less data manipulation. A major example of this is that the pairwise correlation/covariance between CGM_mean, CGM_std, and AC_var is not shown and would be much more compelling in the claim that these are independent factors.

      We appreciate the reviewer’s feedback on our statistical analysis and data presentation. The correlations between CGM_Mean, CGM_Std, and AC_Var were documented in Figure S1B. However, to improve accessibility and clarity, we have moved these correlation analyses to the main figures (Fig. 1F). 

      Regarding our modeling approach, we chose LASSO and PLS methods because they are wellestablished techniques that are particularly suited to scenarios with many input variables and a relatively small sample size. These methods have been used in the literature as robust approaches for variable selection under such conditions (Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288. Wold S, Sjöström M, Eriksson L. 2001. PLS-regression: a basic tool of chemometrics. Chemometrics Intellig Lab Syst 58:109–130. Pei X, Qi D, Liu J, Si H, Huang S, Zou S, Lu D, Li Z. 2023. Screening marker genes of type 2 diabetes mellitus in mouse lacrimal gland by LASSO regression. Sci Rep 13:6862. Wang C, Kong H, Guan Y, Yang J, Gu J, Yang S, Xu G. 2005. Plasma phospholipid metabolic profiling and biomarkers of type 2 diabetes mellitus based on high-performance liquid chromatography/electrospray mass spectrometry and multivariate statistical analysis.

      Anal Chem 77:4108–4116.). 

      Lack of methodological detail is another challenge. For example, the time period of CGM metrics or CGM placement in the primary study in relation to the IVUS-derived measurements of coronary plaques is unclear. Are they temporally distant or proximal/ concurrent with the PCI? 

      We appreciate the reviewer’s important question regarding the temporal relationship between CGM measurements and IVUS-derived plaque assessments. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610615.), all individuals underwent continuous glucose monitoring for at least three consecutive days within the seven-day period prior to the PCI procedure. To improve clarity for readers, we have added the following text to the Methods section (lines 440-441):

      All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.

      A patient undergoing PCI for coronary intervention would be expected to have physiological and iatrogenic glycemic disturbances that do not reflect their baseline state. This is not considered or discussed. 

      We appreciate the reviewer’s concern regarding potential glycemic disturbances associated with PCI. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610-615.), all CGM measurements were performed before the PCI procedure. This temporal separation ensures that the glycemic patterns analyzed in our study reflect the baseline metabolic state of the patients, rather than any physiological or iatrogenic effects of PCI. To avoid any misunderstanding, we have clarified this temporal relationship in the revised manuscript (lines 440-441):

      All individuals underwent CGM for at least three consecutive days within the seven-day period prior to the PCI procedure.

      The attempts at validation in external cohorts, Japanese, American, and Chinese are very poorly detailed. We could only find even an attempt to examine cardiovascular parameters in the Chinese data set but the outcome variables are unspecified with regard to what macrovascular events are included, their temporal relation to the CGM metrics, etc. Notably macrovascular event diagnoses are very different from the coronary plaque necrosis quantification. This could be a source of strength in the findings if carefully investigated and detailed but due to the lack of detail seems like an apples-to-oranges comparison. 

      We appreciate the reviewer’s comment regarding the validation cohorts and the need for greater clarity, particularly in the Chinese dataset. We acknowledge that our initial description lacked sufficient methodological detail, and we have expanded the Methods section to provide a more comprehensive explanation.

      For the Chinese dataset, the data collection protocol was previously documented (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. Physical examinations included anthropometric measurements, and body mass index was calculated using standard protocols. CGM was performed using the FreeStyle Libre H device (Abbott Diabetes Care, UK), which records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events (Xie, Yong, et al. “Clinical outcome of nonculprit plaque ruptures in patients with acute coronary syndrome in the PROSPECT study.” JACC: Cardiovascular Imaging 7.4 (2014): 397-405.), we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than standardized diagnostic procedures or imaging studies. To address these concerns, we have added the following text to the Methods section (lines 496-504):

      The data collection protocol for the Chinese dataset was previously documented (Zhao et al., 2023). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. CGM records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events, we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than from standardized diagnostic procedures or imaging studies.

      Finally, the simulations at the end are not relevant to the main claims of the paper and we would recommend removing them for the coherence of this manuscript. 

      We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variability, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. Because temporal autocorrelation can be conceptually difficult to interpret, these visualizations were intended to provide intuitive examples for the readers. 

      However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502526), while retaining only those that enhance understanding of the three glycemic components. Regarding reviewer 2’s minor comment #4, we acknowledge that autocorrelation can be challenging to understand intuitively. To address this, we kept Fig. 4A with a brief description.

      Recommendations for the authors:

      Reviewer 2# (Recommendations for the authors):

      Summary:

      The study by Sugimoto et. al. investigates the association between components of glucose dynamics-value, variability, and autocorrelation-and coronary plaque vulnerability (%NC) in patients with varying glucose tolerance levels. The research identifies three key factors that independently predict %NC and highlights the potential of continuous glucose monitoring (CGM)-derived indices in risk assessment for coronary artery disease (CAD). Using robust statistical methods and validation across diverse populations, the study emphasizes the limitations of conventional diagnostic markers and suggests a novel, CGMbased approach for improved predictive performance While the study demonstrates significant novelty and potential impact, several issues must be addressed by the authors.

      Major Comments:

      (1) The study demonstrates originality by introducing autocorrelation as a novel predictive factor in glucose dynamics, a perspective rarely explored in prior research. While the innovation is commendable, the biological mechanisms linking autocorrelation to plaque vulnerability remain speculative. Providing a hypothesis or potential pathways would enhance the scientific impact and practical relevance of this finding.

      We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. Our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study. We have added the following sentences to the Discussion section (lines 341-352):

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2025), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      (2) The inclusion of datasets from Japan, America, and China adds a valuable cross-cultural dimension to the study, showcasing its potential applicability across diverse populations. Despite the multi-regional validation, the sample size (n=270) is relatively small, especially when stratified by glucose tolerance categories. This limits the statistical power and applicability to diverse populations. A larger, multi-center cohort would strengthen conclusions.

      We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance levels. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our study adheres to established methodological frameworks for sample size determination, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4 indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section.

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32). Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components. Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective. 

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of followup (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we have added the following sentences to the Discussion section (lines 409-414):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      (3) The study focuses on a well-characterized cohort with controlled cholesterol and blood pressure levels, reducing confounding variables. However, this stringent selection might exclude individuals with significant variability in these parameters, potentially limiting the study's applicability to broader, real-world populations. The authors should discuss how this may affect generalizability and potential bias in the results.

      We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.

      Our validation strategy included multiple cohorts from different regions, specifically 64 Japanese, 53 American and 100 Chinese individuals. These cohorts represent a clinically diverse population, including both healthy individuals and those with diabetes, allowing for validation across a broad spectrum of metabolic conditions. However, we recognize that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      We have added the following text to the Discussion section to address these considerations (lines 409-414, 354-361):

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      (4) The study effectively highlights the potential of CGM-derived indices as a tool for CAD risk assessment, a concept that aligns with contemporary advancements in personalized medicine. Despite its potential, the complexity of CGM-derived indices like AC_Var and ADRR may hinder their routine clinical adoption. Providing simplified models or actionable guidelines would facilitate their integration into everyday practice.

      We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We recognize that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers.

      To address this, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability. By eliminating the need for manual calculations, this tool streamlines the process and makes these indices more practical for clinical use.

      Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like phacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (5) The exclusion of TIR from the main analysis is noted, but its relevance in diabetes management warrants further exploration. Integrating TIR as an outcome measure could provide additional clinical insights.

      We appreciate the reviewer’s comment regarding the potential role of time in range (TIR) as an outcome measure in our study. Because TIR is primarily influenced by the mean and variance of glucose levels, it does not fully capture the distinct role of glucose autocorrelation, which was the focus of our investigation.

      To clarify this point, we have expanded the Discussion section as follows (lines 380-388):

      Although time in range (TIR) was not included in the main analyses due to the relatively small number of T2DM patients and the predominance of participants with TIR >70%, our results demonstrate that CGM-derived indices outperformed conventional markers such as FBG, HbA1c, and PG120 in predicting %NC. Furthermore, multiple regression analysis between factor scores and TIR revealed that only factor 1 (mean) and factor 2 (variance) were significantly associated with TIR (Fig. S8C, D). This finding confirms the presence of three distinct components in glucose dynamics and highlights the added value of examining AC_Var as an independent glycemic feature beyond conventional CGM-derived measures.

      (6) While the study reflects a commitment to understanding CAD risks in a global context by including datasets from Japan, America, and China, the authors should provide demographic details (e.g., age, gender, socioeconomic status) and discuss how these factors might influence glucose dynamics and coronary plaque vulnerability.

      We appreciate the reviewer’s comment regarding the potential influence of demographic factors on glucose dynamics and coronary plaque vulnerability. We examined these relationships and found that age and sex had minimal effects on glucose dynamics characteristics, as shown in Figure S8A and S8B. These findings suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust across demographic groups within our data set.

      To address the reviewer’s suggestion, we have added the following discussion (lines 361-368):

      In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.

      (7) While the article shows CGM-derived indices outperform traditional markers (e.g., HbA1c, FBG, PG120), it does not compare these indices against existing advanced risk models (e.g., Framingham Risk Score for CAD). A direct comparison would strengthen the claim of superiority.

      We appreciate the reviewer’s comment regarding the comparison of CGMderived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.

      Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy. We have updated the Result section as follows (lines 115-122):

      The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R<sup>2</sup> of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC (Fig. 1A). In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R² of only 0.05 and an AIC of 340 (Fig. 1B). Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R² of 0.04 and an AIC of 330 (Fig. 1C).

      (8) The study mentions varying CGM sampling intervals across datasets (5-minute vs. 15minute). Authors should employ sensitivity analysis to assess the impact of these differences on the results. This would help clarify whether higher-resolution data significantly improves predictive performance.

      We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Consequently, the main findings remained consistent across both sampling frequencies, indicating that our results are robust to variations in temporal resolution. We have added this analysis to the Result section (lines 122-126):

      The AC_Var computed from 15-minute CGM sampling was nearly identical to that computed from 5-minute sampling (R = 0.99, 95% CI: 0.97-1.00) (Fig. S1A), and the regression using the 15‑min features yielded almost the same performance (R<sup>2</sup>  = 0.36; AIC = 321; Fig. S1B).

      (9) The identification of actionable components in glucose dynamics lays the groundwork for clinical stratification. The authors could explore the use of CGM-derived indices to develop a simple framework for stratifying risk into certain categories (e.g., low, moderate, high). This could improve clinical relevance and utility for healthcare providers.

      We appreciate the reviewer’s suggestion regarding the potential for CGMderived indices to support clinical stratification. We completely agree with the idea that establishing risk categories (e.g., low, moderate, high) based on specific thresholds would enhance the clinical utility of these measures. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like p-hacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical thresholds. Establishing clinical thresholds typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper. However, we expect to make these measures more actionable in clinical use by integrating automated calculation tools with clear clinical thresholds.

      We have added the following text to the Discussion section to address these considerations (lines 415-419):

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, defining clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (10) While the study acknowledges several limitations, authors should also consider explicitly addressing the potential impact of inter-individual variability in glucose metabolism (e.g., age-related changes, hormonal influences) on the findings.

      We appreciate the reviewer’s comment regarding the potential impact of interindividual variability in glucose metabolism, including age-related changes and hormonal influences, on our results. In our analysis, we found that age had minimal effects on glucose dynamics characteristics, as shown in Figure S8A. In addition, CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of insulin secretion (I.I.) and insulin sensitivity (Composite index) (Fig. 2). These results suggest that our primary conclusions regarding glucose dynamics and coronary risk remain robust despite individual differences in glucose metabolism.

      To address the reviewer’s suggestion, we have added the following discussion (lines 186-188, 361-368):

      Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.

      In our analysis of demographic factors, we found that age and gender had minimal influence on glucose dynamics characteristics (Fig. S8A, B), suggesting that our findings regarding the relationship between glucose dynamics and coronary risk are robust across different demographic groups within our dataset. Future studies involving larger and more diverse populations would be valuable to comprehensively elucidate the potential influence of age, gender, and other demographic factors on glucose dynamics characteristics and their relationship to cardiovascular risk.

      (11) It's unclear whether the identified components (value, variability, and autocorrelation) could serve as proxies for underlying physiological mechanisms, such as beta-cell dysfunction or insulin resistance. Please clarify.

      We appreciate the reviewer’s comment regarding the physiological underpinnings of the glucose components we identified. The mean, variance, and autocorrelation components we identified likely reflect specific underlying physiological mechanisms related to glucose regulation. In our previous research (Sugimoto, Hikaru, et al. “Improved detection of decreased glucose handling capacities via continuous glucose monitoring-derived indices.” Communications Medicine 5.1 (2025): 103.), we explored the relationship between glucose dynamics characteristics and glucose control capabilities using clamp tests and mathematical modelling. These investigations revealed that autocorrelation specifically shows a significant correlation with the disposition index (the product of insulin sensitivity and insulin secretion) and insulin clearance parameters.

      Furthermore, our current study demonstrates that CGM-derived measures such as ADRR and AC_Var significantly contributed to the prediction of %NC independent of established metabolic parameters including insulin secretion (I.I.) and insulin sensitivity (Composite index), as shown in Figure 2. These results suggest that the components we identified capture distinct physiological aspects of glucose metabolism beyond traditional measures of beta-cell function and insulin sensitivity. Further research is needed to fully characterize these relationships, but our results imply that these characteristics of glucose dynamics offer supplementary insight into the underlying beta-cell dysregulation that contributes to coronary plaque vulnerability.

      To address the reviewer’s suggestion, we have added the following discussion to the Result section (lines 186-188):

      Conventional indices, including FBG, HbA1c, PG120, I.I., Composite index, and Oral DI, did not contribute significantly to the prediction compared to these CGM-derived indices.

      Minor Comments:

      (1) The use of LASSO and PLS regression is appropriate, but the rationale for choosing these methods over others (e.g., Ridge regression) should be explained in greater detail.

      We appreciate the reviewer’s comment and have added the following discussion to the Methods section (lines 578-585):

      LASSO regression was chosen for its ability to perform feature selection by identifying the most relevant predictors. Unlike Ridge regression, which simply shrinks coefficients toward zero without reaching exactly zero, LASSO produces sparse models, which is consistent with our goal of identifying the most critical features of glucose dynamics associated with coronary plaque vulnerability. In addition, we implemented PLS regression as a complementary approach due to its effectiveness in dealing with multicollinearity, which was particularly relevant given the high correlation among several CGM-derived measures.

      (2) While figures are well-designed, adding annotations to highlight key findings (e.g., significant contributors in factor analysis) would improve clarity.

      We appreciate the reviewer’s suggestion to improve the clarity of our figures. In the factor analysis, we decided not to include annotations because indicators such as ADRR and J-index can be associated with multiple factors, which could lead to misleading or confusing interpretations. However, in response to the suggestion, we have added annotations to the PLS analysis, specifically highlighting items with VIP values greater than 1 (Fig. 2D, S2D) to emphasize key contributors.

      (3) The term "value" as a component of glucose dynamics could be clarified. For instance, does it strictly refer to mean glucose levels, or does it encompass other measures?

      We appreciate the reviewer’s question regarding the term “value” in the context of glucose dynamics. Factor 1 was predominantly influenced by CGM_Mean, with a factor loading of 0.99, indicating that it primarily represents mean glucose levels. Given this strong correlation, we have renamed Factor 1 to “Mean” (Fig. 3A) to more accurately reflect its role in glucose dynamics.

      (4) The concept of autocorrelation may be unfamiliar to some readers. A brief, intuitive explanation with a concrete example of how it manifests in glucose dynamics would enhance understanding.

      We appreciate the reviewer’s suggestion. Autocorrelation refers to the relationship between a variable and its past values over time. In the context of glucose dynamics, it reflects how current glucose levels are influenced by past levels, capturing patterns such as sustained hyperglycemia or recurrent fluctuations. For example, if an individual experiences sustained high glucose levels after a meal, the strong correlation between successive glucose readings indicates high autocorrelation. We have included this explanation in the revised manuscript (lines 519-524) to improve clarity for readers unfamiliar with the concept. Additionally, Figure 4A shows an example of glucose dynamics with different autocorrelation.

      (5) Ensure consistent use of terms like "glucose dynamics," "CGM-derived indices," and "plaque vulnerability." For instance, sometimes indices are referred to as "components," which might confuse readers unfamiliar with the field.

      We appreciate the reviewer’s comment about ensuring consistency in terminology. To avoid confusion, we have reviewed and standardized the use of terms such as “CGM-derived indices,” and “plaque vulnerability” throughout the manuscript. Additionally, while many of our measures are strictly CGM-derived indices, several “components” in our analysis include fasting blood glucose (FBG) and glucose waveforms during the OGTT. For these measures, we retained the descriptors “glucose dynamics” and “components” rather than relabelling them as CGM-derived indices.

      (6) Provide a more detailed overview of the supplementary materials in the main text, highlighting their relevance to the key findings.

      We appreciate the reviewer’s suggestion. We revised the manuscript by integrating the supplementary text into the main text (lines 129-160), which provides a clearer overview of the supplementary materials. Consequently, the Supplementary Information section now only contains supplementary figures, while their relevance and key details are described in the main text. 

      Reviewer #3 (Recommendations for the authors):

      Other Concerns:

      (1) The text states the significance of tests, however, no p-values are listed: Lines 118-119: Significance is cited between CGM indices and %NC, however, neither the text nor supplementary text have p-values. Need p-values for Figure 3C, Figure S10. When running the https://cgm-basedregression.streamlit.app/ multiple regression analysis, a p-value should be given as well. Do the VIP scores (Line 142) change with the inclusion of SBP, DBP, TG, LDL, and HDL? Do the other datasets have the same well-controlled serum cholesterol and BP levels?

      We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.

      First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure 1D. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.

      Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a statistical method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. “The correct interpretation of confidence intervals.” Proceedings of Singapore Healthcare 19.3 (2010): 276-278.).

      For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1A-B, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83, 0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S8C, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively. We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.

      We confirmed that the results of the variable importance in projection (VIP) analysis remained stable after including additional covariates, such as systolic blood pressure (SBP), diastolic blood pressure (DBP), triglycerides (TG), low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C). The VIP values for ADRR, MAGE, AC_Var, and LI consistently exceeded one even after these adjustments, suggesting that the primary findings are robust in the presence of these clinical variables. We have added the following sentences in the Results and Methods section (lines 188-191, 491-494):

      Even when SBP, DBP, TG, LDL-C, and HDL-C were included as additional input variables, the results remained consistent, and the VIP scores for ADRR, AC_Var, MAGE, and LI remained greater than 1 (Fig. S2D).

      Of note, as the original reports document, the validation datasets did not specify explicit cutoffs for blood pressure or cholesterol. Consequently, they included participants with suboptimal control of these parameters.

      (2) Negative factor loadings have not been addressed and consistency in components: Figure 3, Figure S7. All the main features for value in Figure 3A are positive. However, MVALUE in S7B is very negative for value whereas the other features highlighted for value are positive. What is driving this difference? Please explain if the direction is important. Line 480 states that variables with factor loadings >= 0.30 were used for interpretation, but it appears in the text (Line 156, Figure 3) that oral DI was used for value, even though it had a -0.61 loading. Figure 3, Figure S7. HBGI falls within two separate components (value and variability). There is not a consistent component grouping. Removal of MAG (Line 185) and only MAG does not seem scientific. Did the removal of other features also result in similar or different Cronbach's ⍺? It is unclear what Figure S8B is plotting. What does each point mean?

      We appreciate the reviewer’s comment regarding the classification of CGMderived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.

      For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S3A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behaviour is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.

      Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.

      With respect to negative factor loadings, we agree that they may appear confusing at first. However, in the context of exploratory factor analysis, the magnitude, or absolute value, of the loading is most critical for interpretation, rather than its sign. Following established practice, we considered variables with absolute loadings of at least 0.30 to be meaningful contributors to a given component. Accordingly, although the oral DI had a negative loading of –0.61, its absolute magnitude exceeded the threshold of 0.30, so it was considered in our interpretation of the “value” component. Regarding the reviewer’s observation that MVALUE in Figure S7B shows a strongly negative loading while other indices in the same component show positive loadings, we believe this reflects the relative orientation of the factor solution rather than a substantive difference in interpretation. In factor analysis, the direction of factor loadings is arbitrary: multiplying all the loadings for a given factor by –1 would not change the factor’s statistical identity. Therefore, the important factor is not whether a variable loads positively or negatively but rather the strength of its association with the latent component (i.e., the absolute value of the loading).

      The rationale for removing MAG was based on statistical and methodological considerations. As is common practice in reliability analyses, we examined whether Cronbach’s α would improve if we excluded items with low factor loadings or weak item–total correlations. In the present study, we recalculated Cronbach’s α after removing the MAG item because it had a low loading. Its exclusion did not substantially affect the theoretical interpretation of the factor, which we conceptualize as “secretion” (without CGM). MAG’s removal alone is scientifically justified because it was the only item whose exclusion improved Cronbach's α while preserving interpretability. In contrast, removing other items would have undermined the conceptual clarity of the factor or would not have meaningfully improved α. Furthermore, the MAG item has a high factor 2 loading.

      Each point in Figure S8B (old version) corresponds to an individual participant.

      To address these considerations, we have added the following text to the Discussion, Methods, (lines 388-396, 600-601) and Figure S6B (current version) legend:

      Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “mean” component and others in the “variance” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S5A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: mean, variance, and autocorrelation.

      Variables with absolute factor loadings of ≥ 0.30 were used in interpretation.

      Box plots comparing factors 1 (Mean), 2 (Variance), and 3 (Autocorrelation) between individuals without (-) and with (+) diabetic macrovascular complications. Each point corresponds to an individual. The boxes represent the interquartile range, with the median shown as a horizontal line. Mann–Whitney U tests were used to assess differences between groups, with P values < 0.05 considered statistically significant.

      Minor Concerns:

      (1) NGT is not defined.

      We appreciate the reviewer for pointing out that the term “NGT” was not clearly defined in the original manuscript. We have added the following text to the Methods section (lines 447-451):

      T2DM was defined as HbA1c ≥ 6.5%, fasting plasma glucose (FPG) ≥ 126 mg/dL or 2‑h plasma glucose during a 75‑g OGTT (PG120) ≥ 200 mg/dL. IGT was defined as HbA1c 6.0– 6.4%, FPG 110–125 mg/dL or PG120 140–199 mg/dL. NGT was defined as values below all prediabetes thresholds (HbA1c < 6.0%, FPG < 110 mg/dL and PG120 < 140 mg/dL).

      (2) Is it necessary to list the cumulative percentage (Line 173), it could be clearer to list the percentage explained by each factor instead.

      We appreciate the reviewer’s suggestion to list the percentage explained by each factor rather than the cumulative percentage for improved clarity. According to the reviewer’s suggestion, we have revised the results to show the individual contribution of each factor (39%, 21%, 10%, 5%, 5%) rather than the cumulative percentages (39%, 60%, 70%, 75%, 80%) that were previously listed (lines 220-221).

      (3) Figure S10. How were the coefficients generated for Figure S10? No methods are given.

      We conducted a multiple linear regression analysis in which time in range (TIR) was the dependent variable and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. We have added the following text to the figure legend (Fig. S8C) to provide a more detailed description of how the coefficients were generated:

      Comparison of predicted Time in range (TIR) versus measured TIR using multiple regression analysis between TIR and factor scores in Figure 3. In this analysis, TIR was the dependent variable, and the factor scores corresponding to the first three latent components (factor 1 representing the mean, factor 2 representing the variance, and factor 3 representing the autocorrelation) were the independent variables. Each point corresponds to the values for a single individual.

      (4) In https://cgm-basedregression.streamlit.app/, more explanation should be given about the output of the multiple regression. Regression is spelled incorrectly on the app.

      We appreciate the reviewer for pointing out the need for a clearer explanation of the multiple regression analysis presented in the online tool

      (https://cgmregressionapp2.streamlit.app/). We have added the description about the regression and corrected the typographical error in the spelling of “regression” within the app. 

      (5) The last section of results (starting at line 225) appears to be unrelated to the goal of predicting %NC.

      We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variance, and autocorrelation-and their independent relationships with coronary plaque characteristics. The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. As reviewer 2 pointed out in minor comment #4, temporal autocorrelation can be difficult to interpret, so these visualizations were intended to provide intuitive examples for readers.

      However, we agree with the reviewer’s concern about the coherence of the manuscript. In response, we have streamlined the simulation section by removing technical simulations that do not directly support our primary conclusions (old version of the manuscript, lines 239-246, 502-526), while retaining only those that enhance understanding of the three glycemic components (Fig. 4A).

      (6) Figure S2. The R2 should be reported.

      We appreciate the reviewer for suggesting that we report R² in Figure S2. In the revised version, we have added the correlation coefficients and their 95% confidence intervals to Figure 1E.

      (7) Multiple panels have a correlation line drawn with a slope of 1 which does not reflect the data or r^2 listed. this should be fixed.

      We appreciate the reviewer’s concern that several panels included regression lines with a fixed slope of one that did not reflect the associated R² values. We have corrected Figures 1A–C and 3C to display regression lines representing the estimated slopes derived from the regression analyses.

    1. Author response:

      We thank the reviewers for their insights and suggestions. We appreciate that the reviewers were engaged by both the observations and their interpretation, and consider their interest in further analysis and clarified discussion to be the best possible compliment to this work.

      As noted by the reviewers, the working hypothesis of a nuclear organization role for ZNF-236 is just one model. Clarifying this model and potential alternatives will certainly add to the manuscript and this will be a key part of the revision.  Beyond this, several suggested analyses should explore extant models, while providing context for considering alternatives.  We look forward to carrying out such analyses as feasible and will report them in the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Areas of improvement and suggestions:

      (1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"

      The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.

      We agree that this information is important to distinguish neurons which are direct SP targets from neurons which are involved in directing reproductive behaviors. We have now tested drivers for these neurons and added these data in Fig 3 (SAG neurons) and as Suppl Figs S4 (SPSN and genital tract neuron drivers SPR3 and SPR21), Suppl Fig S6 (overlap in single cell expression atlas), Suppl Fig S7 (overlap of SPSN split drivers with SPR8, fru11/12 and dsx split drivers in the brain inducing PMRs) and Suppl Fig S9 (pC1, OviDNs, OviENs, OviINs and vpoDN).  

      The newly added data are in full support of our conclusion that SP targets central nervous system neurons, which we termed SP Response Inducing Neurons (SPRINz). In particular, we find lines that express in genital tract neurons, but do not induce an SP response (Supp Figs S4, S7 and S10) or do not express in genital tract neurons and induce an SP response (Fig 2 and Supp Fig S2).

      We have analysed the expression of SPSN in the brain and VNC and find expression in few neurons (Suppl Fig S4). This result is consistent with expression of the genes driving SPSN expression in the single cell expression atlas indicating overlap of expression in very few neurons (Suppl Fig S6). We have already shown that FD6 (VT003280) which is part of the SPSN splitGal4 driver, expresses in the brain and VNC and can induce PMRs from SP expression (Fig 4).

      We have taken this further to test another SPSN driver (VT058873) in combination with SPR8, fru11/12 and dsx and find PMRs induced by mSP expression (Suppl Fig S7). Moreover, if we restrict expression of mSP to the brain with otdflp we can induce PMRs from mSP expression and obtain the same response by activating these brain neurons (Suppl Fig S7). We note that the VT058873 ∩ fru11/12 intersection in combination with otdflp stopmSP or stopTrpA1 in the head, did not result in PMRs. Here, PMR inducing neurons likely reside in the VNC, but currently no tools are available to test this further.

      We further tested pC1, OviDNs, OviENs, OviINs and vpoDN for induction of PMRs from expression of mSP. We are pleased to see that OviEN-SS2s, OviIN-SS1 and vpoDN splitGAl4 drivers can reduce receptivity, but not induce oviposition (Suppl Fig S8). We predicted such drivers based on previously published data (Haussmann et al. 2013), which we now validated.

      (2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.

      The expression pattern for tshGAL, which expresses in the trunk is already published (Soller et al., 2006). We have added images for “head” expression for tshGAL and adjusted our statement to be pre-dominantly expressed in the VNC in Suppl Fig 1.

      (3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.

      According to the reviewers suggestion, we have clarified the specificity of VT003280 and now say that this is FD6.

      (4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      In principal, use of GAL80 is a valid approach to restrict expression, if levels of GAL80 are higher than those of GAL4, because GAL80 binds GAL4 to inhibit its activity. Hence, if levels of GAL80 are lower, results could be difficult to interpret.

      (5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.

      Again, we agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      Reviewer #2 (Public Review):

      Strength:

      The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.

      We thank reviewer 2 for recognizing the advance of our work.

      Weakness:

      Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.

      We appreciate this reviewers recognition of our previous work showing that receptivity and oviposition are separable. As pointed out we have now gone one step further and identified in a tour de force approach subsets of neurons in the brain and VNC.

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. As pointed out by this reviewer, the neurochemical identity is an excellent suggestions and will help to further restrict expression to just one type of neuron. However, this is a major task that we will continue in follow up studies.

      Reviewer #3 (Public Review):

      Strengths:

      Besides the main results described in the summary above, the authors discovered the following:

      (1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.

      (2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.

      We thank reviewer 2 for recognizing these two important points regarding the SP response that point to a revised model for how the underlying circuitry induces the post-mating response. To further substantiate these findings we now have added a splitGal4 nSyb ∩ ppk which expresses in genital tract neurons, but does not induce PMRs from mSP expression.

      Weaknesses:

      (1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned,   neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.

      We have added a positive control for ppk expression by combining the ppk-DBD line with a nSyb-AD which expresses in all neurons in Supp Fig S8. This experiment confirms our previous observations that ppk splitGal4 in combination with other drivers does not induce an SP response despite driving expression in genital tract neurons. We have expanded the discussion section to point out that we have identified additional cells in the brain expressing ppkGAL4, but expression of split-GAL4 ppk is absent in these cells. Part of this work has previously been published (Nallasivan et al. 2021). Accordingly, we amended the text to say when expression was achieved with ppkGAL or ppk splitGAL4.

      (2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.

      We agree with this reviewer that the interpretation of the SPR RNAi results are complicated by the fact that SP has additional receptors (Haussmann et al 2013). The results are conclusive for all three intersections when expressing UAS mSP in SPR RNAi with respect to oviposition, e.g. egg laying is not induced in the absence of SPR. For receptivity, the results are conclusive for dsx ∩ fru11/12 and partially for SPR8 ∩ fru 11/12.

      Potentially, SPR RNAi knock-down does not sufficiently reduce SPR levels to completely reduce receptivity in some intersection patterns, likely also because splitGal4 expression is less efficient.

      Why SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive is unclear, but we anticipate that we need a higher resolution of expression to only one cell type to resolve this unexpected result. However, this is a major task that we will continue in follow up studies.

      SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.

      Whether membrane-tethered SP induces the response in a autocrine manner is an import aspect in the interpretation of the results from mSP expression.

      Removing SPR by SPR RNAi and expression of mSP in the same neurons did not induce egg laying for all three intersection and did not reduce receptivity for dsx ∩ fru11/12 and for SPR8 ∩ fru 11/12. Accordingly, we can conclude that for these neurons the response is induced in an autocrine manner.

      We have added this aspect to the discussion section.