10,000 Matching Annotations
  1. Jul 2025
    1. eLife Assessment

      The authors attempt to identify which patients with benign lesions will progress to cancer using a liquid biomarker. Although the study is valuable, the evidence provided for the liquid biopsy EV miRNA signature developed based on radiomics features remains incomplete. There remain key details missing and validation experiments that would better support the conclusions of the study.

    2. Reviewer #1 (Public review):

      Summary:

      The study aimed to develop a liquid biopsy EV miRNA signature associated with radiomics features for early diagnosis of pancreatic cancer. Flawed study design and inadequate description of clinical characteristics of the enrolled samples makes the findings unconvincing.

      Strengths:

      The concept of developing EV miRNA signature associated with disease relevant radiomics features is a strength.

      Weaknesses:

      There are many weaknesses in this manuscript, which include drawing association of data derived from unmatched sample sets, selection of low abundance miRNAs for developing the signature with inadequate rationale, incomplete description of experimental methods and confusing statements in the text.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates a low abundance microRNA signature in extracellular vesicles to subtype pancreatic cancer and for early diagnosis. In this revision, there remain several major and minor issues.

      Strengths:

      The authors did a comprehensive job with numerous analyses of moderately sized cohorts to describe the clinical and translational significance of their miRNA signature.

      Weaknesses:

      The weaknesses of the study largely revolve around a lack of clarity about the methodology used and the validation of their findings.

      (1) The WGCNA analysis was critical to identify the EV miRNAs associated with imaging features, but the "cut-off criteria" for MM and GS have no clear justification. How were these cut-offs determined? How sensitive were the results to these cut-offs?

      (2) The authors now clarify that patients for the sub-study on differentiating early stage from benign pancreatic lesions were matched by age and that the benign pancreatic lesions were predominantly IPMNs. This scientific design is flawed. The CT features extracted likely differentiate solid from cystic pancreatic lesions, and the miRNA signature is doing the same. The authors need to incorporate the following benign controls into their imaging analysis and their EV miRNA analysis: pancreatitis and normal pancreata.

      (3) For the radiomics features, the authors should include an additional external validation set to better support the ability to use these features reproducibly, especially given that the segmentation was manual and reliant on specific people.

      (4) The DF selection process still lacks cited references as originally requested in the first review.

      (5) In Figure 2, more quantitative details are needed in the manuscript. The reviewers failed to incorporate this and only responded in their rebuttal. Add details to the manuscript as originally requested.

      (6) It is still not clear what Figure 4A is illustrating as regards to model performance. The authors need to state in the manuscript very clearly what they are showing in the figure and what the modules represent.

      (7) Figure 5 and the descriptions for the public serum miRNA datasets need more details. Were these pancreatic cancers all adenocarcinoma, what stage, age range, sex distribution, comorbid conditions were the cases? Were the controls all IPMNs or were there other conditions in the controls?

      (8) The subtype results in figures 6 and 7 are not convincing. An association on univariate analysis is not sufficient. The explanation that clinical data is not available to do a multivariable analysis indicates that the authors do not have the ability to claim that they have identified unique subtypes that have clinical relevance. A thorough evaluation of the prognostic significance and the associated molecular features of these tumors is needed.

      Summary:

      There remain key details and validation experiments to better support the conclusions of the study.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Shi et al, has utilized multiple imaging datasets and one set of samples for analyzing serum EV-miRNAs & EV-RNAs to develop an EV miRNA signature associated with disease-relevant radiomics features for early diagnosis of pancreatic cancer. CT imaging features (in two datasets (UMMD & JHC and WUH) were derived from pancreatic benign disease patients vs pancreatic cancer cases), while circulating EV miRNAs were profiled from samples obtained from a different center (DUH). The EV RNA signature from external public datasets (GSE106817, GSE109319, GSE113486, GSE112264) were analyzed for differences in healthy controls vs pancreatic cancer cases. The miRNAs were also analyzed in the TCGA tissue miRNA data from normal adjacent tissue vs pancreatic cancer.

      Strengths:

      The concept of developing EV miRNA signatures associated with disease relevant radiomics features is a strength.

      Weaknesses:

      While the overall concept of developing EV miRNA signature associated with radiomics features is interesting, the findings reported are not convincing for the reasons outlined below:

      (1) Discrepant datasets for analyzing radiomic features with EV-miRNAs: It is not justified how CT images (UMMD & JHC and WUH) and EV-miRNAs (DUH) on different subjects and centers/cohorts shown in Figures 1 &2 were analyzed for association. It is stated that the samples were matched according to age but there is no information provided for the stages of pancreatic cancer and the kind of benign lesions analyzed in each instance.

      Thank you to the reviewer for the valuable comments. We acknowledge that the radiomics data and EV-miRNA data were derived from different patient cohorts. The primary aim of this study was to explore the integration of data from different omics sources in an exploratory manner to identify potential shared biological features.

      We have revised the Methods section accordingly. Regarding the imaging data, we mainly performed batch effect correction on CT images from different centers to eliminate variability. As you correctly pointed out, the EV-miRNA data and CT images from DUH were matched by age. Since all the patients we included had early-stage pancreatic cancer, and the benign pancreatic lesions were predominantly IPMN, we did not specifically highlight this aspect. However, we have now clarified this approach in the data collection section. Thank you for your attention.

      (2) The study is focused on low-abundance miRNAs with no adequate explanation of the selection criteria for the miRNAs analyzed.

      We used MAD (Median Absolute Deviation) to filter low-abundance miRNAs in the manuscript, as this concept was introduced by us for the first time in this context, and we acknowledge that there is still considerable room for refinement and improvement.

      (3) While EV-miRNAs were profiled or sequenced (not well described in the Methods section) with two different EV isolation methods, the authors used four public datasets of serum circulating miRNAs to validate the findings. It would be better to show the expression of the three miRNAs in the additional dataset(s) of EV-miRNAs and compare the expressions of the three EV-miRNAs in pancreatic cancer with healthy and benign disease controls.

      Thank you for your suggestion. We have attempted to identify available EV-miRNA datasets; however, due to current limitations in data access, we opted to use serum samples for validation. In our follow-up studies, we are already in the process of collecting relevant EV samples for further validation.

      (4) It is not clear how the 12 EV-miRNAs in Figure 4C were identified.

      These 12 EV-miRNAs were identified through WGCNA analysis and are associated with the high-risk group.

      (5) Box plots in Figures 4D-F and G-I of three miRNAs in serum and tissue should show all quantitative data points.

      We have completed the revisions. Kindly review them at your convenience.

      (6) What is the GBM model in Figure 5?

      Thank you to the reviewer for raising this question. The "GBM model" referred to in Figure 5 is a classification model built using the Gradient Boosting Machine (GBM) algorithm, designed to predict the diagnostic status of pancreatic cancer by integrating EV-miRNA expression and radiomics features. We implemented the model using the `GradientBoostingClassifier` from the scikit-learn library (version 1.2.2), and optimized the model’s hyperparameters—including learning rate, maximum depth, and number of trees—within a five-fold cross-validation framework. The training process and performance evaluation of the model, including the ROC curve and AUC values, are presented in Figure 5.

      (7) What are the AUCs of individual EV-miRNAs integrated as a panel of three EV-miRNAs?

      Thanks for your comments, Our GBM model integrates the panel of these three EV-miRNAs.

      (8) The authors could have compared the performance of CA19-9 with that of the three EV-miRNAs.

      Since our main focus is on the panel of three EV-miRNAs, we did not present the AUC for each individual miRNA separately. However, we have included the performance of CA19-9 in our dataset as a reference. The predictive AUC for CA19-9 is 0.843 (95% CI, 0.762–0.924).

      (9) How was the diagnostic performance of the three EV-miRNAs in the two molecular subtypes identified in Figure 6&7? Do the C1 & C2 clusters correlate with the classical/basal subtypes, staging, and imaging features?

      Thank you to the reviewer for raising this important question. In fact, our EV panel is primarily designed to distinguish between normal and tumor samples, whereas both C1 and C2 represent tumor subtypes, and thus the panel is not applicable for diagnostic purposes in this context. Additionally, our subtypes are novel and do not align with the conventional classical and basal-like gene expression profiles. Furthermore, the C1 subtype is more frequently observed in stage III tumors (Figure 6J) and is associated with distinct imaging features such as higher texture heterogeneity and lower CT density.

      Reviewer #2 (Public review):

      Summary:

      This study investigates a low abundance microRNA signature in extracellular vesicles to subtype pancreatic cancer and for early diagnosis. There are several major questions that need to be addressed. Numerous minor issues are also present.

      Strengths:

      The authors did a comprehensive job with numerous analyses of moderately sized cohorts to describe the clinical and translational significance of their miRNA signature.

      Weaknesses:

      There are multiple weaknesses of this study that should be addressed:

      (1) The description of the datasets in the Materials and Methods lacks details. What were the benign lesions from the various hospital datasets? What were the healthy controls from the public datasets? No pancreatic lesions? No pancreatic cancer? Any cancer history or other comorbid conditions? Please define these better.

      We sincerely thank the reviewer for the detailed and important suggestions regarding sample definition. Indeed, the source of the datasets and the definition of control groups are critical for ensuring the rigor and interpretability of the study. In response to this comment, we have added clarifications in the revised "Materials and Methods" section.

      First, for the benign lesion group derived from various clinical centers (DUH, UMMD, WUH, etc.), we have carefully reviewed the pathological and clinical records and defined these samples as histologically confirmed non-malignant pancreatic lesions, primarily IPMN. All patients in the benign lesion group had no diagnosis of pancreatic cancer at the time of sample collection, and for cohorts with available follow-up data, no evidence of malignant progression was observed within at least six months.

      Second, the healthy control group from public databases was derived from healthy individuals.

      Finally, to eliminate potential confounding factors, we excluded any samples with a history of other malignancies (e.g., breast cancer, colorectal cancer, etc.) from all datasets with available clinical information, to ensure the specificity of the EV-miRNA expression analysis.

      (2) It is unclear how many of the controls and cases had both imaging for radiomics and blood for biomarkers.

      Due to limitations in resource availability, our study does not include samples with both CT imaging and serological data from the same individuals. Instead, we integrated blood samples and CT imaging data collected from different clinical centers.

      (3) The authors should define the imaging methods and protocols used in more detail. For the CT scans, what slice thickness? Was a pancreatic protocol used? What phase of contrast is used (arterial, portal venous, non-contrast)? Any normalization or pre-processing?

      Thank you to the reviewer for the professional suggestions regarding the imaging section. We have added detailed technical information on CT imaging in the revised Materials and Methods section. All CT images were acquired using a 64-slice multidetector spiral CT scanner, with a standard slice thickness of 1.0–1.5 mm and a reconstruction interval of 1 mm. All pancreatic cancer patients underwent a standard pancreatic protocol triphasic contrast-enhanced CT examination, which included non-contrast, arterial phase (approximately 25–30 seconds), and portal venous phase (approximately 65–70 seconds) imaging.

      For the radiomics analysis, images from the portal venous phase were selected, as this phase provides consistent clarity in delineating tumor boundaries and surrounding vasculature. To ensure data consistency, all imaging data underwent preprocessing, including resampling, intensity normalization of grayscale values (standardized using z-score normalization to a mean of 0 and a standard deviation of 1), and N4 bias field correction to address potential low-frequency signal inhomogeneities.

      (4) Who performed the segmentation of the lesions? An experienced pancreatic radiologist? A student? How did the investigators ensure that the definition of the lesions was performed correctly? Raidomics features are often sensitive to the segmentation definitions.

      All lesion segmentations were performed on portal venous phase contrast-enhanced CT images. Manual delineation was conducted using 3D Slicer (version 4.11) by two radiologists with extensive experience in pancreatic tumor diagnosis. A consensus was reached between the two radiologists on the ROI definition criteria prior to analysis.

      To further assess the robustness of radiomic features to segmentation boundary variations, we selected a subset of representative cases and created “expanded/shrunk ROIs” by adding or subtracting a 2-pixel margin at the lesion boundary. Feature extraction was then repeated, and the coefficient of variation (CV) for the main features included in the model was found to be below 10%, indicating that the model is stable with respect to minor boundary fluctuations.

      (5) Figure 1 is full of vague images that do not convey the study design well. Numbers from each of the datasets, a summary of what data was used for training and for validation, definitions of all of the abbreviations, references to the Roman numerals embedded within the figure, and better labeling of the various embedded graphs are needed. It is not clear whether the graphs are real results or just artwork to convey a concept. I suspect that they are just artwork, but this remains unclear.

      We thank the reviewer for the detailed feedback on Figure 1. We would like to clarify that Figure 1 is a conceptual schematic intended to visually illustrate the overall design of the study, the relationships among different data modules, and the logical sequence of the analytical strategy. It is not meant to present actual results or quantitative details.

      Regarding the reviewer’s concerns about sample sizes, the division between training and validation cohorts, explanations of specific abbreviations, and the precise meaning of each panel, we have provided comprehensive and detailed clarifications in Figure 2.

      (6) The DF selection process lacks important details. Please reference your methods with the Boruta and Lasso models. Please explain what machine learning algorithms were used. There is a reference in the "Feature selection.." section of "the model formula listed below" but I do not see a model formula below this paragraph.

      We thank the reviewer for the thoughtful and detailed comments on the feature selection strategy. We first applied the Boruta algorithm (based on random forests, implemented using the Boruta R package) to the original feature set—which included both radiomics and EV-miRNA features—to identify variables that consistently demonstrated importance across multiple rounds of random resampling.

      Subsequently, we used LASSO regression with five-fold cross-validation to further reduce the dimensionality of the Boruta-selected features and to construct the final feature set used for modeling. The formula for the model is as follows: each regression coefficient is multiplied by the corresponding feature expression level, and the resulting products are summed to generate the Risk Score.

      (7) In Figure 2, more quantitative details are needed. How are patients dichotomized into non-obese and obese? What does alcohol/smoking mean? Is it simply no to both versus one or the other as yes? These two risk factors should be separated and pack years of smoking should be reported. The details of alcohol use should also be provided. Is it an alcohol abuse history? Any alcohol use, including social drinking? Similarly, "diabetes" needs to be better explained. Type I, type II, type 3c? P values should be shown to demonstrate any statistically significant differences in the proportions of the patients from one dataset to another.

      Our definition of obesity was based on the standard BMI threshold (30 kg/m²). A history of smoking or alcohol consumption was defined as continuous use for more than one year. Specific details regarding smoking and alcohol use were recorded at baseline under the category of “smoking/alcohol history”; unfortunately, we did not collect follow-up data on these variables. As for diabetes, only type II diabetes was documented. Statistically significant p-values have been added. Thank you.

      (8) In the section "Different expression radiomic features between pancreatic benign lesions and aggressive tumors", there is a reference to "MUJH" for the first time. What is this? There is also the first reference to "aggressive tumors" in the section. Do the authors just mean the cases? Otherwise there is no clear definition of "aggressive" (vs. indolent) pancreatic cancer. This terminology of tumor "aggressiveness" either needs to be removed or better defined.

      We have corrected the abbreviation (MUJH); it should in fact be JHC. Additionally, regarding the term "aggressive," we have reviewed the literature and used it to convey the highly malignant nature of pancreatic cancer.

      (9) Figure 3 needs to have the specific radiomic features defined and how these features were calculated. Labeling them as just f1, f2, etc is not sufficient for another group to replicate the results independently.

      We have presented these features in Supplementary Table 1. Kindly refer to it for details.

      (10) It is not clear what Figure 4A illustrates as regards model performance. What do the different colors represent, and what are the models used here? This is very confusing.

      This represents the correlation between WGCNA modules and miRNAs. Different module colors indicate distinct miRNA clusters—for example, the green module contains 12 miRNAs grouped together. The colors themselves do not carry any intrinsic meaning.

      (11) Figure 5 shows results for many more model runs than the described 10, please explain what you are trying to convey with each row. What are "Test A" and "Test B"? There is no description in the manuscript of what these represent. In the figure caption, there is a reference to "our center data" which is not clear. Be more specific about what that data is.

      We have indicated this using arrows in Figure 5 from Test A/B/C. Please check.

      (12) Figure 6 describes the subtypes identified in this study, but the authors do not show a multi-variable cox proportional hazards model to show that this subtype classification independently predicts DFS and OS when incorporating confounding variables. This is essential to show the subtypes are clinically relevant. In particular, the authors need to account for the stage of the patients, and receipt of chemotherapy, surgery, and radiation. If surgery was done, we need to know whether they had R1 or R0 resection. The details about the years in which patients were included is also important.

      We sincerely thank the reviewer for this critical comment. We fully agree that incorporating a multivariate Cox proportional hazards model to control for potential confounding factors would provide a more robust validation of the independent prognostic value of our proposed subtypes for DFS and OS.

      However, as the clinical data used in this study were retrospectively collected and access to certain variables is currently restricted, we were only able to obtain limited clinical information. At this stage, we are unable to systematically include key variables such as tumor staging, adjuvant chemoradiotherapy regimens, and resection margin status (R0 vs. R1), which prevents us from performing a rigorous multivariate Cox analysis.

      Similarly, regarding the postoperative resection status, after reviewing the original surgical reports and pathology records, we regret to confirm that margin status (R0 vs. R1) is missing in a substantial portion of cases, making it unsuitable for reliable statistical analysis.

      We fully acknowledge this as a limitation of the current study and have explicitly addressed it in the Discussion section. To address this gap, we are currently designing a more comprehensive prospective cohort study, which will allow us to validate the clinical independence and utility of the proposed subtypes in future research.

      (13) How do these subtypes compare to other published subtypes?

      We sincerely thank the reviewer for raising this important point. Clusters 1 and 2 represent a novel molecular classification proposed for the first time in this study, driven by EV-miRNA profiles. This classification approach is conceptually independent from traditional transcriptome-based subtyping systems, such as the classical/basal-like subtypes, as well as other existing classification schemes. Comparisons with previously reported subtypes and validation of clinical relevance will require further investigation in future studies.

      Reviewer #3 (Public review):

      Summary:

      The authors appear to be attempting to identify which patients with benign lesions will progress to cancer using a liquid biomarker. They used radiomics and EV miRNAs in order to assess this.

      Strengths:

      It is a strength that there are multiple test datasets. Data is batch-corrected. A relatively large number of patients is included. Only 3 miRNAs are needed to obtain their sensitivity and specificity scores.

      Weaknesses:

      This manuscript is not clearly written, making interpretation of the quality and rigor of the data very difficult. There is no indication from the methods that the patients in their cohorts who are pancreatic cancer patients (from the CT images) had prior benign lesions, limiting the power of their analysis. The data regarding the cluster subtypes is very confusing. There is no discussion or comparison if these two clusters are just representing classical and basal subtypes (which have been well described).

      Sorry,we don’t have the data of record from patients, in addition, Regarding the relationship between Cluster 1/Cluster 2 and classical subtypes:We are very grateful for the reviewer’s insightful question. We would like to clarify that Clusters 1 and 2, as shown in Figures 6 and 7, are derived from a novel EV-miRNA–driven molecular classification proposed for the first time in this study. This classification system is constructed independently of the traditional transcriptome-based classical/basal-like subtypes.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      There are errors in reference citations and several typos, misspellings, and grammatical errors throughout the manuscript.

      We have made the necessary revisions.

      Reviewer #2 (Recommendations for the authors):

      (1) Were the radiomic features associated with the subtypes and prognostic in the subset of patients who had CT scans?

      Unfortunately, there are no corresponding CT imaging results available for these cases, as the genes were identified based on predicted miRNA targets and were not derived from patients who had undergone CT scans.

      (2) There is a whole body of literature on prognostic imaging-based subtypes of pancreatic cancer that needs to be cited.

      Thank you for your suggestion. We have cited the relevant references accordingly in the manuscript.

      (3) Similarly, the authors should be more comprehensive about prognostic and early detection markers for miRNAs for pancreatic cancer. Early detection markers really should be described separately from prognostic markers. The authors did not do a PROBE phase 3 study, so early detection is not really relevant. Please see https://edrn.nci.nih.gov/about-edrn/five-phase-approach-and-prospective-specimen-collection-retrospective-blinded-evaluation-study-design/

      The primary objective of our study is early detection. We acknowledge the absence of third-phase validation results, which we will address in the limitations section. Additionally, the subtype classification represents our secondary objective.

      (4) If they want to couch this as a PROBE phase 2 study, then they should review the PROBE guidelines and ensure they are meeting standards. Many of the comments above regarding methodologies, definitions, and patient cohort descriptions would address this concern.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (5) The entire manuscript needs to have a review for the use of the English language. There are numerous typos and grammatical errors that make this manuscript difficult to follow and hard to interpret.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (6) In the section on "Definition and identification of low abundance EV-derived miRNA transcripts", provide a reference for the "edger" function.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (7) In the Abstract: The purpose section only mentions early diagnosis as the goal of this study. It seems subtyping is also a major goal, but it is not mentioned.

      The primary objective of our study is early detection.Additionally, the subtype classification represents our secondary objective.so,we didn’t add it in the purpose.

      (8) The experimental design fails to describe any of the 8 datasets that were used. How many patients? What were the ethnic and racial backgrounds, which is one of the key aspects of this study and mentioned in the title? What range of stages? When were the images and the blood collected in relation to diagnosis? Over what time frame were the patients included? What patients were excluded, if any? These details are important to understand the materials used, along with the methods to design the signatures and models.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (9) Again, the purpose section of the abstract does not align with the rest of the study, including the description of the experimental design. The last sentence of the experimental design section mentions predicting drug sensitivity and survival, which is unrelated to the aim of early diagnosis.

      We have revised the Methods section accordingly. Please kindly review the updated version.

      (10) The results section lacks key details to indicate the impact of the work. Vague descriptions of the findings are not sufficient. The performance of the biomarkers to differentiate benign from malignant lesions, hazard ratios, survival times, and p values should be reported for key results.

      Our aim was to develop an integrated panel for diagnostic purposes; therefore, we provided the AUC to evaluate its performance. However, since this is a diagnostic model, we did not include hazard ratios or survival time data.

      (11) What are "tow" molecular subtypes of pancreatic cancer? Did you mean "two"? What system was used to subtype the pancreatic cancers? Is some new subtyping or a previously published method to subtype the disease?

      Yes, it means two, previously published method.In method part, we have describe it.

      Reviewer #3 (Recommendations for the authors):

      The writing of this manuscript needs extensive re-wording and clarification to increase the readability and interpretability of the data presented. The authors could include a dataset of pancreatic cancer patient imaging data where the status of prior benign lesions was detected (as opposed to patients with benign lesions that do not develop pancreatic cancer). The authors could also address if their clusters 1 and 2 are representing (or are correlated with) the classical and basal subtypes that have been well described for pancreatic cancer.

      Thank you to the reviewer for the constructive comments. We sincerely appreciate your careful review, particularly regarding language clarity, data interpretability, and subtype correlation. To enhance the readability and scientific precision of the manuscript, we have conducted a thorough revision and language polishing throughout the text, improving logical structure, terminology consistency, and clarity in result descriptions. We have especially reinforced the Methods and Discussion sections to better explain key analytical steps and data interpretation.

      We fully understand the reviewer’s suggestion to include information on “the presence of benign lesions prior to pancreatic cancer diagnosis.” However, due to the retrospective nature of our study, the current imaging and EV-miRNA datasets do not contain systematically collected follow-up annotations of this type. Therefore, it is not feasible to incorporate such data into the present manuscript.

      That said, we fully recognize the importance of this direction. In future studies, we plan to evaluate longitudinal samples to investigate the dynamic changes in EV-miRNAs and imaging features during the progression from premalignant to malignant states, aiming to clarify their potential value for early cancer warning.

      Regarding the relationship between Cluster 1/Cluster 2 and classical subtypes:We are very grateful for the reviewer’s insightful question. We would like to clarify that Clusters 1 and 2, as shown in Figures 6 and 7, are derived from a novel EV-miRNA–driven molecular classification proposed for the first time in this study. This classification system is constructed independently of the traditional transcriptome-based classical/basal-like subtypes.

      Although we attempted a cross-comparison with existing TCGA subtypes, differences in data origin, analysis modality (EV-miRNA vs. tissue transcriptome), and limitations in sample matching prevent us from establishing a direct correspondence. In the revised Discussion, we have emphasized that these two classification approaches are complementary rather than equivalent, reflecting different dimensions of tumor heterogeneity. Further integrative multi-omics studies will be needed to validate their biological significance and clinical utility.

    1. eLife Assessment

      This study on the loss of DEGS1 in the developing larval brain convincingly shows the accumulation of dihydroceramide in the CNS which induces severe alterations in the morphology of glial subtypes as well as a reduction in glial number. The localization of DEGS1/ifc primarily to the ER is also compelling and interesting, and the loss of DEGS1/ifc clearly drives ER expansion and reduces the levels of TGs. This is an important contribution to the role of lipid metabolism in neural development and disease.

    2. Reviewer #1 (Public review):

      Summary:

      Zhu et al., investigate the cellular defects in glia as a result of loss in DEGS1/ifc encoding the dihydroceramide desaturase. Using the strength of Drosophila and its vast genetic toolkit, they find that DEGS1/ifc is mainly expressed in glia and it's loss leads to profound neurodegeneration. This supports a role for DEGS1 in the developing larval brain as it safeguards proper CNS development. Loss of DEGS1/ifc leads to dihydroceramide accumulation in the CNS and induces alteration in the morphology of glial subtypes and a reduction in glial number. Cortex and ensheathing glia appeared swollen and accumulated internal membranes. Astrocyte-glia on the other hand displayed small cell bodies, reduced membrane extension and disrupted organization in the dorsal ventral nerve cord. They also found that DEGS1/ifc localizes primarily to the ER. Interestingly, the authors observed that loss of DEGS1/ifc drives ER expansion and reduced TGs and lipid droplet numbers. No effect on PC and PE and a slight increase in PS.

      The conclusions of this paper are well supported by the data.

      Strengths:

      This is an interesting study that provides new insight into the role of ceramide metabolism in neurodegeneration.

      The strength of the paper is the generation of LOF lines, the insertion of transgenes and the use of the UAS-GAL4/GAL80 system to assess the cell-autonomous effect of DEGS1/ifc loss in neurons and different glial subtypes during CNS development.

      The imaging, immunofluorescence staining and EM of the larval brain and the use of the optical lobe and the nerve cord as a readout are very robust and nicely done.

      Drosophila is a difficult model to perform core biochemistry and lipidomics, but the authors used the whole larvae and CNS to uncover global changes in mRNA levels related to lipogenesis and the unfolded protein responses, as well as specific lipid alterations upon DEGS1/ifc loss.

      Weaknesses:

      No major weaknesses identified.

      Minor point: The authors performed lipidomics and RTqPCR on whole larvae and larval CNS which does not inform of any cell type-specific effects. Performing single-cell RNAseq on larval brains to tease apart the cell-type specific effect of DEGS1/ifc loss would be interesting to explore the future, but beyond the scope of the current study.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Zhu et al. describes phenotypes associated with the loss of the gene ifc using a Drosophila model. The authors suggest their findings are relevant to understanding the molecular underpinnings of a neurodegenerative disorder, HLD-18, which is caused by mutations in the human ortholog of ifc, DEGS1.

      The work begins with the authors describing the role for ifc during fly larval brain development, demonstrating its function in regulating developmental timing, brain size, and ventral nerve cord elongation. Further mechanistic examination revealed that loss of ifc leads to depleted cellular ceramide levels as well as dihydroceramide accumulation, eventually causing defects in ER morphology and function. Importantly, the authors showed that ifc is predominantly expressed in glia and is critical for maintaining appropriate glial cell numbers and morphology. Many of the key phenotypes caused by the loss of fly ifc can be rescued by overexpression of human DEGS1 in glia, demonstrating the conserved nature of these proteins as well as the pathways they regulate. Interestingly, the authors discovered that the loss of lipid droplet formation in ifc mutant larvae within the cortex glia, presumably driving the deficits in glial wrapping around axons and subsequent neurodegeneration, potentially shedding light on mechanisms of HLD-18 and related disorders.

      Strengths:

      Overall, the manuscript is thorough in its analysis of ifc function and mechanism. The data images are high quality, the experiments are well controlled, and the writing is clear. There are, however, some concerns that need to be addressed prior to publication.

      Weaknesses:

      The authors adequately addressed the previously indicated weaknesses, and no new weaknesses have been identified.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      Strengths:

      This manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: Zhu et al., investigate the cellular defects in glia as a result of loss in DEGS1/ifc encoding the dihydroceramide desaturase. Using the strength of Drosophila and its vast genetic toolkit, they find that DEGS1/ifc is mainly expressed in glia and its loss leads to profound neurodegeneration. This supports a role for DEGS1 in the developing larval brain as it safeguards proper CNS development. Loss of DEGS1/ifc leads to dihydroceramide accumulation in the CNS and induces alteration in the morphology of glial subtypes and a reduction in glial number. Cortex and ensheathing glia appeared swollen and accumulated internal membranes. Astrocyte-glia on the other hand displayed small cell bodies, reduced membrane extension and disrupted organization in the dorsal ventral nerve cord. They also found that DEGS1/ifc localizes primarily to the ER. Interestingly, the authors observed that loss of DEGS1/ifc drives ER expansion and reduced TGs and lipid droplet numbers. No effect on PC and PE and a slight increase in PS.

      The conclusions of this paper are well supported by the data. The study could be further strengthened by a few additional controls and/or analyses.

      Strengths:

      This is an interesting study that provides new insight into the role of ceramide metabolism in neurodegeneration.

      The strength of the paper is the generation of LOF lines, the insertion of transgenes and the use of the UAS-GAL4/GAL80 system to assess the cell-autonomous effect of DEGS1/ifc loss in neurons and different glial subtypes during CNS development.

      The imaging, immunofluorescence staining and EM of the larval brain and the use of the optical lobe and the nerve cord as a readout are very robust and nicely done.

      Drosophila is a difficult model to perform core biochemistry and lipidomics but the authors used the whole larvae and CNS to uncover global changes in mRNA levels related to lipogenesis and the unfolded protein responses as well as specific lipid alterations upon DEGS1/ifc loss.

      Weaknesses:

      (1) The authors performed lipidomics and RTqPCR on whole larvae and larval CNS from which it is impossible to define the cell type-specific effects. Ideally, this could be further supported by performing single cell RNAseq on larval brains to tease apart the cell-type specific effect of DEGS1/ifc loss.

      We agree that using scRNAseq or pairing FACS-sorting of individual glial subtypes with bulk RNAseq would help tease apart the cell-type specific effects of DEGS1/ifc loss on glial cells. At this time, however, this approach extends beyond the scope of the current paper and means of the lab. 

      (2) It's clear from the data that the accumulation of dihydroceramide in the ER triggers ER expansion but it remains unclear how or why this happens. Additionally, the authors assume that, because of the reduction in LD numbers, that the source of fatty acids comes from the LDs. But there is no data testing this directly.

      As CERT, the protein that transports ceramide from the ER to the Golgi, is far more efficient at transporting ceramide than dihydroceramide, we speculate that dihydroceramide accumulates in the ER due to inefficient transport from the ER to the Golgi by CERT. We state this model more explicitly in the results under the subheading “Reduction of dihydroceramide synthesis suppresses the ifc CNS phenotype”.

      We agree with the point on lipid droplet. We observe a correlation, not a causation, between reduction of lipid droplets and a large expansion of ER membrane. We have tried to clarify the text in the last paragraph of the discussion to make this point more clearly. See also response to reviewer 2 point 3. 

      (3) The authors performed a beautiful EMS screen identifying several LOF alleles in ifc. However, the authors decided to only use KO/ifcJS3. The paper could be strengthened if the authors could replicate some of the key findings in additional fly lines.

      We agree. We replicated the observed cortex glia swelling, ER expansion in cortex glia, and observed increase in neuronal cell death markers in late-third instar larvae mutant for either the ifcjs1 or ifcjs2 allele. These data are now provided as Supplementary Figure 7.

      (4) The authors use M{3xP3-RFP.attP}ZH-51D transgene as a general glial marker. However, it would be advised to show the % overlap between the glial marker and the RFP since a lot of cells are green positive but not per se RFP positive and vice versa.

      We visually reexamined the expression of the 3xP3 RFP transgene relative to FABP labeling for cortex glia, Ebony for astrocyte-like glia, and the Myr-GFP transgene driven by glial-subtype specific GAL4 driver lines for perineurial, subperineurial, and ensheathing glia. We note that RFP localizes to the nucleus cytoplasm while FABP and Ebony localize to the cytoplasm and Myr-GFP to the cell membrane. Thus, an observed lack of overlap of expression between RFP and the other markers can arise to differential localization of the two markers in the same cells (see, for example, Fig. S2D where Myr-GFP expression in the nuclear envelope encircles that of RFP in the nucleus. Through visual inspection of five larval-brain complexes for each glial subtype marker, we found that essentially all cortex, SPG, and ensheathing glia expressed RFP. Similarly, nearly all astrocyte-like glia also expressed RFP, but they expressed RFP at significantly lower levels than that observed for cortex, SPG, or ensheathing glia. This analysis also confirmed that most perineurial glia do not express RFP. The 3xP3 M{3xP3-RFP.attP}ZH-51D transgene then labels most glia in the Drosophila CNS. We have added text to Supplementary Figure 2 noting the above observations as to which glial cells express RFP. 

      (5) The authors indicate that other 3xP3 RFP and GFP transgenes at other genomic locations also label most glia in the CNS. Do they have a preferential overlap with the different glial subtypes?

      We assessed three different types of 3xP3 RFP and GFP transgenes: M{3xP3RFP.attp} transgenes (n=4), Mi{GFP[E.3xP3]=ET1} transgenes (n=3), and

      Tl{GFP[3xP3.cLa]=CRIMIC.TG4} transgenes (n>6). All labeled cortex glia, but different lines exhibited differential labeling of astrocyte and ensheathing glia. These data are now included as Supplementary Figure 3.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Zhu et al. describes phenotypes associated with the loss of the gene ifc using a Drosophila model. The authors suggest their findings are relevant to understanding the molecular underpinnings of a neurodegenerative disorder, HLD-18, which is caused by mutations in the human ortholog of ifc, DEGS1.

      The work begins with the authors describing the role for ifc during fly larval brain development, demonstrating its function in regulating developmental timing, brain size, and ventral nerve cord elongation. Further mechanistic examination revealed that loss of ifc leads to depleted cellular ceramide levels as well as dihydroceramide accumulation, eventually causing defects in ER morphology and function. Importantly, the authors showed that ifc is predominantly expressed in glia and is critical for maintaining appropriate glial cell numbers and morphology. Many of the key phenotypes caused by the loss of fly ifc can be rescued by overexpression of human DEGS1 in glia, demonstrating the conserved nature of these proteins as well as the pathways they regulate. Interestingly, the authors discovered that the loss of lipid droplet formation in ifc mutant larvae within the cortex glia, presumably driving the deficits in glial wrapping around axons and subsequent neurodegeneration, potentially shedding light on mechanisms of HLD-18 and related disorders.

      Strengths:

      Overall, the manuscript is thorough in its analysis of ifc function and mechanism. The data images are high quality, the experiments are well controlled, and the writing is clear.

      Weaknesses:

      (1) The authors clearly demonstrated a reduction in number of glia in the larval brains of ifc mutant flies. What remains unclear is whether ifc loss leads to glial apoptosis or a failure for glia to proliferate during development. The authors should distinguish between these two hypotheses using apoptotic markers and cell proliferation markers in glia.

      To address this point, we used phospho-histone H3 to assess mitotic index in the thoracic CNS of wild-type versus ifc mutant late third instar larvae and found a mild, but significant reduction in mitotic index in ifc mutant relative to wild-type nerve cords. We also assessed the ability of glial-specific expression of the potent anti-apoptotic gene p35 to rescue the observed loss of cortex glia phenotype in the thoracic region of the CNS of otherwise ifc mutant larvae and observed a clear increase in cortex glia in the presence versus the absence of glial-specific p35 expression (p<3 x 10-4). These data are now provided as Supplementary Figure S8 in the paper and referred to on page 8.

      (2) It is surprising that human DEGS1 expression in glia rescues the noted phenotypes despite the different preference for sphingoid backbone between flies and mammals. Though human DEGS1 rescued the glial phenotypes described, can animal lethality be rescued by glial expression of human DEGS1? Are there longer-term effects of loss of ifc that cannot be compensated by the overexpression of human DEGS1 in glia (age-dependent neurodegeneration, etc.)?

      We note explicitly that while glial expression of human DEGS1 does provide rescuing activity, it only partially rescues the ifc mutant CNS phenotype in contrast to glial expression of Drosophila ifc, which fully rescues this phenotype. Thus, the relative activity of human DEGS1 is far below that of Drosophila ifc when assayed in flies. To quantify the functional difference between the two transgenes, we assessed the ability of glial expression of fly ifc or of human DEGS1 to rescue the lethality of otherwise ifc mutant larvae: Glial expression of ifc was sufficient to rescue the adult viability of 57.9% of ifc mutant flies based on expected Mendelian ratios (n=2452), whereas glial expression of DEGS1 was sufficient to rescue just 3.9% of ifc mutant flies (n=1303), uncovering a ~15-fold difference in the ability of the two transgenes to rescue the lethality of otherwise ifc mutant flies. In the absence of either transgene, no ifc mutant larvae reached adulthood (n=1030). These data are now provided in the text on page 9 of the revised manuscript. 

      (3) The mechanistic link between the loss of ifc and lipid droplet defects is missing. How do defects in ceramide metabolism alter triglyceride utilization and storage? While the author's argument that the loss of lipid droplets in larval glia will lead to defects in neuronal ensheathment, a discussion of how this is linked to ceramides needs to be added.

      We have revised the text to address this point. We speculate that the apparent increased demand for membrane phospholipid synthesis may drive the depletion of lipid droplets, providing a link to ifc function and ceramides. Below we provide the rewritten last paragraph; the underlined section is the new text.  

      “The expansion of ER membranes coupled with loss of lipid droplets in ifc mutant larvae suggests that the apparent demand for increased membrane phospholipid synthesis may drive lipid droplet depletion, as lipid droplet catabolism can release free fatty acids to serve as substrates for lipid synthesis. At some point, the depletion of lipid droplets, and perhaps free fatty acids as well, would be expected to exhaust the ability of cortex glia to produce additional membrane phospholipids required for fully enwrapping neuronal cell bodies. Under wild-type conditions, many lipid droplets are present in cortex glia during the rapid phase of neurogenesis that occurs in larvae. During this phase, lipid droplets likely support the ability of cortex glia to generate large quantities of membrane lipids to drive membrane growth needed to ensheathe newly born neurons. Supporting this idea, lipid droplets disappear in the adult Drosophila CNS when neurogenesis is complete and cortex glia remodeling stops. We speculate that lipid droplet loss in ifc mutant larvae contributes to the inability of cortex glia to enwrap neuronal cell bodies. Prior work on lipid droplets in flies has focused on stress-induced lipid droplets generated in glia and their protective or deleterious roles in the nervous system. Work in mice and humans has found that more lipid droplets are often associated with the pathogenesis of neurodegenerative diseases, but our work correlates lipid droplet loss with CNS defects. In the future, it will be important to determine how lipid droplets impact nervous system development and disease.”

      (4) On page 10, the authors use the words "strong" and "weak" to describe where ifc is expressed. Since the use of T2A-GAL4 alleles in examining gene expression is unable to delineate the amount of gene expression from a locus, the terms "broad" and "sparse" labeling (or similar terms) should be used instead.

      The ifc T2A-GAL4 insert in the ifc locus reports on the transcription of the gene. We agree that GAL4 system will not reflect amount of gene expression differences when the expression levels are not dramatically different. However, when the expression levels differ dramatically, as in our case, GAL4 system can reflect this difference in the expression of a reporter gene.  We reworded this section to suggest that ifc is transcribed at higher levels in glia as compared to neurons. We can’t use sparse or broad, as ifc is expressed in all, or at least in most, glia and neurons. The new text is as follows:” Using this approach, we observed strong nRFP expression in all glial cells (Figures 4D and S10A) and modest nRFP expression in all neurons (Figures 4E and S10B), suggesting ifc is transcribed at higher levels in glial cells than neurons in the larval CNS.”  

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      Strengths:

      This manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions.

      Weaknesses:

      I didn't find any obvious weakness.

      Reviewer #1 (Recommendations For The Authors):

      Additional minor comments below:

      (1) The authors state that TGs are the building blocks of membrane phospholipids. This is not exactly true. The breakdown of TGs can result in free FAs which can be used for membrane phospholipid synthesis. Also, membrane phospholipids can also be generated from free FAs that were never in TGs.

      To address this point, we have reworked a number of sentences in the text. On page 12 we reworded two small sections to the following: 

      “In the CNS, lipid droplets form primarily in cortex glia[29] and are thought to contribute to membrane lipid synthesis through their catabolism into free fatty acids versus acting as an energy source in the brain.[41] Consistent with the possibility that increased membrane lipid synthesis drives lipid droplet reduction, RNA-seq assays of dissected nerve cords revealed that loss of ifc drove transcriptional upregulation of genes that promote membrane lipid biogenesis”

      As TG breakdown results in free fatty acids that can be used for membrane phospholipid synthesis, we asked if changes in TG levels and saturation were reflected in the levels or saturation of the membrane phospholipids phosphatidylcholine (PC), phosphatidylethanolamine (PE), and phosphatidylserine (PS).

      (2) Figure 5J what does the dotted line indicate? Please specify in the figure legend or remove it.

      We have added the following text in the figure legend: Dotted line indicates a log2 fold change of 0.5 in the treatment group compared to the control group.

      (3) The text for your graphs is hard to read. Please make the font larger.

      We have increased font size to enhance the readability of the figures.

      (4) The authors mentioned that driving ifc expression in neurons rescues the phenotypes (ref 17). While the glial-specific role presented in this study is robust. I think some readers would appreciate some discussion of this study in light of the data presented here.

      We have added the below text on page 10 to address this point.

      “Results of our gene rescue experiments conflict with a prior study on ifc in which expression of ifc in neurons was found to rescue the ifc phenotype. In this context, we note that elav-GAL4 drives UASlinked transgene expression not just in neurons, but also in glia at appreciable levels, and thus needs to be paired with repo-GAL80 to restrict GAL4-mediated gene expression to neurons. Thus, “off-target” expression in glial cells may account for the discrepant results. It is, however, more difficult to reconcile how neuronal or glial expression of ifc would rescue the observed lethality of the ifc-KO chromosome given the presence additional lethal mutations in the 21E2 region of the second chromosome.”

      (5) While the analysis of fatty acid saturation is experimentally well done. I'm not really sure what the significance of this data is.

      We included this information as a reference for future analysis of additional genes in the ceramide biogenesis pathway, as we expect that alteration of the levels and saturation levels of PE, PC, and PS in cell membranes may underlie key changes in the biophysical properties of glial cell membranes and their ability to enwrap or infiltrate their targets. Thus, we expect the significance of these data to grow as more work is done on additional members of the ceramide pathway in the nervous system in flies and other systems.  

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a typo at the top of page 11: "internal membranes and fail enwrap neurons" is missing the word "to" before "enwrap"

      The typo was fixed.

      (2)  PMID: 36718090 should be included in the discussion of SPT and ORMDL complex in human disease.

      The reference was added.

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors report three novel ifc alleles: ifc[js1], ifc[js2], and ifc[js3]. ifc[js1] and ifc[js2] encode missense mutations, V276D and G257S, respectively. ifc[js3] encodes a nonsense mutation, W162*. These alleles exhibit multiple phenotypes, including delayed progression to the late-third larval instar stage, reduced brain size, elongation of the ventral nerve cord, axonal swelling, and lethality during late larval or early pupal stages.

      Further characterization of these alleles the authors reveals that ifc is predominantly expressed in glia and localizes to the endoplasmic reticulum (ER). The expression of ifc gene governs glial morphology and survival. Expression of fly ifc cDNA or human DEGS1 cDNA specifically in glia, but not neurons, rescues the CNS phenotypes of ifc mutants, indicating a crucial role for ifc in glial cells and its evolutionary conservation. Loss of ifc results in ER expansion and loss of lipid droplets in cortex glia. Additionally, loss of ifc leads to ceramide depletion and accumulation of dihydroceramide. Moreover, it increases the saturation levels of triacylglycerols and membrane phospholipids. Finally, the reduction of dihydroceramide synthesis suppresses the CNS phenotypes associated with ifc mutations, indicating the key role of dihydroceramide in causing ifc LOF defects.

      In summary, this manuscript unveils several intriguing and novel phenotypes of ifc loss-of-function in glia. The experiments are meticulously planned and executed, with the data strongly supporting their conclusions. I have no additional comments and fully support the publication of this manuscript in eLife.

      The authors also note that they added one paragraph to the discussion that addresses the possibility that the increased detection of cell death markers could arise due to the inability of glial cells to remove cellular debris. The text of this paragraph is provided below:

      We note that cortex glia are the major phagocytic cell of the CNS and phagocytose neurons targeted for apoptosis as part of the normal developmental process.23-26  Thus, while we favor the model that ifc triggers neuronal cell death due to glial dysfunction, it is also possible that increased detection of dying neurons arises due at least in part to a decreased ability of cortex glia to clear dying neurons from the CNS. At present, the large number of neurons that undergo developmentally programmed cell death combined with the significant disruption to brain and ventral nerve cord morphology caused by loss of ifc function render this question difficult to address.Additional evidence does, however, support the idea that loss of ifc function drives excess neuronal cell death: Clonal analysis in the fly eye reveals that loss of ifc drives photoreceptor neuron degeneration17, indicating that loss of ifc function drives neuronal cell death; cortex-glia specific depletion of CPES, which acts downstream of ifc, disrupts neuronal function and induces photosensitive epilepsy in flies59, indicating that genes in the ceramide pathway can act nonautonomously in glia to regulate neuronal function; recent genetic studies reveal that other glial cells can compensate for impaired cortex glial cell function by phagocytosing dying neurons62, and we observe that the cell membranes of subperineurial glia enwrap dying neurons in ifc mutant larvae (Fig. S14), consistent with similar compensation occurring in this background, and in humans, loss of function mutations in DEGS1 cause neurodegeneration.7-9 Clearly, future work is required to address this question for ifc/DEGS1 and perhaps other members of the ceramide biogenesis pathway.

    1. eLife Assessment

      This study is important as it highlighted how IL-4 regulates the reactive state of a specific microglial population by increasing the proportion of CD11c+ microglial cells and ultimately suppressing neuropathic pain. The study employs a combination of behavioral assays, pharmacogenetic manipulation of microglial populations, and characterization of microglial markers to address these questions. It provided convincing evidence for the proposed mechanism of IL-4-mediated microglial regulation in neuropathic pain.

    2. Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how IL-4 modulates the reactive state of microglia in the context of neuropathic pain. Specifically, they sought to determine whether IL-4 drives an increase in CD11c+ microglial cells, a population associated with anti-inflammatory responses, and whether this change is linked to the suppression of neuropathic pain. The study employs a combination of behavioral assays, pharmacogenetic manipulation of microglial populations, and characterization of microglial markers to address these questions.

      Strengths:

      Strengths: The methodological approach in this study is robust, providing convincing evidence for the proposed mechanism of IL-4-mediated microglial regulation in neuropathic pain. The experimental design is well thought out, utilizing two distinct neuropathic pain models (SpNT and SNI), each yielding different outcomes. The SpNT model demonstrates spontaneous pain remission and an increase in the CD11c+ microglial population, which correlates with pain suppression. In contrast, the SNI model, which does not show spontaneous pain remission, lacks a significant increase in CD11c+ microglia, underscoring the specificity of the observed phenomenon. This design effectively highlights the role of the CD11c+ microglial population in pain modulation. The use of behavioral tests provides a clear functional assessment of IL-4 manipulation, and pharmacogenetic tools allow for precise control of microglial populations, minimizing off-target effects. Notably, the manipulation targets the CD11c promoter, which presumably reduces the risk of non-specific ablation of other microglial populations, strengthening the experimental precision. Moreover, the thorough characterization of microglial markers adds depth to the analysis, ensuring that the changes in microglial populations are accurately linked to the behavioral outcomes.

      Weaknesses:

      One potential limitation of the study is that the mechanistic details of how IL-4 induces the observed shift in microglial populations are not fully explored. While the study demonstrates a correlation between IL-4 and CD11c+ microglial cells, a deeper investigation into the specific signaling pathways and molecular processes driving this population shift would greatly strengthen the conclusions. Additionally, the paper does not clearly integrate the findings into the broader context of microglial reactive state regulation in neuropathic pain.

      Comments on revisions:

      In the revised manuscript, the authors have successfully addressed my previous concerns as well as the other reviewers. I do not have further concerns about this study.

    3. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary: 

      Kohno et al. examined whether the anti-inflammatory cytokine IL-4 attenuates neuropathic pain by promoting the emergence of antinociceptive microglia in the dorsal horn of the spinal cord. In two models of neuropathic pain following peripheral nerve injury, intrathecal administration of IL-4 once a day for 3 days from day 14 to day 17 after injury, attenuates hypersensitivity to mechanical stimuli in the hind paw ipsilateral to nerve injury. Such an antinociceptive effect correlates with a higher number of CD11c+microglia in the dorsal horn of the spinal cord which is the termination area for primary afferent fibres injured in the periphery. Interestingly, CD11c+ microglia emerge spontaneously in the dorsal horn in concomitance with the resolution of pain in the spinal nerve model of pain, but not in the spared nerve injury model where pain does not resolve, confirming that this cluster of microglia is involved in resolution pain. 

      Based on existing evidence that the receptor for IL-4, namely IL-4R, is expressed by microglia, the authors suggest that IL-4R mediates IL-4 effect in microglia including up-regulation of Igf1 mRNA. They have previously reported that IGF-1 can attenuate pain neuron activity in the spinal cord. 

      Strengths:

      This study includes cutting-edge techniques such as flow cytometry analysis of microglia and transgenic mouse models. 

      Weaknesses:

      The conclusion of this paper is supported by data, but the interpretation of some data requires clarification.  

      We appreciate the reviewer's careful reading of our paper.  According to the reviewer's comments, we have performed new immunohistochemical experiments and added some discussion in the revised manuscript (please see the point-by-point responses below).

      Reviewer #2 (Public review):

      Summary:

      The authors aimed to investigate how IL-4 modulates the reactive state of microglia in the context of neuropathic pain. Specifically, they sought to determine whether IL-4 drives an increase in CD11c+ microglial cells, a population associated with anti-inflammatory responses and whether this change is linked to the suppression of neuropathic pain. The study employs a combination of behavioral assays, pharmacogenetic manipulation of microglial populations, and characterization of microglial markers to address these questions. 

      Strengths: 

      The methodological approach in this study is robust, providing convincing evidence for the proposed mechanism of IL-4-mediated microglial regulation in neuropathic pain. The experimental design is well thought out, utilizing two distinct neuropathic pain models (SpNT and SNI), each yielding different outcomes. The SpNT model demonstrates spontaneous pain remission and an increase in the CD11c+ microglial population, which correlates with pain suppression. In contrast, the SNI model, which does not show spontaneous pain remission, lacks a significant increase in CD11c+ microglia, underscoring the specificity of the observed phenomenon. This design effectively highlights the role of the CD11c+ microglial population in pain modulation. The use of behavioral tests provides a clear functional assessment of IL-4 manipulation, and pharmacogenetic tools allow for precise control of microglial populations, minimizing off-target effects. Notably, the manipulation targets the CD11c promoter, which presumably reduces the risk of non-specific ablation of other microglial populations, strengthening the experimental precision. Moreover, the thorough characterization of microglial markers adds depth to the analysis, ensuring that the changes in microglial populations are accurately linked to the behavioral outcomes. 

      Weaknesses: 

      One potential limitation of the study is that the mechanistic details of how IL-4 induces the observed shift in microglial populations are not fully explored. While the study demonstrates a correlation between IL-4 and CD11c+ microglial cells, a deeper investigation into the specific signaling pathways and molecular processes driving this population shift would greatly strengthen the conclusions. Additionally, the paper does not clearly integrate the findings into the broader context of microglial reactive state regulation in neuropathic pain.  

      We thank the reviewer for these insightful comments on our paper.  As the reviewer's suggested, further investigation of the specific signaling pathways and molecular processes by which IL-4 induces a transition of spinal microglia to the CD11c+ state would strengthen our conclusion and also provide important clues to discovering new therapeutic targets.  In revising the manuscript, we have included this in the Discussion section (line 264-267), and we hope that future studies clarify these points.  As for the additional comment, we have added a brief summary of existing research on microglial function in neuropathic pain at the beginning of the Discussion section (line 188–196).

      Reviewer #1 (Recommendations for the authors):

      The conclusions of this paper are supported by data, but the interpretation of some data requires clarification. 

      (1) In Figure 1D and Figure 7 C, CD11c+ microglia numbers are higher in contralateral dorsal horns after IL-4 administration despite IL-4 having no effect on pain thresholds. The authors should discuss these findings.  

      As the reviewer pointed out, IL-4 increased the number of CD11c<sup>+</sup> microglia in the contralateral spinal dorsal horn (SDH) but did not affect pain thresholds in the contralateral hindpaw.  The data seem to be related to the selective effect of CD11c+ microglia and their factors (especially IGF1) on nerve injury-induced pain hypersensitivity.  In fact, depletion of CD11c+ spinal microglia and intrathecal administration of IGF1 do not elevate pain threshold of the contralateral hindpaw (Science 376: 86–90, 2022).  We have added above statement in the Discussion section (line 208– 213).

      (2)  Do monocytes infiltrate the dorsal horn and DRG after intrathecal injections?

      To address this reviewer's comment, we performed new immunohistochemical experiments to analyze monocytes in the SDH using an antibody for CD169 (a marker for bone marrow-derived monocytes/macrophages, but not for resident microglia) (J Clin Invest 122: 3063– 3087, 2012; Cell Rep 3: 605–614, 2016) and found no CD169+ monocytes in the SDH parenchyma after SpNT.  Consistent with this data, we have previously demonstrated that few bone marrow-derived monocytes/macrophages are recruited to the SDH after SpNT (Sci Rep 6: 23701, 2016).  Similarly, no CD169+ monocytes in the SDH parenchyma were observed in SpNT mice treated intrathecally with PBS or IL-4 (Figure 1—figure supplement 1A).

      In the DRG, CD169 is constitutively expressed in macrophages.  Thus, in accordance with a recent report showing that monocytes infiltrating the DRG are positive for chemokine (C-C motif) receptor 2 (CCR2) (J Exp Med 221: e20230675, 2024), we analyzed CCR2+ cells and found that CCR2+ IBA1dim monocytes were observed in the capsule and parenchyma of the DRG of naive mice (Figure 1—figure supplement 1B).  After SpNT, CCR2+ IBA1dim monocytes in the DRG parenchyma increased.  Intrathecal treatment of IL-4 increased CCR2+ IBA1dim cells in the DRG capsule.  However, the involvement of these monocytes in the DRG in IL-4-induced alleviation of neuropathic pain is unclear and warrants further investigation.  In revising the manuscript, we have included additional data (Figure 1—figure supplement 1) and corresponding text in the Results (line 112–114) and Discussion section (line 218–222).

      (3) In Figure 4, depletion of CD11c+ cells in dorsal root ganglia (DRG) ameliorates neuropathic thresholds but does not alter the anti-nociceptive effect of IL-4 injected intrathecal. It appears that CD11c+ macrophages in DRG have an opposite role to CD11c+ microglia in the spinal cord. Please discuss this result. 

      We apologize for the confusion.  The aim of the experiments in Figure 4 was to examine the contribution of CD11c+ cells in the DRG to the pain-alleviating effect of intrathecal IL-4.  For this aim, we depleted CD11c+ cells in the DRG (but not in the SDH) by intraperitoneal injection of diphtheria toxin (DTX) immediately after the behavioral measurements performed on day 17 (Fig. 4A, B).  On day 18, the paw withdrawal threshold of DTX-treated mice was almost similar to that of PBS-treated mice, indicating that the depletion of CD11c+ cells in the DRG does not affect the pain-alleviating effect of IL-4.  These data are in stark contrast to those obtained from mice with depletion of CD11c+ cells in the SDH by intrathecal DTX (the depletion canceled the IL-4's effect) (Figure 2A).  Thus, it is conceivable that CD11c+ cells in the DRG are not involved in the IL-4-induced alleviating effect on neuropathic pain.  Because the confusion might be related to the statement in this paragraph of the initial version, we thus modified our statements to make this point more clearly (line 133–139).

      Reviewer #2 (Recommendations for the authors):

      A discussion addressing how these results fit into existing research on microglial function in pain would enhance the study's impact.

      A brief summary of existing research on microglial function in neuropathic pain has been included at the beginning of the Discussion section (line 188–196).

      It would be helpful for the authors to elaborate on the implications of their findings within the larger landscape of immune regulation in neuropathic pain.

      Our present findings showed an ability of IL-4, known as a T-cell-derived factor, to increase CD11c+ microglia and to control neuropathic pain.  Furthermore, recent studies have also indicated that immune cells such as CD8+ T cells infiltrating into the spinal cord (Neuron 113: 896-911.e9, 2025), and regulatory T cells (eLife 10: e69056, 2021; Science 388: 96–104, 2025) and MRC1+ macrophages in the spinal meninges (Neuron 109: 1274–1282, 2021) have important roles in regulating microglial states and neuropathic pain.  Thus, these findings provide new insights into the mechanisms of the neuro-immune interactions that regulate neuropathic pain.  In revising the manuscript, we have added above statement in the Discussion section (line 254–260).

      Furthermore, a discussion on how these findings could inform the development of targeted therapies that modulate microglial populations in a controlled, disease-specific manner would be valuable. Exploring how these insights could lead to novel treatment strategies for neuropathic pain could provide important future directions for the research and broader clinical applications.

      We appreciate the reviewer's valuable suggestion.  Our current data, demonstrating that IL-4 increases CD11c+ microglia without affecting the total number of microglia, could open a new avenue for developing strategies to modulate microglial subpopulations through molecular targeting, which may lead to new analgesics.  However, given IL-4's association with allergic responses, targeting microglia-selective molecules involved in shifting microglia toward the CD11c+ state—such as intracellular signaling molecules downstream of IL-4 receptors—may offer a more selective and safer therapeutic approach.  Moreover, since CD11c+ microglia have been implicated in other CNS diseases [e.g., Alzheimer disease (Cell 169: 1276–1290, 2017), amyotrophic lateral sclerosis (Nat Neurosci 25: 26–38, 2022), and multiple sclerosis (Front Cell Neurosci 12: 523, 2019)], further investigations into the mechanisms driving CD11c+ microglia induction could provide insights into novel therapeutic strategies not only for neuropathic pain but also for other CNS diseases.  In revising the manuscript, we have added above statement in the Discussion section (line 260–271).

    1. eLife Assessment

      This study provides valuable findings regarding potential correlates of protection against the African swine fever virus. The evidence supporting the claims is solid, although analysis using a higher number of animals and other virus strains will be required to further evaluate the relevance of the immune parameters associated to protection. The work will be of broad interest to veterinary immunologists, and particularly those working on African swine fever.

    2. Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFN are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

    4. Author Response:

      Reviewer #1 (Public review):

      The study by Lotonin et al. investigates correlates of protection against African swine fever virus (ASFV) infection. The study is based on a comprehensive work, including the measurement of immune parameters using complementary methodologies. An important aspect of the work is the temporal analysis of the immune events, allowing for the capture of the dynamics of the immune responses induced after infection. Also, the work compares responses induced in farm and SPF pigs, showing the latter an enhanced capacity to induce a protective immunity. Overall, the results obtained are interesting and relevant for the field. The findings described in the study further validate work from previous studies (critical role of virus-specific T cell responses) and provide new evidence on the importance of a balanced innate immune response during the immunization process. This information increases our knowledge on basic ASF immunology, one of the important gaps in ASF research that needs to be addressed for a more rational design of effective vaccines. Further studies will be required to corroborate that the results obtained based on the immunization of pigs by a not completely attenuated virus strain are also valid in other models, such as immunization using live attenuated vaccines.

      While overall the conclusions of the work are well supported by the results, I consider that the following issues should be addressed to improve the interpretation of the results:

      We thank Reviewer #1 for their thoughtful and constructive feedback, which will significantly contribute to improving the clarity and quality of our manuscript. Below, we respond to each of the reviewer’s comments and outline the revisions we plan to incorporate.

      (1) An important issue in the study is the characterization of the infection outcome observed upon Estonia 2014 inoculation. Infected pigs show a long period of viremia, which is not linked to clinical signs. Indeed, animals are recovered by 20 days post-infection (dpi), but virus levels in blood remain high until 141 dpi. This is uncommon for ASF acute infections and rather indicates a potential induction of a chronic infection. Have the authors analysed this possibility deeply? Are there lesions indicative of chronic ASF in infected pigs at 17 dpi (when they have sacrificed some animals) or, more importantly, at later time points? Does the virus persist in some tissues at late time points, once clinical signs are not observed? Has all this been tested in previous studies?

      Tissue samples were tested for viral loads only at 17 dpi during the immunization phase, and long-term persistence of the virus in tissues has not been assessed in our previous studies. At 17 dpi, lesions were most prominently observed in the lymph nodes of both farm and SPF pigs. In a previous study using the Estonia 2014 strain (doi: 10.1371/journal.ppat.1010522), organs were analyzed at 28 dpi, and no pathological signs were detected. This finding calls into question the likelihood of chronic infection being induced by this strain.

      (2) Virus loads post-Estonia infection significantly differ from whole blood and serum (Figure 1C), while they are very similar in the same samples post-challenge. Have the authors validated these results using methods to quantify infectious particles, such as Hemadsorption or Immunoperoxidase assays? This is important, since it would determine the duration of virus replication post-Estonia inoculation, which is a very relevant parameter of the model.

      We did not perform virus titration but instead used qPCR as a sensitive and standardized method to assess viral genome loads. Although qPCR does not distinguish between infectious and non-infectious virus, it provides a reliable proxy for relative viral replication and clearance dynamics in this model. Unfortunately, no sample material remains from this experiment, but we agree that subsequent studies employing infectious virus quantification would be valuable for further refining our understanding of viral persistence and replication following Estonia 2014 infection.

      (3) Related to the previous points, do the authors consider it expected that the induction of immunosuppressive mechanisms during such a prolonged virus persistence, as described in humans and mouse models? Have the authors analysed the presence of immunosuppressive mechanisms during the virus persistence phase (IL10, myeloid-derived suppressor cells)? Have the authors used T cell exhausting markers to immunophenotype ASFV Estonia-induced T cells?

      We agree with the reviewer that the lack of long-term protection can be linked to immunosuppressive mechanisms, as demonstrated for genotype I strains (doi: 10.1128/JVI.00350-20). The proposed markers were not analyzed in this study but represent important targets for future investigation. We will address this point in the discussion.

      (4) A broader analysis of inflammatory mediators during the persistence phase would also be very informative. Is the presence of high VLs at late time points linked to a systemic inflammatory response? For instance, levels of IFNa are still higher at 11 dpi than at baseline, but they are not analysed at later time points.

      While IFN-α levels remain elevated at 11 dpi, this response is typically transient in ASFV infection and likely not linked to persistent viremia. We agree that analyzing additional inflammatory markers at later time points would be valuable, and future studies should be designed to further understand viral persistence.

      (5) The authors observed a correlation between IL1b in serum before challenge and protection. The authors also nicely discuss the potential role of this cytokine in promoting memory CD4 T cell functionality, as demonstrated in mice previously. However, the cells producing IL1b before ASFV challenge are not identified. Might it be linked to virus persistence in some organs? This important issue should be discussed in the manuscript.

      We agree that identifying the cellular source of IL-1β prior to challenge is important, and this should be addressed in subsequent studies. We will include a discussion on the potential link between elevated IL-1β levels and virus persistence in certain organs.

      (6) The lack of non-immunized controls during the challenge makes the interpretation of the results difficult. Has this challenge dose been previously tested in pigs of the age to demonstrate its 100% lethality? Can the low percentage of protected farm pigs be due to a modulation of memory T and B cell development by the persistence of the virus, or might it be related to the duration of the immunity, which in this model is tested at a very late time point? Related to this, how has the challenge day been selected? Have the authors analysed ASFV Estonia-induced immune responses over time to select it?

      In our previous study, intramuscular infection with ~3–6 × 10² TCID₅₀/mL led to 100% lethality (doi: 10.1371/journal.ppat.1010522), which is notably lower than the dose used in the present study, although the route here was oronasal. The modulation of memory responses could be more thoroughly assessed in future studies using exhaustion markers. The challenge time point was selected based on the clearance of the virus from blood and serum. We agree that the lack of protection in some animals is puzzling and warrants further investigation, particularly to assess the role of immune duration, potential T cell exhaustion caused by viral persistence, or other immunological factors that may influence protection. Based on our experience, vaccine virus persistence alone does not sufficiently explain the lack-of-protection phenomenon. We will incorporate these important aspects into the revised discussion.

      (7) Also, non-immunized controls at 0 dpc would help in the interpretation of the results from Figure 2C. Do the authors consider that the pig's age might influence the immune status (cytokine levels) at the time of challenge and thus the infection outcome?

      We support the view that including non-immunized controls at 0 dpc would strengthen the interpretation of cytokine dynamics and will consider this in future experimental designs. Regarding age, while all animals were within a similar age range at the time of challenge, we acknowledge that age-related differences in immune status could influence baseline cytokine levels and infection outcomes, and this is an important factor to consider.

      (8) Besides anti-CD2v antibodies, anti-C-type lectin antibodies can also inhibit hemadsorption (DOI: 10.1099/jgv.0.000024). Please correct the corresponding text in the results and discussion sections related to humoral responses as correlates of protection. Also, a more extended discussion on the controversial role of neutralizing antibodies (which have not been analysed in this study), or other functional mechanisms such as ADCC against ASFV would improve the discussion.

      The relevant text in the Results and Discussion sections will be revised accordingly, and the discussion will be extended to more thoroughly address the roles of antibodies.

      Reviewer #2 (Public review):

      Summary:

      In the current study, the authors attempt to identify correlates of protection for improved outcomes following re-challenge with ASFV. An advantage is the study design, which compares the responses to a vaccine-like mild challenge and during a virulent challenge months later. It is a fairly thorough description of the immune status of animals in terms of T cell responses, antibody responses, cytokines, and transcriptional responses, and the methods appear largely standard. The comparison between SPF and farm animals is interesting and probably useful for the field in that it suggests that SPF conditions might not fully recapitulate immune protection in the real world. I thought some of the conclusions were over-stated, and there are several locations where the data could be presented more clearly.

      Strengths:

      The study is fairly comprehensive in the depth of immune read-outs interrogated. The potential pathways are systematically explored. Comparison of farm animals and SPF animals gives insights into how baseline immune function can differ based on hygiene, which would also likely inform interpretation of vaccination studies going forward.

      Weaknesses:

      Some of the conclusions are over-interpreted and should be more robustly shown or toned down. There are also some issues with data presentation that need to be resolved and data that aren't provided that should be, like flow cytometry plots.

      We appreciate the feedback from the Reviewer #2 and acknowledge the concerns raised regarding data presentation. In the revised manuscript, we will clarify our conclusions where needed and ensure that interpretations are better aligned with the data shown.

    1. eLife Assessment

      This study presents a potentially fundamental analysis of a fossil feather from a 125-million-year-old enantiornithine bird. Using sophisticated 3D microscopic and numerical methods, the authors conclude that the feather was iridescent and brightly colored, possibly indicating that this was a male bird that used its crest in sexual displays. At present, the strength of evidence supporting the conclusions is considered incomplete based on methodological shortcomings and questions about taphonomy.

    2. Reviewer #1 (Public review):

      Summary:

      Li et al describe a novel form of melanosome based iridescence in the crest of an Early Cretaceous enantiornithine avialan bird from the Jehol Group.

      This is an interesting manuscript that describes never before seen melanosome structures and explores fossilised feathers through new methods. This paper creates an opening for new work to explore coloration in extinct birds.

      Strengths:

      A novel set of methods applied to the study of fossil melanosomes.

      Comments on revised version:

      The authors provided a response to the previous 9 issues, for which additional response is provided here:

      (1) I respectfully disagree with the authors justification regarding the crest. They show one specimen of Confuciusornis with short feathers (which appears to be a unique feature of this species, possibly related to the fact it is beaked) but what about the more primitive Eoconfuciusornis, a referred specimen of which superficially has an enormous "crest" (Zheng et al 2017), as does Changchengornis (Ji et al 1999). Regardless, it would make more sense compare this new specimen to other enantiornithines. Although limited by the preservation of body feathers, which is not all that common, the following published enantiornithines also exhibit a "crest": bohaiornithid indet. (Peteya et al 2017); Brevirostruavis (Li et al 2021); Dapingfangornis (Li et al 2006); Eoenantiornis (Zhou et al 2005); Grabauornis (Dalsatt etal 2014); Junornis (Liu et al 2017); Longirostravis (Hou etal 2004); Monoenantiornis (Hu & O'Connor 2016); Neobohaiornis (Shen etal 2024); Orienantiornis (Liu etal 2019); Parabohaironis (Wang 2023); Parapengornis (Hu etal 2015); Paraprotopteryx (Zheng et al 2007); and every specimen of Protopteryx. In fact, every single published enantiornithine that preserves any feathering on the head has the feathers preserved perpendicular to the bone (in fact, the body feathers on all parts of the bed are splayed at a right angle to the bone due to compression), as shown in the confuciuornis specimen image provided by the authors. Since it is highly improbable they all had crests, the authors have no justification for the interpretation that this new specimen was crested. This does not mean that the feathers were not iridescent or take away from the novel methods these authors have used to explore preserved feathers.

      (2) Yes, this is possible, but see above for the very strong argument against interpretation of these feathers as forming a crest.

      (3) This just further makes the point that the isolated feather is not likely from the head. Since the neck feathers are missing, it is more likely that it is these feathers that have been disarticulated (and sampled) from the neck region rather than from the very complete looking head feathers; this has significant implications with regards to the birds colour pattern.

      (4) Thank you for acknowledging taphonomy.

      (5) An interesting hypothesis and one I look forward to seeing explored in the future.

      (6) Since the compression is in a single direction, in fact it is not reasonable to assume that distortion would be random. One might predict similar distortion, as with the feathers (spread out from the bone at a 90˚ angle) and bone (crushed), which are all compressed in a single direction. However, I agree that such a consistent discovery suggests it is not an artifact of preservation, and only further studies will elucidate this

      (7) I still fail to detect this hexagonal pattern - could machine learning be used to quantify this pattern? The random arrangement of white arrows does little to clarify the authors interpretations.

      (8) Great to see additional sampling

      (9) Thank you for the explanation.

    3. Reviewer #3 (Public review):

      Summary:

      The paper presents an in-depth analysis of the original colour of a fossil feather from the crest of a 125-million-year-old enantiornithine bird. From its shape and location, it would be predicted that such a feather might well have shown some striking colour and pattern. The authors apply sophisticated microscopic and numerical methods to determine that the feather was iridescent and brightly coloured, and possibly indicates this was a male bird that used its crest in sexual displays.

      Strengths:

      The 3D micro-thin-sectioning techniques and the numerical analyses of light transmission are novel and state of the art. The example chosen is a good one, as a crest feather likely to have carried complex and vivid colours as a warning or for use in sexual display. The authors correctly warn that without such 3D study feather colours might be given simply as black from regular 2D analysis, and the alignment evidence for iridescence could be missed.

      Weaknesses: Trivial

    1. eLife Assessment

      This important study explores the regulation of collective cell migration and tissue patterning in the zebrafish posterior lateral line primordium by SoxB1 transcription factors. The authors provide evidence that SoxB1 genes interact with Wnt and Fgf signaling pathways to control neuromast deposition and spacing, a process central to sensory organ development. The work offers mechanistic insight into the self-organization of migrating tissues and adds to the understanding of how transcriptional networks integrate with signaling pathways during morphogenesis. However, the strength of the evidence supporting several key conclusions is incomplete due to insufficient validation of mutant and knockdown tools, lack of quantitative analysis, and unclear experimental design details; additional quantification and more rigorous verification of gene knockdown or loss-of-function tools are needed to support the proposed model.

    2. Reviewer #1 (Public review):

      Summary:

      Palardy and colleagues examine how transcription factors of the SoxB1 family alter patterning within the zebrafish posterior lateral line primordium and subsequent formation of neuromast organs along the body of the developing fish. They describe how expression of soxb genes changes when Wnt and Fgf signaling pathways are altered, and in addition, how outputs of these signalling pathways change when soxb gene expression is disrupted. Together, experiments suggest a model where the expression of SoxB genes counteracts Wnt signaling. Support comes from the combined inhibition of both pathways, partially restoring the pattern of neuromast deposition. Together, the work reveals an additional layer of control over Wnt and Fgf signals that together ensure proper posterior lateral line development.

      Strengths:

      The authors provide a clear analysis of changes in RNA expression after systematic manipulation of gene expression and signaling pathways to construct a plausible model of how Sox factors regulate primordium patterning.

      Weaknesses:

      There is little attempt to capture the variation of expression patterns with each manipulation. Photomicrographs are examples, with little quantification.

      While the combined loss of soxb functions shows more severe phenotypes, it is not exactly clear what underlies the apparent redundancy. It would be helpful if the soxb gene family member expression was reported after loss of each. Expression of sox1a is shown in sox2 mutants in Figure 4, but other combinations are not reported. This additional analysis would clarify whether there are alterations in expression that influence apparent redundancy.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript seeks to determine the molecular basis of tissue patterning in the collectively migrating cells of the zebrafish posterior lateral line primordium. In particular, the authors examine the cross-regulation of canonical Wnt signaling, Fgf signaling, and the SoxB1 family members Sox1a, Sox2, and Sox3 in the migrating primordium. Using a combination of mutant lines, morphino (MO) knock down, pharmacological inhibition, and dominant-negative inhibition, the authors propose a model in which Sox2 and Sox3 in the trailing region of the primordium restricts Wnt signaling to the leading region, facilitating the formation of rosettes and the deposition of the first formed neuromast downstream of Fgf pathway activity. In contrast, sox1a is expressed in the leading region of the primordium, and the sox1ay590 -/- mutant shows little phenotype on its own. Together, the authors propose a multistep signaling loop that regulates tissue patterning during lateral line collective cell migration.

      Strengths:

      The zebrafish posterior lateral line primordium is a well-established model for the study of collective cell migration that is useful for genetic manipulation and live imaging. The manuscript seeks to understand the complex reciprocal regulation of signaling pathways that regulate tissue patterning of collectively migrating cells.

      Weaknesses:

      (1) The primary tools used in this study are inadequate to support the author's conclusions.

      A. The authors state that the phenotype of the sox2y589 homozygous mutant line described in this manuscript changed across generations, but do not specify which generation is used for any given experiment. The sox2y589 mutant line is not properly verified in this manuscript, which could be done by examining ant-Sox2 antibody labeling, Western blot analysis, or complementation to the existing sox2x50 line described in Gou et al., 2018a and Gou et al., 2018b. There are also published sox1a mutant lines Lekk, et al., 2019.

      B. The authors acknowledge that the sox2 MO1 used in this manuscript also alters sox3 function, but do not redo the experiments with a specific sox2 MO. In addition, the authors show that the anti-Sox2 and anti-Sox3 antibody labeling is reduced but not absent in sox2 MO1 and sox3 MO-injected embryos, but do not show antibody labeling of the sox2 MO and sox3 MO-double injected embryos to determine if there is an additional knockdown.

      C. The authors examine RNA in situ hybridization patterns of sox2 and sox3 following various manipulations, but do not use anti-Sox2 and anti-Sox3 antibody labeling, which would provide more quantifiable information about changes in patterning.

      (2) The manuscript lacks important experimental details and appropriate quantification of results.

      A. It is unclear for most of the experiments described in this manuscript how many individual embryos were examined for each experiment and how robust the results are for each condition. Only Figure 3 includes information about the numbers for each experiment, and in all cases, the experimental manipulations are not fully penetrant, and there is no statistical analysis.

      B. It is not clear at what stage most of the RNA in situ hybridizations were performed.

      C. The manuscript lacks quantification of many of the experiments, making it difficult to conclude their significance.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to understand the molecular underpinnings of the complex process of periodic deposition of the neuromast organs of the embryonic posterior lateral line (PLL) sensory system in zebrafish. It was previously established that Fgf signaling in the trailing zone of the migrating PLL primordium is key to protoneuromast establishment, while Wnt signaling in the leading zone must be downregulated to allow new Fgf signaling-dependent protoneuromasts to form. Here, the authors evaluate the role of three SoxB transcription factors (Sox1a, Sox2, and Sox3) in this complex process, generating two novel CRISPR mutants as part of their study. They interrogate the interplay of the SoxB genes with the Fgf and Wnt signaling pathways during PLL primordium migration, using a combination of genetics, knockdown, and imaging approaches, including live time-lapse studies. They report a key role for the SoxB genes in regulating the pace of protoneuromast maturation as the primordium migrates, thus ensuring appropriate deposition and spacing of the neuromast organs.

      Strengths:

      Strengths of the study are the careful quantitative analysis. based on imaging approaches, of the impact of mutation or knockdown of SoxB genes, coupled with the use of heat shock inducible dominant negative strategies to address how SoxB genes interact with Wnt and Fgf signaling. Functional analyses convincingly uncover a SoxB regulatory network that serves to limit Wnt activity, as directly read out with a live Wnt reporter. The finding that Wnt inhibition (achieved using pharmacological reagents) rescues the SoxB deficiency phenotype provides compelling evidence of the centrality of the Wnt pathway in mediating SoxB function. Use of atoh1 markers to track the stages of development of the neuromasts provides an effective approach to following their maturation, and allows the authors to explore how SoxB/Wnt interplay ultimately translates into the establishment of functional neuromasts. Finally, loss of Sox2 function, together with loss of either Sox1a or Sox3, blocks maturation of the neuromasts, clearly establishing redundancy between these SoxB family genes.

      The concepts introduced and explored in this study - of complex gene networks that work within a dynamic cellular environment to enable self-organization and ultimately stabilization of cell fate choices-provide a useful conceptual framework for future studies. This study is therefore of relevance to understanding the morphogenesis of self-organizing tissues more broadly.

      Weaknesses:

      A minor weakness is the use of SoxB morpholino (MO) knockdown reagents, which are interspersed with mutant analyses. Although the stable mutants are available, they would be challenging to couple with the reporter transgenes used for some of the experiments, providing a reasonable rationale for the use of MO reagents (although the authors don't overtly provide this rationale). Moreover, reduced penetrance of the Sox2 mutants over multiple generations is noted, but no detailed explanation for this finding is offered.

      Given that the expression patterns of Sox1a and Sox3 are not merely different but are largely reciprocal, the mechanistic basis of their very similar double mutant phenotypes with Sox2 remains opaque. Related to this, the authors discuss that Sox1a/Sox2 double knockdown produces a more severe phenotype than Sox2/Sox3 double knockdown, yet this difference is not obviously reflected in the data, some of which is not shown.

    1. eLife Assessment

      This study provides an important method to model the statistical biases of hypermutations during the affinity maturation of antibodies. The authors show convincingly that their model outperforms previous methods with fewer parameters; this is made possible by the use of machine learning to expand the context dependence of the mutation bias. They also show that models learned from nonsynonymous mutations and from out-of-frame sequences are different, prompting new questions about germinal center function. Strengths of the study include an open-access tool for using the model, a careful curation of existing datasets, and a rigorous benchmark; it is also shown that current machine-learning methods are currently limited by the availability of data, which explains the only modest gain in model performance afforded by modern machine learning.

    2. Reviewer #1 (Public review):

      Summary:

      This paper introduces a new class of machine learning models for capturing how likely a specific nucleotide in a rearranged IG gene is to undergo somatic hypermutation. These models modestly outperform existing state-of-the-art efforts, despite having fewer free parameters. A surprising finding is that models trained on all mutations from non-functional rearrangements give divergent results from those trained on only silent mutations from functional rearrangements.

      Strengths:

      * The new model structure is quite clever and will provide a powerful way to explore larger models.<br /> * Careful attention is paid to curating and processing large existing data sets.<br /> * The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation.

      Weaknesses:

      * No significant weaknesses noted

    3. Reviewer #2 (Public review):

      This work offers an insightful contribution for researchers in computational biology, immunology, and machine learning. By employing a 3-mer embedding and CNN architecture, the authors demonstrate that it is possible to extend sequence context without exponentially increasing the model's complexity. Key findings include:

      • Efficiency and Performance: Thrifty CNNs outperform traditional 5-mer models and match the performance of significantly larger models like DeepSHM.<br /> • Neutral Mutation Data: A distinction is made between using synonymous mutations and out-of-frame sequences for model training, with evidence suggesting these methods capture different aspects of SHM, or different biases in the type of data.<br /> • Open Source Contributions: The release of a Python package and pretrained models adds practical value for the community.

      However, readers should be aware of the limitations. The improvements over existing models are modest, and the work is constrained by the availability of high-quality out-of-frame sequence data. The study also highlights that more complex modeling techniques, like transformers, did not enhance predictive performance, which underscores the role of data availability in such studies.

    4. Reviewer #3 (Public review):

      Summary:

      Modeling and estimating sequence context biases during B cell somatic hypermutation is important for accurately modeling B cell evolution to better understand responses to infection and vaccination. Sung et al. introduce new statistical models that capture a wider sequence context of somatic hypermutation with a comparatively small number of additional parameters. They demonstrate their model's performance with rigorous testing across multiple subjects and datasets. Prior work has captured the mutation biases of fixed 3-, 5-, and 7-mers, but each of these expansions has significantly more parameters. The authors developed a machine-learning-based approach to learn these biases using wider contexts with comparatively few parameters.

      Strengths:

      Well motivated and defined problem. Clever solution to expand nucleotide context. Complete separation of training and test data by using different subjects for training vs testing. Release of open-source tools and scripts for reproducibility.

      The authors have addressed my prior comments.

    5. Author Response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review):

      Summary:

      This paper introduces a new class of machine learning models for capturing how likely a specific nucleotide in a rearranged IG gene is to undergo somatic hypermutation. These models modestly outperform existing state-of-the-art efforts, despite having fewer free parameters. A surprising finding is that models trained on all mutations from non-functional rearrangements give divergent results from those trained on only silent mutations from functional rearrangements.

      Strengths:

      (1) The new model structure is quite clever and will provide a powerful way to explore larger models.

      (2) Careful attention is paid to curating and processing large existing data sets.

      (3) The authors are to be commended for their efforts to communicate with the developers of previous models and use the strongest possible versions of those in their current evaluation.

      Thank you very much for your comments. We especially appreciate the last comment, as we have indeed tried hard to do so.

      Weaknesses:

      (1) 10x/single cell data has a fairly different error profile compared to bulk data. A synonymous model should be built from the same briney dataset as the base model to validate the difference between the two types of training data.

      Thank you for pointing this out.

      We have repeated the same analysis with synonymous mutations derived from the bulk-sequenced tang dataset for Figure 4 and the supplementary figure. The conclusion remains the same. We used tang because only the out-of-frame sequences were available to us for the briney dataset, as we were using preprocessing from the Spisak paper.<br /> The fact that both the 10x and the tang data give the same results bolsters our claim.

      (2) The decision to test only kernels of 7, 9, and 11 is not described. The selection/optimization of embedding size is not explained. The filters listed in Table 1 are not defined.

      We have added the following to the Models subsection to further explain these decisions:

      “The hyperparameters for the models (Table 1) were selected with a run of Optuna (Akiba et al., 2019) early in the project and then fixed. Further optimization was not pursued because of the limited performance differences between the existing models.”

      Reviewer #2 (Public Review):

      Summary:

      This work offers an insightful contribution for researchers in computational biology, immunology, and machine learning. By employing a 3-mer embedding and CNN architecture, the authors demonstrate that it is possible to extend sequence context without exponentially increasing the model's complexity.

      Key findings:

      (1) Efficiency and Performance: Thrifty CNNs outperform traditional 5-mer models and match the performance of significantly larger models like DeepSHM.

      (2)Neutral Mutation Data: A distinction is made between using synonymous mutations and out-of-frame sequences for model training, with evidence suggesting these methods capture different aspects of SHM or different biases.

      (3) Open Source Contributions: The release of a Python package and pre-trained models adds practical value for the community.

      Thank you for your positive comments. We believe that we have been clear about the modest improvements (e.g., the abstract says “slight improvement”), and we discuss the data limitations extensively. If there are ways we can do this more effectively, we are happy to hear them.

      Reviewer #3 (Public Review):

      Summary:

      Sung et al. introduce new statistical models that capture a wider sequence context of somatic hypermutation with a comparatively small number of additional parameters. They demonstrate their model’s performance with rigorous testing across multiple subjects and datasets.

      Strengths:

      Well-motivated and defined problem. Clever solution to expand nucleotide context. Complete separation of training and test data by using different subjects for training vs testing. Release of open-source tools and scripts for reproducibility.

      Thank you for your positive comments.

      Weaknesses:

      This study could be improved with better descriptions of dataset sequencing technology, sequencing depth, etc.

      We have added columns to Table 3 that report sequencing technology and depth for each dataset.

      Reviewer #1 (Recommendations for the Authors):

      (1) There seems to be a contradiction between Tables 2 and 3 as to whether the Tang et al. dataset was used to train models or only to test them.

      Thank you for catching this. The "purpose" column in Table 3 was for the main analysis, while Table 2 is describing only models trained to compare with DeepSHM. Explaining this seems more work than it's worth, so we simply removed that column from Table 2. The dataset purposes are clear from the text.

      (2) In Figure 4, I assume the two rows correspond to the Briney and Tang datasets, as in Figure 2, but this is not explicitly described.

      Yes, you are correct. We added an explanation in the caption of Figure 4.

      (3) Figure 2, supplement 1 should include a table like Table 1 that describes these additional models.

      We have added an explanation in the caption to Table 1 that "Medium" and "Large" refer to specific hyperparameter choices. The caption to Figure 2, supplement 1 now describes the corresponding hyperparameter choices for "Small" thrifty models.

      (4) On line 378 "Therefore in either case" seems extraneous.

      Indeed. We have dropped those words.

      (5) In the last paragraph of the Discussion, only the attempt to curate the Ford dataset is described. I am not sure if you intended to discuss the Rodriguez dataset here or not.

      Thank you for pointing this out. We have updated the Materials and Methods section to include our attempts to recover data from Rodriguez et al., 2023.

      (6) Have you looked to see if Soto et al. (Nature 2019) provides usable data for your purposes?

      Thank you for making us aware of this dataset!

      We assessed it but found that the recovery of usable out-of-frame sequences was too low to be useful for our analysis. We now describe this evaluation in the paper.

      (7) Cui et al. note a high similarity between S5F and S5NF (r=0.93). Does that constrain the possible explanations for the divergence you see?

      This is an excellent point.

      We don't believe the correlation observed in Cui and our results are incompatible. Our point is not that the two sources of neutral data are completely different but that they differ enough to limit generalization. Also, the Spearman correlation in Cui is 0.86, which aligns with our observed drop in R-precision.

      (8) Are you able to test the effects of branch length or background SHM on the model?

      We're unsure what is meant by “background SHM.”<br /> We did try joint optimization of branch length and model parameters, but it did not improve performance. Differences in clone size thresholds do exist between datasets, but Figure 3 suggests that tang is better sequence data.

      (9) Would the model be expected to scale up to a kernel of, say, 50? Would that help yield biological insight?

      We did not test such large models because larger kernels did not improve performance.

      While your suggestion is intriguing, distinguishing biological effects from overfitting would be difficult. We explore biological insights more directly in our recent mechanistic model paper (Fisher et al., 2025), which is now cited in a new paragraph on biological conclusions.

      Reviewer #2 (Recommendations for the Authors):

      (1) Consider applying a stricter filtration approach to the Briney dataset to make it more comparable to the Tang dataset.

      Thank you. We agree that differences in datasets are interesting, though model rankings remain consistent. We now include supplementary figures comparing synonymous and out-of-frame models from the tang dataset.

      (2) You omit mutations between the unmutated germline and the MRCA of each tree. Why?

      The inferred germline may be incorrect due to germline variation or CDR3 indels, which could introduce spurious mutations. Following Spisak et al. (2020), we exclude this branch.<br /> Yes, singletons are discarded: ~28k in tang and ~1.1M in jaffe.

      (3) Could a unified model trained on both data types offer further insights?

      We agree and present such an analysis in Figure 4.

      (4) Tree inference biases from parent-child distances may impact the results.

      While this is an important issue, all models are trained on the same trees, so we expect any noise or bias to be consistent. Different datasets help confirm the robustness of our findings.

      (5) Simulations would strengthen validation.

      We focused on real datasets, which we view as a strength. While simulations could help, designing a meaningful simulation model would be nontrivial. We have clarified this point in the manuscript.

      Reviewer #3 (Recommendations for the Authors):

      There are typos in lines 109, 110, 301, 307, and 418.

      Thank you, we have corrected them.

    1. eLife Assessment

      This study presents a valuable finding on the delivery of a nuclear envelop protein to lysosomes and the impact of C-terminal tagging on its traffic. The authors provide solid evidence for the potential artifacts introduced by large terminal tags, particularly in the context of membrane protein localization and stability.

    2. Reviewer #1 (Public review):

      Summary:

      The authors revisit the specific domains/signals required for redirection of an inner nuclear membrane protein, emerin, to the secretory pathway. They find that epitope tagging influences protein fate, serving as a cautionary tale for how different visualisation methods are used. Multiple tags and lines of evidence are used, providing solid evidence for the altered fate of different constructs.

      Strengths:

      This is a thorough dissection of domains and properties that confer INM retention vs secretion to the PM/lysosome, and will serve the community well as a caution regarding placement of tags and how this influences protein fate.

      Weaknesses:

      The specific biogenesis pathway for C-terminally tagged emerin might confound some interpretations. Appending the large GFP to the C-terminus may direct the fusion protein to a different ER insertion pathway than that used by the endogenous protein. How this might influence the fate of the tagged protein remains to be determined. In some ways this is beyond the scope of the current study, but should serve as a warning to epitope-tagging approaches.

    3. Reviewer #2 (Public review):

      In this manuscript, Mella et al. investigate the effect of GFP tagging on the localization and stability of the nuclear-localized tail-anchored (TA) protein Emerin. A previous study from this group demonstrated that C-terminally GFP-tagged Emerin traffics to the plasma membrane and is eventually targeted to lysosomes for degradation. It has been suggested that the C-terminal tagging of TA proteins may shift their insertion from the post-translational TRC/GET pathway to the co-translational SRP-mediated pathway. Consistent with this, the authors confirm that C-terminal GFP tagging causes Emerin to mislocalize to the plasma membrane and subsequently to lysosomes.

      In this study, they investigate the mechanism underlying this misrouting. By manipulating the cytosolic domain and the hydrophobicity of the transmembrane domain (TMD), the authors show that an ER retention sequence and increased TMD hydrophobicity contribute to Emerin's trafficking through the secretory pathway.

      This reviewer had previously raised the concern that the potential role of the GFP tag within the ER lumen in promoting secretory trafficking was not addressed. In the revised manuscript, the authors respond to this concern by examining the co-localization of Emerin-GFP with the ER exit site marker Sec31A. Their data show that the presence of the C-terminal GFP tag increases Emerin's propensity to engage ER exit sites, supporting the conclusion that GFP tagging promotes its entry into the secretory pathway.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors revisit the specific domains/signals required for the redirection of an inner nuclear membrane protein, emerin, to the secretory pathway. They find that epitope tagging influences protein fate, serving as a cautionary tale for how different visualisation methods are used. Multiple tags and lines of evidence are used, providing solid evidence for the altered fate of different constructs.

      Strengths:

      This is a thorough dissection of domains and properties that confer INM retention vs secretion to the PM/lysosome, and will serve the community well as a caution regarding the placement of tags and how this influences protein fate.

      Weaknesses:

      Biogenesis pathways are not explored experimentally: it would be interesting to know if the lysosomal pool arrives there via the secretory pathway (eg by engineering a glycosylation site into the lumenal domain) or by autophagy, where failed insertion products may accumulate in the cytoplasm and be degraded directly from cytoplasmic inclusions.

      This manuscript is a Research Advance that follows previous work that we published in eLife on this topic (Buchwalter et al., eLife 2019; PMID 31599721). In that prior publication, we showed that emerin-GFP arrives at the lysosome by secretion and exposure at the PM, followed by internalization. While we state these previous findings in this manuscript, we did not explicitly restate here how we came to that conclusion. In the 2019 study, we (i) engineered in a glycosylation site, which demonstrated that emerin-GFP receives complex, Endo H-resistant N-glycans, indicating passage through the Golgi; (ii) performed cell surface labeling, which confirmed that emerin accesses the PM; and interfered with (iii) the early secretory pathway using brefeldin A and with (iv) lysosomal function using bafilomycin A1. Further, we ruled out autophagy as a major contributor to emerin trafficking by treating cells with the PI3K inhibitor KU55933, which had no effect on emerin’s lysosomal delivery.

      It would be helpful if the topology of constructs could be directly demonstrated by pulse-labelling and protease protection. It's possible that there are mixed pools of both topologies that might complicate interpretation.

      We demonstrate that emerin’s TMD inserts in a tail-anchored orientation (C terminus in ER lumen) by appending a GFP tag to either the N or C terminus, followed by anti-GFP antibody labeling of unpermeabilized cells (Fig. 1G). This shows the preferred topology of emerin’s wild type TMD.

      As the reviewer points out, it is possible that our manipulations of the TMD sequence (Fig. 2D-E) alter its preferred topology of membrane insertion. We addressed this question by performing anti-GFP and anti-emerin antibody labeling of the less hydrophobic TMD mutant (EMD-TMDm-GFP) after selective permeabilization of the plasma membrane (Figure 2 supplement, panel F). If emerin biogenesis is normal, the GFP tag should face the ER lumen while the emerin antibody epitope should be cytosolic. If the fidelity of emerin’s membrane insertion is impaired, the GFP tag could be exposed to the cytosol (flipped orientation), which would be detected by anti-GFP labeling upon plasma membrane permeabilization. We find that the C-terminal GFP tag is completely inaccessible to antibody when the PM is selectively permeabilized with digitonin, but is readily detected when all intracellular membranes are permeabilized with Triton-X-100. These data confirm that mutating emerin’s TMD does not disrupt the protein’s membrane topology.

      Reviewer #2 (Public review):

      In this manuscript, Mella et al. investigate the effect of GFP tagging on the localization and stability of the nuclear-localized tail-anchored (TA) protein Emerin. A previous study from this group showed that C-terminally GFP-tagged Emerin protein traffics to the plasma membrane and reaches lysosomes for degradation. It is suggested that the C-terminal tagging of tail-anchored proteins shifts their insertion from the post-translational TRC/GET pathway to the co-translational SRP-mediated pathway. The authors of this paper found that C-terminal GFP tagging causes Emerin to localize to the plasma membrane and eventually reach lysosomes. They investigated the mechanism by which Emerin-GFP moves to the secretory pathway. By manipulating the cytosolic domain and the hydrophobicity of the transmembrane domain (TMD), the authors identify that an ER retention sequence and strong TMD hydrophobicity contribute to Emerin trafficking to the secretory pathway. Overall, the data are solid, and the knowledge will be useful to the field. However, the authors do not fully answer the question of why C-terminally GFP-tagged Emerin moves to the secretory pathway. Importantly, the authors did not consider the possible roles of GFP in the ER lumen influencing Emerin trafficking to the secretory pathway.

      Reviewer #2 (Recommendations for the authors):

      Major concerns:

      (1) The authors suggest that an ER retention sequence and high hydrophobicity of Emerin TMD contribute to its trafficking to the secretory pathway. However, these two features are also present in WT Emerin, which correctly localizes to the inner nuclear membrane. Additionally, the authors show that the ER retention sequence is normally obscured by the LEM domain. The key difference between WT Emerin and Emerin-GFP is the presence of GFP in the ER lumen. The authors missed investigating the role of GFP in the ER lumen in influencing Emerin trafficking to the secretory pathway. It is likely that COPII carrier vesicles capture GFP protein in the lumen as part of the bulk flow mechanism for transport to the Golgi compartment. The authors could easily test this by appending a KDEL sequence to the C-terminus of GFP; this should now redirect the protein to the nucleus.

      We agree with the reviewer’s point that the presence of lumenal GFP somehow promotes secretion of emerin from the ER, likely at the stage of enhancing its packaging into COPII vesicles. We struggle to think about how to interpret the KDEL tagging experiment that the reviewer proposes, as the KDEL receptor predominantly recycles soluble proteins from the Golgi to the ER, while emerin is a membrane protein; and we have shown that emerin already contains a putative COPI-interacting RRR recycling motif in its cytosolic domain.

      Nevertheless, we agree with the reviewer that it is worthwhile to test the possibility that addition of GFP to emerin’s C-terminus promotes capture by COPII vesicles. We have evaluated this question by performing temperature block experiments to cause cargo accumulation within stalled COPII-coated ER exit sites, then comparing the propensity of various untagged and tagged emerin variants to enrich in ER exit sites as judged by colocalization with the COPII subunit Sec31a. These data now appear in Figure 4 supplement 1. These experiments indicate that emerin-GFP samples ER exit sites significantly more than does untagged emerin. Further, the ER exit site enrichment of emerin-GFP is dampened by shortening emerin’s TMD. We do not see further enrichment of any emerin variant in ER exit sites when COPII vesicle budding is stalled by low temperature incubation, implying that emerin lacks any positive sorting signals that direct its selective enrichment in COPII vesicles. Altogether, these data indicate that both emerin’s long and hydrophobic TMD and the addition of a lumenal GFP tag increase emerin’s propensity to sample ER exit sites and undergo non-selective, “bulk flow” ER export.

      (2) The authors nicely demonstrate that the hydrophobicity of Emerin TMD plays a role in its secretory trafficking. I wonder if this feature may be beneficial for cells to degrade newly synthesized Emerin via the lysosomal pathway during mitosis, as the nuclear envelope breakdown may prevent the correct localization of newly synthesized Emerin. The authors could test Emerin localization during mitosis. Such findings could add to the physiological significance of their findings. At the minimum, they should discuss this possibility.

      We thank the reviewer for this insightful suggestion. It is attractive to speculate that secretory trafficking might enable lysosomal degradation of emerin during mitosis, when its lamin anchor has been depolymerized. However, we think it is unlikely that mitotic trafficking contributes significantly to the turnover flux of untagged emerin; if it did, we would expect to see higher steady state levels and/or slowed turnover of emerin mutants that cannot traffic to the lysosome. We did not observe this outcome. Instead, mutations that enhance (RA) or impair (TMDm) emerin trafficking had no effect on the untagged protein’s steady-state levels (Fig. 4G).

      Minor concerns:

      (1) On page 7, the authors note that "FLAG-RA construct was not poorly expressed relative to WR, in contrast with RA-GFP (Figures S3C, 2I)." The expression levels of these proteins cannot be compared across two different blots.

      We apologize for this confusion; we were implying two distinct comparisons to internal controls present on each blot. We have adjusted the text to read “FLAG-RA construct was not poorly expressed relative to FLAG-WT (Fig. S3C) in contrast to RA-GFP compared to WT-GFP (Fig. 2I).”

      (2) In the first paragraph of the discussion, the authors suggest that aromatic amino acids facilitate trafficking to lysosomes. However, they only replaced aromatic amino acids with alanine residues. If they want to make this claim, they should test other amino acids, particularly hydrophobic amino acids such as leucine.

      The reviewer may be inferring more import from our statement than we intended. We focused on these aromatic residues within the TMD because they contribute strongly to its overall hydrophobicity. Experimentally, we determined that nonconservative alanine substitutions of these aromatic residues inhibited trafficking. We do not state and do not intend to imply that the aromatic character of these residues specifically influences trafficking propensity, and we agree with the reviewer that to test such a question would require additional substitutions with non-aromatic hydrophobic amino acids.

      We realize that our phrasing may have been misleading by opening with discussion of the aromatic amino acids; in the revised discussion paragraph, we instead lead with discussion of TMD hydrophobicity, and then state how the specific substitutions we made affect trafficking.

      Reviewing Editor comments:

      While reviewer 1 did not provide any recommendations to the authors, I agree with this reviewer that the authors should validate the topology of their tagged proteins (at least for the one used to draw key conclusions). Given that Emerin is a tail-anchored protein, having a big GFP tag at the C-terminus could mess up ER insertion, causing the protein to take a wrong topology or even be mislocalized in the cytosol, particularly under overexpression conditions. In either case, it can be subject to quality control-dependent clearance via either autophagy, ERphagy, or ER-to-lysosome trafficking. I think that the authors should try a few straightforward experiments such as brefeldin A treatment or dominant negative Sar1 expression to test whether blocking conventional ER-to-Golgi trafficking affects lysosomal delivery of Emerin. I also think that the authors should discuss their findings in the context of the RESET pathway reported previously (PMID: 25083867). The ER stress-dependent trafficking of tagged Emerin to the PM and lysosomes appears to follow a similar trafficking pattern as RESET, although the authors did not demonstrate that Emerin traffic to lysosomes via the PM. In this regard, they should tone down their conclusion and discuss their findings in the context of the RESET pathway, which could serve as a model for their substrate.

      We agree that validating the topology of TMD mutants is important, and now include these experiments in the revised manuscript (please see our response to Reviewer 1 above).

      Please see our response to Reviewer 1’s public review; we previously determined that emerin-GFP undergoes ER-to-Golgi trafficking (see our 2019 study).

      We recognize the major parallels between our findings and the RESET pathway. In our 2019 study, we found that similarly to other RESET cargoes, emerin-GFP travels through the secretory pathway, is exposed at the PM, and is then internalized and delivered to lysosomes. We discussed these strong parallels to RESET in our 2019 study. In this revised manuscript, we now also point out the parallels between emerin trafficking and RESET and cite the 2014 study by Satpute-Krishnan and colleagues (PMID 25083867)

    1. eLife Assessment

      This study shows, for the first time, the structure and snapshots of the dynamics of the full-length soluble Angiotensin-I converting enzyme dimer. The combination of structural and computational analyses provides compelling evidence that reveals the conformational dynamics of the complex and key regions mediating the conformational change. This fundamental work illustrates how conformational heterogeneity can be used to gain insights into protein function.

    2. Reviewer #1 (Public review):

      Summary:

      The authors report four cryoEM structures (2.99 to 3.65 Å resolution) of the 180 kDa, full-length, glycosylated, soluble Angiotensin-I converting enzyme (sACE) dimer, with two homologous catalytic domains at the N- and C-terminal ends (ACE-N and ACE-C). ACE is a protease capable of effectively degrading Aβ. The four structures are C2 pseudo-symmetric homodimers and provide insight into sACE dimerization. These structures were obtained using discrete classification in cryoSPARC and show different combinations of open, intermediate, and closed states of the catalytic domains, resulting in varying degrees of solvent accessibility to the active sites.

      To deepen the understanding of the gradient of heterogeneity (from closed to open states) observed with discrete classification, the authors performed all-atom MD simulations and continuous conformational analysis of cryo-EM data using cryoSPARC 3DVA, cryoDRGN, and RECOVAR. cryoDRGN and cryoSPARC 3DVA revealed coordinated open-closed transitions across four catalytic domains, whereas RECOVAR revealed independent motion of two ACE-N domains, also observed with cryoSPARC focused classification. The authors suggest that the discrepancy in the results of the different methods for continuous conformational analysis in cryo-EM could results from different approaches used for dimensionality reduction and trajectory generation in these methods.

      Strengths:

      This is an important study that shows, for the first time, the structure and the snapshots of the dynamics of the full-length sACE dimer. Moreover, the study highlights the importance of combining insights from different cryo-EM methods that address questions difficult or impossible to tackle experimentally, while lacking ground truth for validation.

      Weaknesses:

      The open, closed, and intermediate states of ACE-N and ACE-C in the four cryo-EM structures from discrete classification were designated quantitatively (based on measured atomic distances on the models fitted into cryo-EM maps). Unfortunately, atomic models were not fitted into cryo-EM maps obtained with cryoSPARC 3DVA, cryoDRGN, and RECOVAR, and the open/closed states in these cases were designated based on a qualitative analysis.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript presents a valuable contribution to the field of ACE structural biology and dynamics by providing the first complete full-length dimeric ACE structure in four distinct states. The study integrates cryo-EM and molecular dynamics simulations to offer important insights into ACE dynamics. The depth of analysis is commendable, and the combination of structural and computational approaches enhances our understanding of the protein's conformational landscape.

    4. Reviewer #3 (Public review):

      Summary:

      Mancl et al. report four Cryo-EM structures of glycosylated and soluble Angiotensin-I converting enzyme (sACE) dimer. This moves forward the structural understanding of ACE, as previous analysis yielded partially denatured or individual ACE domains. By performing a heterogeneity analysis, the authors identify three structural conformations (open, intermediate open, and closed) that define the openness of the catalytic chamber and structural features governing the dimerization interface. They show that the dimer interface of soluble ACE consists of an N-terminal glycan and protein-protein interaction regions, as well as C-terminal protein-protein interactions. Further heterogeneity mining and all-atom molecular dynamic simulations show structural rearrangements that lead to the opening and closing of the catalytic pocket, which could explain how ACE binds its substrate. These studies could contribute to future drug design targeting the active site or dimerization interface of ACE.

      Strengths:

      The authors make significant efforts to address ACE denaturation on cryo-EM grids, testing various buffers and grid preparation techniques. These strategies successfully reduce denaturation and greatly enhance the quality of the structural analysis. The integration of cryoDRGN, 3DVA, RECOVAR, and all-atom simulations for heterogeneity analysis proves to be a powerful approach, further strengthening the overall experimental methodology.

      Weaknesses:

      No weaknesses noted. The revised manuscript adequately addresses the points I suggested in the review of the first submission.

    5. Author Response:

      The following is the authors response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors report four cryoEM structures (2.99 to 3.65 Å resolution) of the 180 kDa, full-length, glycosylated, soluble Angiotensin-I converting enzyme (sACE) dimer, with two homologous catalytic domains at the N- and C-terminal ends (ACE-N and ACE-C). ACE is a protease capable of effectively degrading Aβ. The four structures are C2 pseudo-symmetric homodimers and provide insight into sACE dimerization. These structures were obtained using discrete classification in cryoSPARC and show different combinations of open, intermediate, and closed states of the catalytic domains, resulting in varying degrees of solvent accessibility to the active sites. 

      To deepen the understanding of the gradient of heterogeneity (from closed to open states) observed with discrete classification, the authors performed all-atom MD simulations and continuous conformational analysis of cryo-EM data using cryoSPARC 3DVA, cryoDRGN, and RECOVAR. cryoDRGN and cryoSPARC 3DVA revealed coordinated open-closed transitions across four catalytic domains, whereas RECOVAR revealed independent motion of two ACE-N domains, also observed with cryoSPARC-focused classification. The authors suggest that the discrepancy in the results of the different methods for continuous conformational analysis in cryo-EM could result from different approaches used for dimensionality reduction and trajectory generation in these methods. 

      Strengths: 

      This is an important study that shows, for the first time, the structure and the snapshots of the dynamics of the full-length sACE dimer. Moreover, the study highlights the importance of combining insights from different cryo-EM methods that address questions difficult or impossible to tackle experimentally while lacking ground truth for validation. 

      Weaknesses: 

      The open, closed, and intermediate states of ACE-N and ACE-C in the four cryo-EM structures from discrete classification were designated quantitatively (based on measured atomic distances on the models fitted into cryo-EM maps, Figure 2D). Unfortunately, atomic models were not fitted into cryo-EM maps obtained with cryoSPARC 3DVA, cryoDRGN, and RECOVAR, and the open/closed states in these cases were designated based on qualitative analysis. As the authors clearly pointed out, there are many other methods for continuous conformational heterogeneity analysis in cryo-EM. Among these methods, some allow analyzing particle images in terms of atomic models, like MDSPACE (Vuillemot et al., J. Mol. Biol. 2023, 435:167951), which result in one atomic model per particle image and can help in analyzing cooperativity of domain motions through measuring atomic distances or angular differences between different domains (Valimehr et al., Int. J. Mol. Sci. 2024, 25: 3371). This could be discussed in the article. 

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript presents a valuable contribution to the field of ACE structural biology and dynamics by providing the first complete full-length dimeric ACE structure in four distinct states. The study integrates cryo-EM and molecular dynamics simulations to offer important insights into ACE dynamics. The depth of analysis is commendable, and the combination of structural and computational approaches enhances our understanding of the protein's conformational landscape. However, the strength of evidence supporting the conclusions needs refinement, particularly in defining key terms, improving structural validation, and ensuring consistency in data analysis. Addressing these points through major revisions will significantly improve the clarity, rigor, and accessibility of the study to a broader audience, allowing it to make a stronger impact in the field. 

      Strengths: 

      The integration of cryo-EM and MD simulations provides valuable insights into ACE dynamics, showcasing the authors' commitment to exploring complex aspects of protein structure and function. This is a commendable effort, and the depth of analysis is appreciated. 

      Weaknesses: 

      Several aspects of the manuscript require further refinement to improve clarity and scientific rigor as detailed in my recommendations for the authors. 

      Reviewer #3 (Public review): 

      Summary: 

      Mancl et al. report four Cryo-EM structures of glycosylated and soluble Angiotensin-I converting enzyme (sACE) dimer. This moves forward the structural understanding of ACE, as previous analysis yielded partially denatured or individual ACE domains. By performing a heterogeneity analysis, the authors identify three structural conformations (open, intermediate open, and closed) that define the openness of the catalytic chamber and structural features governing the dimerization interface. They show that the dimer interface of soluble ACE consists of an N-terminal glycan and protein-protein interaction region, as well as C-terminal protein-protein interactions. Further heterogeneity mining and all-atom molecular dynamic simulations show structural rearrangements that lead to the opening and closing of the catalytic pocket, which could explain how ACE binds its substrate. These studies could contribute to future drug design targeting the active site or dimerization interface of ACE. 

      Strengths: 

      The authors make significant efforts to address ACE denaturation on cryo-EM grids, testing various buffers and grid preparation techniques. These strategies successfully reduce denaturation and greatly enhance the quality of the structural analysis. The integration of cryoDRGN, 3DVA, RECOVAR, and all-atom simulations for heterogeneity analysis proves to be a powerful approach, further strengthening the overall experimental methodology. 

      Weaknesses: 

      In general, the findings are supported by experimental data, but some experimental details and approaches could be improved. For example, CryoDRGN analysis is limited to the top 5 PCA components for ease of comparison with cryoSPARC 3DVA, but wouldn't an expansion to more components with CryoDRGN potentially identify further conformational states? The authors also say that they performed heterogeneity analysis on both datasets but only show data for one. The results for the first dataset should be shown and can be included in supplementary figures. In addition, the authors mention that they were not successful in performing cryoSPARC 3DFLex analysis, but they do not show their data or describe the conditions they used in the methods section. These data should be added and clearly described in the experimental section. 

      Some cryo-EM data processing details are missing. Please add local resolution maps, box sizes, and Euler angle distributions and reference the initial PDB model used for model building. 

      Reviewer #1 (Recommendations for the authors): <br /> Major point: 

      The authors could discuss the use of continuous conformational heterogeneity analysis methods that analyze particle images in terms of atomic models, based on MD simulations, like MDSPACE (Vuillemot et al., J. Mol. Biol. 2023, 435:167951). MDSPACE can be used on a dataset preprocessed with cryoSPARC or Relion by discrete classification to reduce compositional heterogeneity and obtain initial particle poses. It results in one atomic model per particle image and can help in analyzing the cooperativity of domain motions by measuring atomic distances or angular differences between different domains (Valimehr et al., Int. J. Mol. Sci. 2024, 25: 3371). 

      We agree that MDSPACE is a promising and useful tool for analysis, and are excited to implement such a method. Prior to manuscript submission, we have had discussions with the primary author, Slavica Jonic, about how we may employ her software in our analysis. Unfortunately, we were unable to overcome significant computational issues, notably MDSPACE’s lack of GPU functionality, which prevent us from employing MDSPACE in a reasonable manner for our dataset. We hope to employ MDSPACE in future work, once the computational issues have been addressed, and have added a section on MDSPACE to the discussion in an effort to increase the visibility of MDSPACE, as we feel it is an exciting approach that deserves more visibility. We have added a substantial discussion on this point, specifically on MDspace as follows:

      line 565-574

      Similarly, MDSPACE holds tremendous promise as a method for investigating conformational dynamics from cryo-EM data (61). MDSPACE integrates cryo-EM particle data with short MD simulations to fit atomic models into each particle image through an iterative process which extracts dynamic information. However, the lack of GPU-enabled processing for MDSPACE requires either a dedicated a computational setup that diverges from most other cryo-EM software, or access to a CPU-based supercomputer, which severely limits the accessibility of such software. Despite these challenges, both 3DFlex and MDSPACE use promising approaches to study protein conformational dynamics. We look forward to exploring effective methods to incorporate these strategies into our future research.

      Minor points: 

      (1) Lines 348-350: "The discrepancy in population size between these clusters is likely due to bias in the initial particle poses, rather than a subunit-specific preference for the open state." Which bias? The cluster size is related to conformations, not to poses. 

      We hope to emphasize that the assignment of particles to either the OC or CO cluster is likely due to the particle orientation within the complete dimer refinement, and the discrepancy in size between OC and CO clusters does not necessarily indicate a domain specific preference for one state or another, which would carry allosteric implications. This remains a possibility, but we hope to avoid over-interpretation of our results with the statement above.

      The statement was altered to now read:

      Line 418-423

      “The discrepancy in population size between these clusters is likely due to bias in the initial particle orientation, rather than a subunit-specific preference for the open state. As the O/C state and the C/O state are 180 degree rotations of each other, particle assignment to either cluster is likely influenced by the initial particle orientation of the complete dimer, and we currently lack the data to discern any allosteric implication to the orientation assignment.”

      (2) Line 519: "Micrographs with a max CTF value worse than 4Å were removed from the dataset,..." (also, lines 822-823 in supplementary material). <br /> Do you want to say that micrographs with a resolution worse than 4 A were removed? 

      Max CTF value was replaced with CTF fit resolution to properly match the parameter used in Cryosparc.

      (3) Figure 2C: The black lines are barely visible. Can you make them thicker and in red color? 

      The figure has been amended.

      (4) Figure 2D: The values for Chain A and Chain B in the second row (ACE-C) of sACE-3.05 columns are 17.9 (I) (Chain A) and 13.9 (C) (Chain B). Shouldn't they be reversed (13.9 (C) (Chain A) and 17.9 (I) (Chain B))? 

      The values are now correct. sACE-3.65 chains were flipped in the table, and the updated color scheme should make it easier to map the values from the table to their corresponding structure.

      Reviewer #2 (Recommendations for the authors): 

      The manuscript presents the first complete full-length dimeric ACE structure. The integration of cryo-EM and MD simulations provides valuable insights into ACE dynamics, showcasing the authors' commitment to exploring complex aspects of protein structure and function. This is a commendable effort, and the depth of analysis is appreciated. However, several aspects of the manuscript require further refinement to improve clarity and scientific rigor. In the view of this reviewer, a major revision is necessary. Please see the detailed comments below: 

      (1) Definition of "Conformational Heterogeneity": The term "conformational heterogeneity" should be clearly defined when citing references 27-29. <br /> References 27 and 29 use MD simulations, which reveal "conformational flexibility" rather than "conformational heterogeneity" as observed in cryo-EM data. A more precise distinction should be made. 

      We have changed the term “conformational heterogeneity” to the broader “conformational dynamics

      (2) Figure Adjustments for Clarity: <br /> Figure 1B: A scale bar is needed for accurate representation. 

      A 100 Angstrom scale bar was added to figure 1B.

      Figure 2A, B: Using a Cα trace representation would improve clarity and make structural differences more apparent. 

      We found using a Cα trace representation makes the figure too confusing and impossible to determine individual structural elements. Everything just becomes a jumble of lines.

      Additionally, a Cα displacement vs. residue index plot (with Figure 1A placed along the x-axis) should be included alongside Figures 2A and B to provide quantitative insight into structural variations. 

      This analysis has been combined with several other suggestions and now comprises a new figure 4.

      (3) Structural Resolution and Validation: <br /> Euler angle distribution and 3D-FSC analysis should be provided to help the audience assess how these factors influence the resolution of each structure. <br /> Local resolution analysis in Relion should be included to determine if there are dynamic differences among the four structures. <br /> To enhance structural interpretation, the manuscript would benefit from showcasing examples of bulky side-chain densities (e.g., Trp, Phe, Tyr) for each of the four structures. 

      Information is included in Figure S3 and S5.

      (4) Glycan Modeling Considerations: <br /> Since the resolution of cryo-EM does not allow for precise glycan composition determination, additional experimental validation (e.g., Glyco-MS) would strengthen the modeling. If experimental support is unavailable, appropriate references should be cited to justify the modeled glycans. 

      Minimal glycan modeling was performed with the goal of demonstrating that the protein is glycosylated. We have highlighted that we chose 12 N-linked glycosylation sites that have the observed extra density, an indication that glycan should be present and modeled them with complex glycans in the manuscript.  

      (5) Advanced Cryo-EM and MD Analyses: 3DFlex Analysis: <br /> It is recommended that the authors explore 3DFlex to better capture conformational variability. CryoSPARC's community support can assist in proper implementation. 

      We have incorporated our 3Dflex analysis in our discussion as follows:

      Line 553-565

      Surprisingly, we did not observe such motion using cryoSPARC 3DFlex, a neural network-based method analyzing our cryo-EM data of sACE (54). Central to the working of cryoSPARC 3DFlex is the generation of a tetrahedral mesh used to calculate deformations within the particle population. Proper generation of the mesh is critical for obtaining useful results and must often be determined empirically. Despite several attempts, we were unable to obtain results from 3DFlex comparable to what we observed with our other methods. Even using the results from our 3DVA as prior input to 3DFlex, the largest conformational change we observed was a slight wiggling at the bottom of the D3a subdomain (Movie S12). The authors of 3DFlex note that 3DFlex struggles to model intricate motions, and the implementation of custom tetrahedral meshes currently requires a non-cyclical fusion strategy between mesh segments. Given these limitations, and the complexity of sACE conformational dynamics, it appears that sACE, as a system, is not well-suited to analysis via 3DFlex in its current implementation.

      (6) Movie Consistency: <br /> The MD simulation movies should use the same color coding as the first four movies for consistency. Similarly, the 3DVar analysis map should be color-coded to enhance interpretability. 

      MD simulation movies are re-colored.

      (7) MD Simulations - Data Extraction and Validation: <br /> The manuscript includes several long-timescale MD simulations, but further analysis is needed to extract meaningful dynamic information. Suggested analyses include: <br /> a. RMSF (Root Mean Square Fluctuation) Analysis: Calculate RMSF from MD trajectories and compare it with local resolution variations in cryo-EM maps. 

      RMSF values were included in the new figure 4 along with structural depictions colored by RMSF value to localize variation to the structure.

      b. Assess whether regions exhibiting lower dynamics correspond to higher resolution in cryo-EM. 

      Information is added to Figure 4, Figure S3, S5, S6.

      c. Compare RMSF between simulations with and without glycans to identify potential effects. 

      This has been done in Figure 4.

      d. Clustering Analysis: Use the four solved structures as reference states to cluster MD simulation trajectories. Determine if the population states observed in MD simulations align with cryo-EM findings. 

      This has been done in supplementary figure S10.

      e. Principal Component Analysis (PCA): Perform PCA on MD trajectories and compare with dynamics inferred from cryo-EM analyses (3DVar, cryoDRGN, and RECOVAR) to ensure consistency. 

      This has been done in supplementary figure S11.

      f. Correction of RMSF Analysis or the y-axis label in Figure S9: The RMSF values cannot be negative by definition. The authors should carefully review the code used for this calculation or explicitly define the metric being measured. 

      The Y-axis label has been corrected to clarify that the plot depicts the change in RMSF values when comparing the glycosylated and non-glycosylated MD simulations.

      (8) Discussion on Coordinated Motion and Allostery: <br /> The discussion of coordinated motion and allosteric regulation between sACE-N domains should be explicitly connected to experimental evidence mentioned in the introduction: <br /> "Enzyme kinetics analysis suggests negative cooperativity between two catalytic domains (31-33). However, ACE also exhibits positive synergy toward Ab cleavage and allostery to enhance the activity of its binding partner, the bradykinin receptor (11, 34)." 

      (9) The authors should elaborate on how their new insights provide a mechanistic explanation for these experimental observations. 

      (10) Connection to Therapeutic Implications: <br /> The discussion section should more explicitly connect the structural findings to potential therapeutic applications, which would significantly enhance the impact of the study. 

      These three points (8-10) were addressed in a significant overhaul to the discussion section.

      In summary, this study makes a valuable contribution to the field of ACE structural biology and dynamics. The combination of cryo-EM and MD simulations is particularly powerful, and with major revisions, this manuscript has the potential to make a strong impact. Addressing the points outlined above will significantly improve clarity, strengthen the scientific claims, and enhance the manuscript's accessibility to a broader audience. I appreciate the authors' rigorous approach to this complex topic and encourage them to refine their work to fully highlight the significance of their findings. 

      Reviewer #3 (Recommendations for the authors): 

      (1) The authors incorrectly refer to their ACE construct as full-length throughout the manuscript. Given that they are purifying the soluble region (aa 1-1231), saying full-length ACE is not the correct nomenclature. I suggest removing full-length and using soluble ACE (sACE) throughout the text. 

      We utilize the term full-length to highlight the fact that our structures contain both the N and C domains for both subunits in the dimer, in contrast to the previously published ACE cryo-EM structure. We have clarified in the text that we refer to the full-length soluble region of ACE (sACE), and sACE is used to specifically refer to our construct throughout the text, except when referring to ACE in a more generalized biological context in the introduction and discussion.

      (2) The authors could show differences between the different structural states by measuring and displaying the alpha carbon distances. For example, in Figures 2A, B, 3A, and 4B and C. 

      Alpha carbon displacements for each residue have been added to the new figure 4.

      (3) Most figures, with a few exceptions (Figures 2 and S11), are of low quality. Perhaps they are not saved in the same format. In addition, the color schemes used throughout the figures and movies are not consistent. For example, in Figure 1 D2 domains are in green, while they appear yellow in Figure 2 and later. Please double-check all coloring schemes and keep them consistent throughout the manuscript. In addition, it would be good to keep the labeling of the domains in the subsequent figures, as it is difficult to remember which domain is which throughout the manuscript. 

      We are unsure of how to address the low quality issue, our files and the online versions appear to be of suitable high quality. We will work with editorial staff to ensure all files are of suitable quality. The color scheme has been revised throughout the manuscript to ensure consistency and better differentiate between domains and chains.

      (4) Figure 1. Indicate exactly where in panel A ACE-N ends and ACE-C starts. Also, the pink and magenta, as well as aqua vs. light blue, are hard to distinguish. 

      We have updated coloring scheme.

      (5) Figure 2. In the figure legend, the use of brackets for defining closed, intermediate, and open states is confusing, given that the panels are also described with brackets, and some letters match between them. Using a hyphen or bolding the abbreviations could help. Also, define chains A and B, make the black lines that I assume indicate distances in C bold or thicker as they are very hard to see in the figure, and add to the legend what those lines mean. 

      The abbreviations have been changed from parentheses to quotes, and suggestions have been implemented.

      (6) Figure 4 is confusing as shown. Since the authors mention the general range of motion in sACE-N first in the text, wouldn't it make more sense to show panel B first and then panel A? Also, can you point and label the "tip connecting the two long helices of the D1a subdomain" in the figure? It is not clear to me where this region is in B. In addition, add a description of the arrows in B and C to the figure legend. 

      Most changes incorporated. The order should make more sense now in light of other changes.

      (7) Figure 5. Can the authors add a description to the legend as to what the arrows indicate and their thickness? 

      Done

      (8) Add a scale bar to the micrograph images in the supplementary figures. 

      Figure S2 and S4 need the scale bar.

      (9) Provide a more comprehensive description of buffers used in the DF analysis, as this information could be useful to others. 

      We have included the data in Table S1.<br /> (10) Line 51: Reference format not consistent with other references: (Wu et al., 2023). 

      Fixed

      (11) Line 66: Define "ADAM". 

      The definition has been added.

      (12) Line 90: The authors say: Recent open state structures of sACE-N, sACE monomer, and a sACE-N dimer, along with molecular dynamics (MD) simulations of sACE-C, have begun to reveal the conformational heterogeneity, though it remains under-studied (27-29)." Can the authors clarify what "it" refers to? The full-length ACE, sACE, or its specific domains? 

      The sentence now reads: Recent open state structures of sACE-N, sACE monomer, and a sACE-N dimer, along with molecular dynamics (MD) simulations of sACE-C, have begun to reveal ACE conformational dynamics, though they remain under-studied (29-31).

      (13) Line 204: "The comparison of our dimeric sACE cryoEM structures of reveals the conformational dynamics of sACE catalytic domains." The second "of" should be removed. 

      Fixed<br /> (14) Line 268: "From room mean square fluctuation (RMSF) analysis..." "room" should be replaced with "root."

      Fixed

    1. eLife Assessment

      Arecchi et al. demonstrate that polarized second-harmonic generation microscopy can be used to probe the ON/OFF states of myosin in both permeabilized and intact muscle, making this key measurement accessible to a greater number of labs. This has the potential to help with the study of disease-causing mutations and our understanding of drug function. The methodology is well defined, and the results are important; however, whilst this is overall a convincing study, there are some limitations to the interpretation of the data.

    2. Reviewer #1 (Public review):

      Summary:

      This study utilizes polarized second-harmonic generation (pSHG) microscopy to investigate myosin conformation in the relaxed state, distinguishing between the disordered, actin-accessible ON state and the ordered, energy-conserving OFF state. By pharmacologically modulating the ON/OFF equilibrium with a myosin activator (2-deoxyATP) and inhibitor (Mavacamten), the authors demonstrate that pSHG can sensitively quantify the ON/OFF ratio in both skeletal and cardiac muscle. Validation with X-ray diffraction supports the accuracy of the method. Applying this approach to a hypertrophic cardiomyopathy model, the study shows that R403Q/MYH7-mutated minipigs exhibit an increased ON state fraction relative to controls. This difference is eliminated under saturating concentrations of myosin modulators, indicating that the ON/OFF balance can be pharmacologically shifted to its extremes. Additionally, ATPase assays reveal elevated resting ATPase activity in R403Q samples, which persists even when the ON state is saturated, suggesting that increased energy consumption in this mutation is driven by both a shift toward the ON state and inherently higher myosin ATPase activity.

      Strengths:

      This is a well-written and well-conducted study that clearly reveals the power of SHG microscopy. The study clearly establishes the great utility of SHG to study thick filament regulation.

      Weaknesses:

      (1) Several studies have shown that the ON state of the thick filament is sensitive to both temperature and filament lattice spacing, with a common recommendation to conduct skinned fiber experiments at temperatures above 27{degree sign}C and in the presence of dextran to better preserve physiological conditions. The authors should clarify the experimental temperature used in their skinned fiber studies, indicate whether dextran was included, and discuss whether adherence to these recommended conditions would have impacted their results.

      (2) On page 13, the authors report the proportion of disordered heads as approximately 30% in wild-type and 65% in R403Q fibers. They should clarify whether these values represent the percentage of total myosin heads, or rather the percentage of heads that are responsive to Mavacamten and dATP.

      (3) In Figure 5, regarding ATPase measurements, the content of contractile material per unit volume of muscle preparation will influence the results. Did the authors account for this variable, and if not, how might it have affected the conclusions?

      (4) For readers primarily interested in assessing the ON/OFF state of thick filaments, could the authors list the specific advantages of polarized second harmonic generation (pSHG) microscopy compared to X-ray diffraction?

      (5) Given that many data points were derived from the same fiber or myocyte, how did the authors address the risk of type I errors due to non-independence of measurements? Was a nested or hierarchical statistical approach used?

    3. Reviewer #2 (Public review):

      Summary:

      In striated muscle, myosin motors can dynamically switch between an energy-conserving OFF state and an activated ON state. This switching is important for meeting the body's needs under different physiological conditions, and previous studies have shown that disease-causing mutations associated with cardiomyopathies can affect the population of these states, leading to aberrant contractility. Studying these structural states in muscle has previously only been possible via X-ray diffraction, which requires access to a beam line. Here, Arecchi et al. demonstrate that polarized second-harmonic generation microscopy (pSGH), a technique that is more accessible, can be used to probe the ON/OFF states of myosin in both permeabilized and intact muscle.

      Strengths:

      (1) There is an outstanding need in the field to better understand the regulation of the ON/OFF states of myosin. Currently, this is studied using X-ray diffraction, meaning that it is accessible to only a few labs. The authors demonstrate that pSGH can be used to probe the ON/OFF states of myosin both in intact and permeabilized muscle. This is a significant advance, since it makes it possible to study these states in a standard research laboratory.

      (2) The authors demonstrate that this approach can be employed in both skeletal and cardiac muscle. Importantly, it works with both porcine and mouse cardiac muscle, which are two of the most important animal models for preclinical studies.

      (3) The authors manipulate the ON/OFF equilibrium using both drugs and a genetic model of hypertrophic cardiomyopathy that has been shown to modulate the ON/OFF equilibrium. Their results generally agree with previous studies conducted using X-ray diffraction as well as biochemical measurements of myosin autoinhibition.

      Weaknesses:

      (1) While the application of pSGH to the ON/OFF equilibrium is an important advance, there are limited new biological insights since the perturbations used here have been extensively characterized in previous studies.

      (2) SGH has previously been applied to study the nucleotide-dependent orientation of myosin motors in the sarcomere (PMID: 20385845). The authors have previously interpreted the value of gamma as being a readout of lever arm position, but here, it is interpreted as a measure of ON/OFF equilibrium. When this technique is applied to intact muscle, it is not clear how to deconvolve the contributions of lever arm angle from the ON/OFF population (especially where there is a mix of states that give rise to the gamma value). This is an important limitation that is not discussed in the manuscript.

      (3) The R403Q mutation has previously been shown to cause an increase in ATP usage. Here, the authors measure an elevated basal ATPase rate under relaxing conditions, and they interpret this as showing increased myosin ATPase activity intrinsic to the motors; however, care should be used in interpreting these results. Work from the Spudich lab has shown that the R403Q mutation can appear as increasing motor function in some assays but depressing motor function in others (see PMID: 32284968, 26601291). Moreover, the actin-activated ATPase rate is an order of magnitude higher than the basal ATPase rate, and thus, small changes in the basal ATPase rate are unlikely to be important for physiology.

      (4) The authors interpret some of their data based on the assumption that the high concentrations of drugs cause the myosin to either adopt 100% OFF or ON states. This assumption is not validated, limiting the ability to interpret the fraction of myosins in the ON/OFF states.

      (5) The ATPase measurements are innovative but hard to interpret. dATP and ATP do not have identical ATPase kinetics, meaning that it is hard to deconvolve whether the elevated ATPase rate with dATP is due to changes in the ON/OFF population and/or intrinsic ATPase activity. Similarly, mavacamten reduces the rate of phosphate release from myosin, and this effect is not strictly coupled to the formation of the OFF state (e.g., see PMID: 40118457). As such, it is difficult to deconvolve drug-based changes in the inherent ATPase kinetics of the myosin from changes in the OFF-state population.

    4. Reviewer #3 (Public review):

      Summary:

      This is a very interesting paper extending the use of SHG to the study of relaxed muscle and its use to assess the order-disorder (and on /off) states of myosin heads in the thick filament. The work convincingly shows that SHG and the parameter gamma provide a reliable measure of the state of the myosin heads in a range of different relaxed muscle fibres, both intact and skinned, and in myofibrils. In mini pig cardiac fibres, the use of dATP and mavacamten increased or decreased the number of heads in the disordered state, respectively. On the assumption that these treatments push myosins fully into the disordered or ordered state, then this allows the fraction of ordered heads to be assessed under a wide variety of conditions. It is unfortunate that dATP treatment was not used (as mavacmten was) on rabbit psoas and mouse samples to further test this hypothesis.

      The results with the myosin mutant R403Q support the idea that this mutation reduces the fraction of myosin heads in the ordered state and that mavacamten can recover the WT situation.

      The results from SHG were compared with parallel studies using X-rays to validate the conclusions. Independent fibre ATPase data further support the conclusions.

      The work is solid and provides a novel approach to assessing the activity state of muscle thick filaments. The authors point out some of the potential uses of this approach in the future, including time-resolved SHG measurements. Indeed, jumps in mavacamten or dATP concentration with time-resolved SHG could measure the rates of entry and exit from the ordered, off state of the filament. A measurement is urgently needed in the field.

      Strengths:

      (1) The SHG signal is convincingly shown to assess the fraction of ordered/disordered myosin heads in the thick filament of a variety of muscle fibres.

      (2) The results are similar for rabbit psoas, mouse, and minipig cardiac fibres. Skinning the fibres and production of myofibrils do not change the SHG signal.

      (3) Use of myosin R403Q mutant in mini pig confirms a loss of ordered myosin heads, and the ordered heads can be recovered by mavacamten.

      (4) Parallel X-ray scattering and ATPase data support the conclusions.

      (5) Assuming that dATP and mavacamten generate 100% disordered vs ordered myosin heads respectively, then the percentage of ordered heads can be calculated for a variety of conditions.

      Weaknesses:

      (1) Issues like the effect of fibre disarray and lattice spacing on the SHG signal are not well defined.

      (2) The, now well-defined heterogeneity of thick filament structure is not acknowledged.

      (3) dATP was only used on minipig cardiac fibres. The effect of dATP on rabbit psoas and mouse cardiac fibres would be a useful comparison and would help validate the calculation of % ordered heads.

    1. eLife Assessment

      This important study demonstrates that yeast populations can rapidly evolve freeze-thaw tolerance by converging on a trehalose-rich, quiescence-like state, illuminating a general physiological route to extreme-stress adaptation. The evidence is solid, combining rigorous experimental-evolution design with multi-scale phenotyping, biophysical measurements, whole-genome sequencing, and quantitative modeling that together support the mechanistic conclusions. Questions about the novelty relative to prior growth/stress tolerance links, the precise genetic versus non-genetic drivers of trehalose up-regulation, and the breadth of independently evolved lines. These are areas for clarification, but these do not substantially weaken the overall contribution.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript presents findings on the adaptation mechanisms of Saccharomyces cerevisiae under extreme stress conditions. The authors try to generalize this to adaptation to stress tolerance. A major finding is that S. cerevisiae evolves a quiescence-like state with high trehalose to adapt to freeze-thaw tolerance independent of their genetic background. The manuscript is comprehensive, and each of the conclusions is well supported by careful experiments.

      Strengths:

      This is excellent interdisciplinary work.

      Weaknesses: .

      I have questions regarding the overall novelty of the proposal, which I would like the authors to explain.

      (1) Earlier papers have shown that loss of ribosomal proteins, that slow growth, leads to better stress tolerance in S. cerevisiae. Given this, isn't it expected that any adaptation that slows down growth would, overall, increase stress tolerance? Even for other systems, it has been shown that slowing down growth (by spore formation in yeast or bacteria/or dauer formation in C. elegans) is an effective strategy to combat stress and hence is a likely route to adaptation. The authors stress this as one of the primary findings. I would like the authors to explain their position, detailing how their findings are unexpected in the context of the literature.

      (2) Convergent evolution of traits: I find the results unsurprising. When selecting for a trait, if there is a major mode to adapt to that stress, most of the strains would adapt to that mode, independent of the route. According to me, finding out this major route was the objective of many of the previous reports on adaptive evolution. The surprising part in the previous papers (on adaptive evolution of bacteria or yeast) was the resampling of genes that acquired mutations in multiple replicates of an evolution experiments, providing a handle to understand the major genetic route or the molecular mechanism that guides the adaptation (for example in this case it would be - what guides the over-accumulation of trehalose). I fail to understand why the authors find the results surprising, and I would be happy to understand that from the authors. I may have missed something important.

      (3) Adaptive evolution would work on phenotype, as all of selective evolution is supposed to. So, given that one of the phenotypes well-known in literature to allow free-tolerance is trehalose accumulation, I think it is not surprising that this trait is selected. For me, this is not a case of "non-genetic" adaptation as the authors point out: it is likely because perturbation of many genes can individually result in the same outcome - upregulation of trehalose accumulation. Thereby, although the adaptation is genetic, it is not homogeneous across the evolving lines - the end result is. Do the authors check that the trait is actually a non-genetic adaptation, i.e., if they regrow the cells for a few generations without the stress, the cells fall back to being similarly only partially fit to freeze-thaw cycles? Additionally, the inability to identify a network that is conserved in the sequencing does not mean that there is no regulatory pathway. A large number of cryptic pathways may exist to alter cellular metabolic states.<br /> This is a point in continuation of point #2, and I would like to understand what I have missed.

      (4) To propose the convergent nature, it would be important to check for independently evolved lines and most probably more than 2 lines. It is not clear from their results section if they have multiple lines that have evolved independently.

      (5) For the genomic studies, it is not clear if the authors sequenced a pool or a single colony from the evolved strains. This is an important point, since an average sequence will miss out on many mutations and only focus on the mutations inherited from a common ancestral cell. It is also not clear from the section.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used experimental evolution, repeatedly subjecting Saccharomyces cerevisiae populations to rapid liquid-nitrogen freeze-thaw cycles while tracking survival, cellular biophysics, metabolite levels, and whole-genome sequence changes. Within 25 cycles, viability rose from ~2 % to ~70 % in all independent lines, demonstrating rapid and highly convergent adaptation despite distinct starting genotypes. Evolved cells accumulated about threefold more intracellular trehalose, adopted a quiescence-like phenotype (smaller, denser, non-budding cells), showed cytoplasmic stiffening and reduced membrane damage, and re-entered growth with shorter lag traits that together protected them from ice-induced injury. Whole-genome sequencing indicated that multiple genetic routes can yield the same mechano-chemical survival strategy. A population model in which trehalose controls quiescence entry, growth rate, lag, and freeze-thaw survival reproduced the empirical dynamics, implicating physiological state transitions rather than specific mutations as the primary adaptive driver. The study therefore concludes that extreme-stress tolerance can evolve quickly through a convergent, trehalose-rich quiescence-like state that reinforces membrane integrity and cytoplasmic structure.

      Strengths:

      The strengths of the paper are the experimental design, data presentation and interpretation, and that it is well-written.

      Weaknesses:

      (1) While the phenotyping is thorough, a few more growth curves would be quite revealing to determine the extent of cross-stress protection. For example, comparing growth rates under YPD vs. YPEG (EtOH/glycerol), and measuring growth at 37ºC or in the presence of 0.8 M KCl.

      (2) Is GEMS integrated prior to evolution? Are the evolved cells transformable?

      (3) From the table, it looks like strains either have mutations in Ras1/2 or Vac8. Given the known requirements of Ras/PKA signaling for the G1/S checkpoint (to make sure there are enough nutrients for S phase), this seems like a pathway worth mentioning and referencing. Regarding Vac8, its emerging roles in NVJ and autophagy suggest another nutrient checkpoint, perhaps through TORC1. The common theme is rewired metabolism, which is probably influencing the carbon shuttling to trehalose synthesis.

    1. eLife Assessment

      This study reports the important development and characterization of next-generation analogs of the molecule AA263, which was previously identified for its ability to promote adaptive ER proteostasis remodeling. The evidence supporting the conclusions is convincing, with rigorous assays used to benchmark the changes in potency and efficacy of the AA263 analogs as well as AA263 targets. The ability of AA263 analogs to restore the loss of function associated with disease-associated proteins prone to misfolding will be of interest to pharmacologists, chemical biologists, and cell biologists, as well as those working on protein misfolding disorders.

    2. Reviewer #1 (Public review):

      Summary:

      This study builds off prior work that focused on the molecule AA147 and its role as an activator of the ATF6 arm of the unfolded protein response. In prior manuscripts, AA147 was shown to enter the ER, covalently modify a subset of protein disulfide isomerases (PDIs), and improve ER quality control for the disease-associated mutants of AAT and GABAA. Unsuccessful attempts to improve the potency of AA147 have led the authors to characterize a second hit from the screen in this study: the phenylhydrazone compound AA263. The focus of this study on enhancing the biological activity of the AA147 molecule is compelling, and overcomes a hurdle of the prior AA147 drug that proved difficult to modify. The study successfully identifies PDIs as a shared cellular target of AA263 and its analogs. The authors infer, based on the similar target hits previously characterized for AA147, that PDI modification accounts for a mechanism of action for AA263.

      Strengths:

      The authors are able to establish that, like AA147, AA263 covalently targets ER PDIs. The work establishes the ability to modify the AA263 molecule to create analogs with more potency and efficacy for ATF6 activation. The "next generation" analogs are able to enhance the levels of functional AAT and GABAA receptors in cellular models expressing the Z-variant of AAT or an epilepsy-associated variant of the GABAA receptor, outlining the therapeutic potential for this molecule and laying the foundation for future organism-based studies.

      Weaknesses:

      Arguably, the work does not fully support the statement provided in the abstract that the study "reveals a molecular mechanism for the activation of ATF6". The identification of targets of AA263 and its analogs is clear. However, it is a presumption that the overlap in PDIs as targets of both AA263 and AA147 means that AA263 works through the PDIs. While a likely mechanism, this conclusion would be bolstered by establishing that knockdown of the PDIs lessens drug impact with respect to ATF6 activation. Alternatively, it has previously been suggested that the cell-type dependent activity of AA263 may be traced to the presence of cell-type specific P450s that allow for the metabolic activation of AA263 or cell-type specific PDIs (Plate et al 2016; Paxman et al 2018). If the PDI target profile is distinct in different cell types, and these target difference correlates with ATF6-induced activity by AA263, that would also bolster the authors' conclusion.

    3. Reviewer #2 (Public review):

      Modulating the UPR by pharmacological targeting of its sensors (or regulators) provides mostly uncharted opportunities in diseases associated with protein misfolding in the secretory pathway. Spearheaded by the Kelly and Wiseman labs, ATF6 modulators were developed in previous years that act on ER PDIs as regulators of ATF6. However, hurdles in their medicinal chemistry have hampered further development. In this study, the authors provide evidence that the small molecule AA263 also targets and covalently modifies ER PDIs, with the effect of activating ATF6. Importantly, AA263 turned out to be amenable to chemical optimization while maintaining its desired activity. Building on this, the authors show that AA263 derivatives can improve the aggregation, trafficking, and function of two disease-associated mutants of secretory pathway proteins. Together, this study provides compelling evidence for AA263 (and its derivatives) being interesting modulators of ER proteostasis. Mechanistic details of its mode of action will need more attention in future studies that can now build on this.

      In detail, the authors provide strong evidence that AA263 covalently binds to ER PDIs, which will inhibit the protein disulfide isomerase activity. ER PDIs regulate ATF6, and thus their finding provides a mechanistic interpretation of AA263 activating the UPR. It should be noted, however, that AA263 shows broad protein labeling (Figure 1G), which may suggest additional targets, beyond the ones defined as MS hits in this study. Also, a further direct analysis of the IRE1 and PERK pathways (activated or not by AA263) would have been a benefit, as e.g., PDIA1, a target of AA263, directly regulates IRE1 (Yu et al., EMBOJ, 2020), and other PDIs also act on PERK and IRE1. The authors interpret modest activation of IRE1/PERK target genes (Figure 2C) as an effect on target gene overlap, indeed the most likely explanation based on their selective analyses on IRE1 (ERdj4) and PERK (CHOP) downstream genes, but direct activation due to the targeting of their PDI regulators is also a possible explanation. Further key findings of this paper are the observed improvement of AAT behavior and GABAA trafficking and function. Further strength to the mechanistic conclusion that ATF6 activation causes this could be obtained by using ATF6 inhibitors/knockouts in the presence of AA263 (as the target PDIs may directly modulate the behavior of AAT and/or GABAA). Along the same line, it also warrants further investigation why the different compounds, even if all were used at concentrations above their EC50, had different rescuing capacities on the clients.

      Together, the study now provides a strong basis for such in-depth mechanistic analyses.

    4. Reviewer #3 (Public review):

      Summary:

      This study aims to develop and characterize phenylhydrazone-based small molecules that selectively activate the ATF6 arm of the unfolded protein response by covalently modifying a subset of ER-resident PDIs. The authors identify AA263 as a lead scaffold and optimize its structure to generate analogs with improved potency and ATF6 selectivity, notably AA263-20. These compounds are shown to restore proteostasis and functional expression of disease-associated misfolded proteins in cellular models involving both secretory (AAT-Z) and membrane (GABAA receptor) proteins. The findings provide valuable chemical tools for modulating ER proteostasis and may serve as promising leads for therapeutic development targeting protein misfolding diseases.

      Strengths:

      (1) The study presents a well-defined chemical biology framework integrating proteomics, transcriptomics, and disease-relevant functional assays.

      (2) Identification and optimization of a new electrophilic scaffold (AA263) that selectively activates ATF6 represents a valuable advance in UPR-targeted pharmacology.

      (3) SAR studies are comprehensive and logically drive the development of more potent and selective analogs such as AA263-20.

      (4) Functional rescue is demonstrated in two mechanistically distinct disease models of protein misfolding-one involving a secretory protein and the other a membrane protein-underscoring the translational relevance of the approach.

      Weaknesses:

      (1) ATF6 activation is primarily inferred from reporter assays and transcriptional profiling; however, direct evidence of ATF6 cleavage is lacking.

      (2) While the mechanism involving PDI modification and ATF6 activation is plausible, it remains incompletely characterized.

      (3) No in vivo data are provided, leaving the pharmacological feasibility and bioavailability of these compounds in physiological systems unaddressed.

    1. eLife Assessment

      This article presents valuable findings on how the timing of cooling affects the timing of autumn bud set in European beech saplings. The study leverages extensive experimental data and provides an interesting conceptual framework of the various ways in which warming can affect bud set timing. The support for the findings is incomplete, though extra justifications of the experimental settings, clarifications of the interpretation of the results, and alternative statistical analyses can make the conclusions more robust.

    2. Reviewer #1 (Public review):

      Summary:

      This study provided key experimental evidence for the "Solstice-as-Phenology-Switch Hypothesis" through two temperature manipulation experiments.

      Strengths:

      The research is data-rich, particularly in exploring the effects of pre- and post-solstice cooling, as well as daytime versus nighttime cooling, on bud set timing, showcasing significant innovation. The article is well-written, logically clear, and is likely to attract a wide readership.

      Weaknesses:

      However, there are several issues that need to be addressed.

      (1) In Experiment 1, significant differences were observed in the impact of cooling in July versus August. July cooling induced a delay in bud set dates that was 3.5 times greater in late-leafing trees compared to early-leafing ones, while August cooling induced comparable advances in bud set timing in both early- and late-leafing trees. The study did not explain why the timing (July vs. August) resulted in different mechanisms. Can a link be established between phenology and photosynthetic product accumulation? Additionally, can the study differentiate between the direct warming effect and the developmental effect, and quantify their relative contributions?

      (2) The two experimental setups differed in photoperiod: one used a 13-hour photoperiod at approximately 4,300 lux, while the other used an ambient day length of 16 hours with a light intensity of around 6,900 lux. What criteria were used to select these conditions, and do they accurately represent real-world scenarios? Furthermore, as shown in Figure S1, significant differences in soil moisture content existed between treatments - could this have influenced the conclusions?

      (3) The authors investigated how changes in air temperature around the summer solstice affected primary growth cessation, but the summer solstice also marks an important transition in photoperiod. How can the influence of photoperiod be distinguished from the temperature effect in this context?

      (4) The study utilized potted trees in a controlled environment, which limits the generalization of the results to natural forests. Wild trees are subject to additional variables, such as competition and precipitation. Moreover, climate differences between years (2022 vs. 2023) were not controlled. As such, the conclusions may be overgeneralized to "all temperate tree species", as the experiment only involved potted European beech seedlings. The discussion would benefit from addressing species-specific differences.

    3. Reviewer #2 (Public review):

      In 'Developmental constraints mediate the summer solstice reversal of climate effects on European beech bud set', Rebindaine and co-authors report on two experiments on Fagus sylvatica where they manipulated temperatures of saplings between day and night and at different times of year. I enjoyed reading this paper and found it well written. I think the experiments are interesting, but I found the exact methods somewhat extreme compared to how the authors present them. Further, given that much of the experiment happened outside, I am not sure how much we can generalize from one year for each experiment, especially when conducted on one population of one species. I next expand briefly on these concerns and a few others.

      Concerns:

      (1) As I read the Results, I was surprised the authors did not give more information on the methods here. For example, they refer to the 'effect of July cooling' but never say what the cooling was. Once I read the methods, I feared they were burying this as the methods feel quite extreme given the framing of the paper. The paper is framed as explaining observational results of natural systems, but the treatments are not natural for any system in Europe that I have worked in. For example, a low of 2 {degree sign}C at night and 7 {degree sign}C during the day through the end of May and then 7/13 {degree sign}C in July is extreme. I think these methods need to be clearly laid out for the reader so they can judge what to make of the experiment before they see the results.

      (2) I also think the control is confounded with the growth chamber experience in Experiment 1. That is, the control plants never experience any time in a chamber, but all the treatments include significant time in a chamber. The authors mention how detrimental chamber time can be to saplings (indeed, they mention an aphid problem in experiment 2), so I think they need to be more upfront about this. The study is still very valuable, but again, we may need to be more cautious in how much we infer from the results.

      (3) I suggest the authors add a figure to explain their experiments, as they are very hard to follow. Perhaps this could be added to Figure 1?

      (4) Given how much the authors extrapolate to carbon and forests, I would have liked to see some metrics related to carbon assimilation, versus just information on timing.

      (5) Fagus sylvatica is an extremely important tree to European forests, but it also has outlier responses to photoperiod and other cues (and leafs out very late), so using just this species to then state 'our results likely are generalisable across temperate tree species' seems questionable at best.

      (6) Another concern relates to measuring the end of season (EOS). It is well known that different parts of plants shut down at different times, and each metric of end of season - budset, end of radial expansion, leaf coloring, etc - relates to different things. Thus, I was surprised that the authors ignore all this complexity and seem to equate leaf coloring with budset (which can happen MONTHS before leaf coloring often) and with other metrics. The paper needs a much better connection to the physiology of end of season and a better explanation for the focus on budset. Relatedly, I was surprised that the authors cite almost none of the literature on budset, which generally suggests it is heavily controlled by photoperiod and population-level differences in photoperiod cues, meaning results may be different with a different population of plants.

      (7) I didn't fully see how the authors' results support the Solstice as Switch hypothesis, since what timing mattered seemed to depend on the timing of treatment and was not clearly related to the solstice. Could it be that these results suggest the Solstice as Switch hypothesis is actually not well supported (e.g., line 135) and instead suggest that the pattern of climate in the summer months affects end-of-season timing?

    4. Author Response:

      We would like to thank the reviewers and editors for your consideration of our manuscript, your kind comments about the value of our study, and for providing constructive feedback. We intend to submit a revised version of the manuscript and address the concerns and recommendations. This will include improvements to the statistical analyses, text content, and text format. 

      Specifically, we will:

      1. Revise the text to better explain the experimental methods, interpretation of results and how our findings are situated in the literature. Although we still believe that there is sufficient evidence to suggest that temperate tree species other than Fagus sylvatica may show similar patterns, we understand the reviewers concerns regarding these statements and will revise them.

      2. Add a supplemetal analysis of leaf chlorophyll content data to use leaf discolouration as an alternative marker of the end of the growing season. On this we would like to make two important points. Firstly, we agree with the reviewers that bud set often occurs before leaf discolouration. In experiment 1, bud set occurred on average on day-of-year (DOY) 262, onset of leaf senescence (last day when leaf chlorophyll content fell below 90% of its measured maximum) occurred on average at the same time – DOY 261, and mid-senescence (50% leaf discolouration) occurred on DOY 320. We do not agree that this excludes the combined discussion of bud set and leaf senescence timing. Whilst environmental drivers can affect parts of plants differently, often responses from different end-of-season indicators (e.g. bud set and leaf discolouration) are similar, even if only directionally. Secondly, shifts in bud set timing will remain the key focus of the manuscript as we believe it has greater physiological relevence to plant development, whereas leaf discolouration may simply follow bud set as a symptom of the completion of growth (reduced sink activity).

      3. Address points raised about potential additional drivers of our observed phenological shifts. For example, photoperiod effects and the Sosltice-as-Phenology-Switch hypothesis are not mutually exclusive, the annual progression of photoperiod is fundamental to how we suggest the switch is regulated (please see L66-68 in the original manuscript). The reviewers also comment on the significant differences in soil water content between the treatment groups in Fig. S1. However, all pots were watered sufficiently to avoid water deficit and all efforts were made to minimise differences in water availabiltiy. A provisional analysis shows only one treatment pair (6 - Late_July_Extreme vs. 7 - Early_August_Moderate) had significantly different soil water content, a pair whose differences are not discussed.

    1. eLife Assessment

      This landmark study describes the structure of the human RAD51 filament with a recombination intermediate called the displacement loop (D-loop). Using cryogenic structural, biochemical, and single-molecule analyses, the authors provide compelling evidence on how the RAD51 filament promotes strand exchange between single-stranded and double-stranded DNAs. The findings are highly relevant to the fields of homologous recombination, DNA repair, and genome stability.

    2. Reviewer #1 (Public review):

      Summary:

      The paper describes the cryoEM structure of RAD51 filament on the recombination intermediate. In the RAD51 filament, the insertion of a DNA-binding loop called the L2 loop stabilizes the separation of the complementary strand for the base-pairing with an incoming ssDNA and the non-complementary strand, which is captured by the second DNA-binding channel called the site II. The molecular structure of the RAD51 filament with a recombination intermediate provides a new insight into the mechanism of homology search and strand exchange between ssDNA and dsDNA.

      Strong points:

      This is the first human RAD51 filament structure with a recombination intermediate called the D-loop. The work has been done with great care, and the results shown in the paper are compelling based on cryo-EM and biochemical analyses. The paper is really nice and important for researchers in the field of homologous recombination, which gives a new view on the molecular mechanism of RAD51-mediated homology search and strand exchange.

      Comments on revisions:

      The authors nicely address most of the previous points.

    3. Reviewer #2 (Public review):

      Homologous recombination is essential for DNA double-strand break repair, with RAD51-catalyzed strand exchange at its core. This study presents a 2.64 Å resolution cryogenic electron microscopy structure of the RAD51 D-loop complex, achieved through reconstitution of a RAD51 mini-filament. The structure uncovers how specific RAD51 residues drive strand exchange, offering atomic-level insight into the mechanics of eukaryotic HR and DNA repair.

      Comments on revisions:

      Authors acknowledged:

      "We acknowledge that there exists an extensive body of literature that has investigated the polarity of strand exchange by RecA and RAD51 under a variety of experimental conditions, and we have added a brief comment to the text to reflect this, as well as some of the key citations. Undoubtedly, and as we also mention in our reply to the public reviews, further experimental work will be needed for a full reconciliation of the available evidence."

      In the revised manuscript, this is reflected in the statement:

      "Our mechanistic interpretation of static D-loop structures awaits full reconciliation with earlier efforts to determine strand-exchange polarity for RecA and RAD51 measured under a variety of experimental conditions."

      Among the four cited studies, my understanding (as a person who has never studied this subject of polarity) is as follows:<br /> •References 50 (EMBO J. 1997), 51 (Cell. 1995), and 52 (Nature. 2008) suggest that the strand exchange by human RAD51 occurs with a polarity opposite to that of RecA-that is, in the 5′→3′ direction relative to the complementary strand, or 3′→5′ relative to the initiating single-stranded DNA (isDNA).<br /> • In contrast, reference 49 (PNAS 1998) proposed that 5′→3′ polarity (relative to isDNA) is conserved across RecA, human RAD51, and yeast RAD51.

      Given the substantial structural analysis provided in the current manuscript, it would strengthen the work to include a concise description of these earlier biochemical findings, rather than citing them without context. This would benefit readers who are not familiar with the longstanding studies in the field and allow for a more informed interpretation of how the structural observations may reconcile or contrast with previous work.

    4. Reviewer #3 (Public review):

      Summary:

      Built on their previous pioneer expertise in studying RAD51 biology, in this paper, the authors aim to capture and investigate the structural mechanism of human RAD51 filament bound with a displacement loop (D-loop), which occurs during the dynamic synaptic state of the homologous recombination (HR) strand-exchange step. As the structures of both pre- and post-synaptic RAD51 filaments were previously determined, a complex structure of RAD51 filament during strand exchange is one of the key missing pieces of information for a complete understanding of how RAD51 functions in HR pathway. This paper aims to determine the high-resolution cryo-EM structure of RAD51 filament bound with D-loop. Combined with mutagenesis analysis and biophysical assays, the authors aim to investigate the D-loop DNA structure, RAD51 mediated strand separation and polarity, and a working model of RAD51 during HR strand invasion in comparison with RecA.

      Strengths:

      (1) The structural work and associated biophysical assays in this paper are solid, elegantly designed and interpreted.  These results provide novel insights into RAD51's function in HR.

      (2) The DNA substrate used was well designed, taking into consideration of the nucleotide number requirement of RAD51 for stable capture of donor DNA. This DNA substrate choice lays the foundation for successfully determining the structure of the RAD51 filament on D-loop DNA using single-partial cryo-EM.

      (3) The authors utilised their previous expertise in capping DNA ends using monometric streptavidin and combined their careful data collection and processing to determine the cryo-EM structure of full-length human RAD51 bound at D-loop in high resolution. This interesting structure forms the core part of this work and allows detailed mapping of DNA-DNA and DNA-protein interaction among RAD51, invading strands, and donor DNA arms (Figures 1, 2, 3, 4). The geometric analysis of D-loop DNA bound with RAD51 and EM density for homologous DNA pairing are also impressive (Figure S5). The previously disordered RAD51's L2-loop is now ordered and traceable in the density map and functions as a physical spacer when bound with D-loop DNA. Interestingly, the authors identified that the side chain position of F279 in the L2_loop of RAD51_H differs from other F279 residues in L2-loops of E, F and G protomers. This asymmetric binding of L2 loops and RAD51_NTD binding with donor DNA arms forms the basis of the proposed working model about the polarity on csDNA during RAD51-mediated strand exchange.

      (4) This work also includes mutagenesis analysis and biophysical experiments, especially EMSA, single-molecule fluorescence imaging using an optical tweezer, and DNA strand exchange assay, which are all suitable methods to study the key residues of RAD51 for strand exchange and D-loop formation (Figure 5).

      Weaknesses:

      (1) The proposed model for the 3'-5' polarity of RAD51-mediated strand invasion is based on the structural observations in the cryo-EM structure. This study lacks follow-up biochemical/biophysical experiments to validate the proposed model compared to RecA or developing methods to capture structures of any intermediate states with different polarity models.

      (2) The functional impact of key mutants designed based on structure has not been tested in cells to evaluate how these mutants impact the HR pathway.

      The significance of the work for the DNA repair field and beyond:

      Homologous recombination (HR) is a key pathway for repairing DNA double-strand breaks and involves multiple steps. RAD51 forms nucleoprotein filaments first with 3' overhang single-strand DNA (ssDNA), followed by a search and exchange with a homology strand. This function serves as the basis of an accurate template-based DNA repair during HR. This research addressed a long-standing challenge of capturing RAD51 bound with the dynamic synaptic DNA and provided the first structural insight into how RAD51 performs this function. The significance of this work extends beyond the discovery biology for the DNA repair field, into its medical relevance. RAD51 is a potential drug target for inhibiting DNA repair in cancer cells to overcome drug resistance. This work offers a structural understanding of RAD51's function with D-loop and provides new strategies for targeting RAD51 to improve cancer therapies.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary: 

      The paper describes the cryoEM structure of RAD51 filament on the recombination intermediate. In the RAD51 filament, the insertion of a DNA-binding loop called the L2 loop stabilizes the separation of the complementary strand for the base-pairing with an incoming ssDNA and the non-complementary strand, which is captured by the second DNA-binding channel called the site II. The molecular structure of the RAD51 filament with a recombination intermediate provides a new insight into the mechanism of homology search and strand exchange between ssDNA and dsDNA. 

      Strengths: 

      This is the first human RAD51 filament structure with a recombination intermediate called the D-loop. The work has been done with great care, and the results shown in the paper are compelling based on cryo-EM and biochemical analyses. The paper is really nice and important for researchers in the field of homologous recombination, which gives a new view on the molecular mechanism of RAD51-mediated homology search and strand exchange. 

      Weaknesses: 

      The authors need more careful text writing. Without page and line numbers, it is hard to give comments. 

      We would like to thank the reviewer for their kind words of appreciation of our work.

      Reviewer #2 (Public review):  

      Summary: 

      Homologous recombination (HR) is a critical pathway for repairing double-strand DNA breaks and ensuring genomic stability. At the core of HR is the RAD51-mediated strand-exchange process, in which the RAD51-ssDNA filament binds to homologous double-stranded DNA (dsDNA) to form a characteristic D-loop structure. While decades of biochemical, genetic, and single-molecule studies have elucidated many aspects of this mechanism, the atomic-level details of the strand-exchange process remained unresolved due to a lack of atomic-resolution structure of RAD51 D-loop complex. 

      In this study, the authors achieved this by reconstituting a RAD51 mini-filament, allowing them to solve the RAD51 D-loop complex at 2.64 Å resolution using a single particle approach. The atomic resolution structure reveals how specific residues of RAD51 facilitate the strand exchange reaction. Ultimately, this work provides unprecedented structural insight into the eukaryotic HR process and deepens the understanding of RAD51 function at the atomic level, advancing the broader knowledge of DNA repair mechanisms. 

      Strengths: 

      The authors overcame the challenge of RAD51's helical symmetry by designing a minifilament system suitable for single-particle cryo-EM, enabling them to resolve the RAD51 D-loop structure at 2.64 Å without imposed symmetry. This high resolution revealed precise roles of key residues, including F279 in Loop 2, which facilitates strand separation, and basic residues on site II that capture the displaced strand. Their findings were supported by mutagenesis, strand exchange assays, and single-molecule analysis, providing strong validation of the structural insights. 

      Weaknesses: 

      Despite the detailed structural data, some structure-based mutagenesis data interpretation lacks clarity. Additionally, the proposed 3′-to-5′ polarity of strand exchange relies on assumptions from static structural features, such as stronger binding of the 5′-arm-which are not directly supported by other experiments. This makes the directional model compelling but contradicts several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600). 

      Overall: 

      The 2.6 Å resolution cryoEM structure of the RAD51 D-loop complex provides remarkably detailed insights into the residues involved in D-loop formation. The high-quality cryoEM density enables precise placement of each nucleotide, which is essential for interpreting the molecular interactions between RAD51 and DNA. Particularly, the structural analysis highlights specific roles for key domains, such as the N-terminal domain (NTD), in engaging the donor DNA duplex. 

      This structural interpretation is further substantiated by single-molecule fluorescence experiments using the KK39,40AA NTD mutant. The data clearly show a significant reduction in D-loop formation by the mutant compared to wild-type, supporting the proposed functional role of the NTD observed in the cryoEM model. 

      However, the strand exchange activity interpretation presented in Figure 5B could benefit from a more rigorous experimental design. The current assay measures an increase in fluorescence intensity, which depends heavily on the formation of RAD51-ssDNA filaments. As shown in Figure S6A, several mutants exhibit reduced ability to form such filaments, which could confound the interpretation of strand exchange efficiency. To address this, the assay should either: (1) normalize for equivalent levels of RAD51-ssDNA filaments across samples, or (2) compare the initial rates of fluorescence increase (i.e., the slope of the reaction curve), rather than endpoint fluorescence, to better isolate the strand exchange activity itself. 

      We agree with the reviewer that the reduced filament-forming ability of some of the RAD51 mutants complicates a straightforward interpretation of their strand-exchange assay. Interestingly, the RAD51 mutants that appear most impaired are the esDNA-capture mutants that do not contact the ssDNA in the structure of the pre-synaptic filament. However, the RAD51 NTD mutants, that display the most severe defect in strand-exchange, have a near-WT filament forming ability.

      Based on the structural features of the D-loop, the authors propose that strand pairing and exchange initiate at the 3'-end of the complementary strand in the donor DNA and proceed with a 3'-to-5' polarity. This conclusion, drawn from static structural observations, contrasts with several well-established biochemical studies that support a 5'-to-3' polarity relative to the complementary strand (e.g., Cell 1995, PMID: 7634335; JBC 1996, PMID: 8910403; Nature 2008, PMID: 18256600). While the structural model is compelling and methodologically robust, this discrepancy underscores the need for further experiments. 

      We would like to thank the reviewer for highlighting the importance of our findings to our understanding of the mechanism of homologous recombination.

      The reviewer correctly points out that the polarity of strand exchange by RecA and RAD51 is an extensively researched topic that has been characterised in several authoritative studies. In our paper, we simply describe the mechanistic insights obtained from the structural D-loop models of RAD51 (our work) and RecA (Yang et al, PMID: 33057191).The structures illustrate a very similar mechanism of Dloop formation that proceeds with opposite polarity of strand exchange for RAD51 and RecA. Comparison of the D-loop structures for RecA and RAD51 provides an attractive explanation for the opposite polarity, as caused by the different positions of their dsDNA-binding domains in the filament structure. 

      We agree with the reviewer that further investigation will be needed for an adequate rationalisation of the available evidence. We will mention the relevant literature in the revised version of the manuscript.

      Reviewer #3 (Public review):  

      Summary: 

      Built on their previous pioneer expertise in studying RAD51 biology, in this paper, the authors aim to capture and investigate the structural mechanism of human RAD51 filament bound with a displacement loop (D-loop), which occurs during the dynamic synaptic state of the homologous recombination (HR) strand-exchange step. As the structures of both pre- and post-synaptic RAD51 filaments were previously determined, a complex structure of RAD51 filaments during strand exchange is one of the key missing pieces of information for a complete understanding of how RAD51 functions in the HR pathway. This paper aims to determine the high-resolution cryo-EM structure of RAD51 filament bound with the D-loop. Combined with mutagenesis analysis and biophysical assays, the authors aim to investigate the D-loop DNA structure, RAD51-mediated strand separation and polarity, and a working model of RAD51 during HR strand invasion in comparison with RecA. 

      Strengths: 

      (1) The structural work and associated biophysical assays in this paper are solid, elegantly designed, and interpreted.  These results provide novel insights into RAD51's function in HR. 

      (2) The DNA substrate used was well designed, taking into consideration the nucleotide number requirement of RAD51 for stable capture of donor DNA. This DNA substrate choice lays the foundation for successfully determining the structure of the RAD51 filament on D-loop DNA using single-particle cryo-EM. 

      (3) The authors utilised their previous expertise in capping DNA ends using monomeric streptavidin and combined their careful data collection and processing to determine the cryo-EM structure of full-length human RAD51 bound at the D-loop in high resolution. This interesting structure forms the core part of this work and allows detailed mapping of DNA-DNA and DNA-protein interaction among RAD51, invading strands, and donor DNA arms (Figures 1, 2, 3, 4). The geometric analysis of D-loop DNA bound with RAD51 and EM density for homologous DNA pairing is also impressive (Figure S5). The previously disordered RAD51's L2-loop is now ordered and traceable in the density map and functions as a physical spacer when bound with D-loop DNA. Interestingly, the authors identified that the side chain position of F279 in the L2_loop of RAD51_H differs from other F279 residues in L2-loops of E, F, and G protomers. This asymmetric binding of L2 loops and RAD51_NTD binding with donor DNA arms forms the basis of the proposed working model about the polarity of csDNA during RAD51-mediated strand exchange. 

      (4) This work also includes mutagenesis analysis and biophysical experiments, especially EMSA, singlemolecule fluorescence imaging using an optical tweezer, and DNA strand exchange assay, which are all suitable methods to study the key residues of RAD51 for strand exchange and D-loop formation (Figure 5). 

      Weaknesses: 

      (1) The proposed model for the 3'-5' polarity of RAD51-mediated strand invasion is based on the structural observations in the cryo-EM structure. This study lacks follow-up biochemical/biophysical experiments to validate the proposed model compared to RecA or developing methods to capture structures of any intermediate states with different polarity models. 

      (2) The functional impact of key mutants designed based on structure has not been tested in cells to evaluate how these mutants impact the HR pathway. 

      The significance of the work for the DNA repair field and beyond: 

      Homologous recombination (HR) is a key pathway for repairing DNA double-strand breaks and involves multiple steps. RAD51 forms nucleoprotein filaments first with 3' overhang single-strand DNA (ssDNA), followed by a search and exchange with a homologous strand. This function serves as the basis of an accurate template-based DNA repair during HR. This research addressed a long-standing challenge of capturing RAD51 bound with the dynamic synaptic DNA and provided the first structural insight into how RAD51 performs this function. The significance of this work extends beyond the discovery of biology for the DNA repair field, into its medical relevance. RAD51 is a potential drug target for inhibiting DNA repair in cancer cells to overcome drug resistance. This work offers a structural understanding of RAD51's function with the D-loop and provides new strategies for targeting RAD51 to improve cancer therapies. 

      We thank the reviewer for their positive comments on the significance of our work. Concerning the proposed polarity of strand exchange based on our structural finding, please see our reply to the previous reviewer; we agree with the reviewer that further experimentation will be needed to to reach a settled view on this.

      Testing the functional effects of the RAD51 mutants on HR in cells was not an aim of the current work but we agree that it would be a very interesting experiment, which would likely provide further important insights into the mechanism of strand exchange at the core of the HR reaction.

      Reviewer #1 (Recommendations for the authors):

      Major points:

      (1) Structural analysis showed a critical role of F279 in the L2 loop. However, the biochemical study showed that the F279A substitution did not provide a strong defect in the in vitro strand exchange, as shown in Figure 5B. Moreover, a previous study by Matsuo et al. FEBS J, 2006; ref 43) showed human RAD51-F279A is proficient in the in vitro strand exchange. These suggest that human RAD51 F279 is not critical for the strand exchange. The authors need more discussions of the role of F279 or the L2 for the RAD51-mediated reactions in the Discussion.

      In the strand-exchange essay of Figure 5B, the F279A mutant shows the mildest phenotype, in agreement with the findings of Matsuo et al. Accordingly, in the text we describe the F279A mutant as having a “modest impact” on strand-exchange.

      We have now added a brief comment to the relevant text, pointing out that the result of the strand exchange assay for F279A are in agreement with the previous findings by Matsuo et al., and adding the reference.

      (2) In some parts, the authors cited the newest references rather than the paper describing the original findings. For RAD51 paralogs, why are these three (refs 21,22, 23) selected here? For FIGNL1, why is only one (ref 24) chosen?

      The cited publications were chosen to acquaint the reader with the latest structural and mechanistic advances about the function of some of the most important and well-studied recombination mediator proteins. For completeness, we have now added a further reference for FIGNL1 - Ito, Masaru et al, Nat Comm, 2023 – in the Introduction, to provide the reader with an additional pointer to our current knowledge about the mechanism of FIGNL1 in Homologous Recombination.

      Minor points:

      (1) Page 3, line 1 in the second paragraph, the reaction of "HR": HR should be homology search and strand exchange. HR is used incorrectly throughout the text, please check them. Remove "strandexchange" from ATPases in line 2.

      We believe that HR is used correctly in this context, as we refer to the biochemical reactions of HR, which includes the search for homology and strand exchange.

      We have removed “strand-exchange” from ATPases in line 2, as requested by the reviewer.

      (2) Supplementary Figure 1B, C, "EMSA" experiment: Please indicate an experimental condition in the legend: how ssDNA and dsDNA were mixed with RAD51. In (B), this is not an actual EMSA result, but rather a native gel analysis of reaction products with the D-loop. In (C), was the binding of RAD51 to the pre-formed D-loop examined? Which is correct here? Moreover, why do the authors need streptavidin in this experiment? Please explain why this is necessary for the EMSA assay. Please show where is Cy3 or Cy5 labels on the DNAs should be shown in the schematic drawing.

      The conditions for the experiments of Supplementary figure 1B, C are reported in the Methods section.

      Panel B shows the mobility shifts of the ssDNA and dsDNA sequences in panel A, so it is appropriate to describe it as an EMSA.

      We did not examine the binding of RAD51 to a pre-formed D-loop.

      We used streptavidine in the experiment of Supplementary Figure 1C to show that streptavidine binding did not interfere with D-loop reconstitution.

      The position of the Cy3, Cy5 labels in the DNAs is reported in Table S1.

      (3) Figure S4B, page 6, line 6 from the top, 5'-arm and 3'-arm: please add them to the figure. And also, please explain what 5'-arm and 3'-arm are here in the text, as shown in lines 3-5 in the second paragraph of the same page.

      We thank the reviewer for spotting this slight incongruity. We have removed the reference to 5’- and 3’arms of the donor DNA in the initial description of the D-loop (first paragraph of the “D-loop structure” section, 6 lines from the top), as the nomenclature for the arms of the donor DNA is introduced more appropriately in the following paragraph. Thus, there is no need to re-label Figure S4B; we note that the 5’- and 3’-labels are added to the arms of the donor DNA in Figure S4D.

      (4) Page 7, line 4, and Figure 2E, "C24": C24 should be C26 here (Figure 2D shows that position 24 in esDNA is "T").

      We thank the reviewer for spotting this typo, that is now corrected in the revised version of Figure 2 and in the text.

      (5) Page 8, line 1, K284: It would be nice to show "K284" in Figure 3F.

      We have added the side chain of K284 to Figure 3F, as suggested by the reviewer.

      (6) Page 8, second paragraph, line 3 from the bottom, "5'-arm" should be "3'-arm" for the binding of RAD51A NTD to ds DNA (Figure 4D).

      We thank the reviewer for spotting this typo, that is now corrected in the revised version of the text.

      Reviewer #2 (Recommendations for the authors):

      I understand that the strand exchange polarity of RAD51 should be opposite to that of RecA. But in the RecA manuscript (Nature 2020), it states (in the extended figure 1) " Because the mini-filament consists of fused RecA protomers, it does not reflect the effects a preferential polarity of RecA polymerization might have on the directionality of strand exchange. Also, our strand exchange reactions do not include the single-stranded DNA binding protein SSB that is involved in strand exchange in vivo and may sequester released DNA strands."

      We are aware that the findings by Yang et al, 2020 were obtained with a multi-protomeric RecA chimera and that their construct might not therefore recapitulate a potential effect of RecA polymerisation on the directionality of strand-exchange. 

      Comparison of the RecA and RAD51 D-loop structures shows that RecA and RAD51 adopt the same asymmetric mechanism of D-loop formation, which begins at one arm of the donor DNA and proceeds with donor unwinding and strand invasion until the second arm is captured, completing D-loop formation. However, the cryoEM structures provide compelling evidence that, after engagement with the donor DNA, RecA and RAD51 proceed to unwind the donor with opposite polarity; the structures provide a clear rationale for this, because of the different position of their dsDNA-binding domains relative to the ATPase domain.

      We acknowledge that there exists an extensive body of literature that has investigated the polarity of strand exchange by RecA and RAD51 under a variety of experimental conditions, and we have added a brief comment to the text to reflect this, as well as some of the key citations. Undoubtedly, and as we also mention in our reply to the public reviews, further experimental work will be needed for a full reconciliation of the available evidence.

      Reviewer #3 (Recommendations for the authors):

      (1) I have a minor comment regarding the DNA shown in the structural figures in this work. The authors have used different colours to differentiate between isDNA, esDNA, and csDNA for easier interpretation. However, these colour codes are inconsistent across Figures 1, 2, 3, and 5. This inconsistency makes it difficult to interpret which strand is which, particularly for readers unfamiliar with D-loops and strand invasion. A consistent colour scheme for the DNA strands would enhance the quality of the structural figures.

      We appreciate the reviewer’s comment about the colour scheme of the strands in the D-loop. We chose a unique colour scheme for each figure, to help the reader focus on the particular structural features that we wanted to highlight in the figure. So for instance, in figure 1D we chose to highlight the relationship (complementary vs identical) of the donor DNA strands with the the invading strand; in figure 2, the emphasis is on distinguishing the homologously paired dsDNA (pink) from the exchanged strand (magenta), as a consequence of L2 loop binding; etc.

      (2) I have another comment regarding the rationale behind naming the RAD51 protomers (A to H) within the structure, which could confuse general readers if not clearly explained. In this paper, the RAD51 protomer is RAD51_A when closest to the 3' end of the isDNA. I assume the authors chose this order because HR generates a 3' ssDNA overhang before strand invasion. It would be beneficial for the introduction and results sections to mention this property of the 3' ssDNA overhang and the reasoning behind this naming strategy. This explanation will help readers understand how it differs from other naming orders used in RecA/RAD51 with ssDNA, where protomer A is closer to the 5' ssDNA.

      We thank the reviewer for their insightful comment. We chose to name as chain A the RAD51 protomer nearest to the 3’-end of the isDNA to be consistent with the naming scheme that we use for all our published RAD51 filament structures.

      (3) I have highlighted some text within this paper that has contradicting parts for authors to clarify and correct:

      "Overall, the structural features of the RAD51 D-loop provide a strong indication that strand pairing and exchange begins at the 3'-end of the complementary strand in the donor DNA and progresses with 3'-to5' polarity (Fig. 5F)"

      "The observed 5'-to-3' polarity of strand-exchange by RAD51 is opposite to the 3'-to-5' polarity of bacterial RecA (Fig. S8), that was determined based on cryoEM structures of RecA D-loops".

      We thank the reviewer for alerting us to this inconsistency that has now been corrected in the revised manuscript.

      (4) Figure S8 last model: NTD should be CTD in the title; Figure 2B: resolution scale bar needs A unit. We thank the reviewer for spotting this typo that has now been corrected in the revised version of figure S8. 

      We couldn’t find a missing resolution scale bar in Figure 2B; however, we have added a missing resolution bar with A unit to Fig. S3B.

    1. eLife Assessment

      This paper examines selection on induced epigenetic variation ("Lamarckian evolution") in response to herbivory in Arabidopsis thaliana. The authors find weak evidence for such adaptation, which contrasts with a recently published study that reported extensive heritable variation induced by the environment. The authors convincingly demonstrate that the findings of the previous study were confounded by mix-ups of genetically distinct material, so that standing genetic variation was mistaken for acquired (epigenetic) variation. Given the controversy surrounding the influence of heritable epigenetic variation on phenotypic variation and adaptation, this study is an important, clarifying contribution; it serves as a timely reminder that sequence-based verification of genetic material should be prioritized when either genetic identity or divergence is of importance to the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors extended a previous study of selective response to herbivory in Arabidopsis, in order to look specifically for selection on induced epigenetic variation ("Lamarckian evolution"). They found no evidence. In addition, the re-examined result from a previously published study arguing that environmentally induced epigenetic variation was common, and found that these findings were almost certainly artifactual.

      Strengths:

      The paper is very clearly written, there is no hype, and the methods used are state-of-the-art.

      Weaknesses:

      The result is negative, so the best you can do is put an upper bound on any effects.

      Significance:

      Claims about epigenetic inheritance and Lamarckian evolution continue to be made based on very shaky evidence. Convincing negative results are therefore important. In addition, the study presents results that, to this reviewer, suggest that the 2024 paper by Lin et al. [26] should probably be retracted.

    3. Reviewer #2 (Public review):

      In this paper, the authors examine the extent to which epigenetic variation acquired during a selection treatment (as opposed to standing epigenetic variation) can contribute to adaptation in Arabidopsis. They find weak evidence for such adaptation and few differences in DNA methylation between experimental groups, which contrasts with another recent study (reference 26) that reported extensive heritable variation in response to the environment. The authors convincingly demonstrate that the conclusions of the previous study were caused by experimental error, so that standing genetic variation was mistaken for acquired (epigenetic) variation. Given the controversy surrounding the possible role of epigenetic variation in mediating phenotypic variation and adaptation, this is an important, clarifying contribution.

      [Editors' note: We thank the authors for responding to the reviewers' comments.]

    4. Author Response:

      The following is the authors’ response to the original reviews

      Reviewer #1(Public Review):

      Summary:

      The authors extended a previous study of selective response to herbivory in Arabidopsis, in order to look specifically for selection on induced epigenetic variation ("Lamarckian evolution"). They found no evidence. In addition, they re-examined result from a previously published study arguing that environmentally induced epigenetic variation was common, and found that these findings were almost certainly artifactual.

      Strengths:

      The paper is very clearly written, there is no hype, and the methods used are state-of-the-art.

      Weaknesses:

      The result is negative, so the best you can do is put an upper bound on any effects.

      Significance:

      Claims about epigenetic inheritance and Lamarckian evolution continue to be made based on very shaky evidence. Convincing negative results are therefore important. In addition, the study presents results that, to this reviewer, suggest that the 2024 paper by Lin et al. [26] should probably be retracted.

      Reviewer #2(Public Review):

      In this paper, the authors examine the extent to which epigenetic variation acquired during a selection treatment (as opposed to standing epigenetic variation) can contribute to adaptation in Arabidopsis. They find weak evidence for such adaptation and few differences in DNA methylation between experimental groups, which contrasts with another recent study (reference 26) that reported extensive heritable variation in response to the environment. The authors convincingly demonstrate that the conclusions of the previous study were caused by experimental error, so that standing genetic variation was mistaken for acquired (epigenetic) variation. Given the controversy surrounding the possible role of epigenetic variation in mediating phenotypic variation and adaptation, this is an important, clarifying contribution.

      I have a few specific comments about the analysis of DNA methylation:

      (1) The authors group their methylation analysis by sequence context (CG, CHG, CHH). I feel this is insufficient, because CG methylation can appear in two distinct forms: gene body methylation (gbM), which is CG-only methylation within genes, and transposable element (TE) and TE-like methylation (teM), which typically involves all sequence contexts and generally affects TEs, but can also be found within genes. GbM and teM have distinct epigenetic dynamics, and it is hard to know how methylation patterns are changing during the experiment if gbM and teM are mixed. This can also have downstream consequences (see point below).

      We thank Reviewer 2 for this suggestion. We usually separate the three contexts because they are set by different enzymes and not because of the general process or specific function. It would indeed be informative to group DMCs into gbM and teM, but as there are many regions with overlaps between genes and transposons, this also adds some complexity. Given that there were very few DMCs, we wanted to keep it simple. Therefore, we wrote that 87.3% of the DMCs were close to or within genes and that 98.1% were close to and within genes or transposons. Together with the clear overrepresentation of the CG context, this indicates that most of the DMCs were related to gbM. We updated the paragraph and specifically referred to gbM to make this point clearer.

      (2) For GO analysis, the authors use all annotated genes as a control. However, most of the methylation differences they observe are likely gbM, and gbM genes are not representative of all genes. The authors' results might therefore be explained purely as a consequence of analyzing gbM genes, and not an enrichment of methylation changes in any particular GO group.

      We are grateful to Reviewer #2 for this suggestion. We updated the GO analysis and defined the background as genes with cytosines that we tested for differences in methylation and which also exhibited overall at least 10% methylation (i.e., one cytosine per gene was sufficient). This resulted in a decrease of the background gene set from 34'615 to 18'315 genes. We still detect enrichment of terms related to epigenetic regulation, transport and growth processes. We have updated the corresponding paragraph accordingly.

      Reviewer #1 (Recommendations for The Authors):

      This paper is very clearly written and could be published as-is. The writing could be improved in a few places, for example:

      "We realized that in this recent study (26), potential errors may have confounded treatments with genetic variation. This is because in that study, Lin and colleagues kept lineages 1-to-1 throughout the experiment by single-seed descent."

      “This” in the second sentence seems to refer to the confounding, not your realization thereof.

      I am sure there are more: just give the manuscript a good read-through.

      We thank the Reviewer for pointing out that some sentences may not be clear. We have edited the manuscript and focused on avoiding misleading or unclear wording.

      Reviewer #2 (Recommendations for The Authors):

      (1) The authors should distinguish gbM from teM and repeat the GO term analysis with an appropriate set of control genes.

      See our response to the public reviews above.

      (2) The authors' experimental design should allow them to directly assess whether the rates of epigenetic change are affected by the selective environment. This would require comparison of methylation patterns of individual plants prior to treatment with their progeny (the progeny is what the authors have currently analyzed). This would entail gathering new data, and I don't feel that this analysis is essential, but given the question the authors are addressing (the extent to which a selective environment can induce heritable epigenetic variation), it seems important to test whether the rates of epigenetic change are at all affected by the selection treatment.

      While this is a very valuable recommendation, we can currently not address it because the person who gathered the data works at a different university now. However, we keep this in mind for future projects.

      Again, we would like to thank the reviewers for the constructive suggestions that help us to improve the manuscript.

    1. eLife Assessment

      This useful study presents a real-time transcriptomics analysis, with the aim of providing rapid access to sequenced data to reduce the costs associated with Oxford Nanopore long-read technology. The revised manuscript demonstrates the utilities with four sets of experiments with convincing evidence.

    2. Reviewer #2 (Public review):

      Summary:

      Transcriptomics technologies play crucial roles in biological research. Technologies based on second-generation sequencing, such as Illumina RNA-seq, encounter significant challenges due to the short reads, particularly in isoform analysis. In contrast, third-generation sequencing technologies overcome the limitation by providing long reads, but they are much more expensive. The authors present a useful real-time strategy to minimize the cost of RNA sequencing with Oxford Nanopore Technologies (ONT). The revised manuscript demonstrates the utilities with four sets of experiments with convincing evidence: (1) comparation between two cell lines; (2) comparison of RNA preparation procedures; (3) comparation between heat-shock and control conditions; (4) comparison of genetic modified yeast strains. The strategy will probably guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way, benefiting both fundamental research and clinical applications.

      Strengths:

      The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequencing with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that sequencing can be terminated once sufficient data has been collected to save cost. In this study, the authors demonstrate how to perform comprehensive quality control during sequencing. Their results indicate that the real-time strategy is effective across different species and RNA preparation methods. The revised manuscript addresses most of the major and minor limitations identified in the previous version, including: (1) explicitly detailing the methodology for isoform analysis and presenting the corresponding results; (2) increasing sample sizes and providing a clear explanation of related considerations; (3) clarifying the issue of sequential analysis; and (4) incorporating a new heat-shock experiment that better reflects real-world biological research.

      Weaknesses:

      A key advantage of RNA sequencing using ONT is its ability to facilitate isoform analysis. The primary strength of real-time analysis lies in its potential to reduce costs for researchers while enabling significant biological discoveries related to isoforms. Although the authors explicitly describe their approach to isoform analysis and introduce a new experiment in the revised manuscript, the study still lacks a concrete example that clearly demonstrates the substantial impact of their tool and strategy. While such an example may be beyond the intended scope of the current work, its absence limits a better assessment of the significance of the findings. Because the evaluation of a methodological approach ultimately depends on the additional scientific value it provides in research. It is possible that the full potential of this tool will be demonstrated in future studies by the authors or other researchers.

      Furthermore, while the tool integrates a set of state-of-the-art methods, it does not introduce any novel methods. Consequently, the strength of evidence can be raised to "convincing".

    3. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this study, the authors developed three case studies:

      (1) transcriptome profiling of two human cell cultures (HEK293 and HeLa)

      (2) identification of experimentally enriched transcripts in cell culture (RiboMinus and RiboPlus treatments)

      (3) identification of experimentally manipulated genes in yeast strains (gene knockouts or strains transformed with plasmids containing the deleted gene for overexpression). Sequencing was performed using the Oxford Nanopore Technologies (ONT), the only technology that allows for real-time analysis. The real-time transcriptomic analysis was performed using NanopoReaTA, a recent toolbox for comparative transcriptional analyses of Nanopore-seq data, developed by the group (Wierczeiko and Pastore et al. 2023). The authors aimed to show the use of the tool developed by them in data generated by ONT, evidencing the versatility of the tool and the possibility of cost reduction since the sequencing by ONT can be stopped at any time since enough data were collected.

      Strengths: 

      Given that Oxford Nanopore Technologies offers real-time sequencing, it is extremely useful to develop tools that allow real-time data analysis in parallel with data generation. The authors demonstrated that this strategy is possible for both human cell lines and yeasts in the case studies presented. It is a useful strategy for the scientific community, and it has the potential to be integrated into clinical applications for rapid and cost-effective quality checks in specific experiments such as overexpression of genes.

      Weaknesses:

      In relation to the RNA-Seq analyses, for a proper statistical analysis, a greater number of replicates should have been performed. The experiments were conducted with a minimal number of replicates (2 replicates for case study 1 and 2 and 3 replicates for case study 3).

      We have addressed this issue by performing two new sets of experiments: similar HEK293 vs HeLa with 10 replicates per condition and heatshocked vs non-heat shock with 6 replicates per condition. In the case of HEK293 vs HeLa comparison, we kept the 2 replicates per condition comparison to demonstrate the effect of limited replication number, simulating an early-stage evaluation of the experimental approach to obtain valuable quality control metrics. Nevertheless, we show that relevant and reproducible data can be obtained even with a lower replication number (2 replicates per condition), compared to a higher replication number (10 replicates), across both PromethION and MinION sequencing platforms.

      Regarding the experimental part, some problems were observed in the conversion to doublestranded and loading for Nanopore-Seq, which were detailed in Supplementary Material 2. This fact is probably reflected in the results where a reduction in the overall sequencing throughput and detected gene number for HEK293 compared to HeLa were observed (data presented in Supplementary Figure 2). It is necessary to use similar quantities of RNA/cDNA since the sequencing occurs in real-time. The authors should have standardized the experimental conditions to proceed with the sequencing and perform the analyses.

      We completely agree with the reviewer. In the 10-replicate HEK vs HeLa experiment, we collected similar data to what was presented in Supplementary Material 2. We chose to include this information to highlight the experimental variability that can arise during Nanopore-seq library preparation, particularly with cDNA synthesis. This type of information is not often highlighted in Nanoporebased studies, yet it is crucial to be aware of such differences. Despite these variations, we identified a consistent set of DEGs across comparisons of low versus high replicate numbers. Importantly, NanopoReaTA successfully provided realtime monitoring (e.g. detected number of genes per replicate/condition) as it allows for informed decision-making regarding the next steps in sequencing-based experiments.

      Reviewer #2 (Public Review):

      Transcriptomics technologies play important roles in biological studies. Technologies based on second-generation sequencing, such as mRNA-seq, face some serious obstacles, including isoform analysis, due to short read length. Third-generation sequencing technologies perfectly solve these problems by having long reads, but they are much more expensive. The authors presented a useful real-time strategy to minimize the cost of sequencing with Oxford Nanopore Technologies (ONT). The authors performed three sets of experiments to illustrate the utility of the real-time strategy. However, due to the problems in experimental design and analysis, their aims are not completely achieved. If the authors can significantly improve the experiments and analysis, the strategy they proposed will guide biologists to conduct transcriptomics studies with ONT in a fast and cost-effective way and help studies in both basic research and clinical applications.

      Strengths:

      The authors have recently developed a computational tool called NanopoReaTA to perform real-time analysis when cDNA/RNA samples are sequenced with ONT (Wierczeiko et al., 2023). The advantage of real-time analysis is that the sequencing can be stopped once enough data is collected to save cost. Here, they described three sets of experiments: a comparison between two human cell lines, a comparison among RNA preparation procedures, and a comparison between genetically modified yeasts. Their results show that the real-time strategy works for different species and different RNA preparation methods.

      Weaknesses:

      However, especially considering that the computational tool NanopoReaTA is their previous work, the authors should present more helpful guidelines to perform real-time ONT analysis and more advanced analysis methods. There are four major weaknesses:

      (1) For all three sets of experiments, the authors focused on sample clustering and gene-level differential expression analysis (DEA), and only did little analysis on isoform level and even nothing in any figures in the main text. Sample clustering and gene-level DEA can be easily and well done using mRNA-seq at a much cheaper cost. Even for initial data quality checking, mRNA-seq can be first done in Illumina MiSeq/NextSeq which is quick, before deep sequencing in HiSeq/NovaSeq. The real power of third-generation RNA sequencing is the isoform analysis due to the long read length. At least for now, PacBio Iso-seq is very expensive and one cannot analyze the data in real-time. Thus, the authors should focus on the real-time isoform analysis of ONT to show the advantages.

      We are aware that isoform analysis is one of the powers of real-time monitoring of long-read data, especially with Nanopore-seq. That is why we have included pipelines such as DRIM-seq and DEX-seq, which could provide valuable information about the differential transcript usage (i.e. isoforms). However, interpreting the results in a biologically meaningful context, particularly regarding the role of specific isoforms, remains challenging. This is especially relevant as our main goal is to demonstrate NanopoReaTA's utility as a real-time transcriptomic tool that offers valuable quality control and meaningful insights. Nevertheless, in the heat-shock experiments, we have identified one isoform that was differentially expressed and included it in the main figure. We hope that with the right experimental setup, users could use the incorporated tools for meaningful analyses for isoforms identification.

      (2) The sample sizes are too small in all three sets of experiments: only two for sets 1 and 2, and three for set 3. For DEA, three is the minimal number for proper statistics. But a sample size of three always leads to very poor power. Nowadays, a proper transcriptomics study usually has a larger sample size. Besides the power issue, biological samples always contain many outliers due to many reasons. It is crucial to show whether the real-time analysis also works for larger sample sizes, such as 10, i.e., 20 samples in total. Will the performance still hold when the sample number is increasing? What is the maximum sample number for an ONT run? If the samples need to be split into multiple runs, how the real-time analysis will be adjusted? These questions are quite useful for researchers who plan to use ONT.

      We thank the reviewer for their suggestion. We performed the suggested experiment in the HEK293 vs HeLa, taking 10 replicates per condition and acquired the data during the sequencing. As you can see in the results (Figure 2), the performance held very well, from the first hour up until the 24hour mark. In theory, the maximum number of barcodes that can be integrated in a sequencing run can be used for the pair-wise comparison. We are using 24 barcoding kit (provided by ONT) therefore we can include up to 12 replicates per condition. We are aware that there is a 96 barcoding kit that could be used as well. However, it is important to note that with more samples integrated in the sequencing run, less reads will be generated per sample. Therefore, it is important to plan properly the number of replicates used per sequencing run.

      (3) According to the manuscript, real-time analysis checks the sequencing data in a few time points, this is usually called sequential analysis or interim analysis in statistics which is usually performed in clinical trials to save cost. Care must be taken while performing these analyses, as repeated checks on the data can inflate the type I error rate. Thus, the authors should develop a sequential analysis procedure for real-time RNA sequencing.

      We would like to respond to this comment by addressing two points: 1) Quality control: During the analysis we offer two main statistics, which enable scientists to assess the experimental development. For each iteration the change in relative gene counts per sample is computed to assess the convergence towards 0. Moreover, for each iteration the number of detected genes per sample is computed to assess whether the number of detected reads is saturated. These metrics allow the user to independently assess whether samples within the experimental development reach a stable state, to reveal a meaningful timepoint of data evaluation. 

      Sequential analysis: One solution to lower the type 1 error during sequential analysis is using the Pocock boundary, a systematic lowering of the p-value threshold depending on the number of interim analyses. We offer in NanopoReaTA a custom choice of the p-value threshold during the analysis. This allows researchers to set their parameters as needed.  

      (4) The experimental set 1 (comparison between two completely different human cell lines) and experimental set 2 (comparison among RNA preparation procedures) are not quite biologically meaningful. If it is possible, it is better for the authors to perform an experiment more similar to a real situation for biological discovery. Then the manuscript can attract more researchers to follow its guidelines.

      We took the suggestion of reviewer 2 (from recommendation for authors) to perform heat-shock experimental comparison between heatshocked and non-heat shocked cells from the same cell line (HEK293). We sequenced the sample (6 replicates per condition) and one-hour postsequencing initiation, we already identified three DEGs (including HSPA1A, DNAJB1, and HSP90AA1) known to be upregulated in heat shock conditions (Yonezawa and Bono 2023, Sanchez-Briñas et al. 2023). Therefore, we illustrate how NanopoReaTA can capture biologically relevant insights in real time.

      Reviewer #1 (Recommendations for The Authors):

      (1) The comparison between two different human cell lines doesn't have much biological relevance. It would be more interesting and useful to evaluate the genes and transcripts expressed from the same cell in different conditions.

      As mentioned previously, we conducted a heat-shock experimental comparison between heat-shocked and non-heat-shocked within the same cell line HEK293. We observed reliable results already within one hour of initiating the sequencing.

      (2) Increase the number of replicates to give greater confidence in the results.

      We have addressed the replicate issue by performing two new sets of experiments: HEK293 vs HeLa with 10 replicates per condition and heatshocked vs non-heat shock with 6 replicates per condition. In both cases, we obtained reliable and reproducible results (even when comparing with lower replicate number).

      (3) One of the advantages of performing Nanopore sequencing is the possibility of sequencing RNA molecules directly. It would be interesting to test the real-time analysis strategy in parallel using direct RNA sequencing if it is possible.

      That is a great point. In theory, it would be possible to perform realtime differential gene expression on direct RNA data (since the pipeline for such analysis is already integrated in NanopoReaTA), however the limiting factor is the lack of multiplexing. To perform real-time transcriptomic analysis with direct RNA-seq data, one would need to sequence at least 4 flow cells (MinION or PromethION), each containing one sample (2 flow cells per condition to perform pairwise transcriptomic analyses). Despite the possibility of such an analysis, this scenario will not be cost-effective as this will increase significantly the costs for the amount of data gathered. We are aware that ONT is planning to release a multiplexing option to direct RNA-seq in the unforeseen future. We have integrated the option of direct RNA-seq analyses for the day that such option will be available, and the users will be able to perform real-time transcriptomic analysis with dRNA-seq data.  

      Some minor weakneses are below:

      (4) With respect to the text as a whole, the authors should be more careful with standardization, such as mL/ml and uL/ul, Ribominus/RiboMinus.

      We have standardized the nomenclature to µL, mL and Ribominus (due to trademark).  

      (5) Set up paragraphs on page 9 and throughout the text when necessary.

      We have set the suggested paragraphs on page 9 and throughout the text.

      (6) Please, check the word form in the sentence: "To isolate the RNA form the

      RiboMinus{trade mark, serif} supernatant.."

      The word has been corrected.

      (7) In order to make clear to the reader at the outset, I suggest including in the methodology how many biological replicates were performed for each cell type studied (cell lines and yeast strains).

      _For cell line w_e have included now the number of replicates used for each replicate. We have included this also for yeast setups. 

      (8) Please, check the Supplementary Tables as the word VERDADEIRO has not been translated (TRUE) in Supplementary Table 1.

      This issue appears to be influenced by the language settings configured on the viewer's computer.

      (9) On page 17, I suggest including the absorbance used to measure RNA concentration in HEK293 and HeLa cell lines. Also, I suggest including how the quality of the RNA extracted from the cell cultures and yeast strains was determined. Was the ratio 260/280 and 260/230 calculated? Given that the material was extracted with Trizol, which has phenol and chloroform in its composition, it would be important to evaluate the quality of the RNA, especially by calculating the 260/230 ratio.

      We have included a statement regarding the concentrations and quality of RNA in the “RNA isolation” section within the material and methods.

      (10) On page 18, the topic of Selective purification of ribosomal-depleted (RiboMinus) and ribosomal-enriched (RiboPlus) transcripts needs to be better detailed, especially in the last two sentences. For example: "The pooled bead samples (containing the rRNA) were further processed with Trizol RNA isolation to complete the purification." This sentence should be detailed to make it clear that this procedure is what you call ribosomal-enriched (RiboPlus).

      Qualitative analysis of the material was performed after rRNA depletion and enrichment.

      We have made these sentences clearer.

      (9) On the topic of Direct cDNA-native barcoding Nanopore library preparation and sequencing, in the following sentences: "Concentration determination (1 μl) and adapter ligation using 5 μL NA, 10 μL NEBNext Quick Ligation Reaction Buffer (5X), and 5 μL Quick T4 DNA Ligase (NEB, cat # E6056) were performed. Pooled library purification with 0.7X AMPure XP Beads resulted in a final elution volume of 33 μl EB. Concentration of the pooled barcoded library was determined using Qubit (1 μl)."

      Two concentration determinations were performed, before and after adapter ligation. I suggest writing one sentence for concentration determination and another for adapter ligation.

      We applied the reviewer’s suggestion. 

      (11) In the section Experimental Design in Results, the first sentences are part of the methodology and are described in materials and methods. I suggest removing it from the results and rewriting the text. Results of the RNA extraction methodology and library preparation were shown in supplementary material. Thus, the authors could mention that the results were presented in supplementary material.

      We have revised this section to remove the details of RNA extraction and library preparation, focusing instead on the pipeline and experimental setups. The methodology is outlined in Figure 1, as well as in the materials and methods and the supplementary figures for each experimental setup.

      Reviewer #2 (Recommendations For The Authors):

      For major weakness 4 described in the Public Review, the authors could try experiments like:

      (1) comparison between females and males of tissues or primary cells; or

      (2) comparison between cell lines before and after heat shock.

      They are easy to perform and much more similar to real experimental designs for discovery, and the authors may actually have some new findings because usually people do not do much investigation on the isoform level using mRNA-seq.

      We thank the reviewer for their suggestions. We performed the heat-shock experimental comparison between heat-shocked and non-heat shocked cells from the same cell line (HEK293). We sequenced the sample (6 replicates per condition) and already one-hour post-sequencing initiation, we identified three DEGs including HSPA1A, DNAJB1, and HSP90AA1 reported to be upregulated heat shock conditions (Yonezawa and Bono 2023, Sanchez-Briñas et al. 2023). We have identified one differentially expressed isoform and included it in the main figure.

      There are two minor weaknesses:

      (1) Many figure numbers in the main text are wrong, including:

      Page 4, "similarity plot and principal component analysis (PCA) (Figure 1B, 1C)";

      Page 7, "same intervals as mentioned earlier (Figure 1A)", and "Next, we inspected the PCA and dissimilarity plots (Figure 2B";

      Page 10, "process (Supplementary Figure 19A) until the 24-hour PSI mark point (Figure 9B", and "NEW1 was the sole differentially expressed gene (Figure 9D)".

      The authors should be more careful about this. It is very confusing for readers.

      We have addressed these points in the text. 

      (2) The texts in the figures are too small to recognize, especially in Figures 4 and 5. The reason is that there are too many sub-figures in one figure. Is that really necessary to put more than 20 sub-figures in one? The authors should better summarize their results. For example, remove sub-figures with little information; do not show figures with the same styles again and again in the main text and just summarize them instead.

      We thank the reviewer for the suggestion. We have updated the figure to focus on the most relevant comparisons (new1Δ-pEV vs. WT-pEV and rkr1Δ-pEV vs. WT-pEV), providing a clearer and more realistic comparison between mutant and wild-type conditions in the main figure. Additionally, a summary and all related comparisons are included in Supplementary Documents S4 and S5. We believe these supplementary figures are essential to demonstrate NanopoReaTA's capabilities as a quality control tool, effectively detecting expected transcriptomic alterations in real-time.

    1. eLife Assessment

      This useful study uses brain stimulation and electroencephalography to study speech-gesture integration. It investigates the role of frontotemporal regions in integrating linguistic and extra-linguistic information during communication, focusing on the inferior frontal gyrus and posterior middle temporal gyrus. Reliance on activation patterns of tightly-coupled brain regions over short timescales leads to incomplete support for the study's conclusions due to conceptual and methodological limitations.

    2. Reviewer #1 (Public review):

      Summary:

      The authors quantified information in gesture and speech, and investigated the neural processing of speech and gestures in pMTG and LIFG, depending on their informational content, in 8 different time-windows, and using three different methods (EEG, HD-tDCS and TMS). They found that there is a time-sensitive and staged progression of neural engagement that is correlated with the informational content of the signal (speech/gesture).

      Strengths:

      A strength of the paper is that the authors attempted to combine three different methods to investigate speech-gesture processing.

      Comments on revisions:

      I thank the authors for their careful responses to my comments. However, I remain not convinced by their argumentation regarding the specificity of their spatial targeting and the time-windows that they used.

      I do not believe the authors have adequately demonstrated the spatial and temporal specificity required to disentangle the contributions of the IFG and pMTG during the gesture-speech integration process. While the authors have made a sincere effort to address the concerns raised by the reviewers, and have done so with a lot of new analyses, I remain doubtful that the current methodological approach is sufficient to draw conclusions about the causal roles of the IFG and pMTG in gesture-speech integration.

    3. Reviewer #2 (Public review):

      Summary

      The study is an innovative and fundamental study that clarified important aspects of brain processes for integration of information from speech and iconic gesture (i.e., gesture that depicts action, movement, and shape), based on tDCS, TMS and EEG experiments. They evaluated their speech and gesture stimuli in information-theoretic ways and calculated how informative speech is (i.e., entropy), how informative gesture is, and how much shared information speech and gesture encode. The tDCS and TMS studies found that the left IFG and pMTG, the two areas that were activated in fMRI studies on speech-gesture integration in the previous literature, are causally implicated in speech-gesture integration. The size of tDC and TMS effects are correlated with entropy of the stimuli or mutual information, which indicates that the effects stems from the modulation of information decoding/integration processes. The EEG study showed that various ERP (event-related potential, e.g., N1-P2, N400, LPC) effects that have been observed in speech-gesture integration experiments in the previous literature are modulated by the entropy of speech/gesture and mutual information. This makes it clear that these effects are related to information decoding processes. The authors propose a model of how speech-gesture integration process unfolds in time, and how IFG and pMTG interact with each other in that process.

      Strengths:

      The key strength of this study is that the authors used information-theoretic measures of their stimuli (i.e., entropy and mutual information between speech and gesture) in all of their analyses. This made it clear that the neuro-modulation (tDCS, TMS) affected information decoding/integration and ERP effects reflect information decoding/integration. This study used tDCS and TMS methods to demonstrate that left IFG and pMTG are causally involved in speech-gesture integration. The size of tDCS and TMS effects are correlated with information-theoretic measures of the stimuli, which indicate that the effects indeed stem from disruption/facilitation of information decoding/integration process (rather than generic excitation/inhibition). The authors' results also showed correlation between information-theoretic measures of stimuli with various ERP effects. This indicates that these ERP effects reflect the information decoding/integration process.

      Weaknesses:

      The "mutual information" cannot capture all types of interplay of the meaning of speech and gesture. The mutual information is calculated based on what information can be decoded from speech alone and what information can be decoded from gesture alone. However, when speech and gesture are combined, a novel meaning can emerge, which cannot be decoded from a single modality alone. When example, a person produce a gesture of writing something with a pen, while saying "He paid". The speech-gesture combination can be interpreted as "paying by signing a cheque". It is highly unlikely that this meaning is decoded when people hear speech only or see gestures only. The current study cannot address how such speech-gesture integration occur in the brain, and what ERP effects may reflect such a process. The future studies can classify different types of speech-gesture integration and investigate neural processes that underlie each type. Another important topic for future studies is to investigate how the neural processes of speech-gesture integration change when the relative timing between the speech stimulus and the gesture stimulus changes.

      Comments on the previous round of revisions: The authors addressed my concerns well.

    1. eLife Assessment

      This study uses all-optical electrophysiology methods to provide a valuable insight into the organization of cortical networks and their ability to balance the activity of groups of neurons with similar functional tuning. The all-optical approach used in this study is impressive and the claim that the effects of optical stimulation correspond to a specific homeostatic mechanism is solid. The work will be of interest to neurobiologists and to developers of optical approaches for interrogating brain function.

    2. Reviewer #1 (Public review):

      Summary:

      Kang et al. provide the first experimental insights from holographic stimulation of auditory cortex. Using stimulation of functionally-defined ensembles, they test whether overactivation of a specific subpopulation biases simultaneous and subsequent sensory-evoked network activations.

      Strengths:

      The investigators use a novel technique to investigate the sensory response properties in functionally defined cell assemblies in auditory cortex. These data provide the first evidence of how acutely perturbing specific frequency-tuned neurons impacts the tuning across a broader population. Their revised manuscript appropriately tempers any claims about specific plasticity mechanisms involved.

      Weaknesses:

      Although the single cell analyses in this manuscript are comprehensive, questions about how holographic stimulation impacts population coding are left to future manuscripts, or perhaps re-analyses of this unique dataset.

    3. Reviewer #2 (Public review):

      The goal of HiJee Kang et al. in this study is to explore the interaction between assemblies of neurons with similar pure-tone selectivity in mouse auditory cortex. Using holographic optogenetic stimulation in a small subset of target cells selective for a given pure tone (PTsel), while optically monitoring calcium activity in surrounding non-target cells, they discovered a subtle rebalancing process: co-tuned neurons that are not optogenetically stimulated tend to reduce their activity. The cortical network reacts as if an increased response to PTsel in some tuned assemblies is immediately offset by a reduction in activity in the rest of the PTsel-tuned assemblies, leaving the overall response to PTsel unchanged. The authors show that this rebalancing process affects only the responses of neurons to PTsel, not to other pure tones. They also show that assemblies of neurons that are not selective for PTsel don't participate in the rebalancing process. They conclude that assemblies of neurons with similar pure-tone selectivity must interact in some way to organize this rebalancing process, and they suggest that mechanisms based on homeostatic signaling may play a role.

      The authors have successfully controlled for potential artefacts resulting from their optogenetic stimulation. This study is therefore pioneering in the field of the auditory cortex (AC), as it is the first to use single-cell optogenetic stimulation to explore the functional organization of AC circuits in vivo. The conclusions of this paper are very interesting. They raise new questions about the mechanisms that could underlie such a rebalancing process.

      (1) This study uses an all-optical approach to excite a restricted group of neurons chosen for their functional characteristics (their frequency tuning), and simultaneously record from the entire network observable in the FOV. As stated by the authors, this approach is applied for the first time to the auditory cortex, which is a tour de force. However, such approach is complex and requires precise controls to be convincing. The authors provide important controls to demonstrate the precise ability of their optogenetic methods. In particular, holographic patterns used to excite 5 cells simultaneously may be associated with out-of-focus laser hot spots. Cells located outside of the FOV could be activated, therefore engaging other cells than the targeted ones in the stimulation. This would be problematic in this study as their tuning may be unrelated to the tuning of the targeted cells. To control for such effect, the authors have decoupled the imaging and the excitation planes, and checked for the absence of out-of-focus unwanted excitation (Suppl Fig1).

      (2) In the auditory cortex, assemblies of cells with similar pure-tone selectivity are linked together not only by their ability to respond to the same sound, but also by other factors. This study clearly shows that such assemblies are structured in a way that maintains a stable global response through a rebalancing process. If a group of cells within an assembly increases its response, the rest of the assembly must be inhibited to maintain the total response.<br /> One surprising result is the clear boundary between assemblies: a rebalancing process occurring in one assembly does not affect the response in another assembly comprising cells tuned to a different frequency. However, this is slightly challenged by the data shown in Figure 3.

      Figure 3B-left, for example, shows that, compared to controls, non-target 16 kHz-preferring neurons only decrease their response to a 16 kHz pure tone when the cells targeted by the opto stimulation also prefer 16 kHz, but not when the targeted cells prefer 54 kHz. However, the inverse is not entirely true. Again compared to controls, Figure 3B (right) shows that non-target 54 kHz-preferring neurons decrease their response to a 54 kHz pure tone when the targeted cells also prefer 54 kHz; however, they also tend to be inhibited when the targeted cells prefer 16 kHz.

      The authors suggest this may be due to the partial activation of 54 kHz-preferring cells by 16 kHz tones and propose examining the response of highly selective neurons. The results are shown in Figure 3F. It would have been more logical to show the same results as in Figure 3B, but with the left part restricted to highly 16 kHz-selective cells and the right part to highly 54 kHz-selective cells. However, the authors chose to pool all responses to 16 kHz and 54 kHz tones in every triplet of conditions (control, opto stimulation on 16 kHz-preferring cells and opto stimulation on 54 kHz-preferring cells), which blurs the result of the analysis.

    1. eLife Assessment

      In this manuscript, Lim and collaborators present an important system for developing self-amplifying RNA with convincing evidence that it does not provoke a strong host inflammatory response in cultured cells. This approach could be further strengthened going forward by testing these self-amplying RNAs in an in vivo system.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have developed self-amplifying RNAs (saRNAs) encoding additional genes to suppress dsRNA-related inflammatory responses and cytokine release. Their results demonstrate that saRNA constructs encoding anti-inflammatory genes effectively reduce cytotoxicity and cytokine production, enhancing the potential of saRNAs. This work is significant for advancing saRNA therapeutics by mitigating unintended immune activation.

      Strengths:

      This study successfully demonstrates the concept of enhancing saRNA applications by encoding immune-suppressive genes. A key challenge for saRNA-based therapeutics, particularly for non-vaccine applications, is the innate immune response triggered by dsRNA recognition. By leveraging viral protein properties to suppress immunity, the authors provide a novel strategy to overcome this limitation. The study presents a well-designed approach with potential implications for improving saRNA stability and minimizing inflammatory side effects.

      Comments on revisions:

      All comments have been thoroughly addressed, and the manuscript has been significantly improved.

    3. Reviewer #3 (Public review):

      Summary:

      Context - this is the 2nd review, of a manuscript that has already undergone some revisions.<br /> The manuscript explores ways to make self-amplifying RNA (saRNA) more silent through the inclusion of genes to inhibit the innate immune response. The readouts are predominantly expression and cell viability. They take a layered approach, adding multiple genes, as well as altering the capping of the anti-immune genes.

      Strengths:

      As described by the other reviewers, the authors take a stepwise approach to demonstrate that they can lead to sustained expression of the transgene.

      Weaknesses:

      The following weaknesses need some consideration

      (1) The data show sustained expression, but do not directly show amplification. The amount of RFP is constantly decreasing over the time course. There is some evidence for the srIκBα-Smad7-SOCS1 construct. But measuring the RNA itself would be beneficial<br /> (2) The end construct is very large - it has 12 genes, this may have manufacturing considerations, affecting the translatability.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors have developed self-amplifying RNAs (saRNAs) encoding additional genes to suppress dsRNA-related inflammatory responses and cytokine release. Their results demonstrate that saRNA constructs encoding anti-inflammatory genes effectively reduce cytotoxicity and cytokine production, enhancing the potential of saRNAs. This work is significant for advancing saRNA therapeutics by mitigating unintended immune activation.

      Strengths:

      This study successfully demonstrates the concept of enhancing saRNA applications by encoding immune-suppressive genes. A key challenge for saRNA-based therapeutics, particularly for non-vaccine applications, is the innate immune response triggered by dsRNA recognition. By leveraging viral protein properties to suppress immunity, the authors provide a novel strategy to overcome this limitation. The study presents a well-designed approach with potential implications for improving saRNA stability and minimizing inflammatory side effects.

      We thank Reviewer #1 for their thorough review and for recognizing both the significance of our work and the potential of our strategy to expand saRNA applications beyond vaccines.

      Weaknesses:

      (1) Impact on Cellular Translation:

      The authors demonstrate that modified saRNAs with additional components enhance transgene expression by inhibiting dsRNA-sensing pathways. However, it is unclear whether these modifications influence global cellular translation beyond the expression of GFP and mScarlet-3 (which are encoded by the saRNA itself). Conducting a polysome profiling analysis or a puromycin labeling assay would clarify whether the modified saRNAs alter overall translation efficiency. This additional data would strengthen the conclusions regarding the specificity of dsRNA-sensing inhibition.

      We thank the Reviewer for this insightful suggestion. We performed a puromycin labeling assay to assess global translation rates (Figure 3—figure supplement 1c). This experiment revealed that the E3 construct significantly reduces global protein synthesis, despite driving high levels of saRNAencoded transgene expression (Figure 1d, e). In contrast, the E3-NSs-L* construct mitigated this reduction in global translation while maintaining moderate transgene expression. These findings support our hypothesis that E3 enhances transgene output in part by activating RNase L, which degrades host mRNAs and thereby reduces ribosomal competition. We appreciate the Reviewer’s recommendation of this experiment, which has strengthened the manuscript.

      (2) Stability and Replication Efficiency of Long saRNA Constructs:

      The saRNA constructs used in this study exceed 16 kb, making them more fragile and challenging to handle. Assessing their mRNA integrity and quality would be crucial to ensure their robustness.

      Furthermore, the replicative capacity of the designed saRNAs should be confirmed. Since Figure 4 shows lower inflammatory cytokine production when encoding srIkBα and srIkBαSmad7-SOCS1, it is important to determine whether this effect is due to reduced immune activation or impaired replication. Providing data on replication efficiency and expression levels of the encoded anti-inflammatory proteins would help rule out the possibility that reduced cytokine production is a consequence of lower replication.

      We thank the Reviewer for these valuable suggestions.

      To assess the integrity of the saRNA constructs, we performed denaturing gel electrophoresis (Supplemental Figure 6c). The native saRNA, E3, and E3-NSs-L* constructs each migrated as a single band. The moxBFP, srIκBα, and srIκBα-Smad7-SOCS1 constructs showed both a full-length transcript and a lower-abundance truncated band (Supplemental Figure 6d), suggestive of a cryptic terminator sequence introduced in a region common to these three constructs.

      To evaluate replicative capacity, we performed qPCR targeting EGFP, which is encoded by all constructs. This analysis revealed that the srIκBα-Smad7-SOCS1 construct exhibited lower replication efficiency than both native saRNA and E3. Several factors may contribute to this difference, including the longer transcript length, reduced molar input when equal mass was used for transfection, prevention of host mRNA degradation due to RNase L inhibition, or the presence of truncated transcripts.

      Given these confounding variables, we revised our approach to analyzing cytokine production. Rather than comparing all six constructs together, we split the analysis into two parts: (1) the effects of dsRNA-sensing pathway inhibition (Figure 4a), and (2) the effects of inflammatory signalling inhibition (Figure 4c). For the latter, we compared srIκBα and srIκBα-Smad7-SOCS1 to moxBFP, as these three constructs are more comparable in size, share the same truncated transcript, and all encode L* to inhibit RNase L. This strategy minimizes the likelihood that differences in the cytokine responses are due to variation in replication efficiency.

      (3) Comparative Data with Native saRNA:

      Including native saRNA controls in Figures 5-7 would allow for a clearer assessment of the impact of additional genes on cytokine production. This comparison would help distinguish the effect of the encoded suppressor proteins from other potential factors.

      We thank the Reviewer for this helpful suggestion. We have added the native saRNA condition to Figure 5 as a visual reference. However, due to the presence of truncated transcripts in the constructs designed to inhibit inflammatory signalling pathways, the actual amount of full-length saRNA delivered in these conditions is likely lower than expected, despite using equal total RNA mass for transfection. This complicates direct comparisons with constructs targeting dsRNAsensing pathways, which do not show transcript truncation. For this reason, native saRNA was included only as a visual reference and was not used in statistical comparisons with the inflammatory signalling inhibitor constructs.

      (4) In vivo Validation and Safety Considerations:

      Have the authors considered evaluating the in vivo potential of these saRNA constructs? Conducting animal studies would provide stronger evidence for their therapeutic applicability. If in vivo experiments have not been performed, discussing potential challenges - such as saRNA persistence, biodistribution, and possible secondary effectswould be valuable.

      (5) Immune Response to Viral Proteins:

      Since the inhibitors of dsRNA-sensing proteins (E3, NSs, and L*) are viral proteins, they would be expected to induce an immune response. Analyzing these effects in vivo would add insight into the applicability of this approach.

      We appreciate the Reviewer’s points regarding in vivo validation and safety considerations. While in vivo studies are beyond the scope of the present investigation, we agree that evaluating therapeutic potential, biodistribution, persistence, and secondary effects will be essential for future translation. We have now included a brief discussion of these considerations at the end of the revised discussion. In ongoing work, we are planning follow-up studies incorporating in vivo imaging and functional assessments of saRNA-driven cargo delivery in preclinical models of inflammatory joint pain.

      Regarding the immune response to viral proteins, we agree that this is an important consideration and have now included a clearer discussion of this limitation in the revised manuscript. Specifically, we highlight that encoding multiple viral inhibitors (E3, NSs, and L*), in combination with the VEEV replicase, may increase the likelihood of adaptive immune recognition via MHC class I presentation. This could lead to cytotoxic T cell–mediated clearance of saRNA-transfected cells, thereby limiting therapeutic durability. We emphasize that addressing both intrinsic cytotoxicity and immune-mediated clearance will be essential for advancing the clinical potential of this platform.

      (6) Streamlining the Discussion Section:

      The discussion is quite lengthy. To improve readability, some content - such as the rationale for gene selection-could be moved to the Results section. Additionally, the descriptions of Figure 3 should be consolidated into a single section under a broader heading for improved coherence.

      Thank you for these helpful suggestions. We have streamlined the Discussion to improve readability and have moved the rationale for gene selection to the results section, as recommended. In addition, we have consolidated the Figure 3 descriptions to improve coherence and to simplify the presentation.

      Reviewer #2 (Public review):

      Summary:

      Lim et al. have developed a self-amplifying RNA (saRNA) design that incorporates immunomodulatory viral proteins, and show that the novel design results in enhanced protein expression in vitro in mouse primary fibroblast-like synoviocytes. They test constructs including saRNA with the vaccinia virus E3 protein and another with E3, Toscana virus NS protein and Theiler's virus L protein (E3 + NS + L), and another with srIκBα-Smad7SOCS1. They have also tested whether ML336, an antiviral, enables control of transgene expression.

      Strengths:

      The experiments are generally well-designed and offer mechanistic insight into the RNAsensing pathways that confer enhanced saRNA expression. The experiments are carried out over a long timescale, which shows the enhance effect of the saRNA E3 design compared to the control. Furthermore, the inhibitors are shown to maintain the cell number, and reduce basal activation factor-⍺ levels.

      We thank Reviewer #2 for their thoughtful and detailed assessment of our manuscript, and for recognizing the mechanistic insights provided by our study. We also appreciate their positive comments on the experimental design, the extended timescale, and the observed effects on transgene expression, cell viability, and basal fibroblast activation factor-α levels.

      Weaknesses:

      One limitation of this manuscript is that the RNA is not well characterized; some of the constructs are quite long and the RNA integrity has not been analyzed. Furthermore, for constructs with multiple proteins, it's imperative to confirm the expression of each protein to confirm that any therapeutic effect is from the effector protein (e.g. E3, NS, L). The ML336 was only tested at one concentration; it is standard in the field to do a dose-response curve. These experiments were all done in vitro in mouse cells, thus limiting the conclusion we can make about mechanisms in a human system.

      Thank you for your detailed feedback. We have added new experiments and clarified limitations in the revised manuscript to address these concerns:

      RNA integrity: We performed denaturing gel electrophoresis on the in vitro transcribed saRNA constructs (Supplemental Figure 7c). Constructs targeting dsRNA-sensing pathways migrated as a single band, while those targeting inflammatory signalling pathways showed both a full-length product and a common, lower-abundance truncated transcript. This suggests that the actual amount of full-length RNA delivered for the constructs inhibiting inflammatory signalling was overestimated. To account for this, we avoided direct comparisons between the two types of constructs and instead focused on comparisons within each type to ensure more meaningful interpretation.

      Confirmation of protein expression: While we acknowledge that direct measurement of each protein would provide additional insight, we believe the functional assays presented offer strong evidence that the encoded proteins are expressed and exert their intended biological effects. Additionally, IRES functionality was confirmed visually using fluorescent protein reporters, supporting the successful expression of downstream genes.

      ML336 concentration–response: We have now performed a concentration–response analysis for ML336 (Figure 8a and b), which demonstrates its ability to modulate transgene expression in a concentration-dependent manner.

      Use of human cells: We agree that testing these constructs in human cells is essential for future translational applications and are actively exploring opportunities to evaluate them in patientderived FLS. However, previous studies have shown that Theiler’s virus L* does not inhibit human RNase L (Sorgeloos et al., PLoS Pathog 2013). As a result, it is highly likely that the E3-NSs-L* construct will not function as intended in human systems. Addressing this limitation will be a priority in our future work, where we aim to develop constructs incorporating inhibitors specific to human RNase L to ensure efficacy in human cells.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Figure 2c is not indicated.

      Thank you for pointing out this error. It has now been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) The Graphical Abstract is a bit confusing; suggest modifying it to represent the study and findings more accurately.

      We have revised the graphical abstract to improve clarity and better reflect the study’s design and main findings. Thank you for the suggestion.

      (2) The impact of this paper would be greatly improved if these experiments were repeated, at least partially, in human cells. The rationale for mouse cells in vitro is unclear.

      The rationale for developing constructs targeting mouse cells is based on our intention to utilize these constructs in mouse models of inflammatory joint pain in future studies.

      We recognize that incorporating data from human cells would significantly enhance the translational relevance of our work, and we are actively pursuing collaborations to test these constructs in patient-derived FLS. However, a key component of our saRNA constructs—Theiler’s virus L*—has been shown to inhibit mouse, but not human, RNase L (Sorgeloos et al., PLoS Pathog 2013). Consequently, the E3-NSs-L* polyprotein may not function as intended in human cells. To address this limitation, future work will focus on developing constructs that incorporate inhibitors specific to human RNase L, thereby facilitating more effective translation of our findings to human systems.

      (3) The ML336 was only tested at one concentration and works mildly well, but would be more impactful if tested in a dose-response curve.

      We have now performed a concentration–response analysis for ML336 (Figure 8a and b), which demonstrates its concentration-dependent effects on transgene expression and saRNA elimination. Thank you for the suggestion.

      (4) Overall, there is not a cohesive narrative to the story, instead it comes off as we tried these three different approaches, and they worked in different contexts.

      We have revised the graphical abstract, results, and discussion to improve the cohesiveness of the manuscript’s narrative and to better integrate the mechanistic rationale linking the different approaches. We appreciate the feedback.

      (5) The title is not supported by the data; the saRNA is still somewhat cytotoxic, immunostimulatory and the antiviral minimally controls transgene expression; suggest making this reflect the data.

      We have revised the title to better reflect the scope of the data and the mechanistic focus of the study. The updated title emphasizes the pathways targeted and the outcomes demonstrated, while avoiding overstatement. Thank you for this helpful recommendation.

    1. eLife Assessment

      This important work introduces a splitGFP-based labeling tool with an analysis pipeline for the synaptic scaffold protein bruchpilot, with tests in the adult Drosophila mushroom bodies, a learning center in the Drosophila brain. The evidence supporting the conclusions is solid. However, additional controls, validation of synapse-specificity, validation of activity-dependence, details on image processing, and additional functional experiments are needed to strengthen the study.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Wu et al. uses endogenous bruchpilot expression in a cell-type-specific manner to assess synaptic heterogeneity in adult Drosophila melanogaster mushroom body output neurons. The authors performed genomic on locus tagging of the presynaptic scaffold protein bruchpilot (BRP) with one part of splitGFP (GFP11) using the CRISPR/Cas9 methodology and co-expressed the other part of splitGFP (GFP1-10) using the GAL4/UAS system. Upon expression of both parts of splitGFP, fluorescent GFP is assembled at the N-terminus of BRP, exactly where BRP is endogenously expressed in active zones. For manageable analysis, a high-throughput pipeline was developed. This analysis evaluated parameters like location of BRP clusters, volume of clusters, and cluster intensity as a direct measure of the relative amount of BRP expression levels on site, using publicly available 3D analysis tools that are integrated in Fiji. Analysis was conducted for different mushroom body cell types in different mushroom body lobes using various specific GAL4 drivers. To test this new method of synapse assessment, Wu et al. performed an associative learning experiment in which an odor was paired with an aversive stimulus and found that, in a specific time frame after conditioning, the new analysis solidly revealed changes in BRP levels at specific synapses that are associated with aversive learning.

      Strengths:

      Expression of splitGFP bound to BRP enables intensity analysis of BRP expression levels as exactly one GFP molecule is expressed per BRP. This is a great tool for synapse assessment. This tool can be widely used for any synapse as long as driver lines are available to co-express the other part of splitGFP in a cell-type-specific manner. As neuropils and thus the BRP label can be extremely dense, the analysis pipeline developed here is very useful and important. The authors have chosen an exceptionally dense neuropil - the mushroom bodies - for their analysis and convincingly show that BRP assessment can be achieved with such densely packed active zones. The result that BRP levels change upon associative learning in an experiment with odor presentation paired with punishment is likewise convincing, and strongly suggests that the tool and pipeline developed here can be used in an in vivo context.

      Weaknesses:

      Although BRP is an important scaffold protein and its expression levels were associated with function and plasticity, I am still somewhat reluctant to accept that synapse structure profiling can be inferred from only assessing BRP expression levels and BRP cluster volume. Also, is it guaranteed that synaptic plasticity is not impaired by the large GFP fluorophore? Could the GFP10 construct that is tagged to BRP in all BRP-expressing cells, independent of GAL4, possibly hamper neuronal function? Is it certain that only active zones are labeled? I do see that plastic changes are made visible in this study after an associative learning experiment with BRP intensity and cluster volume as read-out, but I would be reassured by direct measurement of synaptic plasticity with splitGFP directly connected to BRP, maybe at a different synapse that is more accessible.

    3. Reviewer #2 (Public review):

      Summary:

      The authors developed a cell-type specific fluorescence-tagging approach using a CRISPR/Cas9 induced spilt-GFP reconstitution system to visualize endogenous Bruchpilot (BRP) clusters as presynaptic active zones (AZ) in specific cell types of the mushroom body (MB) in the adult Drosophila brain. This AZ profiling approach was implemented in a high-throughput quantification process, allowing for the comparison of synapse profiles within single cells, cell types, MB compartments, and between different individuals. The aim is to analyse in more detail neuronal connectivity and circuits in this centre of associative learning. These are notoriously difficult to investigate due to the density of cells and structures within a cell. The authors detect and characterize cell-type-specific differences in BRP-dependent profiling of presynapses in different compartments of the MB, while intracellular AZ distribution was found to be stereotyped. Next to the descriptive part characterizing various AZ profiles in the MB, the authors apply an associative learning assay and detect consequent AZ re-organisation.

      Strengths:

      The strength of this study lies in the outstanding resolution of synapse profiling in the extremely dense compartments of the MB. This detailed analysis will be the entry point for many future analyses of synapse diversity in connection with functional specificity to uncover the molecular mechanisms underlying learning and memory formation and neuronal network logics. Therefore, this approach is of high importance for the scientific community and a valuable tool to investigate and correlate AZ architecture and synapse function in the CNS.

      Weaknesses:

      The results and conclusions presented in this study are, in many aspects, well-supported by the data presented. To further support the key findings of the manuscript, additional controls, comments, and possibly broader functional analysis would be helpful. In particular:

      (1) All experiments in the study are based on spilt-GFP lines (BRP:GFP11 and UAS-GFP1-10). The Materials and Methods section does not contain any cloning strategy (gRNA, primer, PCR/sequencing validation, exact position of tag insertion, etc.) and only refers to a bioRxiv publication. It might be helpful to add a Materials and Methods section (at least for the BRP:GFP11 line). Additionally, as this is an on locus insertion the in BRP-ORF, it needs a general validation of this line, including controls (Western Blot and correlative antibody staining against BRP) showing that overall BRP expression is not compromised due to the GFP insertion and localizes as BRP in wild type flies, that flies are viable, have no defects in locomotion and learning and memory formation and MB morphology is not affected compared to wild type animals.

      (2) Several aspects of image acquisition and high-throughput quantification data analysis would benefit from a more detailed clarification.

      a) For BRP cluster segmentation it is stated in the Materials and Methods state, that intensity threshold and noise tolerance were "set" - this setting has a large effect on the quantification, and it should be specified and setting criteria named and justified (if set manually (how and why) or automatically (to what)). Additionally, if Pyhton was used for "Nearest Neigbor" analysis, the code should be made available within this manuscript; otherwise, it is difficult to judge the quality of this quantification step.

      b) To better evaluate the quality of both the imaging analysis and image presentation, it would be important to state, if presented and analysed images are deconvolved and if so, at least one proof of principle example of a comparison of original and deconvoluted file should be shown and quantified to show the impact of deconvolution on the output quality as this is central to this study.

      (3) The major part of this study focuses on the description and comparison of the divergent synapse parameters across cell-types in MB compartments, which is highly relevant and interesting. Yet it would be very interesting to connect this new method with functional aspects of the heterogeneous synapses. This is done in Figure 7 with an associative learning approach, which is, in part, not trivial to follow for the reader and would profit from a more comprehensive analysis.

      a) It would be important for the understanding and validation of the learning induced changes, if not (only) a ratio (of AZ density/local intensity) would be presented, but both values on their own, especially to allow a comparison to the quoted, previous AZ remodelling analysis quantifying BRP intensities (ref. 17, 18). It should be elucidated in more detail why only the ratio was presented here.

      b) The reason why a single instead of a dual odour conditioning was performed could be clarified and discussed (would that have the same effects?).

      c) Additionally, "controls" for the unpaired values - that is, in flies receiving neither shock nor odour - it would help to evaluate the unpaired control values in the different MB compartments.

      d) The temporal resolution of the effect is very interesting (Figure 7D), and at more time points, especially between 90 and 270 min, this might raise interesting results.

      e) Additionally, it would be very interesting and rewarding to have at least one additional assay, relating structure and function, e.g. on a molecular level by a correlative analysis of BRP and synaptic vesicles (by staining or co-expression of SV-protein markers) or calcium activity imaging or on a functional level by additional learning assays

    4. Reviewer #3 (Public review):

      Summary:

      The authors develop a tool for marking presynaptic active zones in Drosophila brains, dependent on the GAL4 construct used to express a fragment of GFP, which will incorporate with a genome-engineered partial GFP attached to the active zone protein bruchpilot - signal will be specific to the GAL4-expressing neuronal compartment. They then use various GAL4s to examine innervation onto the mushroom bodies to dissect compartment-specific differences in the size and intensity of active zones. After a description of these differences, they induce learning in flies with classic odour/electric shock pairing and observe changes after conditioning that are specific to the paired conditioning/learning paradigm.

      Strengths:

      The imaging and analysis appear strong. The tool is novel and exciting.

      Weaknesses:

      I feel that the tool could do with a little more characterisation. It is assumed that the puncta observed are AZs with no further definition or characterisation.

    1. eLife Assessment

      This study identifies astrocyte-intrinsic mechanisms by which the LRRK2 G2019S, a mutation linked to familial Parkinson's disease, disrupts synaptic integrity in the anterior cingulate cortex. The findings are convincing, as they rely on a comprehensive set of in vivo and in vitro genetic, biochemical, proteomic, and electrophysiological approaches. They are important because of their translational value, being validated in both mouse models and post-mortem human samples.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors aim to uncover how the Parkinson's disease-linked LRRK2 G2019S mutation affects synaptic integrity through astrocyte-intrinsic mechanisms. Specifically, they investigate whether LRRK2-driven ERM hyperphosphorylation disrupts astrocyte morphology and excitatory synapse maintenance, with a focus on regional specificity within the cortex.

      Strengths:

      (1) Novelty and significance: The work provides important insights into non-neuronal contributions to Parkinson's disease (PD) pathology by highlighting a previously underappreciated role of astrocytic ERM signaling in synapse maintenance. This astrocyte-specific mechanism might help explain early cognitive dysfunctions in PD.

      (2) Mechanistic depth: The authors present a detailed molecular pathway where the LRRK2 G2019S mutation increases ERM phosphorylation, disrupting Ezrin-Atg7 interactions critical for astrocyte morphology.

      (3) Robust methodology: The study uses a powerful combination of tools, including AAV-mediated gene delivery, BioID-based interactome mapping, PALE labeling, and patch-clamp electrophysiology to link molecular, morphological, and functional changes.

      (4) Physiological relevance: Parallel findings in both mouse models and human post-mortem brains suggest conservation of the observed phenotypes and strengthen the relevance to PD pathogenesis.

      Weaknesses:

      (1) Causal directionality: While ERM hyperphosphorylation is clearly shown to correlate with morphological and synaptic changes, the specific causal hierarchy-especially between Ezrin-Atg7 interaction loss and synapse alteration, is inferred but not definitively proven. For example, a rescue experiment directly restoring Atg7 function alongside Ezrin manipulation could strengthen this point.

      (2) Brain region specificity: Although regional differences between ACC and MOp are well documented, the underlying cause of this differential vulnerability remains speculative. Examining astrocyte heterogeneity within cortical layers or via transcriptomic/proteomic profiling could clarify these regional effects.

      (3) Autophagy function: While Atg7 knockdown leads to clear morphological changes, autophagic flux (e.g., LC3-II turnover or p62 accumulation) is not directly assessed. This would strengthen the mechanistic link to autophagy disruption.

      (4) GFAP-based astrogliosis interpretation: The conclusion that no astrogliosis occurs in LRRK2 G2019S mice is based solely on GFAP staining. However, GFAP-negative reactive states have been reported. Including additional markers would help validate this interpretation.

      (5) Impact on neuronal populations: The authors conclude that changes in inhibitory synapse density in the MOp are not rescued by astrocytic Ezrin manipulation and suggest developmental effects on interneurons. However, this is speculative without neuronal cell-type-specific data. Including interneuron density or synaptic connectivity analysis would make this claim more robust.

      (6) Despite these limitations, the authors substantially achieve their stated aims. Their results provide strong support for a model in which astrocytic ERM signaling downstream of LRRK2 contributes to region-specific synaptic changes, particularly in the anterior cingulate cortex. While certain mechanistic links-such as the role of Ezrin-Atg7 interaction in synaptic maintenance-would benefit from further functional validation, the study offers a well-supported framework for understanding astrocyte-intrinsic contributions to synaptic dysfunction in Parkinson's disease.

      This work is likely to contribute meaningfully to ongoing research in neurodegeneration, glial biology, and synaptic regulation. The methodological approaches - especially the combination of in vivo models with proteomics and electrophysiology - will be of interest to others studying astrocyte function and neuron-glia interactions. More broadly, the study highlights the importance of astrocyte heterogeneity and regional specialization in shaping neural circuit vulnerability, providing a valuable foundation for future investigations.

    3. Reviewer #2 (Public review):

      Summary:

      This is an important study that examines the relationship between a Parkinson's 's-associated mutation in LRRK2 kinase and increased ERM phosphorylation in astrocytes, altered excitatory and inhibitory synapse density and function, and a reduction in astrocyte size. The scope is impressively large and includes human and mouse samples, and employs immunolabeling, whole cell patch clamp recording techniques, molecular manipulation in vivo, and BioID. Experiments have appropriate controls, and the outcomes are mostly convincing. The chief weakness is that the study emphasizes scope over depth, such that it falls short of a unifying model of LRRK2-ERM interactions and leave many outcomes difficult to interpret.

      The main idea is that the G2019S Parkinson's mutation in LRRK2 increases its kinase activity and that this either directly or indirectly increases ERM phosphorylation. This excessive ERM phosphorylation is expected to occur within perisynaptic astrocytic processes, reduce astrocyte complexity, and reduce excitatory synapse density and function in ACC. Overexpression of a dominant negative ezrin (phospho-dead) in astrocytes restores their morphology and excitatory synapse density in ACC. This pathway is well supported if taken on its own. But several datapoints presented do not fit this model. The reasoning driving selectivity to ACC and not M1 is not discussed or pursued (is it relevant that pERM levels appear lower in M1 at P21? Do astrocytes in S1 from G2019S mice also show reduced territories?); the differential effects on excitatory versus inhibitory synapses does not fit the model (or is this effect also expected to lie downstream of astrocytes?). Importantly, the effects of ezrin manipulation in wildtype samples (see below) are not integrated into the model, perhaps because the data run counter to expectation.

      Specific Concerns and Questions:

      (1) Effects in wildtype mice are not fully incorporated into the model. Overexpressing (OE) WT ezrin appears to reduce pERM levels by about half (Figure 1i vs 4B). OE-phospho-dead ezrin also appears to reduce pERM integrated density compared to control levels (same figures). This is not discussed (see also item 2). OE phospho-dead ezrin decreases synapse density and maybe function compared to OE WT ezrin in wildtype mice (4C, 4F), but it is not clear whether or not these data differ from unmanipulated wildtype sections/slices (Figures 2 and 3) because the data are normalized. These synaptic findings in wildtype should also be joined to the morphology findings in wildtype astrocytes, where OE-phospho-dead ezrin reduces astrocyte territory similar to LRRK2-G2019S. The shared morphological outcome is discussed as a potential defect in ERM phospho/dephospho balance, but it was hard to see if this could be similarly related to changes in synapse density.

      (2) Labeling for pERMs shown in wildtype mouse and control human is not convincing, but is convincing in the G2019S samples (e.g., Figure 1/S1, Figure 2) (although concentration in perisynaptic astrocytes is not clear). The data presented seem to better support the idea that the mutation confers a pathological gain of ERM phosphorylation (rather than hyperphosphorylation). If the faint labeling in wildtype and control samples is genuine, one would anticipate that pERM labeling would be different in shControl vs. shLrrk2 astrocytes.

      (3) Given the data presented, it would seem that overexpressing the BirA2 ezrin construct, like wildtype ezrin, could impact astrocyte biology. If overexpressing a wildtype ezrin reduces pERM levels, then perhaps the BirA2 construct expression already favors a closed conformation. This is not so much a critique of the approach as a request for clarification and to include, if possible, whether there are reasons to believe or data to support that the BirA2 construct adopts both open and closed conformations.

    4. Reviewer #3 (Public review):

      Summary:

      Wang et al. reported a new role of LRRK2-GS mutant in astrocyte morphology and synapse maintenance and a potential mechanism that acts through phosphorylation of ERM, which binds to ATG7. In both human LRRK2-GS patients and LRRK2-GS KI mouse brain cortex, they found increased ERM phosphorylation levels. LRRK2-GS alters excitatory and inhibitory synapse densities and functions in the cortex, which can be restored by p-ERM-dead mutant. They further demonstrated that LRRK2 regulates astrocyte morphological complexity in vivo through ERM phosphorylation. Proteomic and biochemistry approaches found that ATG7 interacts with Ezrin, which is inhibited by Ezrin phosphorylation. This provides a potential mechanism by which LRRK2-GS impairs the astrocyte morphology.

      Strengths:

      (1) Data in human PD patients (Figure 1B, C) is impressive, showing a clear increase of p-ERM in LRRK2-GS samples.

      (2) Both LRRK2-GS and siLRRK2 show similar phenotypes, supporting both GOF and LOF decrease astrocyte complexity and size.

      (3) Using p-ERM-dead and mimic mutants is elegant. The data is striking that the p-ERM-dead mutant can restore LRRK2-GS-induced excitatory synapse density in the ACC and astrocyte territory volume and complexity, while the p-ERM-mimic mutant can restore the siLRRK2 phenotype.

      (4) ATG7 binding to Ezrin provides a potential mechanism. It is compelling that siATG7 shows a similar decrease in astrocyte territory volume and complexity, and siATG7 in LRRK2-GS does not enhance the astrocyte phenotype.

      Weaknesses:

      (1) The authors claim that p-ERM colocalizes with astrocyte marker ALDH1L1, e.g., Figure 1E, F, G, H, J, K. It is hard to tell from the representative images. Given that this is critical for this paper, it would be appreciated if the authors could improve the images and show clear colocalization. The same concern for Figures S1, 2, 3. Validation of the p-ERM antibody is critical. Figure S4, using λ-PPase to eliminate the phosphorylation signal in general, is very helpful. Additional validation of the p-ERM antibody specific to ERM would be appreciated.

      (2) Does the total ERM level change /increase in LRRK2-GS samples? The increased p-ERM levels could be because the total ERM level increases. Then, the follow-up question is whether the total ERM level matters to the astrocyte phenotypes seen in the paper.

      (3) WT mice carry WT-LRRK2, which also has kinase activity to phosphorylate ERM. So, what are the effects of overexpression of the p-ERM mutants (dead or mimic) on the excitatory and inhibitory synapse densities and functions in WT mouse samples? In Figure 4, statistics should be done comparing WT+Ezrin O/E vs WT+phosphor-dead Ezrin O/E. From what is shown in the graphs, it looks like phosphor-dead Ezrin worsens the phenotype in WT mice, which is opposite to the GS mice. How to explain? The same question for the graphs in Figure 5.

      (4) Rab10 is not a robust substrate for the LRRK2-G2019S mutant, and p-Rab10 is very difficult to detect in mouse brains. The specificity of the pRab10 immunostaining signal in Fig. S8 is not certain.

      (5) Would ATG7, Ezrin, and LRRK2 form a complex?

    1. eLife Assessment

      In this manuscript, Park et al. developed a multiplexed CRISPR construct to genetically ablate the GABA transporter GAT3 in the mouse visual cortex, with effects on population-level neuronal activity. This work is important, as it sheds light on how GAT3 controls the processing of visual information. The findings are compelling, leveraging state-of-the-art gene CRISPR/Cas9, in vivo two-photon laser scanning microscopy, and advanced statistical modeling.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have investigated the role of GAT3 in the visual system. First, they have developed a CRISPR/Cas9-based approach to locally knock out this transporter in the visual cortex. They then demonstrated electrophysiologically that this manipulation increases inhibitory synaptic input into layer 2/3 pyramidal cells. They further examined the functional consequences by imaging neuronal activity in the visual cortex in vivo. They found that the absence of GAT3 leads to reduced spontaneous neuronal activity and attenuated neuronal responses and reliability to visual stimuli, but without an effect on orientation selectivity. Further analysis of this data suggests that Gat3 removal leads to less coordinated activity between individual neurons and in population activity patterns, thereby impairing information encoding. Overall, this is an elegant and technically advanced study that demonstrates a new and important role of GAT3 in controlling the processing of visual information.

      Strengths:

      (1) Development of a new approach for a local knockout (GAT3).

      (2) Important and novel insights into visual system function and its dependence on GAT3.

      (3) Plausible cellular mechanism.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

    3. Reviewer #2 (Public review):

      Summary:

      Park et al. have made a tool for spatiotemporally restricted knockout of the astrocytic GABA transporter GAT3, leveraging CRISPR/Cas9 and viral transduction in adult mice, and evaluated the effects of GAT3 on neural encoding of visual stimulation.

      Strengths:

      This concise manuscript leverages state-of-the-art gene CRISPR/Cas9 technology for knocking out astrocytic genes. This has only to a small degree been performed previously in astrocytes, and it represents an important development in the field. Moreover, the authors utilize in vivo two-photon imaging of neural responses to visual stimuli as a readout of neural activity, in addition to validating their data with ex vivo electrophysiology. Lastly, they use advanced statistical modeling to analyze the impact of GAT3 knockout. Overall, the study comes across as rigorous and convincing.

      Weaknesses:

      Adding the following experiments would potentially have strengthened the conclusions and helped with interpreting the findings:

      (1) Neural activity is quite profoundly influenced by GAT3 knockout. Corroborating these relatively large changes to neural activity with in vivo electrophysiology of some sort as an additional readout would have strengthened the conclusions.

      (2) Given the quite large effects on neural coding in visual cortex assessed på jRGECO imaging, it would have been interesting if the mouse groups could have been subjected to behavioral testing, assessing the visual system.

    1. eLife Assessment

      This study offers important insights into the development of infants' responses to music based on the exploration of EEG neural auditory responses and video-based movement analysis. The convincing results revealed that evoked responses emerge between 3 and 12 months of age, but data analysis requires further refinement to fully complement the findings related to movement in response to music. This study will be of significant interest to developmental psychologists and neuroscientists, as well as researchers interested in music processing and in the translation of perception into action.

    2. Reviewer #1 (Public review):

      Summary:

      This study aims to investigate the development of infants' responses to music by examining neural activity via EEG and spontaneous body kinematics using video-based analysis. The authors also explore the role of musical pitch in eliciting neural and motor responses, comparing infants at 3, 6, and 12 months of age.

      Strengths:

      A key strength of the study lies in its analysis of body kinematics and modeling of stimulus-motor coupling, demonstrating how the amplitude envelope of music predicts infant movement, and how higher musical pitch may enhance auditory-motor synchronization.

      Weaknesses:

      The neural data analysis is currently limited to auditory evoked potentials aligned with beat timing. A more comprehensive approach is needed to robustly support the proposed developmental trajectory of neural responses to music.

    3. Reviewer #2 (Public review):

      Summary:

      Infants' auditory brain responses reveal processing of music (clearly different from shuffled music patterns) from the age of 3 months; however, they do not show a related increase in spontaneous movement activity to music until the age of 12 months.

      Strengths:

      This is a nice paper, well designed, with sophisticated analyses and presenting clear results that make a lot of sense to this reviewer. The additions of EEG recordings in response to music presentations at 3 different infant ages are interesting, and the manipulation of the music stimuli into shuffled, high, and low pitch to capture differences in brain response and spontaneous movements is good. I really enjoyed reading this work and the well-written manuscript.

      Weaknesses:

      I only have two comments. The first is a change to the title. Maybe the title should refer to the first "postnatal" year, rather than the first year of life. There are controversies about when life really starts; it could be in the womb, so using postnatal to refer to the period after birth resolves that debate.

      The other comment relates to the 10 Principal Movements (PMs) identified. I was wondering about the rationale for identifying these different PMs and to what extent many PMs entered in the analyses may hinder more general pattern differences. Infants' spontaneous movements are very variable and poorly differentiated in early development. Maybe, instead of starting with 10 distinct PMs, a first analysis could be run using the combined Quantity of Movements (QoM) without PM distinctions to capture an overall motor response to music. Maybe only 2 PMs could be entered in the analysis, for the arms and for the legs, regardless of the patterns generated. Maybe the authors have done such an analysis already, but describing an overall motor response, before going into specific patterns of motor activation, could be useful to describe the level of motor response. Again, infants provide extremely variable patterns of response, and such variability may potentially hinder an overall effect if the QoM were treated as a cumulated measure rather than one with differentiated patterns.

    4. Reviewer #3 (Public review):

      Summary:

      This study provides a detailed investigation of neural auditory responses and spontaneous movements in infants listening to music. Analyses of EEG data (event-related potentials and steady-state responses) first highlighted that infants at 3, 6, and 12 months of age and adults showed enhanced auditory responses to music than shuffled music. 6-month-olds also exhibited enhanced P1 response to high-pitch vs low-pitch stimuli, but not the other groups. Besides, whole body spontaneous movements of infants were decomposed into 10 principal components. Kinematic analyses revealed that the quantity of movement was higher in response to music than shuffled music only at 12 months of age. Although Granger causality analysis suggested that infants' movement was related to the music intensity changes, particularly in the high-pitch condition, infants did not exhibit phase-locked movement responses to musical events, and the low movement periodicity was not coordinated with music.

      Strengths:

      This study investigates an important topic on the development of music perception and translation to action and dance. It targets a crucial developmental period that is difficult to explore. It evaluates two modalities by measuring neural auditory responses and kinematics, while cross-modal development is rarely evaluated. Overall, the study fills a clear gap in the literature.

      Besides, the study uses state-of-the-art analyses. All steps are clearly detailed. The manuscript is very clear, well-written, and pleasant to read. Figures are well-designed and informative.

      Weaknesses:

      (1) Differences in neural responses to high-pitch vs low-pitch stimuli between 6-month-olds and other infants are difficult to interpret.

      (2) Making some links between the neural and movement responses that are described in this manuscript could be expected, given the study goal. Although kinematic analyses suggested that movement responses are not phase-locked to the music stimuli, analyses of Granger causality between motion velocity and neural responses could be relevant.

      (3) The study considers groups of infants at different ages, but infants within each group might be at different stages of motor development. Was this assessed behaviorally? Would it be possible to explore or take into account this possible inter-individual variability?

    1. eLife Assessment

      This paper undertakes an important investigation to determine whether movement slowing in microgravity is due to a strategic conservative approach or rather due to an underestimation of the mass of the arm. While the experimental dataset is unique and the coupled experimental and computational analyses comprehensive, the authors present incomplete results to support the claim that movement slowing is due to mass underestimation. Further analysis is needed to rule out alternative explanations.

    2. Reviewer #1 (Public review):

      Summary:

      This article investigates the origin of movement slowdown in weightlessness by testing two possible hypotheses: the first is based on a strategic and conservative slowdown, presented as a scaling of the motion kinematics without altering its profile, while the second is based on the hypothesis of a misestimation of effective mass by the brain due to an alteration of gravity-dependent sensory inputs, which alters the kinematics following a controller parameterization error.

      Strengths:

      The article convincingly demonstrates that trajectories are affected in 0g conditions, as in previous work. It is interesting, and the results appear robust. However, I have two major reservations about the current version of the manuscript that prevent me from endorsing the conclusion in its current form.

      Weaknesses:

      (1) First, the hypothesis of a strategic and conservative slowdown implicitly assumes a similar cost function, which cannot be guaranteed, tested, or verified. For example, previous work has suggested that changing the ratio between the state and control weight matrices produced an alteration in movement kinematics similar to that presented here, without changing the estimated mass parameter (Crevecoeur et al., 2010, J Neurophysiol, 104 (3), 1301-1313). Thus, the hypothesis of conservative slowing cannot be rejected. Such a strategy could vary with effective mass (thus showing a statistical effect), but the possibility that the data reflect a combination of both mechanisms (strategic slowing and mass misestimation) remains open.

      (2) The main strength of the article is the presence of directional effects expected under the hypothesis of mass estimation error. However, the article lacks a clear demonstration of such an effect: indeed, although there appears to be a significant effect of direction, I was not sure that this effect matched the model's predictions. A directional effect is not sufficient because the model makes clear quantitative predictions about how this effect should vary across directions. In the absence of a quantitative match between the model and the data, the authors' claims regarding the role of misestimating the effective mass remain unsupported.

      In general, both the hypotheses of slowing motion (out of caution) and misestimating mass have been put forward in the past, and the added value of this article lies in demonstrating that the effect depended on direction. However, (1) a conservative strategy with a different cost function can also explain the data, and (2) the quantitative match between the directional effect and the model's predictions has not been established.

      Specific points:

      (1) I noted a lack of presentation of raw kinematic traces, which would be necessary to convince me that the directional effect was related to effective mass as stated.

      (2) The presentation and justification of the model require substantial improvement; the reason for their presence in the supplementary material is unclear, as there is space to present the modelling work in detail in the main text. Regarding the model, some choices require justification: for example, why did the authors ignore the nonlinear Coriolis and centripetal terms?

      (3) The increase in the proportion of trials with subcomponents is interesting, but the explanatory power of this observation is limited, as the initial percentage was already quite high (from 60-70% during the initial study to 70-85% in flight). This suggests that the potential effect of effective mass only explains a small increase in a trend already present in the initial study. A more critical assessment of this result is warranted.

    3. Reviewer #2 (Public review):

      This study explores the underlying causes of the generalized movement slowness observed in astronauts in weightlessness compared to their performance on Earth. The authors argue that this movement slowness stems from an underestimation of mass rather than a deliberate reduction in speed for enhanced stability and safety.

      Overall, this is a fascinating and well-written work. The kinematic analysis is thorough and comprehensive. The design of the study is solid, the collected dataset is rare, and the model tends to add confidence to the proposed conclusions. That being said, I have several comments that could be addressed to consolidate interpretations and improve clarity.

      Main comments:

      (1) Mass underestimation

      a) While this interpretation is supported by data and analyses, it is not clear whether this gives a complete picture of the underlying phenomena. The two hypotheses (i.e., mass underestimation vs deliberate speed reduction) can only be distinguished in terms of velocity/acceleration patterns, which should display specific changes during the flight with a mass underestimation. The experimental data generally shows the expected changes but for the 45{degree sign} condition, no changes are observed during flight compared to the pre- and post-phases (Figure 4). In Figure 5E, only a change in the primary submovement peak velocity is observed for 45{degree sign}, but this finding relies on a more involved decomposition procedure. It suggests that there is something specific about 45{degree sign} (beyond its low effective mass). In such planar movements, 45{degree sign} often corresponds to a movement which is close to single-joint, whereas 90{degree sign} and 135{degree sign} involve multi-joint movements. If so, the increased proportion of submovements in 90{degree sign} and 135{degree sign} could indicate that participants had more difficulties in coordinating multi-joint movements during flight. Besides inertia, Coriolis and centripetal effects may be non-negligible in such fast planar reaching (Hollerbach & Flash, Biol Cyber, 1982) and, interestingly, they would also be affected by a mass underestimation (thus, this is not necessarily incompatible with the author's view; yet predicting the effects of a mass underestimation on Coriolis/centripetal torques would require a two-link arm model). Overall, I found the discrepancy between the 45{degree sign} direction and the other directions under-exploited in the current version of the article. In sum, could the corrective submovements be due to a misestimation of Coriolis/centripetal torques in the multi-joint dynamics (caused specifically -or not- by a mass underestimation)?

      b) Additionally, since the taikonauts are tested after 2 or 3 weeks in flight, one could also assume that neuromuscular deconditioning explains (at least in part) the general decrease in movement speed. Can the authors explain how to rule out this alternative interpretation? For instance, weaker muscles could account for slower movements within a classical time-effort trade-off (as more neural effort would be needed to generate a similar amount of muscle force, thereby suggesting a purposive slowing down of movement). Therefore, could the observed results (slowing down + more submovements) be explained by some neuromuscular deconditioning combined with a difficulty in coordinating multi-joint movements in weightlessness (due to a misestimation or Coriolis/centripetal torques) provide an alternative explanation for the results?

      (2) Modelling

      a) The model description should be improved as it is currently a mix of discrete time and continuous time formulations. Moreover, an infinite-horizon cost function is used, but I thought the authors used a finite-horizon formulation with the prefixed duration provided by the movement utility maximization framework of Shadmehr et al. (Curr Biol, 2016). Furthermore, was the mass underestimation reflected both in the utility model and the optimal control model? If so, did the authors really compute the feedback control gain with the underestimated mass but simulate the system with the real mass? This is important because the mass appears both in the utility framework and in the LQ framework. Given the current interpretations, the feedforward command is assumed to be erroneous, and the feedback command would allow for motor corrections. Therefore, it could be clarified whether the feedback command also misestimates the mass or not, which may affect its efficiency. For instance, if both feedforward and feedback motor commands are based on wrong internal models (e.g., due to the mass underestimation), one may wonder how the astronauts would execute accurate goal-directed movements.

      b) The model seems to be deterministic in its current form (no motor and sensory noise). Since the framework developed by Todorov (2005) is used, sensorimotor noise could have been readily considered. One could also assume that motor and sensory noise increase in microgravity, and the model could inform on how microgravity affects the number of submovements or endpoint variance due to sensorimotor noise changes, for instance.

      c) Finally, how does the model distinguish the feedforward and feedback components of the motor command that are discussed in the paper, given that the model only yields a feedback control law? Does 'feedforward' refer to the motor plan here (i.e., the prefixed duration and arguably the precomputed feedback gain)?

      (3) Brevity of movements and speed-accuracy trade-off

      The tested movements are much faster (average duration approx. 350 ms) than similar self-paced movements that have been studied in other works (e.g., Wang et al., J Neurophysiology, 2016; Berret et al., PLOS Comp Biol, 2021, where movements can last about 900-1000 ms). This is consistent with the instructions to reach quickly and accurately, in line with a speed-accuracy trade-off. Was this instruction given to highlight the inertial effects related to the arm's anisotropy? One may however, wonder if the same results would hold for slower self-paced movements (are they also with reduced speed compared to Earth performance?). Moreover, a few other important questions might need to be addressed for completeness: how to ensure that astronauts did remember this instruction during the flight? (could the control group move faster because they better remembered the instruction?). Did the taikonauts perform the experiment on their own during the flight, or did one taikonaut assume the role of the experimenter?

      (4) No learning effect

      This is a surprising effect, as mentioned by the authors. Other studies conducted in microgravity have indeed revealed an optimal adaptation of motor patterns in a few dozen trials (e.g., Gaveau et al., eLife, 2016). Perhaps the difference is again related to single-joint versus multi-joint movements. This should be better discussed given the impact of this claim. Typically, why would a "sensory bias of bodily property" persist in microgravity and be a "fundamental constraint of the sensorimotor system"?

    4. Reviewer #3 (Public review):

      Summary:

      The authors describe an interesting study of arm movements carried out in weightlessness after a prolonged exposure to the so-called microgravity conditions of orbital spaceflight. Subjects performed radial point-to-point motions of the fingertip on a touch pad. The authors note a reduction in movement speed in weightlessness, which they hypothesize could be due to either an overall strategy of lowering movement speed to better accommodate the instability of the body in weightlessness or an underestimation of body mass. They conclude for the latter, mainly based on two effects. One, slowing in weightlessness is greater for movement directions with higher effective mass at the end effector of the arm. Two, they present evidence for an increased number of corrective submovements in weightlessness. They contend that this provides conclusive evidence to accept the hypothesis of an underestimation of body mass.

      Strengths:

      In my opinion, the study provides a valuable contribution, the theoretical aspects are well presented through simulations, the statistical analyses are meticulous, the applicable literature is comprehensively considered and cited, and the manuscript is well written.

      Weaknesses:

      Nevertheless, I am of the opinion that the interpretation of the observations leaves room for other possible explanations of the observed phenomenon, thus weakening the strength of the arguments.

      First, I would like to point out an apparent (at least to me) divergence between the predictions and the observed data. Figures 1 and S1 show that the difference between predicted values for the 3 movement directions is almost linear, with predictions for 90º midway between predictions for 45º and 135º. The effective mass at 90º appears to be much closer to that of 45º than to that of 135º (Figure S1A). But the data shown in Figure 2 and Figure 3 indicate that movements at 90º and 135º are grouped together in terms of reaction time, movement duration, and peak acceleration, while both differ significantly from those values for movements at 45º.

      Furthermore, in Figure 4, the change in peak acceleration time and relative time to peak acceleration between 1g and 0g appears to be greater for 90º than for 135º, which appears to me to be at least superficially in contradiction with the predictions from Figure S1. If the effective mass is the key parameter, wouldn't one expect as much difference between 90º and 135º as between 90º and 45º? It is true that peak speed (Figure 3B) and peak speed time (Figure 4B) appear to follow the ordering according to effective mass, but is there a mathematical explanation as to why the ordering is respected for velocity but not acceleration? These inconsistencies weaken the author's conclusions and should be addressed.

      Then, to strengthen the conclusions, I feel that the following points would need to be addressed:

      (1) The authors model the movement control through equations that derive the input control variable in terms of the force acting on the hand and treat the arm as a second-order low-pass filter (Equation 13). Underestimation of the mass in the computation of a feedforward command would lead to a lower-than-expected displacement to that command. But it is not clear if and how the authors account for a potential modification of the time constants of the 2nd order system. The CNS does not effectuate movements with pure torque generators. Muscles have elastic properties that depend on their tonic excitation level, reflex feedback, and other parameters. Indeed, Fisk et al.* showed variations of movement characteristics consistent with lower muscle tone, lower bandwidth, and lower damping ratio in 0g compared to 1g. Could the variations in the response to the initial feedforward command be explained by a misrepresentation of the limbs' damping and natural frequency, leading to greater uncertainty about the consequences of the initial command? This would still be an argument for unadapted feedforward control of the movement, leading to the need for more corrective movements. But it would not necessarily reflect an underestimation of body mass.

      *Fisk, J. O. H. N., Lackner, J. R., & DiZio, P. A. U. L. (1993). Gravitoinertial force level influences arm movement control. Journal of neurophysiology, 69(2), 504-511.

      (2) The movements were measured by having the subjects slide their finger on the surface of a touch screen. In weightlessness, the implications of this contact are expected to be quite different than those on the ground. In weightlessness, the taikonauts would need to actively press downward to maintain contact with the screen, while on Earth, gravity will do the work. The tangential forces that resist movement due to friction might therefore be different in 0g. This could be particularly relevant given that the effect of friction would interact with the limb in a direction-dependent fashion, given the anisotropy of the equivalent mass at the fingertip evoked by the authors. Is there some way to discount or control for these potential effects?

      (3) The carefully crafted modelling of the limb neglects, nevertheless, the potential instability of the base of the arm. While the taikonauts were able to use their left arm to stabilize their bodies, it is not clear to what extent active stabilization with the contralateral limb can reproduce the stability of the human body seated in a chair in Earth gravity. Unintended motion of the shoulder could account for a smaller-than-expected displacement of the hand in response to the initial feedforward command and/or greater propensity for errors (with a greater need for corrective submovements) in 0g. The direction of movement with respect to the anchoring point could lead to the dependence of the observed effects on movement direction. Could this be tested in some way, e.g., by testing subjects on the ground while standing on an unstable base of support or sitting on a swing, with the same requirement to stabilize the torso using the contralateral arm?

      The arguments for an underestimation of body mass would be strengthened if the authors could address these points in some way.

    1. eLife Assessment

      The authors proposed two hypotheses: first, that methamphetamine induces neuroinflammation, and second, that it alters neuronal stem cell differentiation. These are valuable hypotheses, and the authors provided in vivo observations of the methamphetamine response in mice. However, concerns remain regarding the interpretation of the data, and the current evidence is incomplete, requiring substantial experimental validation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript focuses on single-cell RNA sequencing (scRNA-seq) analysis following chronic methamphetamine (METH) treatment in mice. The authors propose two hypotheses:

      (1) METH induces neuroinflammation involving T and NKT cells, and (2) METH alters neuronal stem cell differentiation.

      Strengths:

      The authors provide a substantial dataset with numerous replicates, offering valuable resources to the research community.

      Weaknesses:

      Concerns persist regarding the interpretation of data and the validation of experiments. First, the presence of T cells, NKT cells, and neutrophils in both the control and METH-treated hippocampi suggests that blood contamination rather than immune cell infiltration is the cause. Since the authors claim that METH disrupts the blood-brain barrier, increasing the infiltration of these immune cells, identifying the source of these immune cells is critical.

      Secondly, the pseudotime analysis, which suggests altered neural stem cell (NSC) differentiation, is not conclusively supported by the current data and requires further validation.

      Overall, the authors provided comprehensive in vivo data on the impact of methamphetamine on the hippocampus; however, further in vivo and in vitro experimental validation of the key findings is needed.

    3. Reviewer #2 (Public review):

      Summary:

      Chronic methamphetamine (METH) abuse leads to significant structural and functional deficits in the cortical and hippocampal regions in humans. However, the specific mechanisms underlying chronic METH-induced neurotoxicity in the hippocampus and its contribution to cognitive deficits remain poorly understood. The authors aim to address this knowledge gap using a single-cell transcriptomic atlas of the hippocampus under chronic METH exposure in mice. They present analyses of differential gene expression, cell-cell communication, pseudotemporal trajectories, and transcription factor regulation to characterize the cellular-level impact of METH abuse. However, the overall quality of the manuscript is currently very poor due to a lack of basic quality control, overly descriptive content, and unclear conclusions.

      Strengths:

      The major strength of this study is that it may represent the first report on the impact of METH on the hippocampus in mice. However, the authors should clarify whether similar studies have been previously conducted, as this point remains uncertain.

      Weaknesses:

      Despite this potential novelty, the study has numerous weaknesses. Notably, single-cell RNA sequencing was unable to capture an adequate number of neuronal populations. Neurons accounted for only approximately 0.6% of the total nuclei, representing a significant underrepresentation compared to their actual physiological proportion. Given that the behavioral effects of METH are likely mediated by neuronal dysfunction, readers would reasonably expect to see transcriptional changes in neurons. The authors should explain why they were unable to capture a sufficient number of neurons and justify how this incomplete dataset can still provide meaningful scientific insights for researchers studying METH-induced hippocampal damage and behavioral alterations.

      Another significant weakness of this study is the lack of a cohesive hypothesis or overarching conclusion regarding how METH impacts neural populations. The authors provide a largely descriptive account of transcriptional alterations across various cell types, but the manuscript lacks clear, biologically meaningful conclusions. This descriptive approach makes it difficult for readers to identify the key findings or take-home messages. To improve clarity and impact, the authors should focus on developing and presenting a few plausible hypotheses or mechanistic scenarios regarding METH-induced neurotoxicity, grounded in their scRNA-seq data. Including schematic figures to illustrate these hypotheses would also help readers better understand and interpret the study.

      The final major weakness of this study is its poor readability. It appears that the authors did not adequately proofread the manuscript, as there are numerous typographical errors (e.g., line 333: trisulting; line 756: essencial), unsupported scientific claims lacking citations (e.g., lines 485, 503, 749-753), and grammatically incorrect sentences (e.g., lines 470-472, 540-543, 749-753). In addition, many paragraphs are unorganized and overly descriptive, which further hinders clarity. Some figures are also problematic - too small in size and overcrowded with text in fonts that are difficult to read. It is recommended that the authors carry out quality control. There are too many typographical and grammatical errors to list individually; the authors should carefully review and revise the entire manuscript to address all of these issues.

      Overall, this study could have offered some incremental new insights into neurotoxicity following chronic METH exposure, despite the poor capture of neuronal populations. However, the current manuscript feels more like a data dump than a thoughtfully constructed scientific narrative. I encourage the authors to extract and highlight meaningful biological insights from their dataset and clearly articulate these in the conclusion, ideally supported by an additional schematic figure. Furthermore, I strongly urge the authors to substantially improve the basic quality of the manuscript through careful proofreading and by seeking feedback from colleagues or other readers.

    4. Reviewer #3 (Public review):

      Summary:

      This study aimed to elucidate the intricate mechanisms underlying cognitive decline induced by chronic METH abuse, focusing on the hippocampus at a single-cell resolution. The authors established a robust mouse model of chronic METH exposure. They observed significant impairments in working memory, spatial cognition, learning, and cognitive memory through Y-maze and novel object recognition tests. To gain deeper insights into the cellular and molecular changes, they utilized single-cell RNA sequencing to profile hippocampal cells. They performed extensive bioinformatics analyses, including cell clustering, differential gene expression, cellular communication, pseudotemporal trajectory, and transcription factor regulation.

      Strengths:

      (1) The authors performed a comprehensive suite of bioinformatics analyses, including differential gene expression, cellular cross-talk, pseudotime trajectory, and SCENIC analysis, which enable a multifaceted exploration of METH-induced changes at both the cellular and molecular levels.

      (2) The study demonstrates an awareness of the potential influence of circadian rhythms, dedicating a specific section in the discussion to the disruption of circadian rhythms, which has rarely been mentioned in previous studies on METH. They highlight the frequent occurrence of circadian regulation in their analysis across several cell types.

      (3) The pseudotime analysis provides valuable insights into hindered neurogenesis, showing a shift in NSC differentiation toward astrocytes rather than neuroblasts in METH-treated mice. The detailed analysis of BBB components (endothelial cells, mural cells, SMCs) and their heterogeneous responses to METH is also a significant contribution.

      Weaknesses:

      (1) While the bioinformatics analyses are extensive, the study is primarily descriptive at the molecular level. The absence of experimental validation, such as targeted mRNA/protein quantification and gene knockdown/overexpression to confirm the causal relationship between these identified genes and METH-induced cognitive deficits, is a notable limitation.

      (2) While the discussion extensively covers the functional implications of specific molecular pathways and cell types, it would greatly benefit from a comparison of these findings with existing RNA sequencing data from other METH models in hippocampal tissue.

      (3) The conclusion that "prolonged METH use may progressively impair cognitive function" may not be uniformly supported by the behavioral data: Figures 1C and F (discrimination and preference indexes) exhibited that the 4-week test further declined in the METH group compared to the 2-week. In contrast, Figure 1E and H present a contradictory pattern.

    1. eLife Assessment

      This valuable study investigates the neural basis of bidirectional communication between the cortex and hippocampus during learning. The evidence supporting the identification of specific circuits and functional cell types involved is convincing. However, certain aspects of the behavioral analysis and statistical interpretation remain incomplete. Overall, the work will be of interest to neuroscientists studying learning and memory.

    2. Reviewer #1 (Public review):

      Summary:

      This work by Hall et al provides a novel and important new finding about communication between the anterior cingulate cortex (ACC) and the CA1 region of the dorsal hippocampus: there is a clear ability of ACC to predict CA1 activity, and that is modulated by learning/experience. Furthermore, they have some evidence that the modulation differs by whether the CA1 neurons were in the deep versus superficial sub-layer of CA1. The evidence is suggestive of new and exciting findings, but some gaps and weaknesses remain to be addressed before I believe all of the authors' claims can be supported. The figures also need to be slightly better organized, and the discussion is missing a major dimension in my opinion. Overall, this is a strong submission, but with some gaps to fill.

      Strengths:

      (1) This is a well-written manuscript - the introduction was especially clear, well-cited, and motivating.

      (2) The sub-layer specific communication between ACC and CA1 represents the discovery of a novel and functionally impactful piece of neurobiology.

      (3) Optogenetics was an important verification of ACC-CA1 communication, as was the analysis of neurons by waveform type.

      Weaknesses:

      (1) Figure 2: Why are the data separated into two groups from the outset? If all data are combined, is there a general drop in prediction gain from pre to post?

      (2) 2b and 2c are important since they are complementary means to show the same thing, and it is important that they cross-validate each other, especially since the non-significant task active neuron difference in 2b appears to be nearly as strong as the significant difference to its left. A more holistic analysis can be done to compare these dimensions.

      (3) Sup vs deep neuron definition: Did the authors have any means to validate this anatomical separation using histology or otherwise? I don't believe they described anything like that, and instead use physiology to infer anatomical location. I understand anatomy-based methods may be practically impossible with tetrodes, but this limitation should at least be mentioned, and it should be explained that without something like silicon probes or histological validation, anatomy had to be inferred from physiology.

      (4) Superficial vs deep differences in firing rate ratio based on PG: there are many fewer CAdeep neurons, but in 4c, the trends appear to be the same pre-training, top PG lower than others. It seems the lack of difference in CA1deep in 4c may be due to the much lower power/n. This should be discussed or addressed.

      (5) In Figure 5, the term "firing rate ratio" is used, and it sounds the same as in previous figures, but this is a different ratio (based on modulation by opto stim, not task).

      (6) I would like to learn more about these v-type neurons. I understand we do not yet know about their molecular or morphologic correlate, but more analysis can be done with the current data.

      (7) I would like more discussion of ACC-CA1 connectivity.

      (8) Some elements may be missing from the discussion, relating baseline functioning versus post-learning function.

    3. Reviewer #2 (Public review):

      Summary:

      This study uncovers an inhibitory pathway from the anterior cingulate cortex (ACC) to pyramidal cells in the superficial sublayer of hippocampal area CA1 (CA1sup). As ACC neuron spiking tends to precede hippocampal ripples, this presents the intriguing possibility that ACC inputs are selectively inhibiting particular CA1sup neurons, which could play a role in the reactivation of task-related ensembles known to take place during hippocampal ripples. Indeed, through a generalized linear model (GLM) analysis, the authors demonstrate that the ACC activity within the 200ms immediately preceding the ripple is predictive of the ripple content.

      Strengths:

      The biggest strength of the work is the optogenetic manipulation experiments, which convincingly demonstrate that stimulation of ACC pyramidal neurons activates an interneuron population with symmetric spike waveforms, and inhibits parvalbumin interneurons and pyramidal cells in CA1sup but not CA1deep sublayer.

      An additional strength in the GLM analysis which consistently shows that ACC activity preceding the ripple is predictive of hippocampal activity during the ripple considerably more than in shuffled data for all cells and periods tested.

      Weaknesses:

      The major weakness of this work is that the link with learning and memory is not very well supported.

      The only evidence of rebalancing and reorganization appears to be a single statistical test (the test in Figure 1f, p=0.013) demonstrating a decrease of the GLM prediction gain from pre-task sleep to post-task sleep; the same test is repeated for subsets of the data in the rest of the figures. As the idea of rebalancing and reorganization is central to the paper as currently written, exploring it through another measure, independent of the GLM prediction gain, should be expected. The notion that this pathway is suppressed in sleep following learning can be supported by demonstrating a decrease in any of the following measures: ACC spike-triggered average CA1sup responses, cross-covariances (Wierzynski et al 2009) between ACC and CA1sup cells in post-task sleep, or ripple-triggered cross-correlations (Sirota et al. 2009).

      The differences between task-active and task-inactive neurons are not convincing. The separation between task-active and task-inactive neurons is to divide a distribution that is far from bimodal into what appears to be two arbitrary groups. Similarly, the authors divide cells relative to their prediction gain ("Top PG" and "Bottom PG" in Figure 2c), which fails to select for the population of significantly predicted cells (relative to the shuffle). Within CA1sup cells, after learning, there is a significant decrease in the prediction gain for "task-inactive" cells but not "task-active" cells, but it is important to keep in mind that the "task-active" group contains only 24 neurons, and there was no difference between the two groups of cells ("task-active" vs "task-inactive") when directly compared.

      Finally, it is not clear whether the identity of the pathway-responsive CA1sup neurons is fixed or whether it may change with learning. A deeper analysis into the cell pair cross-correlations or the weights of the GLM analysis may reveal whether there is a reorganization of CA1sup responses (some cells that were inhibited are no longer inhibited, and vice versa) or a dampening (the same CA1sup cells are inhibited in both cases, but the inhibition is less-pronounced in post-task sleep). The possibility of a rigid circuit dampened immediately following fear conditioning, is not discussed by the authors.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Hall and colleagues investigate how the coupling of activity from ACC to CA1is altered by fear learning, showing that during sleep immediately before learning, there is evidence for increased coupling of ACC activity with neurons that will subsequently be inhibited during the learning process. They go on to show that this effect seems to be mediated most by a subpopulation of neurons in the superficial layer of CA1. This fits with previous reports suggesting that these superficial neurons are key for the flexible updating of memory. The authors then go on to show that artificial activation of ACC using optogenetics results in varied effects in CA1, including a subtle decrease in activity of superficial neurons that lasts longer than the stimulus itself. Finally, the authors present some preliminary data suggesting that different interneurons may be recruited by this optogenetic stimulation in different ways and at different times.

      Overall, this is an interesting paper, but much of the analysis is very preliminary, and much of the crucial data about the learning effects and alterations to cell firing are not presented clearly and fully. This is further confounded by a rather opaque description of the results and analysis in the text. Overall, there is something very interesting here, but there needs to be a substantial series of extra analyses to clearly say what this is. In many cases, more robust analysis may render the results underpowered, which could dramatically change the conclusions of the paper.

      Strengths:

      The authors performed difficult, dual-location recordings across a multi-day learning paradigm, which seems like it could be a really nice dataset. They delve into the circuit basis of an interesting finding regarding ACC to CA1 connectivity and how this changes before and after fear conditioning. They provide data to suggest this connectivity may be through specific and distinct subcircuits in CA1.

      Weaknesses:

      (1) There is essentially no information in the text or figures about what the actual learning was, how it was done, how individual animals performed, and how any of these metrics related to learning. Looking at the methods, the authors did a number of things never mentioned anywhere in the text or figures, including novel arena exposure, contextual reexposure in extinction after learning, etc. It seems that this is a very rich dataset that has not been presented at all. I would recommend at the very least:<br /> a) Plot all of the behavioural training data, and how each mouse relates to one another - did the mice learn? At this stage, we don't know!<br /> b) Explain in the text in detail exactly what was done and why, and what this tells us about the neuronal activity.<br /> c) If there is variance in learning and or conditioning, does this relate to features in the analysis, such as the GLM result.

      (2) Along similar lines, a key metric for most of the paper is that neurons most coupled with ACC are more likely to be inhibited during training. However, there is nothing anywhere in the paper showing these data. How do neurons in general respond to contextual shocks? The methods describe this as the average firing rate during training, normalised to pre-sleep activity. This metric seems a bit coarse and may obscure really important task-relevant dynamics. Are the neurons active at specific times, are they tuned to relevant parts of the task, and do any of these features of the cell activity also relate to the coupling with ACC? Similarly, how did the authors mitigate the influence of electrical artefacts caused by the foot shock in their recordings? Again, there is a huge amount of data here that is not being described, and likely holds very valuable information about what is actually happening. The paper would really benefit from the inclusion of these data in an accessible form, such as heatmaps of spiking, how these patterns change over time, and around e.g., foot shock, etc. Also key is how these features are altered by the variability of learning across subjects.

      (3) A number of the effects are presented by comparing a statistically significant effect to a non-statistically significant effect (e.g. in Figure 2b, Figure 2d, Figure 4 b,c, and others). This isn't really valid - the key test that the two groups are different is either with a direct test of the difference or an interaction term in an e.g., ANOVA test. In some places, I am not sure the same conclusions will be drawn from the data with these tests.

      (4) To what extent is defining superficial and deep CA1 neurons solely by ripple waveform an accepted method? Of the two papers referenced for this approach, one is a 2-photon calcium imaging paper that does not do electrical recordings (as far as I am aware), and the second uses this as a descriptor after defining the positions of units on an array. It would be good to clarify how accepted this is, and also how robust this is. At the very least, some kind of metric or walkthrough in the supplement as to how this was done, and how well each cell was classified and with what confidence, or some metric of how distinct and separate the two populations were (or was it just a smudge).

      (5) In the optogenetic experiment in Figure 5, the effect on the CA1 sup neurons seems to be driven by changes in a small subpopulation of this group, with no change in the others. Related to point 2, is there anything else in the data that can pull out what these cells are? More detailed analysis of the firing of these neurons might pull out something really interesting.

      (6) Related to this - a number of comparisons simply pool neurons across mice and analyse them as if independent. This is done a lot in the past, but it would be better if an approach that included the interdependence of neurons recorded from the same mouse at the same time were used (such as a hierarchical model). While this is complex, a simpler approach would just be to plot the summary data also per mouse. For example, in Figure 5, how do the neurons inhibited by ACC activation spread across the different mice? Is the level of inhibition related to how well the mice learned the CS-US association?

      (7) Figure 6 is interesting, but very preliminary. None of the effects are quantified, and one of the cell types is not identified. I think some proper analysis needs to be done, again across mice, to be able to draw conclusions from these data.

      (8) Finally, in general, I felt that the way the paper was written was very hard to follow, often relying on very processed levels of analysis that were hard to relate back to the raw traces and their biological meaning. In general taking more words to really simply and fully explain each analysis, and taking the words and figures to walk through how each analysis was done and what it tells us about the neuronal data/biology would be really beneficial, especially to someone who is not an extracellular electrophysiologist or immersed in the immediate field.

      In summary, while this manuscript explores an intriguing hypothesis about pre-learning circuit dynamics, it is currently held back by insufficient clarity in behavioural analysis, data presentation, and statistical quantification. Addressing these core issues would greatly improve interpretability and confidence in the findings.

    5. Author response:

      We would like to thank the reviewers and the editorial team for all their thoughtful and constructive feedback. The reviewers provided many helpful comments which we will work to incorporate in our resubmission as we believe they will significantly enhance the quality of our manuscript.

      An overarching critique shared among reviewers was regarding limitations in our datasets. Namely, lower N-values for certain groups make some conclusions less reliable. We acknowledge this limitation and will add more experiments to address this concern. Additionally, attention was drawn to our reliance on using the generalized linear model (GLM) for making claims about rebalancing and learning-related changes. To address this, we will work to include additional analyses such as ACC spike-triggered average CA1sup responses, cross-covariances between ACC and CA1sup cells in post-task sleep, and ripple-triggered cross-correlations, among others as per reviewer recommendations. We will also provide a deeper analysis of the weights CA1 neuron in our GLM analysis and their specific features during learning. In accordance, we will provide a clearer description of our learning paradigm including performance data for each animal and how performance relates to our analyses. Overall, we will include more analyses of our datasets across various task events such as recall, to make more efficient use of the full repertoire of our recordings.

      Concerns were also raised regarding some aspects of our statistical analyses. During revision, we will ensure we select the most appropriate statistical measure for each of our tests. Our paper implements the use of tetrode recordings to assess sublayer identification. This approach comes with limitations, and in our resubmission, we will provide a more detailed explanation of those limitations along with a more thorough description of our measures to mitigate them.

      Lastly, in our follow-up submission we will work to improve the written clarity of findings. Specifically, we will simplify and better explain our findings and provide clearer justification for our interpretations and choice of analyses.

    1. eLife Assessment

      This revised paper provides valuable findings that altruistic tendency during moral decision-making is gain/loss context-dependent and oxytocin can restore the absence of altruistic choices in the loss domain. The methods and analyses are solid, yet the study could still benefit from better overall framing and more clarity and precision in the definition of key constructs, as pointed out by reviewers. If these concerns are addressed, this study would be of interest to social scientists and neuroscientists who work on moral decision-making and oxytocin.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether hyperaltruistic preference is modulated by decision context and tested how oxytocin (OXT) may modulate this process. Using an adapted version of a previously well-established moral decision-making task, healthy human participants in this study undergo decisions that gain more (or lose less, termed as context) meanwhile inducing more painful shocks to either themselves or another person (recipient). The alternative choice is always less gain (or more loss) meanwhile less pain. Through a series of regression analyses, the authors reported that hyperaltruistic preference can only be found in the gain context but not in the loss context, however, OXT reestablished the hyperaltruistic preference in the loss context similar to that in the gain context.

      Strengths:

      This is a solid study that directly adapted a previously well-established task and the analytical pipeline to assess hyperaltruistic preference in separate decision contexts. Context-dependent decisions have gained more and more attention in literature in recent years, hence this study is timely. It also links individual traits (via questionnaires) with task performance, to test potential individual differences. The OXT study is done with great methodological rigor, including pre-registration. Both studies have proper power analysis to determine the sample size.

      Weaknesses:

      Despite the strengths, multiple analytical decisions have to be explained, justified, or clarified. Also, there is scope to enhance the clarity and coherence of the writing - as it stands, readers will have to go back and forth to search for information. Last, it would be helpful to add line numbers in the manuscript during the revision, as this will help all reviewers to locate the parts we are talking about.

      Introduction:<br /> (1) The introduction is somewhat unmotivated, with key terms/concepts left unexplained until relatively late in the manuscript. One of the main focuses in this work is "hyperaltruistic", but how is this defined? It seems that the authors take the meaning of "willing to pay more to reduce other's pain than their own pain", but is this what the task is measuring? Did participants ever need to PAY something to reduce the other's pain? Note that some previous studies indeed allow participants to pay something to reduce other's pain. And what makes it "HYPER-altruistic" rather than simply "altruistic"? Plus, in the intro, the authors mentioned that the "boundary conditions" remain unexplored, but this idea is never touched again. What do boundary conditions mean here in this task? How do the results/data help with finding out the boundary conditions? Can this be discussed within wider literature in the Discussion section? Last, what motivated the authors to examine decision context? It comes somewhat out of the blue that the opening paragraph states that "We set out to [...] decision context", but why? Are there other important factors? Why decision context is more important than studying those others?

      Experimental design:<br /> (2) The experiment per se is largely solid, as it followed a previously well-established protocol. But I am curious about how the participants got instructed? Did the experimenter ever mention the word "help" or "harm" to the participants? It would be helpful to include the exact instructions in the SI.

      (3) Relatedly, the experimental details were not quite comprehensive in the main text. Indeed, Methods come after the main text, but to be able to guide readers to understand what was going on, it would be very helpful if the authors could include some necessary experimental details at the beginning of the Results section.

      Statistical analysis<br /> (3) One of the main analyses uses the harm aversion model (Eq1) and the results section keeps referring to one of the key parameters of it (ie, k). However, it is difficult to understand the text without going to the Methods section below. Hence it would be very helpful to repeat the equation also in the main text. A similar idea goes to the delta_m and delta_s terms - it will be very helpful to give a clear meaning of them, as nearly all analyses rely on knowing what they mean.

      (4) There is one additional parameter gamma (choice consistency) in the model. Did the authors also examine the task-related difference of gamma? This might be important as some studies have shown that the other-oriented choice consistency may differ in different prosocial contexts.

      (5) I am not fully convinced that the authors included two types of models: the harm aversion model and logistic regression models. Indeed, the models look similar, and the authors have acknowledged that. But I wonder if there is a way to combine them? For example:<br /> Choice ~ delta_V * context * recipient (*Oxt_v._placebo)<br /> The calculation of delta_V follows Equation 1.<br /> Or the conceptual question is, if the authors were interested in the specific and independent contribution of dalta_m and dalta_s to behavior, as their logistic model did, why the authors examine the harm aversion first, where a parameter k is controlling for the trade-off? One way to find it out is to properly run different models and run model comparison. In the end, it would be beneficial to only focus on the "winning" model to draw inferences.

      (6) The interpretation of the main OXT results needs to be more cautious. According to the operationalization, "hyperaltruistic" is the reduction of pain of others (higher % of choosing the less painful option) relative to the self. But relative to the placebo (as baseline), OXT did not increase the % of choosing the less painful option for others, rather, it decreased the % of choosing the less painful option for themselves. In other words, the degree of reducing other's pain is the same under OXT and placebo, but the degree of benefiting self-interest is reduced under OXT. I think this needs to be unpacked, and some of the wording needs to be changed. I am not very familiar with the OXT literature, but I believe it is very important to differentiate whether OXT is doing something on self-oriented actions vs other-oriented actions. Relatedly, for results such as that in Fig5A, it would be helpful to not only look at the difference, but also the actual magnitude of the sensitivity to the shocks, for self and others, under OXT and placebo.

      Comments on revisions:

      I did not change my original public review, as I think it can still be helpful for the field to see the reasoning and argument.

      For the revision, the authors have done a thorough job of addressing my previous comments and questions.

      The only aspect I would like to ask is that, it would still be great to have a clear definition of hyperaltruism. As it stands, hyperaltruism refers to "people's willingness to pay more to reduce other's pain than<br /> their own pain", ie, this means the "hyper" bit is considered with respect to "self". But shouldn't hyperaltruism be classified contrasting "normal" altruism?

      It is fine that it follows a previously published work (Crockett et al., 2014), but it would still be necessary to explain/define the construct being tested in a standalone fashion rather than letting readers to go back to the original work.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors reported two studies where they investigated the context effect of hyperaltruistic tendency in moral decision-making. They replicated the hyperaltruistic moral preference in the gain domain, where participants inflicted electric shocks to themselves or another person in exchange for monetary profits for themselves. In the loss domain, such hyperaltruistic tendency abolished. Interestingly, oxytocin administration reinstated the hyperaltruistic tendency in the loss domain. The authors also examined the correlation between individual differences in utilitarian psychology and the context effect of hyperaltruistic tendency.

      Strengths:

      (1) The research question - the boundary condition of hyperaltruistic tendency in moral decision-making and its neural basis - is theoretically important.<br /> (2) Manipulating the brain via pharmacological means offers causal understanding of the neurobiological basis of the psychological phenomenon in question.<br /> (3) Individual difference analysis reveals interesting moderators of the behavioral tendency.

      Weaknesses:

      (1) The theoretical hypothesis needs to be better justified. There are studies addressing the neurobiological mechanism of hyperaltruistic tendency, which the authors unfortunately skipped entirely.<br /> (2) There are some important inconsistencies between the preregistration and the actual data collection/analysis, which the authors did not justify.<br /> (3) Some of the exploratory analysis seems underpowered (e.g., large multiple regression models with only about 40 participants).<br /> (4) Inaccurate conceptualization of utilitarian psychology and the questionnaire used to measure it.

      Comments on revisions:

      The authors have addressed the weakness in the second round of revision

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors aimed to index individual variation in decision-making when decisions pit the interests of the self (gains in money, potential for electric shock) against the interests of an unknown stranger in another room (potential for unknown shock). In addition, the authors conducted an additional study in which male participants were either administered intranasal oxytocin or placebo before completing the task to identify the role of oxytocin in moderating task responses. Participants' choice data was analyzed using a harm aversion model in which choices were driven by the subjective value difference between the less and more painful options.

      Strengths:

      Overall, I think this is a well-conducted, interesting, and novel set of research studies exploring decision-making that balances outcomes for the self versus a stranger, and the potential role of the hormone oxytocin (OT) in shaping these decisions. The pain component of the paradigm is well designed, as is the decision-making task, and overall the analyses were well suited to evaluating and interpreting the data. Advantages of the task design include the absence of deception, e.g., the use of a real study partner and real stakes, as a trial from the task was selected at random after the study and the choice the participant made were actually executed. 

      Weaknesses:

      The primary weakness of the paper concerns its framing. Although it purports to be measuring "hyper-altruism," which is the same term used in prior similar (although not identical) designs, I do not believe the task constitutes altruism, but rather the decision to engage, or not engage, in instrumental aggression.

      I continue to believe that when in the "other" trials the only outcome possible for the study partner is pain, and the only outcome possible for the participant is monetary gain, these trials measure decisions about instrumental aggression. That is the exact definition of instrumental aggression is: causing others harm for personal gain. Altruism is not equivalent to refraining from engaging in instrumental aggression, although some similar mechanisms may support both. True altruism would be to accept shocks to the self for the other's benefit (e.g., money).  The interpretation of this task as assessing instrumental aggression is supported by the fact that only the Instrumental Harm subscale of the OUS was associated with outcomes in the task, but not the Impartial Benevolence subscale. By contrast, the IB subscale is the one more consistently associated with altruism (e.g,. Kahane et al 2018; Amormino at al, 2022) I believe it is important for scientific accuracy for the paper, including the title, to be rewritten to reflect what it is testing.

      Although I recognize similar tasks have been previously characterized as "hyper-altruism" I do not believe that is sufficient justification for continuing to promulgate this descriptor without any caveats. I hope the authors will engage more seriously with the idea that this is what the task is measuring.

      Relatedly, in the introduction, I believe it would be important to discuss the non-symmetry of moral obligations related to help/harm--we have obligations not to harm strangers but no obligation to help strangers. This is another reason I do not think the term "hyper altruism" is a good description for this task--given it is typically viewed as morally obligatory not to harm strangers, choosing not to harm them is not "hyper" altruistic (and again, I do not view it as obviously altruism at all).

    1. eLife Assessment

      This important study suggests that adolescent mice exhibit less accuracy than adult mice in a sound discrimination task when the sound frequencies are very similar. The evidence supporting this observation is solid and suggests that it arises from cognitive control differences between adolescent and adult mice. The adolescent period is largely understudied, despite its contribution to shaping the adult brain, which makes this study interesting for a broad range of neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Praegel et al. explore the differences in learning an auditory discrimination task between adolescent and adult mice. Using freely-moving (Educage) and head-fixed paradigms, they compare behavioral performance and neuronal responses over the course of learning. The mice were initially trained for seven days on an easy pure frequency tone Go/No-go task (frequency difference of one octave), followed by seven days of a harder version (frequency difference of 0.25 octave). While adolescents and adults showed similar performance on the easy task, adults performed significantly better on the harder task. Quantifying the lick bias of both groups, the authors then argue that the difference in performance is not due to a difference in perception, but rather to a difference in cognitive control. The authors then used neuropixel recordings across 4 auditory cortical regions to quantify the neuronal activity related to the behavior. At the single cell level, the data shows earlier stimulus-related discrimination for adults compared to adolescents in both the easy and hard tasks. At the neuronal population level, adults displayed a higher decoding accuracy and lower onset latency in the hard task as compared to adolescents. Such differences were not only due to learning, but also to age as concluded from recordings in novice mice. After learning, neuronal tuning properties had changed in adults but not in adolescent. Overall, the differences between adolescent and adult neuronal data correlates with the behavior results in showing that learning a difficult task is more challenging for younger mice.

      Strengths:

      - The behavioral task is well designed, with the comparison of easy and difficult tasks allowing for a refined conclusion regarding learning across age. The experiments with optogenetics and novice mice are completing the research question in a convincing way.<br /> - The analysis, including the systematic comparison of task performance across the two age groups, is most interesting, and reveals differences in learning (or learning strategies?) that are compelling.<br /> - Neuronal recording during both behavioral training and passive sound exposure is particularly powerful, and allows interesting conclusions.

      Weaknesses:<br /> - The presentation of the paper must be strengthened. Inconsistencies, missing information or confusing descriptions should be fixed.<br /> - The recording electrodes cover regions in the primary and secondary cortices. It is well known that these two regions process sounds quite differently (for example, one has tonotopy, the other not), and separating recordings from both regions is important to conclude anything about sound representations. The authors show that the conclusions are the same across regions for Figure 4, but is it also the case for the subsequent analysis? Comparing to the original manuscript, the authors have now done the analysis for AuDp and AUDv separately, and say that the differences are similar in both regions. The data however shows that this is not the case (Fig S7). And even if it were the case, how would it compatible with the published literature?

    3. Reviewer #2 (Public review):

      Summary:

      The authors aimed to find out how and how well adult and adolescent mice discriminate tones of different frequencies and whether there are differences in processing at the level of the auditory cortex that might explain differences in behavior between the two groups. Adolescent mice were found to be worse at sound frequency discrimination than adult mice. The performance difference between the groups was most pronounced when the sounds are close in frequency and thus difficult to distinguish and could, at least in part, be attributed to the younger mice' inability to withhold licking in no-go trials. By recording the activity of individual neurons in the auditory cortex when mice performed the task or were passively listening as well as in untrained mice the authors identified differences in the way that the adult and adolescent brains encode sounds and the animals' choice that could potentially contribute to the differences in behavior.

      Strengths:

      The study combines behavioural testing in freely-moving and head-fixed mice, optogenetic manipulation and high density electrophysiological recordings in behaving mice to address important open questions about age differences in sound-guided behavior and sound representation in the auditory cortex.

      Weaknesses:

      For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      A) The presentation of the paper must be strengthened. Inconsistencies, mislabelling, duplicated text, typos, and inappropriate colour code should be changed.

      We spotted and corrected several inconsistencies and mislabelling issues throughout the text and figures. Thanks!  

      B) Some claims are not supported by the data. For example, the sentence that says that "adolescent mice showed lower discrimination performance than adults (l.22) should be rewritten, as the data does not show that for the easy task (Figure 1F and Figure 1H).

      We carefully reviewed the specific claims and fixed some of the wording so it adheres to the data shown.

      C) In Figure 7 for example, are the quantified properties not distinct across primary and secondary areas?

      We now carried out additional analysis to test this. We found that while AUDp and AUDv exhibit distinct tuning properties, they show similar differences between adolescent and adult neurons (see Supplementary Table 6, Fig. S7-1a-h). Note that TEa and AUDd could not be evaluated due to low numbers of modulated neurons in this protocol.

      D) Some analysis interpretations should be more cautious. (..) A lower lick rate in general could reflect a weaker ability to withhold licking- as indicated on l.164, but also so many other things, like a lower frustration threshold, lower satiation, more energy, etc).

      That is a fair comment, and we refined our interpretations. Moreover, we also addressed whether impulsiveness impacted lick rates. In the Educage, we found that adolescent mice had shorter ITIs only after FAs (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs where licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). Finally, we note that potential differences in satiation were already addressed in the original manuscript by carefully examining the number of trials completed across the session. See also Review 3, comment #1 below.

      Reviewer #2 (Public review):

      A) For some of the analyses that the authors conducted it is unclear what the rationale behind them is and, consequently, what conclusion we can draw from them.

      We reviewed the manuscript carefully and revised the relevant sections to clarify the rationale behind the analyses. See detailed responses to all the reviewer’s specific comments.

      B) The results of optogenetic manipulation, while very interesting, warrant a more in-depth discussion.

      We expanded our discussion on these experiments (L495-511) and also added an additional analysis to strengthen our findings (Fig. S3-2e).

      Reviewer #3 (Public review):

      (1) The authors report that "adolescent mice showed lower auditory discrimination performance compared to adults" and that this performance deficit was due to (among other things) "weaker cognitive control". I'm not fully convinced of this interpretation, for a few reasons. First, the adolescents may simply have been thirstier, and therefore more willing to lick indiscriminately. The high false alarm rates in that case would not reflect a "weaker cognitive control" but rather, an elevated homeostatic drive to obtain water. Second, even the adult animals had relatively high (~40%) false alarm rates on the freely moving version of the task, suggesting that their behavior was not particularly well controlled either. One fact that could help shed light on this would be to know how often the animals licked the spout in between trials. Finally, for the head-fixed version of the task, only d' values are reported. Without the corresponding hit and false alarm rates (and frequency of licking in the intertrial interval), it's hard to know what exactly the animals were doing.

      irst, as requested, we added the Hit rates and FA rates for the head-fixed task (Fig. S3-1a). Second, as requested by the reviewr, we performed additional analyses in both the Educage and head-fixed versions of the task. Specifically, we analyzed the ITI duration following each trial outcome. We found that adolescent mice had shorter ITIs only after Fas (Fig. S2-1). In the head-fixed setup, we examined (1) the proportion of ITIs during which licks occurred (Fig. S3-1c) and (2) the number of licks in these ITIs (Fig. S3-1d). We found no differences between adolescents and adults, indicating that the differences observed in the main task are not due to general differences in impulsiveness (Fig. S2-1, Fig. S3-1c, d). See also comment #D of reviewer #1 above.

      B) There are some instances where the citations provided do not support the preceding claim. For example, in lines 64-66, the authors highlight the fact that the critical period for pure tone processing in the auditory cortex closes relatively early (by ~P15). However, one of the references cited (ref 14) used FM sweeps, not pure tones, and even provided evidence that the critical period for this more complex stimulus occurred later in development (P31-38). Similarly, on lines 72-74, the authors state that "ACx neurons in adolescents exhibit high neuronal variability and lower tone sensitivity as compared to adults." The reference cited here (ref 4) used AM noise with a broadband carrier, not tones.

      We carefully checked the text to ensure that each claim is accurately supported by the corresponding reference.

      C) Given that the authors report that neuronal firing properties differ across auditory cortical subregions (as many others have previously reported), why did the authors choose to pool neurons indiscriminately across so many different brain regions?

      We appreciate the reviewer’s concern. While we acknowledge that pooling neurons across auditory cortical subregions may obscure region-specific effects, our primary focus in this study is on developmental differences between adolescents and adults, which were far more pronounced than subregional differences.

      To address this potential limitation: (1) We analyzed firing differences across subregions during task engagement (see Fig. S4-1, S4-2, S4-3; Supplementary Tables 2 and 3). (2) We have now added new analyses for the passive listening condition in AUDp and AUDv (Fig. S7-1; Supplementary Table 6).

      These analyses support our conclusion that developmental stage has a greater impact on auditory cortical activity than subregional location in the contexts examined. For clarity and cohesion, the main text emphasizes developmental differences, while subregional analyses are presented in the Supplement.

      D) And why did they focus on layers 5/6? (Is there some reason to think that age-related differences would be more pronounced in the output layers of the auditory cortex than in other layers?)

      We agree that other cortical layers, particularly supragranular layers, are important for auditory processing and plasticity. Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L464-8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The presentation of the paper must be strengthened. As it is now, it makes it difficult to appreciate the strengths of the results. Here are some points that should be addressed:

      a) The manuscript is full of inconsistencies that should be fixed to improve the reader's understanding. For example, the description on l.217 and the Figure. S3-1b, the D' value of 0 rounded to 0.01 on l. 735 (isn't it rather the z-scored value that is rounded? A D' of 0 is not a problem), the definition of lick bias on l. 750 and the values in Fig.2, the legend of Figure 7F and what is displayed on the graph (is it population sparseness or responsiveness?), etc.

      We adjusted the legend and description of former Fig. S3-1b (now Fig. S3-2b).

      We now clarify that the rounded values refer to z-scored hit and false alarm rates that we used in the d’ calculation. We adjusted the definition of the lick bias in Fig. 2 and Fig. S3-1b (L804).

      We replaced ‘population responsiveness’ with ‘population sparseness’ throughout the figures, legend and the text.

      b) References to figures are sometimes wrong (for example on l. 737,739).

      c) Some text is duplicated (for example l. 814 and l. 837).

      d) Typos should be corrected (for example l. 127, 'the', l. 787, 'upto').

      We deleted the incorrect references of this section, removed the duplicated text, and corrected the typos.

      e) Color code should be changed (for example the shades of blue for easy and hard tasks - they are extremely difficult to differentiate).

      After consideration, we decided to retain the blue color code (i.e., Fig. 1d, Fig. 3d, Fig. 4e-g, Fig. 5c, Fig. 6d–g), where the distinction between the shades of blue appears sufficiently clear and maintains visual consistency and aesthetic appeal. We did however, made changes in the other color codes (Fig. 4, Fig. 5, Fig. 6, Fig. 7).

      f) Figure design should be improved. For example, why is a different logic used for displaying Figure 5A or B and Figure 1E?

      We adjusted the color scheme in Fig. 5. We chose to represent the data in Fig. 5 according to task difficulty, as this arrangement best illustrates the more pronounced deficits in population decoding in adolescents during the hard task.

      f) Why use a 3D representation in Figure 4G? (2)

      The 3D representation in Fig. 4g was chosen to illustrate the 3-way interactions between onset-latency, maximal discriminability, and duration of discrimination.

      g) Figure 1A, lower right panel- should "response" not be completed by "lick", "no lick"?

      We changed the labels to “Lick” and “No Lick” in Fig. 1a.

      h) l.18 the age mentioned is misleading, because the learning itself actually started 20 days earlier than what is cited here.

      Corrected.

      i) Explain what AAV5-... is on l.212.

      We added an explanation of virus components (see L216-220).

      (2) The comparison of CV in Figure 2 H-J is interesting. I am curious to know whether the differences in the easy and hard tasks could be due to a decrease in CV in adults, rather than an increase in CV in adolescents? Also, could the difference in J be due to 3 outliers?

      We agree that the observed CV differences may reflect a reduction in variability in adults rather than an increase in adolescents. We have revised the Results section accordingly to acknowledge this interpretation.

      Regarding the concern about potential outliers in Fig. 2J, we tested the data for outliers using the isoutlier function in MATLAB (defining outliers as values exceeding three standard deviations from the mean) and found no such cases.

      (3) Figure 2c shows that there is no difference in perceptual sensitivity between adolescents and adults, whereas the conclusion from Figure 4 is that adolescents exhibit lower discriminability in stimulus-related activity. Aren't these results contradictory?

      This is a nuanced point. The similar slopes of the psychometric functions (Fig. 2c) indicating comparable perceptual sensitivity and the lower AUC observed in the ACx of adolescents (Fig. 4) do not necessarily contradict each other. These two measures capture related but distinct issues: psychometric slopes reflect behavioral output, which integrates both sensory encoding and processing downstream to ACx, while the AUC analysis reflects stimulus-related neural activity in ACx, which may still include decision-related components.<br /> Note that stimulus-related neural discriminability outside the context of the task is not different between adolescent and adult experts (Fig. 7h; p = 0.9374, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). This suggests that there are differences that emerge when we measure during behavior. Also note that behavior may rely on processing beyond ACx, and it is possible that downstream areas compensate for weaker cortical discriminability in adolescents — but this issue merits further investigation.

      (4) Why do you think that the discrimination in hard tasks decreases with learning (Figure 6D vs Figure 6F)?

      This is another nuanced point, and we can only speculate at this stage. While it may appear counterintuitive that single-neuron discriminability (AUC) for the hard task is reduced after learning (Fig. 6D vs. 6F), we believe this may reflect a shift in sensory coding in expert animals. In a recent study (Haimson et al., 2024; Science Advances), we found that learning alters single-neuron responses in the easy versus hard task in complex and distinct ways, which may account for this result. It is also possible that, in expert mice, top-down mechanisms such as feedback from higher-order areas act to suppress or stabilize sensory responses in auditory cortex, reducing the apparent stimulus selectivity of single neurons (e.g., AUC), even as behaviorally relevant information is preserved or enhanced at the population level.

      Reviewer #2 (Recommendations for the authors):

      This is very interesting work and I enjoyed reading the manuscript. See below for my comments, queries and suggestions, which I hope will help you improve an already very good paper.

      We thank the reviewer for the meticulous and thoughtful review.

      (1) Line 107: x-axis of panel 1e says 'pre-adolescent'.

      (2) Line 130: replace 'less' with 'fewer'.

      (3) Line 153: 'both learned and catch trials': I find the terminology here a bit confusing. I would typically understand a catch trial to be a trial without a stimulus but these 'catch' trials here have a stimulus. It's just that they are not rewarded/punished. What about calling them probe trials instead?

      We corrected the labelling (1), reworded to ‘fewer’ and ‘probe trials’ (2,3).

      (4) Line 210: The results of the optogenetics experiments are very interesting. In particular, because the effect is so dramatic and much bigger than what has been reported in the literature previously, I believe. Lick rates are dramatically reduced suggesting that the mice have pretty much stopped engaging in the task and the authors very rightly state that the 'execution' of the behavior is affected. I think it would be worth discussing the implications of these results more thoroughly, perhaps also with respect to some of the lesion work. Useful discussions on the topic can be found, for instance, in Otchy et al., 2015; Hong et al., 2018; O'Sullivan et al., 2019; Ceballo et al., 2019 and Lee et al., 2024. Are the mice unable to hear anything in laser trials and that is why they stopped licking? If they merely had trouble distinguishing them then we would perhaps expect the psychometric curves to approach chance level, i.e. to be flat near the line indicating a lick rate of 0.5. Could the dramatic decrease in lick rate be a motor issue? Can we rule out spillover of the virus to relevant motor areas? (I understand all of the 200nL of the virus were injected at a single location) Or are the effects much more dramatic than what has been reported previously simply because the GtACR2 is much more effective at silencing the auditory cortex? Could the effect be down to off-target effects, e.g. by removing excitation from a target area of the auditory cortex, rather than the disruption of cortical processing?

      We have now expanded the discussion in the manuscript to more thoroughly consider alternative interpretations of the strong behavioral effect observed during ACx silencing (L495–511). In particular, we acknowledge that the suppression of licking may reflect not only impaired sensory discrimination but also broader disruptions to arousal, motivation, or motor readiness. We also discuss the potential impact of viral spread, circuit-level off-target effects, and the potency of GtACR2 as possible contributors. We highlight the need for future work using more graded or temporally precise manipulations to resolve these issues.

      (5) Line 226: Reference 19 (Talwar and Gerstein 2001) is not particularly relevant as it is mostly concerned with microstimulation-induced A1 plasticity. There are, however, several other papers that should be cited (and potentially discussed) in this context. In particular, O'Sullivan et al., 2019 and Ceballo et al., 2019 as these papers investigate the effects of optogenetic silencing on frequency discrimination in head-fixed mice and find relatively modest impairments. Also relevant may be Kato et al., 2015 and Lee et al., 2024, although they look at sound detection rather than discrimination.

      We changed the references and pointed the reader to the (new section) Discussion.

      (6) Line 253: 'engaged [in] the task.

      (7) Figure 4: It appears that panel S4-1d is not referred to anywhere in the main text.

      Fixed.

      (8) Line 260: Might be useful to explain a bit more about the motivation behind focusing on L5/L6. Are there mostly theoretical considerations, i.e. would we expect the infragranular layers to be more relevant for understanding the difference in task performance? Or were there also practical considerations, e. g. did the data set contain mostly L5/L6 neurons because those were easier to record from given the angle at which the probe was inserted? If those kinds of practical considerations played a role, then there is nothing wrong with that but it would be helpful to explain them for the benefit of others who might try a similar recording approach.

      There were no deep theoretical considerations for targeting L5/6.  Our focus on layers 5/6 was driven by both methodological and biological considerations. Methodologically, our electrode penetrations were optimized to span multiple auditory cortical areas, and deeper layers provided greater mechanical stability for chronic recordings. Biologically, layers 5/6 contain the principal output neurons of the auditory cortex and are well-positioned to influence downstream decision-making circuits. We acknowledge the limitation of our recordings to these layers in the manuscript (L268; L463–467). See also comment D of reviewer 3.

      (9) Supplementary Table 2: The numbers in brackets indicate fractions rather than percentages.

      Fixed.

      (10) Figure S4-3: The figure legend implies that the number of neurons with significant discriminability for the hard stimulus and significant discriminability for choice was identical. (adolescent neurons = 368, mice = 5, recordings = 10; adult n = 544, mice = 6, recordings = 12 in both cases). Presumably, that is not actually the case and rather the result of a copy/paste operation gone wrong. Furthermore, I think it would be helpful to state the fractions of neurons that can discriminate between the stimuli and between the choices that the animal made in the main text.

      Thank you for spotting the mistake. We corrected the n’s and added the percentage of neurons that discriminate stimulus and choice in the main text and the figure legend.

      (11) Line 301: 'We used a ... decoder to quantify hit versus correct reject trial outcomes': I'm not sure I understand the rationale here. For the single unit analysis hit and false alarm trials were compared to assess their ability to discriminate the stimuli. FA and CR trials were compared to assess whether neurons can encode the choice of the mice. But the hit and CR trials which are contrasted here differ in terms of both stimulus and behavior/choice so what is supposed to be decoded here, what is supposed to be achieved with this analysis?

      Thank you for this important point. You're correct that comparing hit and CR trials captures differences in both stimulus and choice, or task-related differences. We chose this contrast for the population decoding analysis to achieve higher trial counts per session and similar number of trials which are necessary for the reliability of the analysis. While this approach does not isolate stimulus from choice encoding, it provides an overall measure of how well population activity distinguishes task-relevant outcomes. We explicitly acknowledge this issue in L313-314.

      (12) Line 332: What do you mean when you say the novice mice were 'otherwise fully engaged' in the task when they were not trained to do the task and are not doing the task?

      By "otherwise fully engaged," we mean that novice mice were actively participating in the task environment, similar to expert mice — they were motivated by thirst and licked the spout to obtain water. The key distinction is that novice mice had not yet learned the task rules and likely relied on trial-and-error strategies, rather than performing the task proficiently.

      (13) Line 334: 'regardless of trial outcome': Why is the trial outcome not taken into account? What is the rationale for this analysis? Furthermore, in novice mice a substantial proportion of the 'go' trials are misses. In expert mice, however, the proportion of 'miss trials' (and presumably false alarms) will by definition be much smaller. Given this, I find it difficult to interpret the results of this section.

      This approach was chosen to reliably decode a sufficient number of trials for each task difficulty (i.e. expert mice predominantly performed CRs on No-Go trials and novice mice often showed FAs). Utilizing all trial outcomes ensured that we had enough trials for each stimulus type to accurately estimate the AUCs. This approach avoids introducing biases due to uneven trial numbers across learning stages.

      (14) Line 378: 'differences between adolescents and adults arise primarily from age': Are there differences in any of the metrics shown in 7e-h between adolescents and adults?

      We confirm that differences between adolescents and adults are indeed present in some metrics but not others in Figure 7e–h. Specifically, while tuning bandwidth was similar in novice animals, it was significantly lower in adult experts (Fig. 7e; novice: p = 0.0882; expert: p = 0.0001 Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The population sparseness was similar in both novice and expert adolescent and adult neurons (Fig. 7f; novice: p = 0.2873; expert: p = 0.1017, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The distance to the easy go stimulus was similar in novice animals, but lower in adult experts (Fig. 7g; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript). The neuronal d-prime was similar in both novice and expert adolescent and adult neurons (Fig. 7h; novice: p = 0.7727; expert: p = 0.0001, Kruskal Willis Test after Tukey-Kramer correction for multiple comparisons; not discussed in the manuscript).

      (15) Line 475: '...well and beyond...': something seems to be missing in this statement.

      (16) Line 487: 'onto' should be 'into', I think.

      (17) Line 610 and 613: '3 seconds' ... '2.5 seconds': Was the response window 3s or 2.5s?

      (18) Line 638: 'set' should be 'setup', I believe.

      All the mistakes mentioned above, were fixed. Thanks.

      (19) Line 643: 'Reward-reinforcement was delayed to 0.5 seconds after the tone offset': Presumably, if they completed their fifth lick later than 0.5 seconds after the tone, the reward delivery was also delayed?

      Apologies for the lack of clarity. In the head-fixed version, there was no lick threshold. Mice were reinforced after a single lick. If that lick occurred after the 0.5-second reinforcement delay following tone offset, the reward or punishment was delivered immediately upon licking.

      (20) Line 661: 'effect [of] ACx'.

      (21) Line 680: 'a base-station connected to chassis'. The sentence sounds incomplete.

      (22) Line 746: 'infliction', I believe, should say 'inflection'.

      (23) Line 769: 'non-auditory responsive units': Shouldn't that simply say 'non-responsive units'? The way it is currently written I understand it to mean that these units were responsive (to some other modality perhaps) but not to auditory stimulation.

      (24) Line 791: 'bins [of] 50ms'.

      (25) Line 811: 'all of' > 'of all'.

      (26) Line 814: Looks like the previous paragraph on single unit analysis was accidentally repeated under the wrong heading.

      (27) Line 817: 'encoded' should say 'calculated', I believe.

      All the mistakes mentioned above were fixed. Thanks.

      (28) Line 869: 'bandwidth of excited units': Not sure I understand how exactly the bandwidth, i.e. tuning width was measured.

      We acknowledge that our previous answer was unclear and expanded the Methods section. To calculate bandwidth, we identified significant tone-evoked responses by comparing activity during the tone window to baseline firing rates at 62 dB SPL (p < 0.05). For each neuron, we counted the number of contiguous frequencies with significant excitatory responses, subtracting isolated false positives to correct for chance. We then converted this count into an octave-based bandwidth by multiplying the number of frequency bins by the octave spacing between them (0.1661 octaves per step).

      (29) Line 871: 'population sparseness': Is that the fraction of tone frequencies that produced a significant response? I would have thought that this measure is very highly correlated to your measure of bandwidth, to the point of being redundant, but I may have misunderstood how one or the other is calculated. Furthermore, the Y label of Figure 7f says 'responsiveness' rather than sparseness and that would seem to be the more appropriate term because, unless I am misunderstanding this, a larger value here implies that the neuron responded to more frequencies, i.e. in a less sparse manner.

      We have clarified the use of the term "population sparseness" and updated the Y-axis label in Figure 7f to better reflect this measure. This metric reflects the fraction of tone–attenuation combinations that elicited a significant excitatory response across the entire population of neurons, not within individual units.

      While this measure is related to bandwidth, it captures a distinct property of the data. Bandwidth quantifies how broadly or narrowly a single neuron responds across frequencies at a fixed intensity, whereas population sparseness reflects how distributed responsiveness is across the population as a whole. Although the two measures are related, since broadly tuned neurons often contribute to lower population sparseness, they capture distinct aspects of neural coding and are not redundant.

      (30) Line 881: I think this line should refer to Figure 7h rather than 7g.

      Fixed.

      Reviewer #3 (Recommendations for the authors):

      (1) In the Educage, water was only available when animals engaged in the task; however, there is no mention of whether/how animal weight was monitored.

      In the Educage, mice had continuous access to water by voluntarily engaging in the task, which they could perform at any time. Although body weight was not directly monitored, water access was essentially ad libitum, and mice performed hundreds of trials per day, thereby ensuring sufficient daily intake. This approach allowed us to monitor hydration (ad libitum food is supplied in the home cage). The 24/7 setup, including automated monitoring of trial counts and water consumption, was reviewed and approved by our institutional animal care and use committee (IACUC).

      (2) In Figure 2B-C and Figure 2E, the y-axis reads "lick rate". At first glance, I took this to mean "the frequency of licking" (i.e. an animal typically licks at a rate of 5 Hz). However, what the authors actually are plotting here is the proportion of trials on which an animal elicited >= 5 licks during the response window (i.e. the proportion of "yes" responses). I recommend editing the y-axis and the text for clarity.

      We replaced the y-label and adjusted the figure legend (Fig. 2).

      (3) I didn't see any examples of raw (filtered) voltage traces. It would be worth including some to demonstrate the quality of the data.

      We have added an example of a filtered voltage trace aligned to tone onset in Fig. S4-1a to illustrate data quality. In addition, all raw and processed voltage traces, along with relevant analysis code, are available through our GitHub repository and the corresponding dataset on Zenodo.

      (4) The description of the calculation of bias (C) in the methods section (lines 749-750) is incorrect. The correct formula is C = -0.5 * [z(hit rate) + z(fa rate)]. I believe this is the formula that the authors used, as they report negative C values. Please clarify or correct.

      Thanks for spotting this. It is now corrected.

      (5) The authors use the terms 'naïve' and 'novice' interchangeably. I suggest sticking with one term to avoid potential confusion.

      (6) Multiple instances: "less trials/day" should be "fewer trials/day"

      (7) Supplementary Table 2: The values reported are proportions, not percentages. Please correct.

      (8) Line 270: Table 2 does not show the number of neurons in the dataset categorized by region. Perhaps the authors meant Supplementary Table 2?

      Fixed. Thank you for pointing these mistakes out.

      (9) Figure 5C: the data from the hard task are entirely obscured by the data from the easy task. I recommend splitting it into two different plots.

      We agree and split the decoding of the easy and the hard task into two graphs (left: easy task; right: hard task). Thank you!

      (10) How many mice contributed to each analyzed data set? Could the authors provide a breakdown in a table somewhere of how many neurons were recorded in each mouse and which ones were included in which analyses?

      We added an overview of the analyzed datasets in supplementary Table 7. Please note that the number of mice and neurons used in each analysis is also reported in the main text and legends. Importantly, all primary analyses were conducted using LME models, which explicitly account for hierarchical data structure and inter-mouse variability, thereby addressing potential concerns about data imbalance or bias.

    1. eLife Assessment

      This study presents valuable findings on the role of dopamine receptor D2R in dopaminergic neurons DAN-c1 and mushroom body neurons (Y201-GAL4 pattern) on aversive and appetitive conditioning. The evidence supporting the claims of the authors is solid in the context of their behavioural paradigm. Controls using a reciprocal training protocol would have broadened the scope of their conclusions. The work will be of interest to researchers studying the role of dopamine during learning and memory.

    2. Reviewer #1 (Public review):

      Summary:

      Both flies and mammals have D1-like and D2-like dopamine receptors, yet the role of D2-like receptors in Drosophila learning and memory remains underexplored. This paper investigates the role of the D2-like dopamine receptor D2R in single pairs of dopaminergic neurons (DANs) during single-odor aversive learning in the Drosophila larva. First, confocal imaging is used to screen GAL4 driver strains that drive GFP expression in just single pairs of dopaminergic neurons. Next, thermogenetic manipulations of one pair of DANs (DAN-c1) suggest that DAN-c1 activity during larval aversive learning is important. Confocal imaging is then used to reveal expression of D2R in the DANs and mushroom body of the larval brain. Finally, optogenetic activation during training phenocopies D2R knockdown in these neurons: aversive learning is impaired when DAN-c1 is targeted, while appetitive and aversive learning are impaired when the mushroom body is manipulated. Finally, a model is proposed in which D2R limits excessive dopamine release to facilitate successful olfactory learning.

      Strengths:

      The paper convincingly reproduces prior findings that demonstrated D2R knockdown in DL1 DANs or the mushroom body impairs aversive olfactory learning in Drosophila larvae (Qi and Lee, 2014; doi:10.3390/biology3040831). These previous findings were built upon and extended with a comprehensive confocal imaging screen of 57 GAL4 drivers that identified tools driving GFP expression in individual DANs. One of the drivers, R76F02-AD; R55C10-DBD, was consistently shown to label DAN-c1 neurons and no other DANs in the larval brain. Confocal imaging is also used to demonstrate that GFP-tagged D2R is expressed in most DANs and the mushroom body. Behavioral experiments demonstrate that driving D2R knockdown in DAN-c1 neurons impairs aversive learning, as do other loss-of-function manipulations of DAN-c1 neurons.

      Limitations:

      (1) The single-odor paradigm used to train larvae does not have the advantages of a more conventional balanced or reciprocal training paradigm. The paper describes how the single-odor experimental design could be controlled for non-associative effects, but does not provide an independent validation of the control experiments performed by a different research group using different odors and genotypes 15 to 20 years ago (see Honjo and Furukubo-Tokunaga, 2005; doi:10.1523/jneurosci.2135-05.2005 and Honjo and Furukubo-Tokunaga, 2009; doi:10.1523/jneurosci.1315-08.2009). Whether the involvement of DAN-c1 for aversive learning generalizes to standard paradigms remains unclear (see Eschbach et al., 2020; doi:10.1038/s41593-020-0607-9 and Weber et al., 2023; doi:10.7554/elife.91387.1).

      (2) In 11 of 22 larval brains examined in the paper, R76F02-AD; R55C10-DBD appears to drive GFP expression in 1 to 8 additional non-dopaminergic neurons (Figure S1P and Table S3). Of the remaining 11 brains, 4 of their corresponding ventral nerve cords also have expression in 2 to 4 neurons (Table S3). Therefore, experiments involving with the R76F02-AD; R55C10-DBD driver could be manipulating the activity of additional neurons in around 60% of larvae. The conclusions of the paper would be strengthened if key experiments were repeated with other GAL4 drivers that may label DAN-c1 with even greater specificity, such as SS03066 (Truman et al., 2023; doi:10.7554/elife.80594) or MB320C (Hige et al., 2015; doi:10.1016/j.neuron.2015.11.003).

      (3) Successful immunostaining with an anti-D2R antibody (Draper et al., 2007; doi:10.1002/dneu.20355 and Love et al., 2023; doi:10.1111/gbb.12836) could validate GFP-tagged D2R expression (Figure 3) in the same way that TH immunostaining was used throughout the paper to determine whether neurons were dopaminergic.

      (4) The paper proposes a model in which DAN-c1 activity conveys an aversive teaching signal (Figure 2f) but excessive artificial DAN-c1 activation causes excessive dopamine release that impairs aversive learning (Figures 2i and 5b). According to this model, thermogenetic DAN-c1 activation during training with water or sucrose conveys an aversive teaching signal that reduces performance (Figure 2i) whereas optogenetic DAN-c1 activation does not due to excessive dopamine release (Figures 5c and 5d). The model suggests that optogenetic DAN-c1 activation is strong enough to cause excessive dopamine release by itself whereas thermogenetic DAN-c1 activation can only achieve the same outcome when it occurs in conjunction with natural DAN-c1 activation evoked by quinine. Therefore, an experiment with weaker optogenetic DAN-c1 activation (with lower intensity light or pulsed at a lower frequency) during water or sucrose training would be expected to convey an aversive teaching signal rather than excessive dopamine release, reducing performance. Such an experiment could reconcile the differing thermogenetic and optogenetic results of the paper.

    3. Reviewer #2 (Public review):

      Summary:

      The study wanted to functionally identify individual DANs that mediate larval olfactory learning. Then search for DAN-specific driver strains that mark single dopaminergic neurons, which subsequently can be used to target genetic manipulations of those neurons. 56 GAL4 drivers identifying dopaminergic neurons were found (Table 1) and three of them drive the expression of GFP to a single dopaminergic neuron in the third-instar larval brain hemisphere. The DAN driver R76F02-AD;R55C10-DBD appears to drive the expression to a dopaminergic neuron innervating the lower peduncle (LP), which would be DAN-c1.

      Split-GFP reconstitution across synaptic partners (GRASP) technique was used to investigate the "direct" synaptic connections from DANs to the mushroom body. Potential synaptic contact between DAN-c1 and MB neurons (at the lower peduncle) were detected.

      Then single odor associative learning was performed and thermogenetic tools were used (Shi-ts1 and TrpA1). When trained at 34{degree sign}C, the complete inactivation of dopamine release from DAN-c1 with Shibirets1 impaired aversive learning (Figure 2h), while Shibirets1 did not affect learning when trained at room temperature (22 {degree sign}C). When paired with a gustatory stimulus (QUI or SUC), activation of DAN-c1 during training impairs both aversive and appetitive learning (Figure 2k).<br /> Then examined the expression pattern of D2R in fly brains and were found in dopaminergic neurons and the mushroom body (Figure 3). To inspect whether the pattern of GFP signals indeed reflected the expression of D2R, three D2R enhancer driver strains (R72C04, R72C08, and R72D03-GAL4) were crossed with the GFP-tagged D2R strain.

      D2R knockdown (UAS-RNAi) in dopaminergic neurons driven by TH-GAL4 impaired larval aversive learning. Using a microRNA strain (UAS-D2R-miR), a similar deficit was observed. Crossing the GFP-tagged D2R strain with a DAN-c1-mCherry strain demonstrated the expression of D2R in DAN-c1 (Figure 4a). Knockdown of D2R in DAN-c1 impaired aversive learning with the odorant pentyl acetate, while appetitive learning was unaffected (Figure 4e). Sensory and motor functions appear not affected by D2R suppression.

      To exclude possible chronic effects of D2R knockdown during development, optogenetics was applied at distinct stages of the learning protocol. ChR2 was expressed in DAN-c1, and blue light was applied at distinct stages of the learning protocol. Optogenetic activation of DAN-c1 during training impaired aversive learning, not appetitive learning (Figure 5b-d).

      Knockdown of D2Rs in MB neurons by D2R-miR impaired both appetitive and aversive learning (Figure 6a). Activation of MBNs during training impairs both larval aversive and appetitive learning.

      Finally, based on the data the authors propose a model where the effective learning requires a balanced level of activity between D1R and D2R (Figure 7).

      Strengths:

      The work is well written, clear, and concise. They use well documented strategies to examine GAL4 drivers with expression in a single DAN, behavioral performance in larvae with distinct genetic tools including those to do thermo and optogenetics in behaving flies. Altogether, the study was able to expand our understanding of the role of D2R in DAN-c1 and MB neurons in the larva brain.

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data and the model of adequate levels of cAMP (Figure 7b) appears to be able to explain a poor memory after insufficient or excessive cAMP signaling. The study provides insight into the role of D2R in associative learning expanding our understanding and might be a reference similarly to previous key findings (Qi and Lee, 2014, https://doi.org/10.3390/biology3040831).

    1. eLife Assessment

      In this highly innovative study, Carpenet C et al explore the use of nanobody-based PET imaging to track proliferative cells after in vivo transplantation in mice, in a fully immunocompetent setting. The development of a unique set of PET tracers and mouse strains to track genetically-unmodified transplanted cells in vivo is an important novel asset that could potentially facilitate cell tracking in different research fields. The evidence provided is compelling as the new method proposed might facilitate overcoming certain limitations of alternative approaches, such as full sized immunoglobulins and small molecules.

    2. Reviewer #1 (Public review):

      Summary:

      The topic of nanobody-based PET imaging is important, and holds great potential for real-world applications since nanobodies have many advantages over full sized immunoglobulins and small molecules.

      Strengths:

      The submitted manuscript contains quite a bit of interesting data from a collaborative team of well-respected researchers. The authors are to be congratulated for presenting results that may not have turned out the way they had hoped, and doing so in a transparent fashion.

      Weaknesses:

      However, the manuscript could be considered to be a collection of exploratory findings rather than a complete and mature scientific exposition. Most of the sample sizes were 3 per group, which is fine for exploratory work, but insufficient to draw strong, statistically robust conclusions for definitive results.

      Overall, the following specific limitations are noted as suggestions for future work:

      (1) The authors used DFO, which is well known to leak Zr, rather than the current standard for 89Zr PET which is DFO* (DFO-star)

      (2) The brain tissues were not capillary depleted, which limits interpretation. Capillary depletion, with quantitative assessment of the completion of the depletion process, is the standard in the field.

      (3) The authors have not experimentally tested the hypothesis that the PEG adduct reduced BBB transcytosis.

      (4) The results in Fig. 7 involving the placenta are interesting, but need confirmation using constructs with 18F labeling and without the PEG adduct.

      (5) If this line of investigation were to be translated to humans, an important consideration would be the relative safety of 89Zr and 64Cu. It is likely to be quite a bit worse than for 18F, since the 89Zr and 64Cu have longer half-lives, dissociate from their chelators, and lodge in off-target tissues.

      (6) A surprising and somewhat disappointing finding was the modest amount of BBB transcytosis. Clearly additional work will be needed before nanobody-based brain PET becomes feasible.

    3. Reviewer #2 (Public review):

      Summary:

      In this study the authors described a previously developed set of VHH-based PET tracers to track transplants (cancer cells, embryo's) in a murine immune-competent environment.

      Strengths:

      Unique set of PET tracer and mouse strain to track transplanted cells in vivo without genetic modification of the transplanted cells. This is a unique asset and a first-in-kind.

      Weaknesses:

      None

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors): 

      Overall, the manuscript could be clearer and more beneficial to the readers with the following suggested revisions:  

      (1) The abstract should include information on the comparative performance of 89Zr 64Cu and 18F labeled nanobodies, especially noting the challenges with DFO-89Zr and NOTA-64Cu. 

      (2) The abstract should explicitly note the types of transplants assessed and the specific PET findings.

      (3) The abstract should note the negative results in terms of brain PET findings. 

      We thank reviewer 1 for these three suggestions. We have now included this information in the abstract.

      (4)  Based on the data shown in Fig. 1 and Table 1, it seems that the nanobodies bind to quite a few proteins other than TfR. This should be discussed frankly as a limitation. 

      The presence of multiple other bands and proteins identified by LC/MS in Figure 1 is typical for immunoprecipitation experiments, as performed under the conditions used: all proteins other than TfR that are identified in Table 1 are abundant cytoplasmic (cytoskeletal) and/or nuclear proteins.  More rigorous washing would perhaps have removed some of these contaminants at the risk of losing some of the specific signal as well. We have added a comment to this effect.  In an in vivo setting, this would be of minor concern, as these proteins would be inaccessible to our nanobodies. In fact, when VHH123 radioconjugates are injected in huTfr+/+ mice (or VHH188 in C57BL/6), we observe no specific signal – which supports this conclusion. 

      We therefore state: “We show that both V<sub>H</sub>Hs bind only to the appropriate TfR, with no obvious cross-reactivity to other surface-expressed proteins by immunoblot, LC/MSMS analysis of immunoprecipitates, SDS-PAGE of <sup>35</sup>S-labelled proteins and flow cytometry (Fig 1;Table 1).”. We have added some clarification to make this clearer, and we also include the full LC/MSMS data tables are also added in supplemental materials, as supplementary Table 1. We have included subcellular localization information for each protein identified through LC/MSMS in Table 1 as well.

      (5)  Why did the authors use DFO, which is well known to leak Zr, rather than the current standard for 89Zr PET, DFO* (DFO-star)? 

      We used DFO rather than DFO-star for several reasons: 1) because we had already conducted and published numerous other studies using DFO-conjugated nanobodies and not observed any release of <sup>89</sup>Zr, 2) commercially sourced clickchemistry enabled DFO-star (such as DFO*-DBCO) was not available at the time of the study. 

      (6) Figure 2B appears to show complex structures, more complex than just GGG-DFOazide, and GGG-NOTA-azide. This should be explained in detail. 

      We have added two supplemental figures and methods that recapitulate how we generated what we have termed as GGG-DFO-Azide and GGG-NOTA-Azide. We have updated the legend of Figure 2B. 

      (7) Why is there a double band in Suppl. Fig 9 for VHH123-NOTA-Azide? 

      Under optimal conditions, sortase A-mediated transpeptidation is efficient,  resulting in the formation of a peptide bond between the C-terminally LPETG-tagged protein and the GGG-probe. However, extended reaction times or suboptimal concentrations of modified GGG-probes (which are often in limited supply) in the reaction mixture, allow hydrolysis of the sortase A-LPET-protein intermediate. The hydrolysis product can no longer participate in a sortase A reaction. This is what explains the doublet in the reaction used to generate VHH123-NOTA-N<sub>3</sub> – the upper band is VHH123-NOTA-N<sub>3</sub> and the lower band is the hydrolysis product.  VHH123-LPET, is unable to react with PEG<sub>20kDa</sub>-DBCO (the lower band that appears at the same position of migration in the next lane on the gel). We noticed that an adjacent lane was mislabelled as ‘VHH188-NOTA-PEG<sub>20kDa</sub>’ when in fact it was ‘VHH123-NOTA-PEG<sub>20kDa</sub>’. This has been corrected.

      The hydrolysis product, VHH123-LPET, has a short circulatory half-life and obviously lacks the PEG moiety as well as the chelator. It therefore cannot chelate <sup>64</sup>Cu. Its presence should not interfere with PET imaging.  Since all animals were injected with the same measured dose of <sup>64</sup>Cu labeled-conjugate, the presence of an unlabeled TfRbinding competitor in the form of VHH123-LPET - at a << 1:1 molar ratio to the labelled nanobody – would be of no consequence.

      (8) More details should be provided about the tetrazine-TCO click chemistry for 18F labeling. 

      We have added supplementary methods and figures that detail how <sup>18</sup>F-TCO was generated. For the principle of TCO-tetrazine click-chemistry, a brief description was added in the text, as well as a reference to a review on the subject.

      (9) For the data shown in Figure 3H, the authors should state whether the brain tissues were capillary depleted, and if so, how this was performed and how complete the procedure was. 

      No capillary depletion of the brain tissues was performed, as this was challenging to perform in compliance with the radiosafety protocols in place at our institution. We have updated the legend of figure 3H and methods to include this important detail. Whole blood gamma-counting did not show any obvious di  erence of activity across the 4 groups in figure 3G (same mice as in figure 3H), which would go against the interpretation that activity di  erences in the brain (figure 3H) are solely attributable to residual activity from blood in the capillaries. 

      (10) The authors should experimentally test the hypotheses that the PEG adduct reduced BBB transcytosis. 

      Reviewer 1 is correct to point out that we have not tested un-PEGylated conjugates of <sup>64</sup>Cu and <sup>89</sup>Zr with the anti-TfR nanobodies and we currently do not have the means to perform additional experiments. However, the <sup>18</sup>F conjugates were not PEGylated, and these also fail to show any detectable signal in the CNS by PET/CT (see figure 4A). PEGylation alone cannot be the sole factor that limits transcytosis across the BBB.

      (11) It was interesting to note that the Cu appears to dissociate from the NOTA chelator. The authors should provide more information about the kinetics of this process.  

      We have not tested the kinetics of dissociation between <sup>64</sup>Cu and the NOTA conjugates in vitro, like we have done for <sup>89</sup>Zr and DFO (supplemental figure 2), because previous work (see references 35 and 36 by Dearling JL and Mirick GR and colleagues) has shown that NOTA and other copper chelators tend to release free copper radioisotopes in the liver, a commonly reported artifact. We have also included a new set of images that show the biodistribution of VHH123-NOTA-<sup>64</sup>Cu in huTfR+/+ mice, where we still observe a substantial signal in the liver, indicating release of <sup>64</sup>Cu from NOTA, in the absence of the anti-TfR VHH binding to its target. This was clearly not seen using the DFO-<sup>89</sup>Zr conjugates.  Binding of the VHH to TfR, followed by internalization, appears to be required for the release of <sup>89</sup>Zr from DFO, prompting us to investigate this phenomenon further.

      (12) The authors should increase the sample size, and test two different radiolabels for the transplant imaging results (Figs. 5 and 6), since these seem to be the ones they feel are the most important, based on the title and abstract. 

      We agree with reviewer 1 that more repeats would increase the significance of our findings, but we unfortunately do not have the means of performing additional experiments at this time (the lab at Boston Children’s Hospital has closed as Dr. Ploegh has retired). We believe that the results are compelling and will be of use to the in vivo imaging community.

      (13) Fig. 6G appears to show a false positive result for the kidney imaging. Is this real, or an artifact of small sample size?

      We agree with reviewer 1 that the kidney signals in figure 6 are somewhat puzzling. The difference between the tumor-bearing mice that received VHH123 and VHHEnh conjugates is not significant – with the obvious caveat that the VHHEnh group is comprised of only 2 mice, so sample size may well be a factor here. If we compare the signals of the VHH123 conjugate in tumor-bearing mice vs. tumor-free mice, the VHH123 conjugates would have cleared much faster in the tumor-free mice over 24 hours (since no epitope is present for VHH123 to bind to), thus weakening the kidney signal observed after 24 hours. The same would be true for all the other tissues – except for the liver (where free <sup>64</sup>Cu that leaks from NOTA accumulates). VHHEnh conjugates in tumor-bearing mice show a significant kidney signal – although no VHH123 target epitope is present in these mice. B16.F10 tumors at 4 weeks of growth tend to be necrotic and can passively retain any radiotracer – this generates the weak lung signal visible in Fig 6D – thus the radiotracer would clear at a slower rate than VHH123 conjugates in tumor-free mice giving a higher kidney signal at 24 hours. 

      No tumors were found in the kidneys post-necropsy. We attribute the differences in kidney signals to di erent kinetics of clearance of the radioconjugates. We have added this explanation to the results and discussion.

      (14) Are the results shown in Fig. 7 generalizable? The authors should the constructs with 18F labeling and without the PEG adduct. 

      We agree with reviewer 1 that it would be very interesting to confirm these observations using 18F radioconjugates. The results should be generalizable, as the difference between signals can only be attributed to the presence of the recognized epitope in the placenta– which is in fact the only variable that differs between the two groups. At the time of conducting the study, we had not planned to perform the same experiments with 18F radioconjugates – partly because synthesis of 18F radioconjugates is more challenging (and costly) than the production of 89Zr-labeled nanobodies.  

      (15) The authors should discuss the relative safety of 89Zr and 64Cu. It is likely to be quite a bit worse than for 18F, since the 89Zr and 64Cu have longer half-lives, dissociate from their chelators, and lodge in off-target tissues. An alternative interpretation of the authors' data could be that 89Zr and 64Cu labeling in this context are unsuitable for the stated purposes of PET imaging. In this case, the key experiments shown in Figs. 5-7 should be repeated with the 18F labeled nanobody constructs. 

      Our vision was to o er a tool to the scientific community interested in in vivo tracking of cells in di erent preclinical disease models. The question of safety regarding 89Zr and 64Cu for clinical use was therefore not a factor we then considered. However, we have now included a section in the discussion about the potential safety issue of <sup>89</sup>Zr release and bone accumulation in clinical settings, especially for radioconjugates that target an internalizing surface protein. 

      (16) The authors should remark on the somewhat surprisingly modest amount of BBB transcytosis in the discussion. What were the a inities of the nanobodies? 

      The a inities and binding kinetics of both nanobodies was described in a separate work that is referenced in the introduction (references 21 and 22 by Wouters Y and colleagues). Through other methods that rely on a highly sensitive bio-assay, it was shown that both VHH123 and VHH188 are capable of transcytosis: both nanobodies coupled to a neurotensin peptide induced a drop of temperature after i.v. injection in matching mouse strains (VHH123 in C57BL/6 and VHH188 in huTfr +/+). The lack of any compelling CNS signal by PET/CT is discussed in the manuscript.

      (17) More details of the methods should be provided in the supplement. 

      a.  What was the source of the penta-mutant Sortase A-His6? 

      Sortase A pentamutant is produced in-house, by cytoplasmic expression in E.coli (BL21 strain), using a plasmid vector encoding a truncated and mutated version of Sortase A. References were added, as well as the Addgene repository number (51140).

      b.  What was the yield of the sortase reactions? 

      For small proteins, such as nanobodies/ V<sub>H</sub>Hs, we find that the yield of a sortase A reaction typically is > 75%. This is what we observed for all our conjugations. The methods section was updated to include this information.

      c.  What was the source of the GGG-Azide-DFO and GGG-Azide NOTA? Based on the structures shown in Fig. 2, these appear to be more complex that was noted in the text. 

      We have now detailed the synthesis of GGG-DFO-Azide and GGG-NOTA-Azide in the supplementary methods.

      d.  More details about the source and purity of the tetrazine and TCO labeling reagents should be provided. 

      We have included information on the synthesis of GGG-tetrazine in the supplementary methods. Concerning the synthesis of <sup>18</sup>F-TCO, we have also included a detailed description of the compound in supplementary methods. The reaction between GGG-tetrazine and <sup>18</sup>F-TCO is now further detailed in the manuscript. 

      e.  The TCO-agarose slurry purification should be explained in more detail, and the results should be shown. 

      We have included a detailed procedure of how the TCO-agarose slurry purification was performed in the methods sections. We had already included the Radio-Thin Layer Chromatography QC data of the final VHH123-18F and VHH188-18F purifications in the supplementary figures – which are obtained immediately after TCOagarose slurry purification. The detailed yields of the TCO-agarose slurry purification in terms of activity of each collected fraction is now detailed in the methods section.

      f.   The CT parameters should be provided.  

      We have now added more information about the PET/CT imaging procedure in the methods section of the manuscript.

      Reviewer #2 (Recommendations for the authors): 

      Authors should discuss the possibility of the TfR as a rejection antigen. Murine TfR is foreign for hTfR+/+ mice and vice versa. 

      We have not discussed this possibility, as we believe the risk of rejection of huTfR+ cells in moTfR+ mice (or vice versa) is negligible. The cells and mice are of the same genetic background – save for the coding region of ectodomain of the TfR (spanning amino acids ~194 to 390 of the full length TfR, which is 763 AA). The pairwise identity of both human and mouse TfR ectodomains is of 73% after alignment of both AA sequences using Clustal Omega. We agree that we cannot formally exclude the possibility of an immune rejection, and have now mentioned this possibility in the discussion.

      Is there any clinical use of the anti-human TfR receptor PET tracer? 

      We do not currently envision an application for the anti-human TfR VHH in PET/CT in a clinical setting.  

      Why is the in vivo anti-mouse TfR uptake level in C57BL/6 mice consistently higher than the anti-human TfR receptor PET tracer in hTfR+/+ mice? Is this due to differences in characteristics of the VHH's (e.g. a inity, internalization properties), or rather due to a different biological behavior of the hTfR-transgene (e.g. reduced internalization properties)? 

      We indeed observed that VHH123 uptake and binding appears to be more robust than that of VHH188 to their respective targets. Moreover, after later times post-injection (> 48h), VHH188 appears to display a very low reactivity to C57BL/6 (moTfR+) cells (see Figure 3B). We attribute this to the respective affinities and specificities of both VHHs. We have not investigated the VHH binding kinetics of the mouse versus humanectodomain TfR proteins in vitro. Internalization should be mildly different at best, as <sup>89</sup>Zr release from DFO occurs with both VHHs in both C57BL/6 and huTfR +/+ mouse models (when injected in a matched configuration). The huTfR +/+ mice rely exclusively on the huTfr for their iron supply. They are healthy with no obvious pathological features. The behavior of the huTfr is therefore presumably similar, if not identical to that of the mouse Tfr, bearing in mind that the huTfr and the mouse Tfr are both reliant on mouse Tf as their ligand

      The anti-TfR VHHs were initially developed as a carrier for BBB-transport of VHH-based drug conjugates (previous publications). The data shown here reduces enthusiasm towards this application. Uptake in the brain is several log-factors lower than physiological uptake elsewhere. Potential consequences of off-brain uptake on potential toxicity of VHH-based drug-conjugates could be better emphasized in the discussion. 

      We did not observe a significant presence of the anti-TfR VHHs in the CNS by PET/CT. We have addressed several possibilities: longer circulation times post-injection may favor transcytosis of the VHHs through the BBB. However, because transcytosis requires endocytosis –<sup>89</sup>Zr may be released by their chelating moiety at this step. The only radiotracers with a covalent bond between the radio-isotope and the VHHs in our work are the <sup>18</sup>F VHHs, but the signal acquisition window may have been too short to observe transcytosis and accumulation in the CNS. Another possible caveat is that PEGylation of the radiotracers may be an obstacle to transcytosis. The circulatory halflife of unpegylated VHHs is too low to allow adequate visualization after 24 hours postinjection, as the conjugates rapidly clear from the circulation (t ½ = 30 minutes or less). We have updated the discussion to address these points.

      In several locations (I have counted 5) a space is missing between words, please double-check. 

      We carefully checked the manuscript to remove any remaining typos.

      It is unclear to me why for the melanoma-tracking experiment the tracer is switched from the 89Zr-labeled variant to the 64Cu-labeled variant. 

      The decision to switch to the <sup>64</sup>Cu labeled VHHs for the melanoma experiment stemmed from a wish to 1) evaluate the performance of the <sup>64</sup>Cu-radioconjugates in detecting transplanted cells as we had done with the <sup>89</sup>Zr conjugates and 2) assess how the (non-specific) liver signal seen with <sup>64</sup>Cu contrasts with a specific signal.  

      typo in discussion: C57BL/6 instead of C57B/6         

      We have corrected the typo.

      It is unclear to me why in FIG1B cells are labeled with 35S. Is it correct that the signals seen are due to staining membranes with anti-TfR mAbs? Or is this an autoradiography of the gel? 

      In Figure 1B cells were labeled with 35S-Met/Cys, while the images shown are indeed those of Western Blots, using an anti-TfR monoclonal antibody as the primary antibody to detect human and mouse TfR retrieved by the anti Tfr VHHs. Autoradiography using the same lysates showed the presence of contaminants in the VHH eluates, as commonly seen in immunoprecipitates from metabolically labeled cells (as distinct from IP/Westerns). For this reason, we performed a Western Blot on the same samples to confirm TfR pull-down. As written in the results section, we also performed LCMS analysis of the immunoprecipitated proteins to better characterize contaminating proteins (Table 1). To clarify this, we have now added the autoradiographs in supplementary data (supplementary figure 15) and added a reference to these observation in the results. 

      ROI quantifications in all figures: these should be expressed as %ID/cc instead of %ID/g. Ex vivo tissue counts should be in %ID/g instead of cpm. 

      We have converted all ROI quantification figures as %ID/cc based on the assumption that 1mL (1cc) = 1g. For ex vivo tissue counts, %ID/g has been calculated based on injected dose (except for figure 3G, where the comparisons in %ID/G are not possible due to the uncertain nature of bone marrow and whole blood). All figures have now been updated.

      Fig4: it would be good to also see respective mouse controls (C57BL6 vs hTfR+/+) for the 64Cu- and 18F-labeled VHH123 tracers. Each radiolabeling methodology changes in vivo biodistribution and specificity, which can be better assessed by using appropriate controls. 

      We had performed these controls but they were not included in the manuscript as deemed redundant with the results of Figure 3. We have now separated Figure 4 in two panels (Figure 4A and 4B) with figure 4A showing the 1h timepoint post-injection of VHH123 radiotracers in C57BL/6 vs huTfr<sup>+/+</sup> and Figure 4B showing the 24h timepoint in the same configuration. ROI analyses were also done on the huTfR<sup>+/+</sup> controls and were included in Figure 4C as well.

      Fig7: is it correct that mouse imaging is performed at 24h p.i. and dissected embryo's at 72h p.i.? Why are there 2 days between each procedure of the same animals? 

      We acquired images at di erent timepoints, specifically at 1h, 24h, 48h and 72 hours after radio-tracer injection. As 72 h was the last timepoint, the mice were sacrificed the same day and embryo dissection performed thereafter, at 72 hours post radiotracer injection. We decided to show the 24h timepoint images as they were the most representative of the series, o ering the best signal-to-noise ratio. The signal pattern did not change over the course from 24h to 72h. We have now added those timepoints in the supplementary data.

    1. eLife Assessment

      This study focuses on a previously reported positive correlation between translational efficiency and protein noise. Using mathematical modeling and analysis of experimental data the authors reach the valuable conclusion that this phenomenon arises due to ribosomal demand. While some aspects of the work appear to be incomplete, the results have the potential to be of value and interest to the field of gene expression.

    2. Reviewer #1 (Public review):

      Summary:

      The authors use analysis of existing data, mathematical modelling and new experiments to explore the relationship between protein expression noise, translation efficiency and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Weaknesses:

      My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. Revisions have improved clarity but I am both confused by the assumptions used here in the mathematical modelling of this section. I said before, the authors assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. The author's seem to dismiss this and maybe I am missing something. However, the specific forms used in equations of table S1 seem very phenomenological and I am not sure how these can be taken as good approximations for modelling ribosome demand. Why kc has this specific form, why such a sharp hill number is appropriate. how many total ribosomes per mRNA is assumed here (if this assumption is indeed needed). Again, my intuition is that on average the total level of mRNA across all genes would stay constant and therefore there are not big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes (as this on average is compensated with drop in level of other transcripts). Should not one be considering all transcripts and total ribosomes to be able to model ribosome demand?

    3. Reviewer #2 (Public review):

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Major comments:

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      (4) It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.

      (5) The conclusions from Figure 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Eq. (4) in Paulsson, Physics of Life Reviews 2005.

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figure 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      Comments on revisions:

      Updated Review: The authors have satisfactorily answered all of my questions and comments. The current manuscript is much clearer and stronger than the previous one. I do not have any other questions.

    4. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below. 

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain mostly the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. This would increase ribosome demand. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells. We have now rephrased the section between the lines 321 and 331 to clarify this point.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters ‘mRNA-curr’ and ‘mRNA-prev’ are the mRNA copy numbers at the present time point and the previous time point in the stochastic simulations, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the transcriptional burst frequency and the burst size, as well as the rate of mRNA removal. We have now incorporated more details about the modelling part along with explanation for parameters and terms in the revised manuscript (lines 390 to 411; lines 795 to lines 807). 

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise. 

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered. 

      Reviewer #1 (Recommendations for the authors):  

      (1) Figure 1B shows noise as Distance to Median (DM) that can be positive or negative. It is therefore misleading that the authors say there is a 10-fold increase in noise (this would be relevant if the quantity was strictly positive). How is the 10-fold estimated? Similar comments apply to Figure 1F and the estimated 37-fold. I also wonder if the datasets combined from different studies are necessarily compatible.

      We have now changed the statements and mentioned the actual noise values for different classes of genes rather than the fold-changes (lines 111-113 and 143-145). We agree that the measurements for mRNA expression levels, protein synthesis rates and protein noise were obtained from experiments done by different research labs, and this could introduce more variation in the data. However, it is unlikely the experimental variations are likely to be random and do not bias any specific class of genes (in Fig. 1B and Fig. 1F) more than others.  

      (2)   How Figure 1D has been generated seems confusing, the authors state this is based on the Gillespie algorithm, but in panel 1C and also in the methods, they are writing ODEs and Equations 3 and 4 stating the Euler method for the solution of ODEs. Also, I am concerned if this has been done at steady-state. The protein noise for the two-state model can be analytically obtained, and instead of simulations, the authors could have just used the expression. Also, Figure 1D shows CV while the corresponding data Figure 1B is showing mean adjusted DM. So, I am not sure if the comparison is valid. I am also very confused about the fact that the authors show CV does not depend on the mean expression of proteins and mRNA. Analytical solutions suggested there is always an inverse relationship exists between CV and mean and this has also been experimentally observed (see for example Newman et al 2006).

      We used Gillespie algorithm for stochastic simulations and identified the time points when an event (for example, switching to ON or OFF states during transcriptional bursting) occurred. If an event occurred at a time point, the rates of the reactions were guided by the equations 3 and 4, as the rates of reactions were dependent on the number of mRNA (or protein) molecules present, production rates and removal rates. 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise (Newman et al., 2006). For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene. 

      The work of Newman et al. (2006) measures noise values of different genes with different transcriptional burst characteristics and different mRNA and protein removal rates. We also see similar results in our simulations (Fig. 1E), where as we increase the mean expression by changing the transcriptional burst frequency, the protein noise goes down.     

      (3) Estimating parameters of gene expression using reference 44 ignores the effect of variability in capture efficiency and cell size. In a recent paper, Tang et al Bioinformatics 39 (7), btad395 2023 addressed this issue.

      Thank you for referring to the work of Tang et al. (2023). We note that the cell size and capture efficiency have a small effect on the burst frequency (Kon) but has a more pronounced effect on burst size (Tang et al., 2023). In our analysis, we considered only burst frequency and even with likely small inaccuracies in our estimation of Kon, we can capture interesting association of burst frequency with noise trends. 

      (4) In the methods "αp = 0.007 per mRNA molecule per unit time", I believe it should be per protein molecule per unit time.

      Corrected.

      (5)  Figure 3 uses TASEP modelling but the details of this modelling are not described well.

      We have now expanded the description of the modelling approach in the revised manuscript (lines 391-412; lines 693-776 and lines 797-809). In addition, we have also added more details in the figure captions. 

      (6) Another overall issue is that when the authors talk about changes in burst frequency or changes in translation efficiency, it is not always clear, is this done while keeping all the other parameters constant therefore changing mean expressions, or is this done by keeping the mean expressions constant?

      To test for the association between mean protein expression and protein noise, we have varied the mean expression by changing the translation initiation rate (TLinit) for the most part of the manuscript while keeping other parameters constant. In figure 5, where we decoupled TLinit from ribosome traversal rate (V), we changed the mean protein expression by changing the ribosome traversal rate while keeping other parameters constant. We have now mentioned this in the manuscript. 

      (7)   I believe Figures 5 and 6 present the same data in different ways, I wonder if these can be combined or if some aspect of the data in Figure 5 could go to supplementary. Also, the statistical tests in Figure 5E and F are not clear what they are testing.

      We have now moved figures 5E and 5F to the supplement (Fig. S20). We have also added details of the statistical test in the figure caption. 

      Reviewer #2 (Public review): 

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below. 

      Major comments: 

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules of specific genes at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, the ribosome demand model had two predictions both of which could be validated through modelling as well as from our experiments. 

      To further investigate whether removing ribosome demand from our model could eliminate the positive mean-noise correlation for a gene, we have now tested two additional sets of models where we decoupled the translation initiation rate (TLinit) from the ribosome traversal speed (V). In the first model, we changed the mean protein expression by changing the translation initiation rate but keeping the ribosome traversal speed constant. Thus, in this scenario, ribosome demand varied according to the variation in the translation initiation rate. As expected, the positive correlation between mean expression and protein noise was maintained in this condition (Fig. 5B). In the second model, we changed the mean expression by changing the ribosome traversal speed but keeping the translation initiation rate (and therefore, the ribosome demand) constant. In this situation, the relationship between mean expression and protein noise turned negative (Fig. 5B and fig. S16). These results further pointed that the ribosome demand was indeed driving the positive relationship between mean expression and protein noise. 

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      We agree with the reviewer’s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, but it has been observed in studies across bacteria, yeast, and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the relative strength of these associations, but to understand the molecular basis of the influence of translational efficiency on protein noise. 

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We have revised the figure captions to include more details as per the reviewer’s suggestion. 

      (4)  It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells. 

      For all published datasets where we had measurements from many genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-tomedian (DM, for protein noise). These measures of noise are corrected mean-dependence of expression noise. For simulations, which we performed for a single gene, and for experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible for a single gene. We now mention this in line 123-124. We used the measure of protein synthesis rate per mRNA as the measure of translational efficiency (Riba et al., 2019; line 100). Alternatively, we also used tRNA adaptation index (tAI) as a measure of translational efficiency, as codon choice could also influence the translation rate per mRNA molecule (Tuller et al., 2010) (line 193). 

      The protein noise was quantified from the signal intensity of GFP tagged proteins (Newman et al., 2006; and our data), which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells.  

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they may not be new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models. 

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a basal translation initiation rate depending on the mRNA numbers and other variables. We changed the basal translation initiation rate to alter the mean protein expression levels. We have now elaborated the modelling section to incorporate these details in the revised manuscript (lines 404 to 412). 

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the basal translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description of the model that the changes in the translation initiation rate were also linked to changes in the translation elongation rate (Fig. 3D). Thus, an increase in the translation initiation rate was associated with faster ribosome traversal through an mRNA molecule. This has also been observed in an experimental study by Barrington et al. (2023). Therefore, the models can also be expressed in terms of the translation elongation rate or ribosome traversal speed, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.  

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      (1)  The discussion from lines 180 to 182 appears consistent with Figure 1E. It seems that the twostate model can already explain why the genes with high burst frequency and high protein synthesis rate showed a small protein noise. It is unclear to me the purpose of this discussion.

      Yes, the results from Fig. 1E were from stochastic simulations, whereas the results discussed in the lines 191 to 193 (in the revised manuscript) were based on our analysis of experimental data that is shown in Fig. 2D.

      (2)  If I understand correctly, "translational efficiency" is the same as "protein synthesis rate" in this work. It would be helpful if the authors could keep the same notation throughout the paper to avoid confusion.

      The protein synthesis rate per mRNA molecule is the best measure of translational efficiency, and we used the experimental data from Riba et al. (2019) for this purpose (line 99-100). Alternatively, we also used tRNA Adaptation Index (tAI) as a measure of translational efficiency, as the codon choice also influences the rate at which an mRNA molecule is translated (Tuller et al., 2010) (line 192). 

      (3) On line 227, does "higher translation rate" mean "higher translation initiation rate"? The same issues happen in a few places in this paper.

      Corrected now (line 243 in the revised manuscript and throughout the manuscript). 

      (4) The discussion from lines 296 to 301 is unclear. It is not obvious to me how the authors obtained the conclusion that lowering translational efficiency would decrease the protein expression noise.

      High translational efficiency will require more ribosomes and hence, will increase ribosome demand. If ribosome demand is the molecular basis of high expression noise for genes with bursty transcription and high translational efficiency, then we can expect a reduction in ribosome demand and a reduction in noise if we lower the translational efficiency. We have rephrased this section for clarity between the lines 334 and 339 in the revised manuscript.   

      (5)  On line 324, should slower translation mean a shorter distance between neighboring ribosomes? One can imagine the extreme limit in which ribosomes move very slowly so that the mRNA is fully packed with ribosomes. 

      Slower translation or ribosome traversal rate would also lower the translation initiation rate (Barrington et al., 2023). Slower traversal of ribosomes reduces the chances of collision in case of transient slow-down of ribosomes due to occurrence of one or more non-preferred codons. We have now clarified this part in the lines 360 to 369 in the revised manuscript.

      (6) The text from lines 423 to 433 can be put in Methods.

      We have already added this part to the methods section (lines 900 to 910) and now minimize this discussion in the results section. 

      (7)  The discussion from lines 128 to 130 is unclear, and the statement appears to be consistent with the two-state model (see Figure 1E). The meaning of "initial mRNA numbers" is also unclear.

      An earlier study has proposed that essential genes in yeast employs high transcription and low translation strategy for expression, likely to maintain low expression noise in these genes and to prevent detrimental effects of high expression noise (Fraser et al., 2004). However, there has been no direct supportive evidence. Therefore, we were testing whether the differences in mRNA levels and translational efficiency of genes can lead to differences in protein noise through stochastic simulations. The discussion between the lines 130 and 132 in the revised manuscript summarises the results of the simulations. 

      Initial mRNA numbers - mRNA copy numbers that are present in the cell at the start of stochastic simulations. However, we have now changed it to ‘mRNA levels’ in the revised manuscript for clarity (line 131 in the revised manuscript).

      (8)  On line 212, is the translation initiation rate TL_init the same thing as beta_p in Figure 3A?

      βp refers to the rate of protein synthesis, which is influenced by the translational burst kinetics as well as the translation initiation rate, whereas TLinit refers to the translation initiation rate. So, these parameters are related, but are not the same.

    1. eLife Assessment

      Floeder and colleagues provide an important investigation that describes the experimental conditions that systematically produce "ramps" in dopamine signaling in the striatum. This somewhat nebulous feature of dopamine has been a significant part of recent theoretical and computational debates attempting to formally describe the different timescales on which dopamine functions. The current results are convincing and add context to that ongoing work.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Floedder et al report that dopamine ramps in both Pavlovian and Instrumental conditions are shaped by reward interval statistics. Dopamine ramps are an interesting phenomenon because at first glance they do not represent the classical reward prediction errors associated with dopamine signaling. Instead, they seem somewhat to bridge the gap between tonic and phasic dopamine, with an intense discussion still being held in the field about what is their actual behavioral role. Here, in tests with head-fixed mice, and dopamine being recorded with a genetically encoded fluorescent sensor in the nucleus accumbens, the authors find that dopamine ramps were only present when intertrial intervals were relatively short and the structure of the task (Pavlovian cue or progression in a VR corridor) contained elements that indicated progression towards the reward (e.g., a dynamic cue). The authors propose that although these findings can be explained by classical theories of dopamine function, they are better explained by their model of Adjusted Net Contingency of Causal Relation (ANCCR). The results of this study provide constraints on future models of dopamine function, and are of high interest to the field.

    3. Reviewer #2 (Public review):

      In this manuscript by Floeder et al., the authors report a correlation between ITI duration and the strength of a dopamine ramp occurring in the time between a predictive conditioned stimulus and a subsequent reward. They found this relationship occurring within two different tasks with mice, during both a Pavlovian task as well as an instrumental virtual visual navigation task. Additionally, they observed this relationship only in conditions when using a dynamic predictive stimulus. The authors relate this finding to their previously published model ANCCR in which the time constant of the eligibility trace is proportionate to the reward rate within the task.

      The relationship between ITI duration and the extent of a dopamine ramp which the authors have reported is very intriguing and certainly provides an important constraint for models for dopamine function. As such, these findings are potentially highly impactful to the field.

    4. Reviewer #3 (Public review):

      Summary:

      Floeder and colleagues measure dopamine signaling in the nucleus accumbens core using fiber photometry of the dLight sensor, in Pavlovian and instrumental tasks in mice. They test some predictions from a recently proposed model (ANCCR) regarding the existence of "ramps" in dopamine that have been seen in some previous research, the characteristics of which remain poorly understood.

      They find that cues signaling a progression toward rewards (akin to a countdown) specifically promote ramping dopamine signaling in the nucleus accumbens core, but only when the intertrial interval just experienced was short. This work is discussed in the context of ongoing theoretical conceptions of dopamine's role in learning.

      This work is the clearest demonstration to date of concrete training factors that seem to directly impact whether or not dopamine ramps occur. The existence of ramping signals has long been a feature of debates in the dopamine literature and this work adds important context to that. Further, as a practical assessment of the impact of a relatively simple trial structure manipulation on dopamine patterns, this work will be important for guiding future studies. These studies are well done and thoughtfully presented. The additional data, analyses, and discussion in the revised version of the paper add strength and clarity to the conclusions.

      The current results raise interesting questions regarding what, if any potential function cue-reward interval dopamine ramps serve. In the current data, licking behavior was similar on different trial types and was not related to ramping activity.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Floedder et al report that dopamine ramps in both Pavlovian and Instrumental conditions are shaped by reward interval statistics. Dopamine ramps are an interesting phenomenon because at first glance they do not represent the classical reward prediction errors associated with dopamine signaling. Instead, they seem somewhat to bridge the gap between tonic and phasic dopamine, with an intense discussion still being held in the field about what is their actual behavioral role. Here, in tests with head-fixed mice, and dopamine being recorded with a genetically encoded fluorescent sensor in the nucleus accumbens, the authors find that dopamine ramps were only present when intertrial intervals were relatively short and the structure of the task (Pavlovian cue or progression in a VR corridor) contained elements that indicated progression towards the reward (e.g., a dynamic cue). The authors show that these findings are well explained by their previously published model of Adjusted Net Contingency of Causal Relation (ANCCR).

      Strengths:

      This descriptive study delineates some fundamental parameters that define dopamine ramps in the studied conditions. The short, objective, and to-the-point format of the manuscript is great and really does a service to potential readers. The authors are very careful with the scope of their conclusions, which is appreciated by this reviewer.

      We thank the reviewer for their overall support of the formatting and scope of the manuscript. 

      Weaknesses:

      The discussion of the results is very limited to the conceptual framework of the authors' preferred model (which the authors do recognize, but it still is a limitation). The correlation analysis presented in panel l of Figure 3 seems unnecessary at best and could be misleading, as it is really driven by the categorical differences between the two conditions that were grouped for this analysis. There are some key aspects of the data and their relationship with each other, the previous literature, and the methods used to collect them, that could have been better discussed and explored.

      We agree with the reviewer that a weakness of the discussion was the limited framing of the results within the ANCCR model. To address this, we have expanded our introduction and discussion sections to provide a more thorough explanation of our model and possible leading alternatives.

      We thank the reviewer for pointing out that Figure 3l may be misleading for readers; we removed this panel from the revised Figure 4.

      We have further addressed the specific concerns raised by the reviewer in their comments to the authors. Indeed, we agree with the reviewer that the original manuscript was narrow in its focus regarding relationships between different aspects of the data. To more thoroughly explore how key variables – including dopamine ramp slope and onset response as well as licking behavior slope – could relate to each other, we have added Extended Data Figure 8. In this figure, we show that no correlations exist between any of these key variables in either dynamic tone condition; it is our hope that this additional analysis highlights the significance of the clear relationship between dopamine ramp slope and ITI duration. 

      Reviewer #2 (Public Review):

      In this manuscript by Floeder et al., the authors report a correlation between ITI duration and the strength of a dopamine ramp occurring in the time between a predictive conditioned stimulus and a subsequent reward. They found this relationship occurring within two different tasks with mice, during both a Pavlovian task as well as an instrumental virtual visual navigation task. Additionally, they observed this relationship only in conditions when using a dynamic predictive stimulus. The authors relate this finding to their previously published model ANCCR in which the time constant of the eligibility trace is proportionate to the reward rate within the task.

      The relationship between ITI duration and the extent of a dopamine ramp which the authors have reported is very intriguing and certainly provides an important constraint for models for dopamine function. As such, these findings are potentially highly impactful to the field. I do have a few questions for the authors which are written below.

      We thank the reviewer for their interest in our findings and belief in their potential to be impactful in the field. 

      (1) I was surprised to see a lack of counterbalance within the Pavlovian design for the order of the long vs short ITI. Ramping of the lick rate does increase from the long-duration ITIs to the short-duration ITI sessions. Although of course, this increase in ramping of the licking across the two conditions is not necessarily a function of learning, it doesn't lend support to the opposite possibility that the timing of the dynamic CS hasn't reached asymptotic learning by the end of the long-duration ITI. The authors do reference papers in which overtraining tends to result in a reduction of ramping, which would argue against this possibility, yet differential learning of the dynamic CS would presumably be required to observe this effect. Do the authors have any evidence that the effect is not due to heightened learning of the timing of the dynamic CS across the experiment?

      We appreciate the reviewer expressing their surprise regarding the lack of counterbalance in our Pavlovian experimental design. We previously did not explicitly do this because the ramps disappeared in the short ITI/fixed tone condition, indicating that their presence is not just a matter of total experience in the task. However, we agree that this is incidental, but not direct evidence. To address this drawback, we repeated the Pavlovian experiment in a new cohort of animals with a revised training order, switching conditions such that the short ITI/dynamic tone (SD) condition preceded the long ITI/dynamic tone (LD) condition (see revised Figure 2a). Despite this change in the training order, the main findings remain consistent: positive dLight slopes (i.e., dopamine ramps) are only observed in the SD condition (Figure 2b-d). 

      We thank the reviewer for raising these questions regarding licking behavior and learning and their relationship with dopamine ramps. Indeed, a closer look at the average licking behavior reveals subtle differences across conditions (Figure 1f and Extended Data Figure 5a). While the average lick rate during the ramp window does not differ across conditions (Extended Data Figure 5c), the ramping of the lick rate during this window is higher for dynamic tone conditions compared to fixed tone conditions (Extended Data Figure 5d). Despite these differences, we still believe that the main comparison between the dopamine slope in the SD vs LD condition remains valid given their similar lick ramping slopes. Furthermore, our primary measure of learning is not lick slope, but anticipatory lick rate during the 1 s trace preceding reward delivery, which is robustly nonzero across cohorts and conditions (Figure 1g and Extended Data Figure 5b). 

      Taken together, we hope that the results from our counterbalanced Pavlovian training and more rigorous analysis of lick behavior across conditions provide sufficient evidence to assuage concerns that the differences in ramping dopamine simply reflect differences in learning. 

      (2) The dopamine response, as measured by dLight, seems to drop after the reward is delivered. This reduction in responding also tends to be observed with electrophysiological recordings of dopamine neurons. It seems possible that during the short ITI sessions, particularly on the shorter ITI duration trials, that dopamine levels may still be reduced from the previous trial at the onset of the CS on the subsequent trial. Perhaps the authors can observe the dynamics of the recovery of the dopamine response following a reward delivery on longer-duration ITIs in order to determine how quickly dopamine is recovering following a reward delivery. Are the trials with very short ITIs occurring within this period that dopamine is recovering from the previous trial? If so, how much of the effect may be due to this effect? It should be noted that the lack of observance of a ramp on the condition of shortduration ITIs with fixed CSs provides a potential control for this effect, yet the extent to which a natural ramp might occur following sucrose deliveries should be investigated.

      We thank the reviewer for highlighting the possibility that ramps may be due to the dopamine response recovery following reward delivery. Given that peak reward dopamine responses tend to be larger in long ITI conditions, however, we felt that it was inappropriate to compare post-reward dopamine recovery times across conditions. Instead, we decided to directly compare the dLight slope 2s before cue onset (“pre-cue window,” a proxy for recovery from previous trial) with the dLight slope during our ramp window from 3 to 8s after cue onset (Extended Data Figure 6a). There were no significant differences in pre-cue dLight slope across conditions (Extended Data Figure 6b); this suggests that the ramping slopes seen in the SD condition, but not other conditions, is not simply due to the natural dopamine recovery response following reward delivery. Furthermore, if the dopamine ramps observed in the SD condition were a continuation of the post-reward dopamine recovery from the previous trial, we would expect to see a positive correlation between the dLight slope before and during the cue. However, there is no such correlation between the dLight slopes in the ramp window vs. pre-cue window in the SD condition (Extended Data Figure 6c-d). We believe that this observation, along with the builtin control of the SF condition mentioned by the reviewer, serves as evidence against the possibility of our ramp results being due to a natural ramp after reward delivery.

      (3) The authors primarily relate the finding of the correlation between the ITI and the slope of the ramp to their ANCCR model by suggesting that shorter time constants of the eligibility trace will result in more precisely timed predictors of reward across discrete periods of the dynamic cue. Based on this prediction, would the change in slope be more gradual, and perhaps be more correlated with a broader cumulative estimate of reward rate than just a single trial?

      To clarify, we do not propose that a smaller eligibility trace time constant results in more precise timing per se. Instead, we believe that the rapid eligibility trace decay from smaller time constants gives greater causal predictive power for later periods in the dynamic cue (see Extended Data Figure 1) since the memory of the earlier periods of the cue is weaker. 

      We appreciate the reviewer’s curiosity regarding the influence of a broader cumulative estimate of reward vs. only the immediately preceding ITI on dopamine ramp slopes. Indeed, in several instrumental tasks (e.g., Krausz et al., Neuron, 2023), recent reward rate modulates the magnitude of dopamine ramps, making this an important variable to investigate. We chose to use linear regression for each mouse separately to analyze the relationship between the trial dopamine slope and the average previous ITI for the past 1 through 10 most recent trials. In the SD condition, as reported in our earlier manuscript, there was a significantly negative dependence of trial dopamine slope with the single previous ITI (i.e., if the previous ITI was long, the next trial tends to have a weaker ramp). This negative dependence, however, only held for a single previous trial; there was no clear relationship between the per-trial dopamine slope and the average of the past 2 through 10 ITIs (Extended Data Figure 7a). For the LD condition, on the other hand, there is no clear relationship between the per-trial dopamine slope and the average previous ITI for any of the past 1 through 10 trials, with one exception: there is a significantly negative dependence of trial dopamine slope with the average ITI of the previous 2 trials (Extended Data Figure 7b). This longer timescale relationship in the LD condition suggests that the adaptation of the eligibility trace time constant is nuanced and depends on the general ITI length. 

      In general, though we reason that the eligibility trace time constant should depend on overall event rates, we do not currently propose a real-time update rule for the eligibility trace time constant depending on recent event rates. Accordingly, we are currently agnostic about the actual time scale of history of recent event rate calculation that mediates the eligibility trace time constant. Our experimental results suggest that when the ITI is generally short for Pavlovian conditioning, the eligibility trace time constant adapts to ITI on a rapid timescale. However, only a small fraction of the variability of this rapid fluctuation is captured by recent ITI history. A more thorough investigation of this real-time update rule would need to be done in the future.

      Reviewer #3 (Public Review):

      Summary:

      Floeder and colleagues measure dopamine signaling in the nucleus accumbens core using fiber photometry of the dLight sensor, in Pavlovian and instrumental tasks in mice. They test some predictions from a recently proposed model (ANCCR) regarding the existence of "ramps" in dopamine that have been seen in some previous research, the characteristics of which remain poorly understood.

      They find that cues signaling a progression toward rewards (akin to a countdown) specifically promote ramping dopamine signaling in the nucleus accumbens core, but only when the intertrial interval just experienced was short. This work is discussed in the context of ongoing theoretical conceptions of dopamine's role in learning.

      Strengths:

      This work is the clearest demonstration to date of concrete training factors that seem to directly impact whether or not dopamine ramps occur. The existence of ramping signals has long been a feature of debates in the dopamine literature and this work adds important context to that. Further, as a practical assessment of the impact of a relatively simple trial structure manipulation on dopamine patterns, this work will be important for guiding future studies. These studies are well done and thoughtfully presented.

      We thank the reviewer for recognizing the context that our study adds to the dopamine literature and the potential for our experiments to guide future work. 

      Weaknesses:

      It remains somewhat unclear what limits are in place on the extent to which an eligibility trace is reflected in dopamine signals. In the current study, a specific set of ITIs was used, and one wonders if the relative comparison of ITI/history variables ("shorter" or "longer") is a factor in how the dopamine signal emerges, in addition to the explicit length ("short" or "long") of the ITI. Another experimental condition, where variable ITIs were intermingled, could perhaps help clarify some remaining questions.

      Though we used ITIs of fixed means, due to the exponential nature of their distribution, we did intermingle ITIs of various durations in both our long and short ITI conditions. The distribution of ITI durations is visualized in Figure 1c for Pavlovian conditioning and Extended Data Figure 9b for VR navigation. 

      The relative comparison between consecutive ITIs was not something we originally explored, so we thank the reviewer for wondering how it impacts the dopamine signal. To investigate this, we quantified both the change in ITI (+ or - Δ ITI for relatively longer or shorter, respectively) and the change in dopamine ramp slope between consecutive trials in the SD condition (Figure 3d). Across each mouse separately, we found a significantly negative relationship between Δ slope and Δ ITI (Figure 3e-f). Also, the average Δ slope was significantly greater for consecutive trials with a Δ ITI below -1 s compared to trials with a Δ ITI above +1 s (Figure 3g). Altogether, these findings suggest that relative comparison of ITIs does correlate with changes in the dopamine signal; a relatively longer ITI tends to have a weaker ramp, which fits in nicely with the expected inverse relationship between ITI and dopamine ramp slope from our ANCCR model.

      In both tasks, cue onset responses are larger, and longer on long ITI trials. One concern is that this larger signal makes seeing a ramp during the cue-reward interval harder, especially with a fluorescence method like photometry. Examining the traces in Figure 1i - in the long, dynamic cue condition the dopamine trace has not returned to baseline at the time of the "ramp" window onset, but the short dynamic trace has. So one wonders if it's possible the overall return to baseline trend in the long dynamic conditions might wash out a ramp.

      This is a good point, and we thank the reviewer for raising it. Certainly, the cue onset response is significantly larger in long ITI conditions (see Figure 1i-j and Figure 4h-j). To avoid any bleed over effect, we intentionally chose ramp window periods during later portions of the trial (in line with work from others e.g., Kim et al., Cell, 2020). While the cue onset dopamine pulse seems to have flatlined by the start of the ramp window period, the dopamine levels clearly remain elevated relative to pre-cue baseline. This type of signal has been observed with fiber photometry in other Pavlovian conditioning paradigms with long cue durations (e.g., Jeong et al., Science, 2022). Because of the persistently elevated dopamine levels, it is certainly possible that a ramping signal during the cue is getting washed out; with the bulk fluorescence photometry technique we employed in this study, this possibility is unfortunately difficult to completely rule out. However, the long ITI/fixed tone (LF) condition could serve as a potential control given the overall similarity in the dopamine signal between the LF and LD conditions: both conditions have large cue onset responses with elevated dopamine throughout the duration of the cue (see Extended Data Figures 2c and 3c). Critically, the LD condition lacks a noticeable ramp despite the dynamic tone providing information on temporal proximity to reward, which is thought to be necessary for dopamine ramps to occur. Importantly, regardless of whether a ramp is masked in the long ITI dynamic condition, most studies investigate such a condition in isolation and would report the absence of dopamine ramps. Thus, at a descriptive level, we believe it remains true that observable dopamine ramps are only present when the ITI is short. 

      Not a weakness of this study, but the current results certainly make one ponder the potential function of cue-reward interval ramps in dopamine (assuming there is a determinable function). In the current data, licking behavior was similar on different trial types, and that is described as specifically not explaining ramp activity.

      We agree that this work naturally raises the question of the function of dopamine ramps. However, selective and precise manipulation of only the dopamine ramps without altering other features such as phasic responses, or inducing dopamine dips, is highly technically challenging at this moment; due to this challenge, we intentionally focused on the conditions that determine the presence or absence of dopamine ramps rather than their function. We agree with the reviewer that studying the specific function of dopamine ramps is an interesting future question. 

      Reviewing Editor:

      The reviewers felt the results are of considerable and broad interest to the neuroscience community, but that the framing in terms of ANCCR undermined the scope of the findings as did the brief nature of the formatting of the manuscript. In addition, the reviewers felt that the relationship between ramp dynamics, behavior, and ITI conditions requires more in-depth analyses. Relatedly, the lack of counterbalancing of the ITI durations was considered to be a drawback and needs to be addressed as it may affect the baseline. Addressing these issues in a satisfactory manner would improve the assessment of the manuscript to important/convincing.

      We truly appreciate the valuable feedback provided on this manuscript by all three reviewers and the reviewing editor. Based on this input, we have significantly revised the manuscript to address the issues brought up by the reviewers. Firstly, we have conducted additional experiments to counterbalance the ITI conditions for Pavlovian conditioning; this strengthened our results by confirming our original findings that ITI duration, rather than training order, is the key variable controlling the presence or absence of dopamine ramps. Secondly, we completed more rigorous analyses to further explore the relationship between dopamine dynamics, animal behavior, and ITI duration; we generally found no significant correlations between these variables, with a notable exception being our main finding between ITI duration and dopamine ramp slope. Finally, we revised and expanded our writing to both explain predictions from our ANCCR model in less technical language and explore how alternative theoretical frameworks could potentially explain our findings. In doing so, we hope that our manuscript is now more accessible and of interest to a broad audience of neuroscience readers.

      Reviewer #1 (Recommendations For The Authors):

      The study could be improved if the authors performed a more detailed comparison of how other theoretical frameworks, beyond ANCCR could account for the observed findings. Also, the correlation analysis presented in the panel I of Figure 3 seems unnecessary and potentially spurious, as the slope of the correlation is clearly mostly driven by the categorical differences between the two ITI conditions, which were combined for the analysis - it's not clear what is the value of this analysis beyond the group comparison presented in the following panel.

      Again, we thank the reviewer for elaborating on their concern regarding Figure 3l – we have removed it from the revised Figure 4. 

      The relationship between ramp dynamics with the behavior and the large differences in cue onset responses between short and long ITI conditions could have been better explored. If I understand correctly the overarching proposal of this and other publications by this group, then the differences in cue responses is determined by the spacing of rewards in a somewhat similar way that the ramps are. So, is there a trial-by-trial correlation between the amplitude of the cue responses and the slope of the ramps? Is there a correlation between any of these two measures with the licking behavior, and if so, does it change with the ITI condition? A more thorough exploration of these relationships would help support the proposal of the primacy of inter-event spacing in determining the different types of dopamine responses in learning.

      There are certainly interesting relationships between dopamine dynamics, behavior, and ITI that we failed to explore in our original manuscript – we appreciate the reviewer bringing them up. We found no correlation between dopamine ramp slope and cue onset response in either the SD or LD condition (Extended Data Fig 8a-b). Moreover, we found no correlation between either of these variables and the trial-by-trial licking behavior (Extended Data Fig 8c-f). Finally, there is no relationship between licking behavior and previous ITI duration (Extended Data Fig 8g-h), suggesting that behavioral differences do not account for differences in the dopamine ramp slope. Together, the lack of significant relationships between these other variables highlights the specific, clear relationship between ITI duration and dopamine ramp slope. 

      Finally, another issue I feel could have been better discussed is how the particular settings of both tasks might be biasing the results. For example, there is an issue to be considered about how the dopamine ramp dynamics reported here, especially the requirement of a dynamic cue for ramps to be present, square with the previous published results by one of the authors - Mohebi et al, Nature, 2019. In that manuscript, rats were executing a bandit task where, to this reviewer's understanding, there was no explicit dynamic cue aside from the standard sensory feedback of the rats moving around in the behavior boxes to approach a nose poke port. Is the idea that this sensory feedback could function as a dynamic cue? If that's the case, then this short-scale, movement-related feedback should also function as a dynamic cue in a freely moving Pavlovian condition, when the animals must also move towards a reward delivery port, right? Therefore, could it be that the experimental "requirement" of a dynamic cue is only present in a head-fixed condition? One could phrase this in a different way to Steelman and potentially further the authors' proposal: perhaps in any slightly more naturalistic setting, the interaction of the animals with their environment always functions as a dynamic cue indicating proximity to reward, and this relationship was experimentally isolated by the use of head fixation (but not explicitly compared with a freely moving condition) in the present study. I think that would be an interesting alternative to consider and discuss, and perhaps explore experimentally at some point.

      We thank the reviewer for raising this important point regarding the influence of our experimental settings on our results. At first glance, it could appear that our results demonstrating the necessity of a dynamic cue for ramps in a head-fixed setting do not fit neatly with other results in a freely moving setup (e.g., Collins et al., Scientific Reports, 2016; Mohebi et al., Nature, 2019). Exactly as the reviewer states though, we believe that sensory feedback from the environment in freely moving preparations serves the same function as a dynamic progression of cues. We have considered the implications of methodological differences between head-fixed and freely moving preparations in the discussion section. 

      Reviewer #2 (Recommendations For The Authors):

      This comment relates indirectly to comment 3, in that the authors intermix theory throughout the manuscript. I think this would be fine if the experiment was framed directly in terms of ANCCR, but the authors specifically mention that this experiment wasn't developed to distinguish between different theories. As such, it seems difficult to assess the scope of the comments regarding theory within the paper because they tend to be specifically related to ANCCR. For instance, the last comment has broad implications of how the ramp might be related to the overall reward rate, an interesting finding that constrains classes of dopamine models rather than evidence just for ANCCR. Perhaps adding a discussion section that allows the authors to focus more on theory would be beneficial for this manuscript.

      We appreciate this suggestion by the reviewer. We have updated both our introduction and discussion sections to elaborate more thoroughly on theory.

      Reviewer #3 (Recommendations For The Authors):

      The paper could potentially benefit from the use of more accessible language to describe the conceptual basis of the work, and the predictions, and a bit of reformatting away from the brief structure with lots of supplemental discussion.

      For example, in the introduction, the line - "Varying the ITI was critical because our theory predicts that the ITI is a variable controlling the eligibility trace time constant, such that a short ITI would produce a small time constant relative to the cue-reward interval (Supplementary Note 1)". As far as I can tell, this is meant to get across the notion that dopamine represents some aspect of the time between rewards - dopamine signals will differ for cues following short vs long intervals between rewards.

      As written, the language of the paper takes a fair bit of parsing, but the notions are actually pretty simple. This is partly due to the brief format the paper is written in, where familiarity with the previous papers describing ANCCR is assumed.

      From a readability standpoint, and the potential impact of the paper on a broad audience, perhaps this could be considered as a point for revision.

      We thank the reviewer for pointing out the drawbacks of our technical language and brief formatting. To address this, we have removed the majority of the supplementary notes and expanded our introduction and discussion sections. In doing so, we hope that the conceptual foundations of this work, and potential alternative theoretical explanations, are accessible and impactful for a broad audience of readers.

    1. eLife Assessment

      This valuable study by Wu and Zhou combines neurophysiological recordings and computational modelling to address an interesting question regarding the sequence of events from sensing to action. Neurophysiological evidence remains incomplete: explicit mapping of saccade-related activity in the same neurons and a better understanding of the influence of the spatial configuration of stimulus and targets would be required to pinpoint whether such activity might contribute, even partially, to the observed results and interpretations. These results are of interest for neuroscientists investigating decision-making.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors recorded activity in the posterior parietal cortex (PPC) of monkeys performing a perceptual decision-making task. The monkeys were first shown two choice dots of two different colors. Then, they saw a random dot motion stimulus. They had to learn to categorize the direction of motion as referring to either the right or left dot. However, the rule was based on the color of the dot and not its location. So, the red dot could either be to the right or left, but the rule itself remained the same. It is known from past work that PPC neurons would code the learned categorization. Here, the authors showed that the categorization signal depended on whether the executed saccade was in the same hemifield as the recorded PPC neuron or in the opposite one. That is, if a neuron categorized the two motion directions such that it responded stronger for one than the other, then this differential motion direction coding effect was amplified if the subsequent choice saccade was in the same hemifield. The authors then built a computational RNN to replicate the results and make further tests by simulated "lesions".

      Strengths:

      Linking the results to RNN simulations and simulated lesions.

      Weaknesses:

      Potential interpretational issues due to a lack of explicit evidence on the sizes and locations of the response fields of the neurons. For example, is the contra/ipsi effect explained by the fact that in the contra condition, the response target and the saccade might have infringed on the outer edges of the response fields?

    1. eLife Assessment

      The manuscript by Russell et al. investigates an important problem: the current lack of methods for early and accurate N. fowleri diagnosis, which is >95% fatal. The authors provide solid evidence that a small RNA secreted by N. fowleri is detectable in biological fluids like blood and urine in a mouse model, and is present in cerebrospinal fluid and blood for a limited number of patient samples. This could potentially help with earlier diagnosis, which could save lives.

    2. Reviewer #1 (Public review):

      Summary:

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods are based on sampling cerebrospinal fluid, and are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Fig. 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Fig. 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post infection (Fig. 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Fig. 6B), and in whole blood from 2 out of 2 samples (Fig. 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine.

      Strengths:

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples shows that whole blood could be tested for smallRNA-1 to diagnose N. fowleri infections. The potential for human blood or urine to be tested for N. fowleri could lead to critical early interventions.

      Weaknesses:

      There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood, serum, or human urine due to little to no sample material being available for testing. This limitation is examined thoroughly in the discussion section, and additional tests are beyond the scope of this work.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Early and accurate diagnosis is critical to treating N. fowleri infections, which often lead to death within 2 weeks of exposure. Current methods-sampling cerebrospinal fluid are invasive, slow, and sometimes unreliable. Therefore, there is a need for a new diagnostic method. Russell et al. address this need by identifying small RNAs secreted by Naegleria fowleri (Figure 1) that are detectable by RT-qPCR in multiple biological fluids including blood and urine. SmallRNA-1 and smallRNA-2 were detectable in plasma samples of mice experimentally infected with 6 different N. fowleri strains, and were not detected in uninfected mouse or human samples (Figure 4). Further, smallRNA-1 is detectable in the urine of experimentally infected mice as early as 24 hours post-infection (Figure 5). The study culminates with testing human samples (obtained from the CDC) from patients with confirmed N. fowleri infections; smallRNA-1 was detectable in cerebrospinal fluid in 6 out of 6 samples (Figure 6B), and in whole blood from 2 out of 2 samples (Figure 6C). These results suggest that smallRNA-1 could be a valuable diagnostic marker for N. fowleri infection, detectable in cerebrospinal fluid, blood, or potentially urine. 

      Strengths: 

      This study investigates an important problem, and comes to a potential solution with a new diagnostic test for N. fowleri infection that is fast, less invasive than current methods, and seems robust to multiple N. fowleri strains. The work in mice is convincing that smallRNA1 is detectable in blood and urine early in infection. Analysis of patient blood samples suggest that whole blood (but not plasma) could be tested for smallRNA-1 to diagnose N. fowleri infections. 

      Thank you for comments regarding the strengths of this study. We agree that our data for detecting the biomarker in biofluids from mice is convincing. In addition, our spike-in studies with human cerebrospinal fluid, plasma, and urine (Figure 6) suggest these biofluids from humans could be used for diagnosis.

      We appreciate the comment regarding plasma and recognize this was not fully explained in the manuscript. We do believe that plasma can be used to assess the biomarker. Firstly, we demonstrated equivalent sensitivity of the method to detect smallRNA-1 in plasma and urine in mice with end-stage PAM (Figure 5). In addition, spike in samples of human plasma, cerebrospinal fluid, and urine demonstrated equivalent sensitivity of detecting the biomarker (Figure 6). 

      The negative result for human plasma in Figure 6C requires clarification; this sample was convalescent plasma from a survivor. The patient presented to the hospital on August 7, 2016, was treated, made a remarkable recovery, and was released from the hospital later that month. The plasma sample in Figure 6C was collected September 7, 2016, which is a month after treatment was initiated and weeks after the patient was symptom free. Our interpretation of the convalescent plasma result is the patient had cleared the active amoeba infection and that is why we did not detect the biomarker. We have added text in the discussion and in the legend for Figure 6 to clarify the convalescent plasma result. 

      One additional caveat for consideration is that many of the samples we received from amoebaeinfected humans were stored at room temperatures for undefined periods of time before being moved to <-20°C (see details in Table S9). We can’t rule out possible sample degradation, but this is an unfortunate reality of obtaining human samples from individuals later confirmed to be infected with pathogenic free-living amoebae.

      Weaknesses: 

      (1) There are not many N. fowleri cases, so the authors were limited in the human samples available for testing. It is difficult to know how robust this biomarker is in whole blood (only 2 samples were tested, both had detectable smallRNA-1), serum (1 out of 1 sample tested negative), or human urine (presumably there is no material available for testing). This limitation is openly discussed in the last paragraph of the discussion section. 

      We agree the extremely limited availability of human samples is a limitation of this study. Given the rarity of these infections in the United States, even prospective studies to systematically collect samples would be very challenging. We hope that by publishing the details of this biomarker detection is that the method can be used by diagnostic reference centers, especially in areas where outbreaks of multiple cases per year have been reported.

      (2) There seems to be some noise in the data for uninfected samples (Figures 4B-C, 5B, and 6C), especially for those with serum (2E). While this is often orders of magnitude lower than the positive results, it does raise questions about false positives, especially early in infection when diagnosis would be the most useful. A few additional uninfected human samples may be helpful. 

      We agree; however, we would like to point out the progression of disease in humans and mice are similar. Typically, patients survive between 10-14 days after presumed exposure and mice have similar survival times following instillation of N. fowleri amoebae into a nare of the mouse. Therefore, detection of this biomarker as early as 72 h in mice is seemingly equivalent to the onset of initial symptoms in humans.  

      Reviewer #2 (Public review): 

      Summary: 

      The authors sought to develop a rapid and non-invasive diagnostic method for primary amoebic meningoencephalitis (PAM), a highly fatal disease caused by Naegleria fowleri. Due to the challenges of early diagnosis, they investigated extracellular vesicles (EVs) from N. fowleri, identifying small RNA biomarkers. They developed an RT-qPCR assay to detect these biomarkers in various biofluids. 

      Strengths: 

      (1)  This study has a clear methodological approach, which allows for the reproducibility of the experiments. 

      (2) Early and Non-Invasive Diagnosis - The identification of a small RNA biomarker that can be detected in urine, plasma, and cerebrospinal fluid (CSF) provides a non-invasive diagnostic approach, which is crucial for improving early detection of PAM. 

      (3) High Sensitivity and Rapid Detection - The RT-qPCR assay developed in the study is highly sensitive, detecting the biomarker in 100% of CSF samples from human PAM cases and in mouse urine as early as 24 hours post-infection. Additionally, the test can be completed in ~3 hours, making it feasible for clinical use. 

      (4)  Potential for Disease Monitoring - Since the biomarker is detectable throughout the course of infection, it could be used not only for early diagnosis but also for tracking disease progression and monitoring treatment efficacy. 

      (5)  Strong Experimental Validation - The study demonstrates biomarker detection across multiple sample types (CSF, urine, whole blood, plasma) in both animal models and human cases, providing robust evidence for its clinical relevance. 

      (6) Addresses a Critical Unmet Need - With a >97% case fatality rate, PAM urgently requires improved diagnostics. This study provides one of the first viable liquid biopsy-based diagnostic approaches, potentially transforming how PAM is detected and managed. 

      Thank you for summarizing the strengths of the study.

      Weaknesses: 

      (1) Limited Human Sample Size - While the biomarker was detected in 100% of CSF samples from human PAM cases, the number of human samples analyzed (n=6 for CSF) is relatively small. A larger cohort is needed to validate its diagnostic reliability across diverse populations. 

      As noted in response to Reviewer #1 above, we agree this is a limitation of the study; however, we were fortunate to obtain even 15 µL samples of cerebrospinal fluid, plasma, serum, or whole blood from as many patients as we did. There is an urgent need for more systematic collection and storage of samples for rare diseases like primary amoebic meningoencephalitis so that advancements in diagnostics and biomarker discovery can be conducted. It is our sincere hope that by publishing our detailed methods and experimental results in this manuscript, that additional hospitals and research centers can replicate our studies and help advance this or other techniques for early diagnosis of PAM.

      (2) Lack of Pre-Symptomatic or Early-Stage Human Data - Although the biomarker was detected in mouse urine as early as 24 hours post-infection, there is no data on whether it can be reliably detected before symptoms appear in humans, which is crucial for early diagnosis and treatment initiation. 

      It is difficult to envision a method to obtain these biofluids from infected humans prior to onset of symptoms. More likely the best we can hope for is that physicians include primary amoebic meningoencephalitis in their assessment of patients that present with prodromal symptoms of meningitis.

      (3)  Plasma Detection Challenges - While the biomarker was detected in whole blood, it was not detected in human plasma, which could limit the ease of clinical implementation since plasma-based diagnostics are more common. Further investigation is needed to understand why it is absent in plasma and whether alternative blood-based approaches (e.g., whole blood assays) could be optimized. 

      See response to Reviewer #1 above.

      Reviewer #1 (Recommendations for the authors): 

      (1) What is the evidence that these small RNAs are secreted specifically in EVs? I believe that they are, and ultimately it doesn't impact the conclusions, but I think the evidence here could be either stronger or presented in a more obvious way. 

      Our data demonstrates that smallRNA-1 is present in N. fowleri-derived EVs (Figures 2 and Supplemental Figure 7) and in the intact amoebae (Figure 3B).  Initial sequencing data to identify these smallRNA biomarkers came from PEG-precipitated EVs (Figure S1), by using methods we previously published (22). The PEG-precipitated EVs were extracted specifically for spike in studies. Finally, the smallRNAs in EVs were confirmed after extraction of EVs from 7 N. fowleri strains (Figure 2). We do not have evidence that they are secreted outside of EVs.

      (2) The figure legends would be more useful with some additional information. For example: why are there two points for Nf69 in Fig 2B? In Figure 3A-B, please add more detail as to what the graphs are showing (are they histograms binned by a number of amoebae? This does not seem obvious to me). 

      We agree the Figure legends should be edited for clarity and to add additional information. Both Figure legends have been updated.

      In Figure 2B, each point represents the mean of three technical replicates of EV preps for each N. fowleri strain.

      In Figure 3 the points indicate the Copy#/µL of a well from a 96-well plate. The histograms show the mean of these observations for each condition. 

      (3)  In Figure 2E, the FBS seems like it has near detectable levels of smallRNA-1 compared to Ac and Bm (albeit N. fowleri has 4 orders of magnitude higher levels than the FBS). Because cows are likely exposed to N. fowleri and have documented infections (e.g. doi: 10.1016/j.rvsc.2012.01.002), is it possible this signal is real? 

      Thank you for making this interesting observation. We agree that cows are likely to have significant exposure to N. fowleri, yet documented infections are rare. In this case we do not believe the near detectable levels of smallRNA-1 in FBS was due to an infected donor animal. This noise was likely due to extracting RNA from concentrated FBS rather than FBS diluted in cell culture media. In addition, as shown in Supplemental Figure 4, the qPCR product from EVs extracted from FBS were not the same as that from the N. fowleri-derived EVs. Please note we used a PEG extraction reagent that separates lipid particles, so this is additional evidence the smallRNAs are present in EVs.

      (4)  In Figure 6A, why was the sample size greater for water and unspiked urine? Similarly, why is the number of infected mice so variable in Figure 4B? 

      In Figure 6A we assayed de-identified biofluids provided by Advent Hospital in Orlando, Florida. The plasma and serum samples were pooled from multiple individuals; whereas, individual urine samples (n=8) were provided for this experiment. We have updated the legend for Figure 6A to include these details.

      For Figure 4B we used plasma collected at the end-stage of disease following infections with five different strains of N. fowleri. The sample sizes varied for two reasons. First, Nf69 was the strain used most by our lab and we had plasma from several in vivo experiments. The lower sample sizes for the other strains came from an experiment with 8 mice per group. Some of these strains were less virulent and did not succumb to disease with the number of amoebae inoculated in this experiment. Thus, plasma was only collected from animals that were euthanized due to severe N.

      fowleri infections. In follow up studies (e.g., Figure 5B), plasma was collected every 24 hr for analysis.

      Very minor points: 

      (1)  The number of acronyms (FLA, PAM, EVs, CNS, CSF, LOD) could be reduced to make this paper more reader-friendly. 

      Acronyms that were used infrequently in the manuscript (FLA, CNS, LOD, mNGS, UC) have been edited to spell out the complete names. We kept the acronyms EVs and CSF because they are each used more than twenty times in the manuscript.

      (2)  The decimal point in the Cq values is formatted strangely. 

      The decimal points have been edited to normal format in both the manuscript and supplementary material.

      (3)  Figure 3C is not intuitive. I do not understand the logic for the placement of the different samples (was row A only amoebae, B only Veros, C blank, D a mix, and F more Veros?). 

      Thank you for this comment; we agree the microtiter plate schematic (Fig 3C) was misleading. We have revised Figure 3C to make the point that we tested amoebae alone, Vero cells alone, and we combined supernatants from Vero cells (alone) plus amoebae (alone) to confirm that 1) smallRNA-1 was only detected in amoeba-conditioned media, and 2) that Vero-conditioned media does not affect detection of smallRNA-1.

      Reviewer #2 (Recommendations for the authors): 

      Minor corrections: 

      The abbreviation 'Nf' for Naegleria fowleri is not appropriate in a scientific publication. According to taxonomic conventions, the correct way to abbreviate a scientific name is as follows: 

      The first mention should be written in full: Naegleria fowleri. 

      In subsequent mentions, the genus name should be abbreviated to its initial in uppercase, followed by a period, while the species name remains in lowercase: N. fowleri. 

      The same rule applies to Balamuthia mandrillaris and Acanthamoeba species, which should be abbreviated as B. mandrillaris and Acanthamoeba spp. after their first mention. 

      We agree and each of the scientific names have been updated to the proper format. Please note Nf69 is the accepted nomenclature for this N. fowleri strain, so no changes were made when referring to this specific strain.

      Temperatures should be expressed in international units (°C). Please update the temperatures reported in Fahrenheit (°F) in the 'Materials and Methods' section, specifically in the 'Animal Studies' subsection. 

      These changes were made in the revised manuscript.

    1. eLife Assessment

      This convincing study, which is based on a survey of researchers, finds that women are less likely than men to submit articles to elite journals. It also finds that there is no relation between gender and reported desk rejection. The study is an important contribution to work on gender bias in the scientific literature.

    2. Reviewer #1 (Public review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      - Women are less likely to submit their papers to highly influential journals (e.g., Nature, Science and PNAS).<br /> - Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.<br /> - Women are also more likely to say that they were advised not to submit to highly influential journals.

      The paper highlights an important point, namely that the submission behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates - or a lack thereof - should not be automatically interpreted as as evidence for or against discrimination (broadly defined) in the peer review process.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be valid reasons - even when women are not intrinsically better at research than men - why a greater fraction of female-authored submissions are accepted relative to male-authored submissions (or vice versa). For example, if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Identifying policy interventions is not a major contribution of this paper

      I would take out the final sentence in the abstract. In my opinion, your survey evidence isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major - or even minor - contribution of your paper. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!) While it's fine to briefly discuss them at the end of your paper - as you currently do - I wouldn't highlight that in the abstract as being an important contribution of your paper.

      Minor comments

      - What is the rationale for conditioning on academic rank and does this have explanatory power on its own - i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

    3. Reviewer #2 (Public review):

      Basson et al. present compelling evidence supporting a gender disparity in article submission to "elite" journals. Most notably, they found that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. Overall, this work is an important addition to the study of gender disparities in the publishing process.

      I thank the authors for addressing my concerns.

    4. Reviewer #4 (Public review):

      Main strengths

      The topic of the MS is very relevant given that across the sciences/academia, genders are unevenly represented, which has a range of potential negative consequences. To change this, we need to have the evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and the impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with a high impact factor. While previous work has detected this gap and identified some potential mechanisms, the current MS provides strong evidence that this gap might be due to a lower submission rate of women compared to men, rather than the rejection rates. These results are based on a survey of close to 5000 authors. The survey seems to be conducted well (though I am not an expert in surveys), and data analysis is appropriate to address the main research aims. It was impossible to check the original data because of the privacy concerns.

      Interestingly, the results show no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking and are advised not to submit to prestigious journals, indicating that both intrinsic and extrinsic factors shape women's submission behaviour.

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, but also to inform assessment reform at a larger scale.

      I do not find any major weaknesses in the revised manuscript.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This paper summarises responses from a survey completed by around 5,000 academics on their manuscript submission behaviours. The authors find several interesting stylised facts, including (but not limited to):

      - Women are less likely to submit their papers to highly influential journals (*e.g.*, Nature, Science and PNAS).

      - Women are more likely to cite the demands of co-authors as a reason why they didn't submit to highly influential journals.

      - Women are also more likely to say that they were advised not to submit to highly influential journals.

      Recommendation

      This paper highlights an important point, namely that the submissions' behaviours of men and women scientists may not be the same (either due to preferences that vary by gender, selection effects that arise earlier in scientists' careers or social factors that affect men and women differently and also influence submission patterns). As a result, simply observing gender differences in acceptance rates---or a lack thereof---should not be automatically interpreted as as evidence of for or against discrimination (broadly defined) in the peer review process. I do, however, make a few suggestions below that the authors may (or may not) wish to address.

      We thank the author for this comment and for the following suggestions, which we take into account in our revision of the manuscript.

      Major comments

      What do you mean by bias?

      In the second paragraph of the introduction, it is claimed that "if no biases were present in the case of peer review, then 'we should expect the rate with which members of less powerful social groups enjoy successful peer review outcomes to be proportionate to their representation in submission rates." There are a couple of issues with this statement.

      - First, the authors are implicitly making a normative assumption that manuscript submission and acceptance rates *should* be equalised across groups. This may very well be the case, but there can also be important reasons why not -- e.g., if men are more likely to submit their less ground-breaking work, then one might reasonably expect that they experience higher rejection rates compared to women, conditional on submission.

      We do assume that normative statement: unless we believe that men’s papers are intrinsically better than women’s papers, the acceptance rate should be the same. But the referee is right: we have no way of controlling for the intrinsic quality of the work of men and women. That said, our manuscript does not show that there is a different acceptance rate for men and women; it shows that women are less likely to submit papers to a subset of journals that are of a lower Journal Impact Factor, controlling for their most cited paper, in an attempt to control for intrinsic quality of the manuscripts.

      - Second, I assume by "bias", the authors are taking a broad definition, i.e., they are not only including factors that specifically relate to gender but also factors that are themselves independent of gender but nevertheless disproportionately are associated with one gender or another (e.g., perhaps women are more likely to write on certain topics and those topics are rated more poorly by (more prevalent) male referees; alternatively, referees may be more likely to accept articles by authors they've met before, most referees are men and men are more likely to have met a given author if he's male instead of female). If that is the case, I would define more clearly what you mean by bias. (And if that isn't the case, then I would encourage the authors to consider a broader definition of "bias"!)

      Yes, the referee is right that we are taking a broad definition of bias. We provide a definition of bias on page 3, line 92. This definition is focused on differential evaluation which leads to differential outcomes. We also hedge our conversation (e.g., page 3, line 104) to acknowledge that observations of disparities may only be an indicator of potential bias, as many other things could explain the disparity. In short, disparities are a necessary but insufficient indicator of bias. We add a line in the introduction to reinforce this. The only other reference to the term bias comes on page 10, line 276. We add a reference to Lee here to contextualize.

      Identifying policy interventions is not a major contribution of this paper

      In my opinion, the survey evidence reported here isn't really strong enough to support definitive policy interventions to address the issue and, indeed, providing policy advice is not a major -- or even minor -- contribution of your paper, so I would not mention policy interventions in the abstract. (Basically, I would hope that someone interested in policy interventions would consult another paper that much more thoughtfully and comprehensively discusses the costs and benefits of various interventions!)

      We thank the referee for this comment. While we agree that our results do not lead to definitive policy interventions, we believe that our findings point to a phenomenon that should be addressed through policy interventions. Given that some interventions are proposed in our conclusion, we feel like stating this in the abstract is coherent.

      Minor comments

      - What is the rationale for conditioning on academic rank and does this have explanatory power on its own---i.e., does it at least superficially potentially explain part of the gender gap in intention to submit?

      The referee is right: academic rank was added to control for career age of researchers, with the assumption that this variable would influence submission behavior. However, the rank information we collected was for the time that the individual respondent took the survey, which could be different from the rank they held concerning their submission behaviors mentioned in the survey. That is why we didn't consider rank as an independent variable of interest. But I do also agree with the reviewer that it could be related to their submission behaviors in some cases. Our initial analysis shows that academic rank is not a significant predictor of whether researchers submitted to SNP, but does contribute significantly to the SNP acceptance rates and desk rejection rates of individuals in Medical Sciences.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Basson et al. study the representation of women in "high-impact" journals through the lens of gendered submission behavior. This work is clear and thorough, and it provides new insights into gender disparities in submissions, such as that women were more likely to avoid submitting to one of these journals based on advice from a colleague/mentor. The results have broad implications for all academic communities and may help toward reducing gender disparities in "high-impact" journal submissions. I enjoyed reading this article, and I have several recommendations regarding the methodology/reporting details that could help to enhance this work.

      We thank the referee for their comments.

      Strengths:

      This is an important area of investigation that is often overlooked in the study of gender bias in publishing. Several strengths of the paper include:

      (1) A comprehensive survey of thousands of academics. It is admirable that the authors retroactively reached out to other researchers and collected an extensive amount of data.

      (2) Overall, the modeling procedures appear thorough, and many different questions are modeled.

      (3) There are interesting new results, as well as a thoughtful discussion. This work will likely spark further investigation into gender bias in submission behavior, particularly regarding the possible gendered effect of mentorship on article submission.

      Thank you for those comments.

      Weaknesses:

      (1) The GitHub page should be further clarified. A detailed description of how to run the analysis and the location of the data would be helpful. For example, although the paper says that "Aggregated and de-identified data by gender, discipline, and rank for analyses are available on GitHub," I was unable to find such data.

      We added the link to the Github page, as well as more details on the how to run the statistical analysis. Unfortunately, our IRB approval does not allow for the sharing of the raw data.

      (2) Why is desk rejection rate defined as "the number of manuscripts that did not go out for peer review divided by the number of manuscripts rejected for each survey respondent"? For example, in your Grossman 2020 reference, it appears that manuscripts are categorized as "reviewed" or "desk-rejected" (Grossman Figure 2). If there are gender differences in the denominator, then this could affect the results.

      We thank the referee for pointing this out. Actually, what the referee is proposing is how we calculated it in the manuscript; the calculation mentioned in the manuscript was a mistake. We corrected the manuscript.

      (3) Have you considered correcting for multiple comparisons? Alternatively, you could consider reporting P-values and effect sizes in the main text. Otherwise, sometimes the conclusions can be misleading. For example, in Figure 3 (and Table S28), the effect is described as significant in Social Sciences (p=0.04) but not in Medical Sciences (p=0.07).

      We highly appreciate the suggestion. We’ve added Odds Ratio values and p-values to the main manuscript.

      (4) More detail about the models could be included. It may be helpful to include this in each table caption so that it is clear what all the terms of the model were. For instance, I was wondering if journal or discipline are included in the models.

      We appreciate the suggestion. We’ve added model details to the figure and table captions in the manuscript and the supplemental materials.

      Reviewer #3 (Public Review):

      Summary:

      This is a strong manuscript by Basson and colleagues which contributes to our understanding of gender disparities in scientific publishing. The authors examine attitudes and behaviors related to manuscript submission in influential journals (specifically, Science, Nature and PNAS). The authors rightly note that much attention has been paid to gender disparities in work that is already published, but this fails to capture the unseen hurdles that occur prior to publication (which include decisions about where to publish, desk rejections, revisions and resubmissions, etc.). They conducted a survey study to address some of these components and their results are interesting:

      They find that women are less likely to submit their manuscript to Science, Nature or PNAS. While both men and women feel their work would be better suited for more specialized journals, women were more likely to think their work was 'less novel or groundbreaking.'

      A smaller proportion of respondents indicated that they were actively discouraged from submitting their manuscripts to these journals. In this instance, women were more likely to receive this advice than men.

      Lastly, the authors also looked at self-reported acceptance and rejection rates and found that there were no gender differences in acceptance or rejection rates.

      These data are helpful in developing strategies to mitigate gender disparities in influential journals.

      We thank the referee for their comments

      Comments:

      The methods the authors used are appropriate for this study. The low response rate is common for this type of recruitment strategy. The authors provide a thoughtful interpretation of their data in the Discussion.

      We thank the referee for their comments

      Reviewer #4 (Public Review):

      This manuscript covers an important topic of gender biases in the authorship of scientific publications. Specifically, it investigates potential mechanisms behind these biases, using a solid approach, based on a survey of researchers.

      Main strengths

      The topic of the MS is very relevant given that across sciences/academia representation of genders is uneven, and identified as concerning. To change this, we need to have evidence on what mechanisms cause this pattern. Given that promotion and merit in academia are still largely based on the number of publications and impact factor, one part of the gap likely originates from differences in publication rates of women compared to men.

      Women are underrepresented compared to men in journals with high impact factor. While previous work has detected this gap, as well as some potential mechanisms, the current MS provides strong evidence, based on a survey of close to 5000 authors, that this gap might be due to lower submission rates of women compared to men, rather than the rejection rates. The data analysis is appropriate to address the main research aims. The results interestingly show that there is no gender bias in rejection rates (desk rejection or overall) in three high-impact journals (Science, Nature, PNAS). However, submission rates are lower for women compared to men, indicating that gender biases might act through this pathway. The survey also showed that women are more likely to rate their work as not groundbreaking, and be advised not to submit to prestigious journals

      With these results, the MS has the potential to inform actions to reduce gender bias in publishing, and actions to include other forms of measuring scientific impact and merit.

      We thank the referee for their comments.

      Main weakness and suggestions for improvement

      (1) The main message/further actions: I feel that the MS fails to sufficiently emphasise the need for a different evaluation system for researchers (and their research). While we might act to support women to submit more to high-impact journals, we could also (and several initiatives do this) consider a broader spectrum of merits (e.g. see https://coara.eu/ ). Thus, I suggest more space to discuss this route in the Discussion. Also, I would suggest changing the terms that imply that prestigious journals have a better quality of research or the highest scientific impact (line 40: journals of the highest scientific impact) with terms that actually state what we definitely know (i.e. that they have the highest impact factor). And think this could broaden the impact of the MS

      We agree with the referee. We changed the wording on impact, and added a few lines were added on this in the discussion.

      (2) Methods: while methods are all sound, in places it is difficult to understand what has been done or measured. For example, only quite late (as far as I can find, it's in the supplement) we learn the type of authorship considered in the MS is the corresponding authorship. This information should be clear from the very start (including the Abstract).

      We performed the suggested edits.

      Second, I am unclear about the question on the perceived quality of research work. Was this quality defined for researchers, as quality can mean different things (e.g. how robust their set-up was, how important their research question was)? If researchers have different definitions of what quality means, this can cause additional heterogeneity in responses. Given that the survey cannot be repeated now, maybe this can be discussed as a limitation.

      We agree that this can mean something different for researchers—probably varies by discipline, but also by gender. But that was precisely the point: whether men/women considered their “best work” to be published in higher impact venue. While there may be heterogeneity in those perceptions, the fact that 1) men and women rate their research at the same level and 2) we control for disciplinary differences should mitigate some of that.

      I was surprised to see that discipline was considered as a moderator for some of the analyses but not for the main analysis on the acceptance and rejection rates.

      We appreciate the attention to detail. In our analysis of acceptance and rejection rates, we conducted separate regression analyses for each discipline to capture any field-specific patterns that might otherwise be obscured.

      We added more details on this to clarify.

      I was also suppressed not to see publication charges as one of the reasons asked for not submitting to selected journals. Low and middle-income countries often have more women in science but are also less likely to support high publication charges.

      That is a good point. However, both Science and Nature have subscription options, which do not require any APCs.

      Finally, academic rank was asked of respondents but was not taken as a moderator.

      Academic rank is included in the regression as a control variable (Figure 1).

      Reviewer #2 (Recommendations For The Authors):

      In addition to the points in the "Weaknesses" section of the my Public Review above, I have several suggestions to improve this work.

      (1) Can you please indicate what the error bars mean in each plot? I am assuming that they are 95% confidence intervals.

      We appreciate the attention to detail. Yes, they are 95% confidence intervals. We’ve clarified this in the captions of the corresponding figures. 

      (2) Can you provide a more detailed explanation for why the 7 journals were separated? I see that on page 3 of the supporting information you write that "Due to limited responses, analysis per journal was not always viable. The results pertaining to the journals were aggregated, with new categories based on the shared similarities in disciplinary foci of the journals and their prestige." Specifically, why did you divide the data into (somewhat arbitrary) categories as opposed to using all the data and including a journal term in your model?

      The survey covered 7 journals:

      • Science, Nature, and PNAS (S.N.P.)

      • Nature Communications and Science Advances (NC.SA.)

      • NEJM and Cell (NEJM.C.)

      We believe that the first three are a class of their own: they cover all fields (while NEJM and Cell are limited to (bio)medical sciences), and have a much higher symbolic capital than both Nature Comms and Science Advances (which are receiving cascading papers from Nature and Science, respectively). We believe that factors leading to submission to S.N.P. are much different than those leading to submission to the other groups of journals, which is why we separated the analysis in that manner.

      (3) You included random effects for linear regression but not for logistic regression. Please justify this choice or include additional logistic regression models with random effects.

      We used mixed-effect models for linear regressions (where number of submissions, acceptance rate, or rejection rate is the dependent variable). As mentioned in the previous comment, we tested using rank as the control variable and found it had a potential impact on the variables we analyzed using linear regressions in some disciplines. Therefore, we introduced it as a random effect for all the linear regression models.

      Reviewer #3 (Recommendations For The Authors):

      The limitations of this work are currently described in the Supplement. It may be helpful to bring several of these items into the Discussion so that they can be addressed more prominently.

      Added content

      Reviewer #4 (Recommendations For The Authors):

      (1) Line 40: add 'as leading authors of papers published in' before ' 'journals'

      Done

      (2) Explain what the direction in the ' relationship between' line 62 is

      Added

      (3) Lines 101-102 - this is a bit unclear. Please, provide some more info, also including what did these studies find.

      Added

      (4) Is 'sociodemographic' the best term in line 120

      Yes, we believe so.

      (5) Results would benefit from a short intro with the info on the number of respondents, also by gender.

      Those are present at the end of the intro (and in the methods, at the end). We nonetheless added gender.

      (6) Line 134 add how many woman and man did submit to Science, Nature, and PNAS

      Added. In all disciplines combined, 552 women and 1,583 men ever submitted to these three elite journals. More details can be found in SI Table 9

      (7) Add 'Self-' before reported, line 141

      Added

      (8) Add sample sizes to Figs 1 and 2

      Those are in the appendix

      (9) Line 168 - unclear if this is ever or as their first choice

      We do not discriminate – it is whether the considered it at all.

      (10) Add sample size in line 177

      Added. 480 women and 1404 men across all disciplines reported desk rejections by S.N.P. journals.

      (11) I would like to see some discussion on the fact that the highest citation paper will also be a paper that the authors have submitted earlier in their careers given that citations will pile up over time.

      Those are actually quite evenly distributed. We modified the supplementary materials.

      (12) Data availability - be clear that supporting info contains only summary data. Also, while the Data availability statement refers to de-identified data on Github, the Github page only contains the code, and the note that 'The STAT code used for our analyses is shared.

      We are unable to share the survey response details publicly per IRB protocols.' Why were de-identified data shared? This is extremely important to allow for the reproducibility of MS results. I would also suggest sharing data in a trusted repository (e.g. Dryad, ZENODO...) rather than on Github, as per current recommendations on the best practices for data sharing.

      Thank you for your careful reading and for highlighting the importance of clear data availability. We will revise our Data Availability Statement to explicitly state that the supporting information contains only summary data and that the complete analysis code is available on GitHub.

      We understand the importance of sharing de-identified data for reproducibility. However, our IRB strictly prohibits the sharing of any individual-level data, including de-identified files, to protect participant confidentiality. Consequently, the summary data included in the supporting information, together with the provided code, is intended to facilitate the verification of our core findings. Our previous statement regarding “de-identified” data sharing was inaccurate and thus has been removed. We apologize for the confusion.

      In light of your suggestion, we are also exploring depositing the summary data and code in a trusted repository (e.g., Dryad or Zenodo) to further align with current best practices for data sharing.

    1. eLife Assessment

      In this useful study, the authors perform voltage imaging of CA1 pyramidal cells in head-fixed mice running on a track while local field potentials (LFPs) were recorded in the contralateral hemisphere. The authors conclude that synchronous ensembles of neurons are associated with theta rhythms but not with contralateral sharp wave-ripples. However, evidence for some of the paper's primary claims remains incomplete, due to limitations of the experimental approach.

    2. Joint Public Review:

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using innovative imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. The authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The single cell voltage imaging used in this study is a highly novel method that may allow recordings that were not previously possible using existing methods.

      Weaknesses:

      The strength of evidence remains incomplete because of the main claim that synchronous events are not associated with ripples. As was mentioned in previous rounds of review, ripples emerge locally and independently in the two hemispheres. Thus, obtaining ripple recordings from the contralateral hemisphere does not provide solid evidence for this claim. The papers the authors are citing to make the claim that "Additionally, we implanted electrodes in the contralateral CA1 region to monitor theta and ripple oscillations, which are known to co-occur across hemispheres (29-31)" do not support this claim. For example, reference 29 contains the following statement: "These findings suggest that ripples emerge locally and independently in the two hemispheres".

    3. Author response:

      The following is the authors’ response to the current reviews.

      We thank the editor and reviewers for their thoughtful evaluations. We would like to clarify that the revised manuscript does not make a general claim about the absence of ripple-associated synchronous population activity. Rather, we report only that the synchronous ensembles observed in our data were not associated with contralateral ripple oscillations. This distinction is clearly reflected in the revised Title, Abstract, Introduction, Results, and Discussion. We also explicitly acknowledged the methodological limitation of recording LFP from the contralateral side of the hippocampus.

      To further improve clarity and prevent potential misinterpretation, we are submitting a revised version (R4) in which we:

      (1) Replace the word "surprisingly" with the more neutral "Moreover";

      (2) Refer to ripple events consistently as "contralateral ripples (c-ripples)";

      (3)Expand the discussion of limitations inherent to contralateral LFP recordings.

      Additionally, while Buzsaki et al. (2003) wrote that "These findings suggest ripples emerge locally and independently in the two hemispheres", the same study also presents data and reports that "Ripple episodes occurred simultaneously in the left and right CA1 regions" (p. 206). Our original citation was intended to reflect this nuance. Nevertheless, to avoid any potential misinterpretation, we have removed the co-occurrence statement with its associated citations in the revised (R4) manuscript.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      For many years, there has been extensive electrophysiological research investigating the relationship between local field potential patterns and individual cell spike patterns in the hippocampus. In this study, using state-ofthe-art imaging techniques, they examined spike synchrony of hippocampal cells during locomotion and immobility states. In contrast to conventional understanding of the hippocampus, the authors demonstrated that hippocampal place cells exhibit prominent synchronous spikes locked to theta oscillations.

      Strengths:

      The voltage imaging used in this study is a highly novel method that allows recording not only suprathreshold-level spikes but also subthreshold-level activity. With its high frame rate, it offers time resolution comparable to electrophysiological recordings.

      Comments on revisions: I have no further comments.

      We thank the reviewer for constructive reviews and for recognizing the strength of our study.

      Reviewer #2 (Public review):

      Summary:

      This study employed voltage imaging in the CA1 region of the mouse hippocampus during the exploration of a novel environment. The authors report synchronous activity, involving almost half of the imaged neurons, occurred during periods of immobility. These events did not correlate with SWRs, but instead, occurred during theta oscillations and were phased locked to the trough of theta. Moreover, pairs of neurons with high synchronization tended to display non-overlapping place fields, leading the authors to suggest these events may play a role in binding a distributed representation of the context.

      Strengths:

      Technically this is an impressive study, using an emerging approach that allows single cell resolution voltage imaging in animals, that while head-fixed, can move through a real environment. The paper is written clearly and suggests novel observations about population level activity in CA1.

      Comments on revisions:

      I have no further major requests and thank the authors for the additional data and analyses.

      We thank the reviewer for recognizing the strength of our study and for appreciating the additional data and analyses we provided during the revision process.

      Reviewer #3 (Public review):

      Summary:

      In the present manuscript, the authors use a few minutes of voltage imaging of CA1 pyramidal cells in head fixed mice running on a track while local field potential (LFPs) are recorded. The authors suggest that synchronous ensembles of neurons are differentially associated with different types of LFP patterns, theta and ripples. The experiments are flawed in that the LFP is not "local" but rather collected the other side of the brain.

      Strengths:

      The authors use a cutting-edge technique.

      Weaknesses:

      Although the authors have toned down their claims, the statement in the title ("Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Theta but not Ripple Oscillations During Novel Exploration") is still unsupported.

      One could write the same title while voltage imaging one mouse and recording LFP from another mouse.

      To properly convey the results, the title should be modified to read

      "Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Contralateral Theta but not with Contralateral Ripple Oscillations During Novel Exploration"

      Without making this change, the title - and therefore the entire work - is misleading at best.

      We thank the reviewer for the thoughtful and constructive suggestion regarding the title. We fully understand the concern that our original title may have overstated the specificity of the contralateral LFP recordings, potentially allowing for misinterpretation.

      In our results, synchronous ensembles are associated with intracellular theta oscillations recorded from the ipsilateral hippocampus and with extracellular theta but not ripples oscillations recorded from the contralateral hippocampus. To clarify this distinction and minimize the potential for misinterpretation, we have revised the abstract accordingly. 

      Abstract (line18):

      “… Notably, these synchronous ensembles were not associated with contralateral ripple oscillations but were instead phase-locked to theta waves recorded in the contralateral CA1 region. Moreover, the subthreshold membrane potentials of neurons exhibited coherent intracellular theta oscillations with a depolarizing peak at the moment of synchrony.”

      Based on this, we propose the following revised title, which we believe more effectively communicates the central finding of our study: 

      “Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons During Novel Exploration”. 

      Compared to the reviewer’s suggested title, this version offers a clearer and more concise summary of our findings while allowing important methodological details to be fully conveyed in the abstract and main text. While the suggested title accurately reflects the source of the LFP signals, it does not mention the intracellular theta oscillations recorded from the ipsilateral hippocampus, which are a critical part of our results. Including both the intracellular and extracellular recording contexts in the title would make it overly long and potentially less accessible to readers. In contrast, the revised title succinctly captures the core phenomenon, and the updated abstract now explicitly clarifies the relationship between the synchronous ensembles and both types of oscillatory signals. 

      We sincerely appreciate the reviewer’s input, which helped us refine both the language and the presentation of our findings. We hope these changes address the concern and clarify the scope of our work. 

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      (1) Change the title. Although the authors have toned down their claims, the statement in the title ("Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Theta but not Ripple Oscillations During Novel Exploration") is still unsupported. One could write the same title while voltage imaging one mouse and recording LFP from another mouse. To properly convey the results, the title should be modified to read

      "Synchronous Ensembles of Hippocampal CA1 Pyramidal Neurons Associated with Contralateral Theta but not with Contralateral Ripple Oscillations During Novel Exploration"

      Without making this change, the title - and therefore the entire work - is misleading at best. But if you can manage that (and attend to comment #2 below), then the manuscript would not be making any false statements.

      Please see our reply in the public review above.

      (2) Report the exact locations of the contralateral recording electrodes. In their rebuttal, the authors supplies a figure ("Author response image 1") in which they show damage to the neocortex and fluorescence signal in the CA1 pyramidal cell layer. This is useful, but it is unclear from which animal this histology was generated.

      Please include this (or another similar) photograph in Figure 1B, right next to the voltage imaging photograph. Indicate from which animal each photograph was obtained - ideally, provide the two photographs from the same animal. Second, please include such paired photographs - along with paired signals - for every animal that you are able to.

      If you can manage that, it will add credibility to the statement that the recordings are indeed from the contralateral CA1 pyramidal cell layer (as opposed to from the contralateral hemisphere).

      We thank the reviewer for this important point. We have followed the suggestion and now provide paired photographs showing LFP electrode tracks and voltage images from the same animal (see revised Figure 1B)

      In addition, we have included similar paired photographs for additional animals used in this study (see Figure 1-figure supplement 1).

      These updates directly support the claim that LFP recordings were obtained from the contralateral CA1 pyramidal layer, rather than from the contralateral hemisphere. We sincerely thank the reviewer for the valuable suggestion, which has substantially strengthened our manuscript.

    1. eLife Assessment

      This valuable study reveals surprising morphological diversity of Drosophila sensory neurons. Using serial block-face electron microscopy, the authors created detailed 3D reconstructions of large neuronal populations, convincingly finding significant structural variation both within and across distinct classes. These results form the basis for testable hypotheses on how neuronal arborization is optimized for particular sensory functions. This research will be highly relevant to biologists in the fields of physiology, insect chemosensation, and neuroscience.

    2. Reviewer #1 (Public review):

      The authors of this study use electron microscopy and 3D reconstruction techniques to study the morphology of distinct classes of Drosophila sensory neurons *across many neurons of the same class.* This is a comprehensive study attempting to look at nearly all the sensory neurons across multiple sensilla in the same animal to determine a) how much morphological variability exists between and within neurons of different and similar sensory classes and b) identify dendritic features that may have evolved to support particular sensory functions. This study builds upon the authors' previous work which allowed them to identify and distinguish sensory neuron subtypes in the EM volumes without additional staining so that reconstructed neurons could reliably be placed in the appropriate class. This work is unique in looking at a large number of individual neurons of the same class to determine what is consistent and what is variable about their class-specific morphologies.

      This means that in addition to providing specific structural information about these particular cells, the authors explore broader questions of how much morphological diversity exists between sensory neurons of the same class. This then informs our conceptualization about how different dendritic morphologies might affect specific sensory and physiological properties of neurons.

      The authors found that CO2 sensing neurons have an unusual, sheet-like morphology in contrast to the thin branches of odor-sensing neurons. They show that this morphology greatly increases the surface area to volume ratio above what could be achieved by modest branching of thin dendrites, and posit that this might be important for their sensory function, though this was not directly tested in their study due to technical limitations. The study is mainly descriptive in nature, but thorough, and provides a nice jumping off point for future functional studies. One interesting future analysis could be to examine all four cell types within a single sensilla together to see if there are any general correlations that could reveal insights about how morphology is determined and relative contributions of intrinsic mechanisms vs interactions with neighboring cells. For example, if higher-than-average branching in one cell type correlated with higher-than-average branching in another type when within the same sensilla, it might suggest differential amounts of extracellular growth or branching cues within a given sensillum drive any heterogeneity observed within a class across sensilla. Conversely, if higher branching in one cell type consistently leads to reduced length or branching of the other neurons within its sensillum, this might point to dendrite-dendrite interactions between cells undergoing competitive or repulsive interactions to define territories within each sensillum as a major determinant of the variability.

      Strengths:

      This work provides a thorough morphometric analysis of the neurons of the *majority of all ab1 sensilla* across a single antenna. The authors use this analysis to 1) characterize the unique dendritic architecture of ab1C neurons relative to other ORNs including ab1D and 2) provide evidence of substantial morphological diversity even within a single subclass of neuron.

      Weaknesses:

      This is primarily a descriptive paper due to technical limitations since it is not currently technically feasible to determine individual ORN response properties and tie them to identified neurons with detailed EM-based ultrastructural analyses, nor to predictably alter dendritic morphology of these cells to directly test how different morphologies affect sensory function. However, the quantitative descriptive findings presented here will shape these future questions and are necessary for any such future work.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript employs serial block‐face electron microscopy (SBEM) and cryofixation to obtain high‐resolution, three‐dimensional reconstructions of Drosophila antennal sensilla containing olfactory receptor neurons (ORNs) that detect CO2. This method has been used previously by the same lab in Gonzales et. al, 2021. (https://elifesciences.org/articles/69896), and Zhang et. al, 2019 Nature Communications. The previous study by Zhang also correlated morphometric measurements from SBEM with asymmetric ephaptic activity for paired neurons using electrophysiology across multiple olfactory sensilla. This manuscript applies the same SBEM method to now characterize the ab1 sensillum which houses the ab1C, CO2 detecting neuron, but stops short of integration neuronal activity with structural variability.

      The SBEM-based morphometric studies do however significantly advance preliminary observations from older two-dimensional TEM-based reports. Previous images of the putative CO2 neuron in Drosophila (Shanbhag et al., 1999) and in mosquitoes (McIver and Siemicki, 1975; Lu et al, 2007) reported that the dendritic architecture of the CO2 neuron was somewhat different (circular and flattened, lamellated) from other olfactory neurons in the antenna of insects. In this study, the authors confirm this different morphology but also classify it into distinct subtypes (loosely curled, fully curled, split, and mixed).

      Strengths:

      The study makes a convincing case that ab1C neurons exhibit a unique, dendritic morphology unlike the canonical cylindrical dendrites found in ab1D neurons. This observation extends previous qualitative TEM findings by not only confirming the presence of flattened lamellae in CO₂ neurons but also quantifying key morphometrics such as dendritic length, surface area, and volume, and calculating surface area-to-volume ratios. The enhanced ratios observed in the flattened segments are speculated to be linked to potential advantages in receptor distribution (e.g., Gr21a/Gr63a) and efficient signal propagation.

      Weaknesses:

      Although this quantitative approach is very robust compared to earlier reports, interpretations are somewhat limited by the absence of direct electrophysiological data to confirm whether ultrastructural differences translate into altered neuronal function. The biggest question remains unanswered: whether structural variation observed in the ab1C dendrites by SBEM have an electrophysiological functional relevance?

      Surveys of ab1 sensillum with single-sensillum recordings (even a few from multiple Drosophila antenna) as they have done for ab2s and others in the past, would have measured spontaneous activity, spike amplitude, and response to CO2. This could have allowed for comparison of frequency of functional variation, if any, to structural variation and a discussion would therefore have strengthened the overall characterization. In the case of ab2 sensilla the authors find very little variance, could the ab1 also be the same? In the absence of this data, it becomes hard to speculate whether structural variation observed in the ab1C dendrites by SBEM have any functional relevance or whether they are simply random variations in dendrite development.

      Additionally, artifacts could be a consideration, even though Cryofixation is superior to chemical fixation. Although this is hard to address, all types of fixations in TEMs cause some artifacts, as does serial sectioning. An understanding of the error rates for the SBEM method would have increased the confidence in the conclusions drawn. For example, what is the structural variation of SBEMs in the ab2 population, which shows very little electrophysiological variation? Can a comparison be done?

    4. Reviewer #3 (Public review):

      Summary:

      In the current manuscript entitled "Population-level morphological analysis of paired CO2- and odor-sensing olfactory neurons in D. melanogaster via volume electron microscopy", Choy, Charara et al. use volume electron microscopy and neuron reconstruction to compare the dendritic morphology of ab1C and ab1D neurons of the Drosophila basiconic ab1 sensillum. They aim to investigate the degree of dendritic heterogenity within a functional class of neurons using ab1C and ab1D, which they can identify due to the unique feature of ab1 sensilla to house four neurons and the stereotypic location on the third antennal segment. This is a great use of volumetric electron imaging and neuron reconstruction to sample a population of neurons of the same type. Their data convincingly shows that there is dendritic heterogenity in both investigated populations and their sample size is sufficient to strongly support this observation. This data proposes that the phenomenon of dendritic heterogenity is common in the Drosophila olfactory system and will stimulate future investigations into the developmental origin, functional implications and potential adaptive advantage of this feature.

      Moreover, the authors discovered that there is a difference between CO2- and odour sensing neurons of which the first show a characteristic flattened and sheet-like structure not observed in other sensory neurons sampled in this and previous studies. They hypothesize that this unique dendritic organization which increases the surface area to volume ratio, might allow more efficient Co2 sensing by housing higher numbers of Co2 receptors. This is supported by previous attempts to express Co2 sensors in olfactory sensory neurons which lack this dendritic morphology, resulting in lower Co2 sensitivity compared to endogenous neurons.

      Overall, this detailed morphological description of olfactory sensory neurons' dendrites convincingly shows heterogeneity in two neuron classes with potential functional impacts for odour sensing.

      Strength:

      The volumetric EM imaging and reconstruction approach offers unpreceeded details in single cell morphology and compares dendrite heterogenity across a great fraction of ab1 sensilla.<br /> The authors identify specific shapes for ab1C sensilla potentially linked to their unique function in CO2 sensing.

      Weaknesses:

      While the morphological description is highly detailed, current methods prevent linking morphology to odour sensitivity or other properties of the neurons. Therefore, this study remains mainly descriptive and will require future work to link neuron structure and function.

    1. eLife Assessment

      This important work develops C. elegans as a model organism for studying effort-based discounting by asking the worms to choose between easy and hard to digest bacteria. The authors provide convincing evidence that the nematodes are effort-discounting. However, evidence regarding the role of dopamine is incomplete and this weakens the authors connection of the behavior in C. elegans with mammals.

    2. Reviewer #1 (Public Review):

      Summary:

      Here, Millet et al. consider whether the nematode C. elegans 'discounts' the value of reward due to effort in a manner similar to that shown in other species, including rodents and humans. They designed a T-maze effort choice paradigm inspired by previous literature, but manipulated how effortful the food is to consume. C. elegans worms were sensitive to this novel manipulation, exhibiting effort-discounting-like behaviour that could be shaped by varying the density of food at each alternative in order to calculate an indifference point. This discounting-like behaviour was related to worms' rates of patch leaving, which differed between the low and high effort patches in isolation. The authors also found a potential relationship to dopamine signalling, and also that this discounting behaviour was not specific to lab-based strains of C. elegans.

      Strengths:

      The question is well-motivated, and the approach taken here is novel. The authors are careful in their approach to altering and testing the properties of the effortful, elongated bacteria. Similarly, they go to some effort to understand what exactly is driving behavioural choices in this context, both through the application of simple standard models of effort discounting and a kinetic analysis of patch leaving. The comparisons to various dopamine mutants further extend the translational potential of their findings. I also appreciate the comparison to natural isolate strains, as the question of whether this behaviour may be driven by some sort of strain-specific adaptation to the environment is not regularly addressed in mammalian counterparts. The manuscript is well-written, and the figures are clear and comprehensible.

      Weaknesses:

      Discounting is typically defined as the alteration of a subjective value by effort (or time, risk, etc.), which is then used to guide future decision-making. By adapting the standard t-maze task for C. elegans as a patch-leaving paradigm, the authors observe behaviour strongly consistent with discounting models, but that is likely driven by a different process, in particular by an online estimate of the type of food in the current patch, which then influences patch-leaving dynamics (Figure 3). This is fundamentally different from decision-making strategies relating to effort that have been described in the rodent and human literatures. Similarly, the calculation of indifference points at the group instead of at the individual level also suggests a different underlying process and limits the translational potential of their findings. The authors do not discuss the implications of these differences or why they chose not to attempt a more analogous trial-based experiment.

      In the case of both the dopamine and natural isolate experiments, the data are very noisy despite large (relative to other C. elegans experiments) sample sizes. In the dopamine experiment, disruption of dop-1, dop-2, and cat-2 had no statistically significant effect. There do not appear to be any corrections for multiple comparisons, and the single significant comparison, for dop-3, had a small effect size. More detailed behavioural analyses on both these and the wild isolate strains, for example by applying their kinetic analysis, would likely give greater insight as to what is driving these inconsistent effects.

    3. Reviewer #2 (Public Review):

      Summary:

      Millet et al. show that C. elegans systematically prefers easy-to-eat bacteria but will switch its choice when harder-to-eat bacteria are offered at higher densities, producing indifference points that fit standard economic discounting models. Detailed kinetic analysis reveals that this bias arises from unchanged patch-entry rates but significantly elevated exit rates on effortful food, and dop-3 mutants lose the preference altogether, implicating dopamine in effort sensitivity. These findings extend effort-discounting behavior to a simple nematode, pushing the phylogenetic boundary of economic cost-benefit decision-making.

      Strengths:

      (1) Extends the well-characterized concept of effort discounting into _C. elegans_, setting a new phylogenetic boundary and opening invertebrate genetics to economic-behavior studies.

      (2) Elegant use of cephalexin-elongated bacteria to manipulate "effort" without altering nutritional or olfactory cues, yielding clear preference reversals and reproducible indifference points.

      (3) Application of standard discounting models to predict novel indifference points is both rigorous and quantitatively satisfying, reinforcing the interpretation of worm behavior in economic terms.

      (4) The three-state patch-model cleanly separates entry and exit dynamics, showing that increased leaving rates-rather than altered re-entry-drive choice biases.

      (5) Investigates the role of dopamine in this behavior to try to establish shared mechanisms with vertebrates.

      (6) Demonstration of discounting in wild strain (solid evidence).

      Weaknesses:

      (1) The kinetic model omits rich trajectory details-such as turning angles or hazard functions-that could distinguish a bona fide roaming transition from other exit behaviors.

      (2) Only _dop-3_ shows an effect, and the statistical validity of this result is questionable. It is not clear if the authors corrected for multiple comparisons, and the effect size is quite small and noisy, given the large number of worms tested. Other mutants do not show effects. Given these two concerns, the role of dopamine in c. elegans effort discounting was unconvincing.

      (3) With only five wild isolates tested (and variable data quality), it's hard to conclude that effort discounting isn't a lab-strain artifact or how broadly it varies in natural populations.

      (4) Detailed analysis of behavior beyond preference indices would strengthen the dopamine link and the claim of effort discounting in wild strains.

      (5) A few mechanistic statements (e.g., tying satiety exclusively to nutrient signals) would benefit from explicit citations or brief clarifications for non-worm specialists.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors establish a behavioral task to explore effort discounting in C. elegans. By using bacterial food that takes longer to consume, the authors show that, for equivalent effort, as measured by pumping rate, they obtain less food, as measured by fat deposition.

      The authors formalize the task by applying a formal neuroeconomic decision-making model that includes value, effort, and discounting. They use this to estimate the discounting that C. elegans applies based on ingestion effort by using a population-level 2-choice T-maze.

      They then analyze the behavioral dynamics of individual animals transitioning between on-food and off-food states. Harder to ingest bacteria led to increased food patch leaving.

      Finally, they examined a set of mutants defective in different aspects of dopamine signaling, as dopamine plays a key role in discounting in vertebrates and regulates certain aspects of C. elegans foraging.

      Strengths:

      The behavioral experiments and neuroeconomic analysis framework are compelling and interesting, and make a significant contribution to the field. While these foraging behaviors have been extensively studied, few include clearly articulated theoretical models to be tested.

      Demonstrating that C. elegans effort discounting fits model predictions and has stable indifference points is important for establishing these tasks as a model for decision making.

      Weaknesses:

      The dopamine experiments are harder to interpret. The authors point out the perplexing lack of an effect of dat-1 and cat-2. dop-3 leads to general indifference. I am not sure this is the expected result if the argument is a parallel functional role to discounting in vertebrates. dop-3 causes a range of locomotor phenotypes and may affect feeding (reduced fat storage), and thus, there may be a general defect in the ability to perform the task rather than anything specific to discounting.

      That said, some of the other DA mutants also have locomotor defects and do not differ from N2. But there is no clear result here - my concern is that global mutants in such a critical pathway exhibit such pleiotropy that it's difficult to conclude there is a clear and specific role for DA in effort discounting. This would require more targeted or cell-specific approaches.

      Meanwhile, there are other pathways known to affect responses to food and patch leaving decisions: serotonin, pigment-dispersing factor, tyramine, etc. The paper would have benefited from a clarification about why these were not considered as promising candidates to test (in addition to or instead of dopamine).

    1. eLife Assessment

      This study provides compelling evidence that action potential (AP) broadening is not a universal feature of homeostatic plasticity in response to chronic activity deprivation. By leveraging state-of-the-art methods across multiple brain regions and laboratories, the authors demonstrate that AP half-width remains largely stable, challenging previous assumptions in the field. These important findings help resolve longstanding inconsistencies in the literature and significantly advance our understanding of neuronal network homeostasis.

    2. Reviewer #1 (Public review):

      Summary:

      Ritzau-Jost et al. investigate the potential contribution of AP broadening in homeostatic upregulation of neuronal network activity with a specific focus on dissociated neuronal cultures. In cultures obtained from a few brain regions from mice or rats using different culture conditions and examined by different laboratories, AP half-width remained stable despite chronic activity block with TTX. The finding suggests that AP width is not significantly modulated by changes in sodium channel activity.

      Strengths:

      The collaborative nature of the study amongst the neuronal culture experts and the rigorous electrophysiological assessments provides for a compelling support of the main conclusion.

      Weaknesses:

      Given the negative nature of the results, a couple of remaining issues (such as the cell density of cultures and the presentation of imaging experiments with a voltage sensor) warrant further consideration. In addition, a discussion of the reasons for the seeming stability of AP half-width to sodium channel modulation might help extend the scope of the study beyond the presentation of a negative conclusion.

    3. Reviewer #2 (Public review):

      Summary:

      This study reexamined the idea that action potential broadening serves as a homeostatic mechanism to compensate for changes in network activity. The key finding was that, while action potential broadening does occur in certain neurons - such as CA3 pyramidal cells-it is far from a universal response. This is important because it helps resolve longstanding discrepancies in the field, thereby contributing to a better understanding of network dynamics. The replication of these findings across multiple laboratories further strengthened the study's rigor.

      Strengths:

      Mechanisms of network homeostasis are essential to understand network dynamics.

      Weaknesses:

      No weaknesses were noted by this reviewer.

    4. Reviewer #3 (Public review):

      Summary:

      The manuscript "Unreliable homeostatic action potential broadening in cultured dissociated neurons" by Ritzau-Jost et al. investigates action potential (AP) broadening as a mechanism underlying homeostatic synaptic plasticity. Given the existing variability in the literature concerning AP broadening, the authors address an important and timely research question of considerable interest to the field.

      The study systematically demonstrates cell-type- and model-specific AP broadening in hippocampal neurons after chronic treatment with either tetrodotoxin (TTX) or glutamatergic transmission blockers. The findings indicate AP broadening in CA3 pyramidal neurons in organotypic cultures after TTX treatment, but notably not in dissociated hippocampal neurons under identical conditions. However, blocking glutamatergic neurotransmission caused AP broadening in dissociated hippocampal neurons. Moreover, extensive evaluations in neocortical dissociated cultures robustly challenge previous findings by revealing a lack of AP broadening following TTX treatment. Additionally, the proposed role of BK-type potassium channels in mediating AP broadening is convincingly questioned through complementary electrophysiological and voltage-imaging experiments.

      Strengths:

      The manuscript exhibits an outstanding experimental design, employing state-of-the-art techniques and a rigorous multi-lab validation approach that greatly enhances scientific reliability. The experimental results are meticulously illustrated, and the conclusions drawn are justified and supported by the presented data. Furthermore, the manuscript is comprehensively and clearly written.

      Weaknesses:

      Concerning the statistical analyses employed, it is advisable to consider the Kruskal-Wallis test with corrections for multiple comparisons when evaluating more than two experimental groups.

    1. eLife Assessment

      This valuable work investigates cooperative behaviors in adolescents using a repeated Prisoner's Dilemma game. The computational modeling approach used in the study is solid and well established, yet evidence supporting certain claims remains incomplete. The work could be strengthened with the consideration of additional experimental contexts, non-linear relationships between age and observed behavior, and modeling details. If these concerns are addressed, the results will be of interest to developmental psychologists, economists, and social psychologists.

    2. Reviewer #1 (Public review):

      Summary:

      Wu and colleagues aimed to explain previous findings that adolescents, compared to adults, show reduced cooperation following cooperative behaviour from a partner in several social scenarios. The authors analysed behavioural data from adolescents and adults performing a zero-sum Prisoner's Dilemma task and compared a range of social and non-social reinforcement learning models to identify potential algorithmic differences. Their findings suggest that adolescents' lower cooperation is best explained by a reduced learning rate for cooperative outcomes, rather than differences in prior expectations about the cooperativeness of a partner. The authors situate their results within the broader literature, proposing that adolescents' behaviour reflects a stronger preference for self-interest rather than a deficit in mentalising.

      Strengths:

      The work as a whole suggests that, in line with past work, adolescents prioritise value accumulation, and this can be, in part, explained by algorithmic differences in weighted value learning. The authors situate their work very clearly in past literature, and make it obvious the gap they are testing and trying to explain. The work also includes social contexts that move the field beyond non-social value accumulation in adolescents. The authors compare a series of formal approaches that might explain the results and establish generative and model-comparison procedures to demonstrate the validity of their winning model and individual parameters. The writing was clear, and the presentation of the results was logical and well-structured.

      Weaknesses:

      I also have some concerns about the methods used to fit and approximate parameters of interest. Namely, the use of maximum likelihood versus hierarchical methods to fit models on an individual level, which may reduce some of the outliers noted in the supplement, and also may improve model identifiability.

      There was also little discussion given the structure of the Prisoner's Dilemma, and the strategy of the game (that defection is always dominant), meaning that the preferences of the adolescents cannot necessarily be distinguished from the incentives of the game, i.e. they may seem less cooperative simply because they want to play the dominant strategy, rather than a lower preferences for cooperation if all else was the same.

      Appraisal & Discussion:

      The authors have partially achieved their aims, but I believe the manuscript would benefit from additional methodological clarification, specifically regarding the use of hierarchical model fitting and the inclusion of Bayes Factors, to more robustly support their conclusions. It would also be important to investigate the source of the model confusion observed in two of their models.

      I am unconvinced by the claim that failures in mentalising have been empirically ruled out, even though I am theoretically inclined to believe that adolescents can mentalise using the same procedures as adults. While reinforcement learning models are useful for identifying biases in learning weights, they do not directly capture formal representations of others' mental states. Greater clarity on this point is needed in the discussion, or a toning down of this language.

      Additionally, a more detailed discussion of the incentives embedded in the Prisoner's Dilemma task would be valuable. In particular, the authors' interpretation of reduced adolescent cooperativeness might be reconsidered in light of the zero-sum nature of the game, which differs from broader conceptualisations of cooperation in contexts where defection is not structurally incentivised.

      Overall, I believe this work has the potential to make a meaningful contribution to the field. Its impact would be strengthened by more rigorous modelling checks and fitting procedures, as well as by framing the findings in terms of the specific game-theoretic context, rather than general cooperation.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript investigates age-related differences in cooperative behavior by comparing adolescents and adults in a repeated Prisoner's Dilemma Game (rPDG). The authors find that adolescents exhibit lower levels of cooperation than adults. Specifically, adolescents reciprocate partners' cooperation to a lesser degree than adults do. Through computational modeling, they show that this relatively low cooperation rate is not due to impaired expectations or mentalizing deficits, but rather a diminished intrinsic reward for reciprocity. A social reinforcement learning model with asymmetric learning rate best captured these dynamics, revealing age-related differences in how positive and negative outcomes drive behavioral updates. These findings contribute to understanding the developmental trajectory of cooperation and highlight adolescence as a period marked by heightened sensitivity to immediate rewards at the expense of long-term prosocial gains.

      Strengths:

      (1) Rigid model comparison and parameter recovery procedure.

      (2) Conceptually comprehensive model space.

      (3) Well-powered samples.

      Weaknesses:

      (1) A key conceptual distinction between learning from non-human agents (e.g., bandit machines) and human partners is that the latter are typically assumed to possess stable behavioral dispositions or moral traits. When a non-human source abruptly shifts behavior (e.g., from 80% to 20% reward), learners may simply update their expectations. In contrast, a sudden behavioral shift by a previously cooperative human partner can prompt higher-order inferences about the partner's trustworthiness or the integrity of the experimental setup (e.g., whether the partner is truly interactive or human). The authors may consider whether their modeling framework captures such higher-order social inferences. Specifically, trait-based models-such as those explored in Hackel et al. (2015, Nature Neuroscience)-suggest that learners form enduring beliefs about others' moral dispositions, which then modulate trial-by-trial learning. A learner who believes their partner is inherently cooperative may update less in response to a surprising defection, effectively showing a trait-based dampening of learning rate.

      (2) This asymmetry in belief updating has been observed in prior work (e.g., Siegel et al., 2018, Nature Human Behaviour) and could be captured using a dynamic or belief-weighted learning rate. Models incorporating such mechanisms (e.g., dynamic learning rate models as in Jian Li et al., 2011, Nature Neuroscience) could better account for flexible adjustments in response to surprising behavior, particularly in the social domain.

      (3) Second, the developmental interpretation of the observed effects would be strengthened by considering possible non-linear relationships between age and model parameters. For instance, certain cognitive or affective traits relevant to social learning-such as sensitivity to reciprocity or reward updating-may follow non-monotonic trajectories, peaking in late adolescence or early adulthood. Fitting age as a continuous variable, possibly with quadratic or spline terms, may yield more nuanced developmental insights.

      (4) Finally, the two age groups compared - adolescents (high school students) and adults (university students) - differ not only in age but also in sociocultural and economic backgrounds. High school students are likely more homogenous in regional background (e.g., Beijing locals), while university students may be drawn from a broader geographic and socioeconomic pool. Additionally, differences in financial independence, family structure (e.g., single-child status), and social network complexity may systematically affect cooperative behavior and valuation of rewards. Although these factors are difficult to control fully, the authors should more explicitly address the extent to which their findings reflect biological development versus social and contextual influences.

    4. Reviewer #3 (Public review):

      Summary:

      Wu and colleagues find that in a repeated Prisoner's Dilemma, adolescents, compared to adults, are less likely to increase their cooperation behavior in response to repeated cooperation from a simulated partner. In contrast, after repeated defection by the partner, both age groups show comparable behavior.

      To uncover the mechanisms underlying these patterns, the authors compare eight different models. They report that a social reward learning model, which includes separate learning rates for positive and negative prediction errors, best fits the behavior of both groups. Key parameters in this winning model vary with age: notably, the intrinsic value of cooperating is lower in adolescents. Adults and adolescents also differ in learning rates for positive and negative prediction errors, as well as in the inverse temperature parameter.

      Strengths:

      The modeling results are compelling in their ability to distinguish between learned expectations and the intrinsic value of cooperation. The authors skillfully compare relevant models to demonstrate which mechanisms drive cooperation behavior in the two age groups.

      Weaknesses:

      Some of the claims made are not fully supported by the data:

      The central parameter reflecting preference for cooperation is positive in both groups. Thus, framing the results as self-interest versus other-interest may be misleading.

      It is unclear why the authors assume adolescents and adults have the same expectations about the partner's cooperation, yet simultaneously demonstrate age-related differences in learning about the partner. To support their claim mechanistically, simulations showing that differences in cooperation preference (i.e., the w parameter), rather than differences in learning, drive behavioral differences would be helpful.

      Two different schedules of 120 trials were used: one with stable partner behavior and one with behavior changing after 20 trials. While results for order effects are reported, the results for the stable vs. changing phases within each schedule are not. Since learning is influenced by reward structure, it is important to test whether key findings hold across both phases.

      The division of participants at the legal threshold of 18 years should be more explicitly justified. The age distribution appears continuous rather than clearly split. Providing rationale and including continuous analyses would clarify how groupings were determined.

      Claims of null effects (e.g., in the abstract: "adults increased their intrinsic reward for reciprocating... a pattern absent in adolescents") should be supported with appropriate statistics, such as Bayesian regression.

      Once claims are more closely aligned with the data, the study will offer a valuable contribution to the field, given its use of relevant models and a well-established paradigm.

    1. eLife Assessment

      The authors investigated the potential role of IgG N-glycosylation in Haemorrhagic Fever with Renal Syndrome (HFRS), which may offer significant insights for understanding molecular mechanisms and for the development of therapeutic strategies for this infectious disease. The findings are useful to the field, although the strength of evidence to support the findings is incomplete. Several issues need to be addressed, including more detail on the background, methods, and results. Additional statistical tests should be performed, and the conclusions should reflect the correlational findings of the paper.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the potential role of IgG N-glycosylation in Haemorrhagic Fever with Renal Syndrome (HFRS), which may offer significant insights for understanding molecular mechanisms and for the development of therapeutic strategies for this infectious disease. However, several issues need to be addressed.

      Major Points:

      (1) The authors should provide a detailed description of the pathogenesis of Haemorrhagic Fever with Renal Syndrome (HFRS) and elaborate on the crucial role of IgG proteins in the disease's progression (line 65).

      (2) An additional discussion on the significance of glycosylation, particularly IgG N-glycosylation, in viral infections should be included in the Introduction section.

      (3) In the Abstract section, the authors state that HTNV-specific IgG antibody titers were detected and IgG N-glycosylation was analyzed. However, the analysis of plasma IgG N-glycans is described in the Methods section. Therefore, the authors should clarify the glycome analysis process. Was the specific IgG glycome profile similar to the total IgG N-glycome? Given the biological relevance of specific IgG in immunological diseases, characterizing the specific IgG N-glycome profile would be more significant than analyzing the total plasma IgG.

      (4) Further details regarding the N-glycome analysis should be provided, including the quantity of IgG protein used and the methodology employed for analyzing IgG N-glycans (lines 286-287).

      (5) Additional statistical analyses should be performed, including multiple comparisons with p-value adjustment, false discovery rate (FDR) control, and Pearson correlation (line 291).

      (6) Quality control should be conducted prior to the IgG N-glycome analysis. Additionally, both biological and technical replicates are essential to assess the reproducibility and robustness of the methods.

      (7) Multiple regression analysis should be conducted to evaluate the influence of genetic and environmental factors on the IgG N-glycome.

      (8) Line 196. Additional discussions should be included, focusing on the underlying correlation between the differential expression of B-cell glycogenes and the dysregulated IgG N-glycome profile, as well as the potential molecular mechanisms of IgG N-glycosylation in the development of HFRS.

    3. Reviewer #2 (Public review):

      Summary:

      This work sought to explore antibody responses in the context of hemorrhagic fever with renal syndrome (HFRS) - a severe disease caused by Hantaan virus infection. Little is known about the characteristics or functional relevance of IgG Fc glycosylation in HFRS. To address this gap, the authors analyzed samples from 65 patients with HFRS spanning the acute and convalescent phases of disease via IgG Fc glycan analysis, scRNAseq, and flow cytometry. The authors observed changes in Fc glycosylation (increased fucosylation and decreased bisection) coinciding with a 4-fold or greater increase in Haantan virus-specific antibody titer. They suggest that these shifts contribute to disease recovery. The study also includes exploratory analyses linking IgG glycan profiles to glycosylation-related gene expression in distinct B cell subsets, using single-cell transcriptomics. Overall, this is an interesting study that combines serological profiling with transcriptomic data to shed light on humoral immune responses in an underexplored infectious disease. The integration of Fc glycosylation data with single-cell transcriptomic data is a strength. However, some improvements could be made in the clarity of both the Results and Materials and Methods sections, and some conclusions would benefit from greater caution, particularly in avoiding overinterpretation of correlative findings.

      Comments:

      (1) While it is great to reference prior publications in the Materials and Methods section, the current level of detail is insufficient to clearly understand the study design and experimental procedures performed. Readers should not be expected to consult multiple previous papers to grasp the core methodological aspects of the present paper. For instance, the categorization of HFRS patients into different clinical subtypes/courses, and the methods for measuring Fc glycosylation should be explicitly described in the Materials and Methods section of this manuscript.

      (2) The authors should explain the nature of their cohort in a bit more detail. While it appears that HFRS cases were identified based on IgM ELISA and/or PCR, these are indicators of the Haantan virus infection. My understanding is that not all Haantan virus infections progress to HFRS. Thus, it is unclear whether all patients in the HFRS group actually had hemorrhagic fever. This distinction is critical for interpreting how the results observed relate to disease severity.

      (3) The authors state that: "A 4-fold or greater increase in HTNV-NP-specific antibody titers usually indicates a protective humoral immune response during the acute phase", but they do not cite any references or provide any context that supports this claim. Given that in their own words, one of the most significant findings in the study is changes in glycosylation coinciding with this 4-fold increase, it is important to ground this claim in evidence. Without this, the use of a 4-fold threshold appears arbitrary and weakens the rationale for using this immune state as a proxy for protective immunity.

      (4) The authors also claim that changes in Fc glycosylation influence recovery from HFRS - a point even emphasized in the manuscript title. However, this conclusion is not well supported by the data for two main reasons. First, the authors appear to measure bulk IgG Fc glycans, not Fc glycans of Hantaan virus-specific antibodies. While reasonable, this is something that should be communicated in the manuscript. Hantaan virus-specific antibodies are likely a very small fraction of total circulating IgG antibodies (perhaps ~1%), even during acute infection. As a result, changes in bulk Fc glycosylation may (or may not) accurately reflect the glycosylation state of Hantaan virus-specific antibodies. Second, even if the bulk Fc glycan shifts do mirror those of Hantaan virus-specific antibodies, it remains unclear whether these changes causally drive recovery or are merely a consequence of the infection being resolved. Thus, while the differences in Fc glycosylation observed are interesting - and it is tempting to speculate on their functional significance - the manuscript treats the observed correlations as causal mechanistic insight without sufficient data or justification.

      (5) Fc glycosylation is known to be influenced by covariates such as age and sex. While it is helpful that the authors stratified the patients by age group and looked for significant differences in glycosylation across them, a more robust approach would be to directly control for these covariates in the statistical analysis - such as by using a linear mixed effects model, in which disease state (e.g., acute vs. convalescent), age, and sex are treated as fixed effects, and subject ID is included as a random effect to account for repeated measures. This would allow the authors to assess whether observed differences in Fc glycosylation remain significant after accounting for potential confounders. This could be important given that some of the reported differences are quite small, for example, 94.29% vs. 94.89% fucosylation.

      (6) The manuscript states that there are limited studies on antibody glycosylation in the context of HFRS, but does not cite any relevant literature. If prior work exists, it should be cited to contextualize the current study. If no prior studies have been conducted/reported, to the author's knowledge, that should be stated explicitly to show the novelty of the work.

    1. eLife Assessment

      This study presents a valuable technical advance in the long-term live imaging of limb regeneration at cellular resolution in Parhyale hawaiensis. The authors develop and carefully validate a method to continuously image entire regenerating legs over several days while minimizing photodamage and optimizing conditions for robust cell tracking, together with post-hoc in situ identification of cell types. The data are convincing, the methodology is rigorous and clearly documented, and the results will be of interest to researchers in regeneration biology, developmental biology, and advanced live imaging and cell tracking software development.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Building upon their previous work, the authors present an enhanced method for confocal live imaging of leg regeneration in the crustacean Parhyale hawaiensis. Parhyale is an emerging and tractable model system that offers insights into the evolution and mechanisms of development and regeneration. Çevrim et al. demonstrate the ability to image the complete leg regeneration process, spanning several days, with 10-20 minute time intervals and cellular resolution. They have concurrently optimized imaging conditions to enable cell tracking while minimizing phototoxicity. Additionally, they report successfully implementing HCR in situ hybridization in Parhyale, allowing for specific gene transcript staining at the endpoint of live imaging. This opens the possibility of assigning molecular identities to tracked cells.

      A key challenge in many regeneration models is achieving continuous imaging throughout the entire regenerative process, as many organisms are difficult to immobilize or cannot tolerate extended imaging without stress. This manuscript's major strength lies in providing practical solutions to these challenges in Parhyale, a compelling and accessible arthropod model for limb regeneration. The authors also employ complementary tools to analyze time-lapse movies and correlate them with endpoint staining. Together, these advances will serve as a useful resource for researchers studying regeneration in Parhyale or in other systems where parts of this workflow can be adapted.

      While the data demonstrating the methodological advancement and technical feasibility are solid, much of the benchmarking and regeneration characterization remains qualitative. This does not undermine the validity of the proof-of-principle, but limits the study's broader appeal.

    3. Reviewer #2 (Public review):

      The manuscript by Çevrim et al. presents a live-imaging workflow that captures the complete leg regeneration process in the crustacean Parhyale hawaiensis, at a resolution suitable for cell tracking and gene expression analysis. Building on earlier work describing selective stages of leg regeneration (Alwes et al., 2016), the authors recorded 22 confocal time-lapse movies, starting from amputation to full regeneration. They defined three distinct phases of regeneration (wound closure, cell proliferation and morphogenesis, and differentiation) based on cellular and morphological features.

      One movie was used to assess how imaging parameters (z-spacing, time intervals, and image quality) influence tracking reliability and the time required for manual proofreading, with an effort to minimize phototoxicity. Tracking was performed in the upper tissue layers using an improved version of the Mastodon plugin Elephant in Fiji. The same sample was fixed post-imaging for in situ hybridization using an HCR protocol adapted for adult legs, targeting the gene spineless. This enabled the alignment of gene expression with specific cell lineages and the identification of progenitor cells present at the time of amputation.

      In summary, the study provides a proof-of-principle for combining long-term live imaging, cell tracking, and gene expression analysis during regeneration. Given the labor-intensive nature of tracking over a 5-10 day time-lapse movie, the use of a single movie for this study is well justified. The workflow, from imaging to lineage reconstruction and molecular annotation, is successfully demonstrated and well documented with this dataset.

      Although the biological insights from the cell lineage and molecular mapping are still limited, the methodology offers significant potential in regenerative biology to uncover the cellular and molecular contributions to tissue and cell type re-formation.

      Confocal microscopy was used for live imaging, which restricted imaging to the upper 30 µm tissue layer. Light-sheet microscopy could have provided gentler imaging and enabled imaging from multiple angles to image the whole leg. While the authors acknowledge this possibility in the manuscript, they discarded it due to incompatibility between their mounting strategy and available light-sheet microscopes. As a future direction, optimizing the mounting approach for compatibility with light-sheet microscopes could enable more comprehensive tissue imaging.

    1. eLife Assessment

      This fundamental study demonstrates how a left-right bias in the relationship between numerical magnitude and space depends on brain lateralization. The evidence is compelling and will be of interest to researchers studying numerical cognition, brain lateralization, and cognitive brain development more broadly.

    2. Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

    3. Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how this is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of a L-R bias in the relationship between number and space.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.<br /> The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatial-numerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).<br /> This has meant that the background to the study is confusing. First, they note correctly that many other creatures, including insects can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be a L-R bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited). Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals' and are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to left is greater for 2 and 3 than for 4 etc.

      What does all this mean? I think that the experiment should absolutely be published in eLife, but the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In fact, I think it would make a very interesting special issue to bring up to date how and why the L-R bias exists, and where and why it does not.

      Karolis, V., Iuculano, T., & Butterworth, B. (2011). Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706. doi:10.1037/a0024255

      Review of the revised version:

      The background and terminology in the text have been significantly altered and clarified: Spatial Numerical Association (SNA) instead of Mental Number Line (MNL) in the text, but with a discussion about how SNA might be the basis of MNL. This entails a link from SNA - a bias - to mental representation of a sequence of numerical magnitudes, which will need to be spelt out in subsequent work with a sequence of numbers rather than a single number, in this case 4. Could the effect be generalised to much larger numbers?

      Although the relationship between number and space seems fundamental, the key question is why the L-R SNA bias should exist at all. The authors take on this challenge and make important arguments for the evolutionary advantage of the bias is (see lines 138ff, 375ff, 444ff), though this is likely still to be controversial.

      Subsequent work may clarify its interaction of brain lateralisation with culture, notably reading and writing direction (e.g. Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396), though this relationship has exceptions and challenges (e.g. Karolis, V., Iuculano, T., & Butterworth, B. (2011). Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706).

      For example, would humans with more lateralised brains show a stronger bias? Would humans with reverse lateralisation show a R-L SNA?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Functional lateralization between the right and left hemispheres is reported widely in animal taxa, including humans. However, it remains largely speculative as to whether the lateralized brains have a cognitive gain or a sort of fitness advantage. In the present study, by making use of the advantages of domestic chicks as a model, the authors are successful in revealing that the lateralized brain is advantageous in the number sense, in which numerosity is associated with spatial arrangements of items. Behavioral evidence is strong enough to support their arguments. Brain lateralization was manipulated by light exposure during the terminal phase of incubation, and the left-to-right numerical representation appeared when the distance between items gave a reliable spatial cue. The light-exposure induced lateralization, though quite unique in avian species, together with the lack of intense inter-hemispheric direct connections (such as the corpus callosum in the mammalian cerebrum), was critical for the successful analysis in this study. Specification of the responsible neural substrates in the presumed right hemisphere is expected in future research. Comparable experimental manipulation in the mammalian brain must be developed to address this general question (functional significance of brain laterality) is also expected.

      We sincerely appreciate the Reviewer's insightful feedback and his/her recognition of the key contributions of our study.

      Reviewer #2 (Public review):

      Summary:

      This is the first study to show how a L-R bias in the relationship between numerical magnitude and space depends on brain lateralisation, and moreover, how is modulated by in ovo conditions.

      Strengths:

      Novel methodology for investigating the innateness and neural basis of an L-R bias in the relationship between number and space.

      We would like to thank the Reviewer for their valuable feedback and for highlighting the key contributions of our study.

      Weaknesses:

      I would query the way the experiment was contextualised. They ask whether culture or innate pre-wiring determines the 'left-to-right orientation of the MNL [mental number line]'.

      We thank the Reviewer for raising this point, which has allowed us to provide a more detailed explanation of this aspect. Rather than framing the left-to-right orientation of the mental number line (MNL) as exclusively determined by either cultural influences or innate pre-wiring, our study highlights the role of environmental stimulation. Specifically, prenatal light exposure can shape hemispheric specialization, which in turn contributes to spatial biases in numerical processing. Please see lines 115-118.

      The term, 'Mental Number Line' is an inference from experimental tasks. One of the first experimental demonstrations of a preference or bias for small numbers in the left of space and larger numbers in the right of space, was more carefully described as the spatial-numerical association of response codes - the SNARC effect (Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and numerical magnitude. Journal of Experimental Psychology: General, 122, 371-396).

      We have refined our description of the MNL and SNARC effect to ensure conceptual accuracy in the revised manuscript; please see lines 53-59.

      This has meant that the background to the study is confusing. First, the authors note, correctly, that many other creatures, including insects, can show this bias, though in none of these has neural lateralisation been shown to be a cause. Second, their clever experiment shows that an experimental manipulation creates the bias. If it were innate and common to other species, the experimental manipulation shouldn't matter. There would always be an L-R bias. Third, they seem to be asserting that humans have a left-to-right (L-R) MNL. This is highly contentious, and in some studies, reading direction affects it, as the original study by Dehaene et al showed; and in others, task affects direction (e.g. Bachtold, D., Baumüller, M., & Brugger, P. (1998). Stimulus-response compatibility in representational space. Neuropsychologia, 36, 731-735, not cited). Moreover, a very careful study of adult humans, found no L-R bias (Karolis, V., Iuculano, T., & Butterworth, B. (2011), not cited, Mapping numerical magnitudes along the right lines: Differentiating between scale and bias. Journal of Experimental Psychology: General, 140(4), 693-706). Indeed, Rugani et al claim, incorrectly, that the L-R bias was first reported by Galton in 1880. There are two errors here: first, Galton was reporting what he called 'visualised numerals', which are typically referred to now as 'number forms' - spontaneous and habitual conscious visual representations - not an inference from a number line task. Second, Galton reported right-to-left, circular, and vertical visualised numerals, and no simple left-to-right examples (Galton, F. (1880). Visualised numerals. Nature, 21, 252-256.). So in fact did Bertillon, J. (1880). De la vision des nombres. La Nature, 378, 196-198, and more recently Seron, X., Pesenti, M., Noël, M.-P., Deloche, G., & Cornet, J.-A. (1992). Images of numbers, or "When 98 is upper left and 6 sky blue". Cognition, 44, 159-196, and Tang, J., Ward, J., & Butterworth, B. (2008). Number forms in the brain. Journal of Cognitive Neuroscience, 20(9), 1547-1556.

      We sincerely appreciate the opportunity to discuss numerical spatialization in greater detail. We have clarified that an innate predisposition to spatialize numerosity does not necessarily exclude the influence of environmental stimulation and experience. We have proposed an integrative perspective, incorporating both cultural and innate factors, suggesting that numerical spatialization originates from neural foundations while remaining flexible and modifiable by experience and contextual influences. Please see lines 69–75.

      We have incorporated the Reviewer’s suggestions and cited all the recommended papers; please see lines 47–75.

      If the authors are committed to chicks' MN Line they should test a series of numbers showing that the bias to the left is greater for 2 and 3 than for 4, etc.

      What does all this mean? I think that the paper should be shorn of its misleading contextualisation, including the term 'Mental Number Line'. The authors also speculate, usefully, on why chicks and other species might have a L-R bias. I don't think the speculations are convincing, but at least if there is an evolutionary basis for the bias, it should at least be discussed.

      In the revised version of the manuscript, we have resorted to adopt the Spatial Numerical Association (SNA). We thank the Reviewer for this valuable comment.

      We appreciated the Reviewer’s suggestion regarding the evolutionary basis of lateralization and have included considerations of its relevance in chicks and other species; please see lines 143-151 and 381-386.

      This paper is very interesting with its focus on why the L-R bias exists, and where and why it does not.

      We wish to thank the Reviewer again for his/her work.

      Reviewer #1(Public review)

      (1) Introduction needs to be edited to make it much more concise and shorter. Hypotheses (from line 67 to 81) and predictions (from line 107 to 124) must be thoroughly rephrased, because (a) general readers are not familiar with the hypotheses (emotional valence and BAFT), (b) the hypotheses may or may not be mutually exclusive, and therefore (c) the logical linkage between the hypotheses and the predicted results are not necessarily clear. Most general readers may be embarrassed by the apparently complicated logical constructs of this study. Instead, it is recommended that focal spotlight should be given to the issue of functional contributions of brain lateralization to the cognitive development of number sense.

      We thank the Reviewer for these comments, which allowed us to improve the clarity of our hypotheses and predictions. We thoroughly rephrased them to ensure they are accessible to general readers and specified that the models may or may not be mutually exclusive. Additionally, we highlighted the functional contributions of brain lateralization to the cognitive development of number sense, addressing the suggested focal point. While we have shortened the introduction, we opted to retain essential background information to ensure readers are well-informed about the relevant scientific literature. Please review the entire introduction, particularly lines 84–118 and 218.

      (2) In relation to the above (a), abbreviations need to be reexamined. MNL (mental number line) appears early on lines 27 and 49, whereas the possibly related conceptual term SNA appeared first on line 213, without specification to "spatial numerical association".

      We thank the Reviewer for bringing this to our attention. We have addressed the suggestions, and the term SNA has been used specifically to refer to numerical spatialization in non-human animals. Please see lines 27-30.

      (3) By the way, what difference is there between MNL and SNA? Please specify the difference if it is important. If not important, is it possible that one of these two is consistently used in this report, at least in the Introduction?

      We clarified the distinction between MNL and SNA and have consistently used SNA in this report; please see lines 47-75.

      (4) In relation to the above (a and b), clarification of the hypotheses and their abbreviations in the form of a table or a graphical representation will strongly reinforce the general readers' understanding. It is also possible that some of these hypotheses are discussed later in the Discussion, rather than in Introduction.

      We appreciated this suggestion and have now clarified the hypotheses, also providing a table/graphical representation, aiming to enhance accessibility for general readers; please see lines 110-118, and 218.

      (5) Figures 1 and 2 are transparent and easily understandable; however, the statistical details in the Results may bother the readers as the main points are doubly represented in Figures 1, 2, and Table 1. These (statistics and Table 1) may go to the supplementary file, if the editor agrees.

      We would prefer to keep Table 1 and the statistical details as part of the main article to provide readers with a comprehensive overview of the experimental results. However, if the editors also suggest to move them to the supplementary file, we are open to making this adjustment.

      (6) In Figure 1D and E, and text lines 139-140. Figure 1D shows that the chick is looking monocularly by the right eye, but the text (line 139) says "left eye in use. Is it correct?

      We thank the reviewer for pointing out this incongruity. We have corrected the text to align with Figure 1D and E; please see lines 180-181.

      (7) Methods. The behavioral experiment was initiated on Wednesday (8 a.m.; line 479), but at what age? At what post-hatch day was the experiment terminated? A simple graphical illustration of the schedule will be quite helpful.

      We have added the requested details, specifying that experiments began on the third post-hatch day and ended on the fifth day; please see lines 533-539.

      Additionally, we have included a graphical illustration of the schedule to enhance clarity; please see line 666.  

      (8) Methods. How many chicks were excluded from the study in the course of Pre-training (line 525) and Training (line 535-536)? Was the exclusion rate high, or just negligible?

      We appreciate the reviewer's suggestion. We have now included the number of subjects excluded during the training phase; please see lines 593-597.

      We wish to thank the Reviewer again for his/her work.

    1. Reviewer #3 (Public review):

      Summary

      The paper presents an imaging and analysis pipeline for whole-mount gastruloid imaging with two-photon microscopy. The presented pipeline includes spectral unmixing, registration, segmentation, and a wavelength-dependent intensity normalization step, followed by quantitative analysis of spatial gene expression patterns and nuclear morphometry on a tissue level. The utility of the approach is demonstrated by several experimental findings, such as establishing spatial correlations between local nuclear deformation and tissue density changes, as well as the radial distribution pattern of mesoderm markers. The pipeline is distributed as a Python package, notebooks, and multiple napari plugins.

      Strengths

      The paper is well-written with detailed methodological descriptions, which I think would make it a valuable reference for researchers performing similar volumetric tissue imaging experiments (gastruloids/organoids). The pipeline itself addresses many practical challenges, including resolution loss within tissue, registration of large volumes, nuclear segmentation, and intensity normalization. Especially the intensity decay measurements and wavelength-dependent intensity normalization approach using nuclear (Hoechst) signal as reference are very interesting and should be applicable to other imaging contexts. The morphometric analysis is equally well done, with the correlation between nuclear shape deformation and tissue density changes being an interesting finding. The paper is quite thorough in its technical description of the methods (which are a lot), and their experimental validation is appropriate. Finally, the provided code and napari plugins seem to be well done (I installed a selected list of the plugins and they ran without issues) and should be very helpful for the community.

      Weaknesses

      I don't see any major weaknesses, and I would only have two issues that I think should be addressed in a revision:

      (1) The demonstration notebooks lack accompanying sample datasets, preventing users from running them immediately and limiting the pipeline's accessibility. I would suggest to include (selective) demo data set that can be used to run the notebooks (e.g. for spectral unmixing) and or provide easily accessible demo input sample data for the napari plugins (I saw that there is some sample data for the processing plugin, so this maybe could already be used for the notebooks?).

      (2) The results for the morphometric analysis (Figure 4) seem to be only shown in lateral (xy) views without the corresponding axial (z) views. I would suggest adding this to the figure and showing the density/strain/angle distributions for those axial views as well.

    2. Reviewer #2 (Public review):

      Summary:

      This study presents an integrated experimental and computational pipeline for high-resolution, quantitative imaging and analysis of gastruloids. The experimental module employs dual-view two-photon spectral imaging combined with optimized clearing and mounting techniques to image whole-mount immunostained gastruloids. This approach enables the acquisition of comprehensive 3D images that capture both tissue-scale and single-cell level information.

      The computational module encompasses both pre-processing of acquired images and downstream analysis, providing quantitative insights into the structural and molecular characteristics of gastruloids. The pre-processing pipeline, tailored for dual-view two-photon microscopy, includes spectral unmixing of fluorescence signals using depth-dependent spectral profiles, as well as image fusion via rigid 3D transformation based on content-based block-matching algorithms. Nuclei segmentation was performed using a custom-trained StarDist3D model, validated against 2D manual annotations, and achieving an F1 score of 85+/-3% at a 50% intersection-over-union (IoU) threshold. Another custom-trained StarDist3D model enabled accurate detection of proliferating cells and the generation of 3D spatial maps of nuclear density and proliferation probability. Moreover, the pipeline facilitates detailed morphometric analysis of cell density and nuclear deformation, revealing pronounced spatial heterogeneities during early gastruloid morphogenesis.

      All computational tools developed in this study are released as open-source, Python-based software.

      Strengths:

      The authors applied two-photon microscopy to whole-mount deep imaging of gastruloids, achieving in toto visualization at single-cell resolution. By combining spectral imaging with an unmixing algorithm, they successfully separated four fluorescent signals, enabling spatial analysis of gene expression patterns.

      The entire computational workflow, from image pre-processing to segmentation with a custom-trained StarDist3D model and subsequent quantitative analysis, is made available as open-source software. In addition, user-friendly interfaces are provided through the open-source, community-driven Napari platform, facilitating interactive exploration and analysis.

      Weaknesses:

      The computational module appears promising. However, the analysis pipeline has not been validated on datasets beyond those generated by the authors, making it difficult to assess its general applicability.<br /> Besides, the nuclei segmentation component lacks benchmarking against existing methods.

      Appraisal:

      The authors set out to establish a quantitative imaging and analysis pipeline for gastruloids using dual-view two-photon microscopy, spectral unmixing, and a custom computational framework for 3D segmentation and gene expression analysis. This aim is largely achieved. The integration of experimental and computational modules enables high-resolution in toto imaging and robust quantitative analysis at the single-cell level. The data presented support the authors' conclusions regarding the ability to capture spatial patterns of gene expression and cellular morphology across developmental stages.

      Impact and utility:

      This work presents a compelling and broadly applicable methodological advance. The approach is particularly impactful for the developmental biology community, as it allows researchers to extract quantitative information from high-resolution images to better understand morphogenetic processes. The data are publicly available on Zenodo, and the software is released on GitHub, making them highly valuable resources for the community.

    3. Reviewer #1 (Public review):

      Summary:

      The image analysis pipeline is tested in analysing microscopy imaging data of gastruloids of varying sizes, for which an optimised protocol for in toto image acquisition is established based on whole mount sample preparation using an optimal refractive index matched mounting media, opposing dual side imaging with two-photon microscopy for enhanced laser penetration, dual view registration, and weighted fusion for improved in toto sample data representation. For enhanced imaging speed in a two-photon microscope, parallel imaging was used, and the authors performed spectral unmixing analysis to avoid issues of signal cross-talk.

      In the image analysis pipeline, different pre-treatments are done depending on the analysis to be performed (for nuclear segmentation - contrast enhancement and normalisation; for quantitative analysis of gene expression - corrections for optical artifacts inducing signal intensity variations). Stardist3D was used for the nuclear segmentation. The study analyses into properties of gastruloid nuclear density, patterns of cell division, morphology, deformation, and gene expression.

      Strengths:

      The methods developed are sound, well described, and well-validated, using a sample challenging for microscopy, gastruloids. Many of the established methods are very useful (e.g. registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

      Weaknesses:

      A recommendation should be added on when or under which conditions to use this pipeline.

    4. eLife Assessment

      This important study introduces a powerful imaging approach that enables deep-tissue visualization in gastruloids using two-photon microscopy, combined with spectral imaging and unmixing to achieve four-color 3D image acquisition. The evidence is compelling that many of the established methods are very helpful (e.g., registration, corrections, signal normalisation, lazy loading bioimage visualisation, spectral decomposition analysis), facilitate the development of quantitative research, and would be of interest to the wider scientific community.

    1. eLife Assessment

      The findings of this important study substantially advance our understanding of the transcription factors that can induce hair cell-like cells from human pluripotent stem cells. The presented evidence supporting these findings is compelling, including rigorous characterization of the effects of hair cell induction using both single-cell RNA sequencing and electrophysiological assessments.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Rainey et al investigated the effects of transcription factors, ATOH1, POU4F3, GFI1, and SIX1 on the induction of hair cells from human pluripotent stem cells. The authors used a doxycycline-inducible system to control transgene expression and demonstrated significant improvement in the efficiency of MYO7A+ hair cell differentiation compared to the retrovirus-mediated approach. Next, they characterized differentiated cells using single-cell RNA-seq and identified a population of hair cell-like cells with gene expression profiles similar to the fetal human vestibular hair cells. Finally, they revealed the electrophysiological properties of induced cells consistent with those of mechanosensitive hair cells.

      Strengths:

      A key finding in this study is the rapid induction of cells expressing multiple hair cell markers that takes place within 21 days after overexpression of the four transcription factors. Additionally, the authors demonstrate that doxycycline-mediated gene overexpression outperforms retroviral-mediated gene transfer in terms of both the efficiency and reproducibility of hair cell induction. Furthermore, the authors demonstrate that these induced hair cells can be used to study hair cell protection from cisplatin ototoxicity.

      Weaknesses:

      The authors conclude that the induced cells lack distinct hair cell subtypes. However, the characterization of generated hair cells in single-cell RNA-seq data is insufficient. Additional vestibular or cochlear hair cell-enriched marker gene and protein expression should be analyzed. Moreover, the morphological features and mechanotransduction channel activity of the induced hair cells have not been analyzed.

    3. Reviewer #2 (Public review):

      Summary:

      The study employs a specific set of transcription factors to promote lineage conversion of pluripotent stem cells into fetal hair cells. In pluripotent stem cells, an inducible expression system containing SIX1, ATOH1, POU4F3, and GFI1 (SAPG) was inserted into a safe harbor site. The stable cell line allows for doxycycline-inducible expression of transcription factors to generate induced hair cells (iHCs). These changes were observed in gene expression and electrophysiological properties. Comparing the transcriptome with iHCs derived from fibroblasts or primary human inner ear tissue suggested that it is similar to human hair cells. Although the iHCs did not have hair bundles - a key morphological feature of hair cells - the cellular system has immense potential for the field. The defined transcription factors allow for the dissection of gene regulatory networks and provide a molecular handle for the lineage conversion process. The results also suggest that the pluripotent stem cells were not directly converted into iHCs. Instead, there are several transitional cell states. These observations indicate that lineage conversion may still be hampered by yet undefined molecular obstacles and may help identify and overcome these in future work. The stable cell line allows for repeatable and large-scale screening studies, which is not feasible using primary human cells.

      Strengths:

      The cellular system is well-designed, with clearly described expression of the defined factors. Transient expression of the exogenous transcription factors SIX1, ATOH1, POU4F3, and GFI1 (SAPG) upon doxycycline induction is well-documented. Increased expression of endogenous SAPG factors suggests activation of self-regulatory feedback pathways during conversion. The stable iPS cell line provides a tool for the field to study lineage conversion or generate large numbers of iHCs.

      Single-nuclear RNA-seq distinguishes distinct cell clusters and cellular transition states, validating the system's utility. A comparison of previously published data from iHCs and human fetal hair cells also suggested that iHCs are similar to developing human hair cells at the transcriptome level. Whole-cell patch clamp recordings show the generation of excitable cells with heterogeneous ion channel properties, which suggests a change in the cell type.

      Weaknesses:

      The interpretation of the snRNA-seq results could be strengthened by explaining the three distinct clusters for uninduced cells and how they transition into the iHC trajectory.

      Although the analysis focuses on the cell cluster that represents iHCs (R5), a short discussion on what clusters R1-R4 (Figure 3B) represent would be useful. These cells do not express high levels of the SAPG factors even after 21 days of continuous doxycycline induction and may provide insight into hurdles that hamper lineage conversion.

      RNA velocity analysis on single-nuclear RNA-seq is impressive but requires clarification on inferring the pseudotime trajectory. Some rationale and explanation on how the ratio of unspliced to spliced mRNA in the nucleus can be used to infer the differentiation trajectory would strengthen the discussion.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Robert N. Rainey et. al. reported a new approach to induce hair cell-like cells from a human induced pluripotent stem cell line. Based on the previously identified key transcription factors SIX1, ATOH1, POU4F3, and GFI1 (SAPG), which are essential for the conversion into induced hair cell-like cells in mice. The manuscript represents an advance over the authors' previous published work, which used the same transcription factors but viral gene delivery.

      Strengths:

      The manuscript is clear and well-written. The background is easy to follow for people outside of the field. The data are well-organized and well-described. The evidence is strong.

      Weaknesses:

      General comments:

      (1) The manuscript generated multiple valuable datasets for the field. However, the data are not deposited in the hearing field central resource for gene expression (umgear.org), and links are not provided in the figure legends to datasets or dataset collections in the gEAR. This is a major comment as it significantly decreases the utility of the datasets generated in the manuscript and decreases the ease of reuse of the data. This is a flaw that could be easily addressed by uploading the data and generating links to datasets in the body of the manuscript.

      (2) If a pulse of Dox induces the SAPG and starts the conversion process, it is not clear why the analyzed cells were treated for 21 days - a duration that can negatively affect the fate of converting hair cells.

      (3) Foxj1 is listed as a supporting cell-specific gene; however, it is expressed in the cochlear hair cells until the end of the first postnatal week.

      (4) It is not clear why cells were sorted for analysis of the retrovirally induced cells but not in the stable cell line, which also expressed tdTomato.

      (5) Figure 1D and Supplementary Figure 2: the authors state that the endogenous ATOH1 and POU4F3 expressions decrease after 7d. Should the authors have stats on the graphs?

      (6) Supplementary Figure 4: OCT4 should be replaced by POU5F1 (or vice versa) for consistency.

      (7) The authors show the induction or decrease of the exogenous transcription factor expressions by RT-qPCR. It would be nice, if possible, to also see either WB or immuno with antibodies directed against the tags.

      Bioinformatic comments:

      (1) In the previous study (Menendez et al. 2020), ATAC-seq and regulatory elements are employed in the analysis, while a similar analysis is missing in this study. It will be informative to show the motif enrichment analysis at promoter regions of differentially expressed genes (DEGs) in the most hair cell-like cluster 3 (RV-R3).

      (2) In the previous study (Menendez et al. 2020), it was stated that SAPG can convert supporting cells to hair cells, while in this study, the authors stated that "reprogramming with SAPG does not activate supporting cell networks in the stable cell line". Can the authors provide more analysis/comments on this difference?

      (3) The approach in this study tends to generate a very similar level of expression for the SAPG factors, while the real levels of expression might be different for actual transcriptional regulation, eg, Figure 1C. How will this very close expression level of SAPG affect the features of the induced hair cell?

      (4) Figure 5B, missing color bar to show the DEG strength in the heatmap. Why are Six1 and Gfi1 not shown in this heatmap?

    1. eLife Assessment

      This important study examines the relationship between cognition and mental health and investigates how brain, genetics, and environmental measures mediate that relationship. The methods and results are compelling and well-executed. Overall, this study will be of interest in the field of population neuroscience and in studies of mental health.

    2. Reviewer #1 (Public review):

      Summary:

      This work integrates two timepoints from the Adolescent Brain Cognitive Development Study to understand how neuroimaging, genetic and environmental data contribute to the predictive power of mental health variables in predicting cognition in a large early adolescent sample. Their multimodal and multivariate prediction framework involves a novel opportunistic stacking model to handle complex types of information to predict variables that are important in understanding mental health-cognitive performance associations.

      Strengths:

      The authors are commended for incorporating and directly comparing the contribution of multiple imaging modalities (task fMRI, resting state fMRI, diffusion MRI, structural MRI), neurodevelopmental markers, environmental factors and polygenic risk scores in a novel multivariate framework (via opportunistic stacking), as well as interpreting mental health-cognition associations with latent factors derived from Partial Least Squares. The authors also use a large well-characterized and diverse cohort of adolescents from the Adolescent Brain Cognitive Development (ABCD) Study. The paper is also strengthened by commonality analyses to understand the shared and unique contribution of different categories of factors (e.g., neuroimaging vs mental health vs polygenic scores vs sociodemographic and adverse developmental events) in explaining variance in cognitive performance

      Weaknesses:

      The paper is framed with an over-reliance on the RDoC framework in the introduction, despite deviations from the RDoC framework in the methods. The field is also learning more about RDoC's limitations when mapping cognitive performance to biology. The authors also focus on a single general factor of cognition as the core outcome of interest as opposed to different domains of cognition. The authors could consider predicting mental health rather than cognition. Using mental health as a predictor could be limited by the included 9-11 year age range at baseline (where mental health concerns are likely to be low or not well captured), as well as the nature of how the data was collected, i.e., either by self-report or from parent/caregiver report.

      Comments on revisions:

      The authors have done an excellent job of addressing my comments. I have no other suggestions to add. Great work!

    3. Reviewer #2 (Public review):

      Summary:

      This paper by Wang et al. uses rich brain, behaviour, and genetics data from the ABCD cohort to ask how well cognitive abilities can be predicted from mental health related measures, and how brain and genetics influence that prediction. They obtain an out of sample correlation of 0.4, with neuroimaging (in particular task fMRI) proving the key mediator. Polygenic scores contributed less.

      Strengths:

      This paper is characterized by the intelligent use of a superb sample (ABCD) alongside strong statistical learning methods and a clear set of questions. The outcome - the moderate level of prediction between brain, cognition, genetics and mental health - is interesting, and particularly important is the dissection of which features best mediate that prediction and how developmental and lifestyle factors play a role.

      Weaknesses:

      There are relatively few weaknesses to this paper. It has already undergone review at a different journal, and the authors clearly took the original set of comments into account in revising their paper. Overall, while the ABCD sample is superb for the questions asked, it would have been highly informative to extend the analyses to datasets containing more participants with neurological/psychiatric diagnoses (e.g. HBN, POND) or extending it into adolescent/early adult onset psychopathology cohorts. But it is fair enough that the authors want to leave that for future work.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This work integrates two timepoints from the Adolescent Brain Cognitive Development (ABCD) Study to understand how neuroimaging, genetic, and environmental data contribute to the predictive power of mental health variables in predicting cognition in a large early adolescent sample. Their multimodal and multivariate prediction framework involves a novel opportunistic stacking model to handle complex types of information to predict variables that are important in understanding mental health-cognitive performance associations. 

      Strengths: 

      The authors are commended for incorporating and directly comparing the contribution of multiple imaging modalities (task fMRI, resting state fMRI, diffusion MRI, structural MRI), neurodevelopmental markers, environmental factors, and polygenic risk scores in a novel multivariate framework (via opportunistic stacking), as well as interpreting mental health-cognition associations with latent factors derived from partial least squares. The authors also use a large well-characterized and diverse cohort of adolescents from the ABCD Study. The paper is also strengthened by commonality analyses to understand the shared and unique contribution of different categories of factors (e.g., neuroimaging vs mental health vs polygenic scores vs sociodemographic and adverse developmental events) in explaining variance in cognitive performance 

      Weaknesses: 

      The paper is framed with an over-reliance on the RDoC framework in the introduction, despite deviations from the RDoC framework in the methods. The field is also learning more about RDoC's limitations when mapping cognitive performance to biology. The authors also focus on a single general factor of cognition as the core outcome of interest as opposed to different domains of cognition. The authors could consider predicting mental health rather than cognition. Using mental health as a predictor could be limited by the included 9-11 year age range at baseline (where many mental health concerns are likely to be low or not well captured), as well as the nature of how the data was collected, i.e., either by self-report or from parent/caregiver report. 

      Thank you so much for your encouragement.

      We appreciate your comments on the strengths of our manuscript.

      Regarding the weaknesses, the reliance on the RDoC framework is by design. Even with its limitations, following RDoC allows us to investigate mental health holistically. In our case, RDoC enabled us to focus on a) a functional domain (i.e., cognitive ability), b) the biological units of analysis of this functional domain (i.e., neuroimaging and polygenic scores), c) potential contribution of environments, and d) the continuous individual deviation in this domain (as opposed to distinct categories). We are unaware of any framework with all these four features.

      Focusing on modelling biological units of analysis of a functional domain, as opposed to mental health per se, has some empirical support from the literature. For instance, in Marek and colleagues’ (2022) study, as mentioned by a previous reviewer, fMRI is shown to have a more robust prediction for cognitive ability than mental health. Accordingly, our reasons for predicting cognitive ability instead of mental health in this study are motivated theoretically (i.e., through RDoC) and empirically (i.e., through fMRI findings). We have clarified this reason in the introduction of the manuscript.

      We are aware of the debates surrounding the actual structure of functional domains where the originally proposed RDoC’s specific constructs might not fit the data as well as the data-driven approach (Beam et al., 2021; Quah et al., 2025). However, we consider this debate as an attempt to improve the characterisation of functional domains of RDoC, not an effort to invalidate its holistic, neurobiological and basicfunctioning approach. Our use of a latent-variable modelling approach through factor analyses moves towards a data-driven direction. We made the changes to the second-to-last paragraph in the introduction to make this point clear:

      “In this study, inspired by RDoC, we a) focused on cognitive abilities as a functional domain, b) created predictive models to capture the continuous individual variation (as opposed to distinct categories) in cognitive abilities, c) computed two neurobiological units of analysis of cognitive abilities: multimodal neuroimaging and PGS, and d) investigated the potential contributions of environmental factors. To operationalise cognitive abilities, we estimated a latent variable representing behavioural performance across various cognitive tasks, commonly referred to as general cognitive ability or the gfactor (Deary, 2012). The g-factor was computed from various cognitive tasks pertinent to RDoC constructs, including attention, working memory, declarative memory, language, and cognitive control. However, using the g-factor to operationalise cognitive abilities caused this study to diverge from the original conceptualisation of RDoC, which emphasises studying separate constructs within cognitive abilities (Morris et al., 2022; Morris & Cuthbert, 2012). Recent studies suggest an improvement to the structure of functional domains by including a general factor, such as the g-factor, in the model, rather than treating each construct separately (Beam et al., 2021; Quah et al., 2025). The g-factor in children is also longitudinally stable and can forecast future health outcomes (Calvin et al., 2017; Deary et al., 2013). Notably, our previous research found that neuroimaging predicts the g-factor more accurately than predicting performance from separate individual cognitive tasks (Pat et al., 2023). Accordingly, we decided to conduct predictive models on the g-factor while keeping the RDoC’s holistic, neurobiological, and basic-functioning characteristics.”

      Reviewer #2 (Public review):

      Summary: 

      This paper by Wang et al. uses rich brain, behaviour, and genetics data from the ABCD cohort to ask how well cognitive abilities can be predicted from mental-health-related measures, and how brain and genetics influence that prediction. They obtain an out-ofsample correlation of 0.4, with neuroimaging (in particular task fMRI) proving the key mediator. Polygenic scores contributed less. 

      Strengths: 

      This paper is characterized by the intelligent use of a superb sample (ABCD) alongside strong statistical learning methods and a clear set of questions. The outcome - the moderate level of prediction between the brain, cognition, genetics, and mental health - is interesting. Particularly important is the dissection of which features best mediate that prediction and how developmental and lifestyle factors play a role. 

      Thank you so much for the encouragement. 

      Weaknesses: 

      There are relatively few weaknesses to this paper. It has already undergone review at a different journal, and the authors clearly took the original set of comments into account in revising their paper. Overall, while the ABCD sample is superb for the questions asked, it would have been highly informative to extend the analyses to datasets containing more participants with neurological/psychiatric diagnoses (e.g. HBN, POND) or extend it into adolescent/early adult onset psychopathology cohorts. But it is fair enough that the authors want to leave that for future work. 

      Thank you very much for providing this valuable comment and for your flexibility.

      For the current manuscript, we have drawn inspiration from the RDoC framework, which emphasises the variation from normal to abnormal in normative samples (Morris et al., 2022). The ABCD samples align well with this framework.

      We hope to extend this framework to include participants with neurological and psychiatric diagnoses in the future. We have begun applying neurobiological units of analysis for cognitive abilities, assessed through multimodal neuroimaging and polygenic scores (PGS), to other datasets containing more participants with neurological and psychiatric diagnoses. However, this is beyond the scope of the current manuscript. We have listed this as one of the limitations in the discussion section:

      “Similarly, our ABCD samples were young and community-based, likely limiting the severity of their psychopathological issues (Kessler et al., 2007). Future work needs to test if the results found here are generalisable to adults and participants with stronger severity.”

      In terms of more practical concerns, much of the paper relies on comparing r or R2 measures between different tests. These are always presented as point estimates without uncertainty. There would be some value, I think, in incorporating uncertainty from repeated sampling to better understand the improvements/differences between the reported correlations. 

      This is a good suggestion. We have now included bootstrapped 95% confidence intervals in all of our scatter plots, showing the uncertainty of predictive performance.

      The focus on mental health in a largely normative sample leads to the predictions being largely based on the normal range. It would be interesting to subsample the data and ask how well the extremes are predicted. 

      We appreciate this comment. Similar to our response to Reviewer 2’s Weakness #1, our approach has drawn inspiration from the RDoC framework, which emphasises the variation from normal to abnormal in normative samples (Morris et al., 2022). Subsampling the data would make us deviate from our original motivation. 

      Moreover, we used 17 mental healh variables in our predictive models: 8 CBCL subscales, 4 BIS/BAS subscales and 5 UPSS subscales. It is difficult to subsample them. Perhaps a better approach is to test the applicability of our neurobiological units of analysis for cognitive abilities (multimodal neuroimaging and PGS) in other datasets that include more extreme samples. We are working on this line of studies at the moment, and hope to show that in our future work. 

      Reviewer 2’s Weakness #4

      A minor query - why are only cortical features shown in Figure 3? 

      We presented both cortical and subcortical features in Figure 3. The cortical features are shown on the surface space, while the subcortical features are displayed on the coronal plane. Below is an example of these cortical and subcortical features from the ENBack contrast. The subcortical features are presented in the far-right coronal image.

      We separated the presentation of cortical and subcortical features because the ABCD uses the CIFTI format (https://www.humanconnectome.org/software/workbenchcommand/-cifti-help). CIFTI-format images combine cortical surface (in vertices) with subcortical volume (in voxels). For task fMRI, the ABCD parcellated cortical vertices using Freesurfer’s Destrieux atlas and subcortical voxels using Freesurfer’s automatically segmented brain volume (ASEG).

      Due to the size of the images in Figure 3, it may have been difficult for Reviewer 2 to see the subcortical features clearly. We have now added zoomed-in versions of this figure as Supplementary Figures 4–13.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the autors):

      (1) In the abstract, could the authors mention which imaging modalities contribute most to the prediction of cognitive abilities (e.g., working memory-related task fMRI)? 

      Thank you for the suggestion. Following this advice, we now mention which imaging modalities led to the highest predictive performance. Please see the abstract below.

      “Cognitive abilities are often linked to mental health across various disorders, a pattern observed even in childhood. However, the extent to which this relationship is represented by different neurobiological units of analysis, such as multimodal neuroimaging and polygenic scores (PGS), remains unclear. 

      Using large-scale data from the Adolescent Brain Cognitive Development (ABCD) Study, we first quantified the relationship between cognitive abilities and mental health by applying multivariate models to predict cognitive abilities from mental health in children aged 9-10, finding an out-of-sample r\=.36 . We then applied similar multivariate models to predict cognitive abilities from multimodal neuroimaging, polygenic scores (PGS) and environmental factors. Multimodal neuroimaging was based on 45 types of brain MRI (e.g., task fMRI contrasts, resting-state fMRI, structural MRI, and diffusion tensor imaging). Among these MRI types, the fMRI contrast, 2-Back vs. 0-Back, from the ENBack task provided the highest predictive performance (r\=.4). Combining information across all 45 types of brain MRI led to the predictive performance of r\=.54. The PGS, based on previous genome-wide association studies on cognitive abilities, achieved a predictive performance of r\=.25. Environmental factors, including socio-demographics (e.g., parent’s income and education), lifestyles (e.g., extracurricular activities, sleep) and developmental adverse events (e.g., parental use of alcohol/tobacco, pregnancy complications), led to a predictive performance of r\=.49. 

      In a series of separate commonality analyses, we found that the relationship between cognitive abilities and mental health was primarily represented by multimodal neuroimaging (66%) and, to a lesser extent, by PGS (21%). Additionally, environmental factors accounted for 63% of the variance in the relationship between cognitive abilities and mental health. The multimodal neuroimaging and PGS then explained 58% and 21% of the variance due to environmental factors, respectively. Notably, these patterns remained stable over two years. 

      Our findings underscore the significance of neurobiological units of analysis for cognitive abilities, as measured by multimodal neuroimaging and PGS, in understanding both a) the relationship between cognitive abilities and mental health and b) the variance in this relationship shared with environmental factors.”

      (2) Could the authors clarify what they mean by "completing the transdiagnostic aetiology of mental health" in the introduction? (Second paragraph). 

      Thank you. 

      We intended to convey that understanding the transdiagnostic aetiology of mental health would be enhanced by knowing how neurobiological units of cognitive abilities, from the brain to genes, capture variations due to environmental factors. We realise this sentence might be confusing. Removing it does not alter the intended meaning of the paragraph, as we clarified this point later. The paragraph now reads:

      “According to the National Institute of Mental Health’s Research Domain Criteria (RDoC) framework (Insel et al., 2010), cognitive abilities should be investigated not only behaviourally but also neurobiologically, from the brain to genes. It remains unclear to what extent the relationship between cognitive abilities and mental health is represented in part by different neurobiological units of analysis -- such as neural and genetic levels measured by multimodal neuroimaging and polygenic scores (PGS). To fully comprehend the role of neurobiology in the relationship between cognitive abilities and mental health, we must also consider how these neurobiological units capture variations due to environmental factors, such as sociodemographics, lifestyles, and childhood developmental adverse events (Morris et al., 2022). Our study investigated the extent to which a) environmental factors explain the relationship between cognitive abilities and mental health, and b) cognitive abilities at the neural and genetic levels capture these associations due to environmental factors. Specifically, we conducted these investigations in a large normative group of children from the ABCD study (Casey et al., 2018). We chose to examine children because, while their emotional and behavioural problems might not meet full diagnostic criteria (Kessler et al., 2007), issues at a young age often forecast adult psychopathology (Reef et al., 2010; Roza et al., 2003). Moreover, the associations among different emotional and behavioural problems in children reflect transdiagnostic dimensions of psychopathology (Michelini et al., 2019; Pat et al., 2022), making children an appropriate population to study the transdiagnostic aetiology of mental health, especially within a framework that emphasises normative variation from normal to abnormal, such as the RDoC (Morris et al., 2022).“

      (3) It is unclear to me what the authors mean by this statement in the introduction: "Note that using the word 'proxy measure' does not necessarily mean that the predictive model for a particular measure has a high predictive performance - some proxy measures have better predictive performance than others". 

      We added this sentence to address a previous reviewer’s comment: “The authors use the phrasing throughout 'proxy measures of cognitive abilities' when they discuss PRS, neuroimaging, sociodemographics/lifestyle, and developmental factors. Indeed, the authors are able to explain a large proportion of variance with different combinations of these measures, but I think it may be a leap to call all of these proxy measures of cognition. I would suggest keeping the language more objective and stating these measures are associated with cognition.” 

      Because of this comment, we assumed that the reviewers wanted us to avoid the misinterpretation that a proxy measure implies high predictive performance. This term is used in machine learning literature (for instance, Dadi et al., 2021). We added the aforementioned sentence to ensure readers that using the term 'proxy measure' does not necessarily mean that the predictive model for a particular measure has high predictive performance. However, it seems that our intention led to an even more confusing message. Therefore, we decided to delete that sentence but keep an earlier sentence that explains the meaning of a proxy measure (see below).

      “With opportunistic stacking, we created a ‘proxy’ measure of cognitive abilities (i.e., predicted value from the model) at the neural unit of analysis using multimodal neuroimaging.”

      (4) Overall, despite comments from reviewers at another journal, I think the authors still refer to RDoC more than needed in the intro given the restructuring of the manuscript. For instance, at the end of page 4 and top of page 5, it becomes a bit confusing when the authors mention how they deviated from the RDoC framework, but their choice of cognitive domains is still motivated by RDoC. I think the chosen cognitive constructs are consistent with what is in ABCD and what other studies have incorporated into the g factor and do not require the authors to further justify their choice through RDoC. Also, there is emerging work showing that RDoC is limited in its ability to parse apart meaningful neuroimaging-based patterns; see for instance, Quah et al., Nature 2025 (https://doi.org/10.1038/s41467-025-55831-z). 

      Thank you very much for your comment. We have addressed it in our Response to Reviewer 1’s summary, strengths, and weaknesses above. We have rewritten the paragraph to clarify the relevance of our work to the RDoC framework and to recent studies aiming to improve RDoC constructs (including that from Quah and colleagues).

      (5) I am still on the fence about the use of 'proxy measures of cognitive abilities' given that it is defined as the predictive performance of mental health measures in predicting cognition - what about just calling these mental health predictors? Also, it would be easier to follow this train of thought throughout the manuscript. But I leave it to the authors if they decide to keep their current language of 'proxy measure of cognition'. 

      Thank you so much for your flexibility. As we explained previously, this ‘proxy measures’ term is used in machine learning literature (for instance, Dadi et al., 2021). We thought about other terms, such as “score”, which is used in genetics, i.e., polygenic scores (Choi et al., 2020). and has recently been used in neuroimaging, i.e., neuroscore (Rodrigue et al., 2024). However, using a ‘score’ is a bit awkward for mental health and socio-demographics, lifestyle and developmental adverse events. Accordingly, we decided to keep the term ‘proxy measures’.

      (6) It is unclear which cognitive abilities are being predicted in Figure 1, given the various domains that authors describe in their intro. Is it the g-factor from CFA? This should be clarified in all figure captions. 

      Yes, cognitive abilities are operationalised using a second-order latent variable, the g-factor from a CFA. We now added the following sentence to Figure 1, 2, 4 to make this point clearer. Thank you for the suggestion:

      “Cognitive abilities are based on the second-order latent variable, the g-factor, based on a confirmatory factor analysis of six cognitive tasks.”

      (7) I think it may also be worthwhile to showcase the explanatory power cognitive abilities have in predicting mental health or at least comment on this in the discussion. Certainly, there may be a bidirectional relationship here. The prediction direction from cognition to mental health may be an altogether different objective than what the paper currently presents, but many researchers working in psychiatry may take the stance (with support from the literature) that cognitive performance may serve as premorbid markers for later mental health concerns, particularly given the age range that the authors are working with in ABCD. 

      Thank you for this comment. 

      It is important to note that we do not make a directional claim in these cross-sectional analyses. The term "prediction" is used in a machine learning sense, implying only that we made an out-of-sample prediction (Yarkoni & Westfall, 2017). Specifically, we built predictive models on some samples (i.e., training participants) and applied our models to test participants who were not part of the model-building process. Accordingly, our predictive models cannot determine whether mental health “causes” cognitive abilities or vice versa, regardless of whether we treat mental health or cognitive abilities as feature/explanatory/independent variables or as target/response/outcome variables in the models. To demonstrate directionality, we would need to conduct a longitudinal analysis with many more repeated samples and use appropriate techniques, such as a cross-lagged panel model. It is beyond the scope of this manuscript and will need future releases of the ABCD data.

      We decided to use cognitive abilities as a target variable here, rather than a feature variable, mainly for theoretical reasons. This work was inspired by the RDoC framework, which emphasises functional domains. Cognitive abilities is the functional domain in the current study. We created predictive models to predict cognitive abilities based on a) mental health, b) multimodal neuroimaging, c) polygenic scores, and d) environmental factors. We could not treat cognitive abilities as a functional domain if we used them as a feature variable. For instance, if we predicted mental health (instead of cognitive abilities) from multimodal neuroimaging and polygenic scores, we would no longer capture the neurobiological units of analysis for cognitive abilities.

      We now made it clearer in the discussion that our use of predictive models cannot provide the directional of the effects

      “Our predictive modelling revealed a medium-sized predictive relationship between cognitive abilities and mental health. This finding aligns with recent meta-analyses of case-control studies that link cognitive abilities and mental disorders across various psychiatric conditions (Abramovitch et al., 2021; East-Richard et al., 2020). Unlike previous studies, we estimated the predictive, out-of-sample relationship between cognitive abilities and mental disorders in a large normative sample of children. Although our predictive models, like other cross-sectional models, cannot determine the directionality of the effects, the strength of the relationship between cognitive abilities and mental health estimated here should be more robust than when calculated using the same sample as the model itself, known as in-sample prediction/association (Marek et al., 2022; Yarkoni & Westfall, 2017). Examining the PLS loadings of our predictive models revealed that the relationship was driven by various aspects of mental health, including thought and externalising symptoms, as well as motivation. This suggests that there are multiple pathways—encompassing a broad range of emotional and behavioural problems and temperaments—through which cognitive abilities and mental health are linked.”

      (8) There is a lot of information packed into Figure 3 in the brain maps; I understand the authors wanted to fit this onto one page, and perhaps a higher resolution figure would resolve this, but the brain maps are very hard to read and/or compare, particularly the coronal sections. 

      Thank you for this suggestion. We agree with Reviewer 1 that we need to have a better visualisation of the feature-importance brain maps. To ensure that readers can clearly see the feature importance, we added a Zoom-in version of the feature-importance brain maps as Supplementary Figures 4 – 13.

      (9) It would be helpful for authors to cluster features in the resting state functional connectivity correlation matrices, and perhaps use shorter names/acronyms for the labels. 

      Thank you for this suggestion. 

      We have now added a zoomed-in version of the feature importance for rs-fmri as Supplementary Figure 7 (for baseline) and 12 (for follow-up).

      (10) Figures 4a) and 4b): please elaborate on "developmental adverse" in the title. I am assuming this is referring to childhood adverse events, or "developmental adversities". 

      Thank you so much for pointing this out. We meant ‘developmental adverse events’. We have made changes to this figure in the current manuscript.

      (11) For the "follow-up" analyses, I would recommend the authors present this using only the features that are indeed available at follow-up, even if the list of features is lower, otherwise it becomes a bit confusing with the mix of baseline and follow-up features. Or perhaps the authors could make this more clear in the figures by perhaps having a different color for baseline vs follow-up features along the y-axis labels. 

      Thank you for this advice. We have now added an indicator in the plot to show whether the features were collected in the baseline or follow-up. We also added colours to indicate which type of environmental factors they were. It is now clear that the majority of the features that were collected at baseline, but were used for the followup predictive model, were developmental adverse events.

      (12) Minor: Makowski et al 2023 reference can be updated to Makowski et al 2024, published in Cerebral Cortex. 

      Thank you for pointing this out. We have updated the citation accordingly. 

      References

      Abramovitch, A., Short, T., & Schweiger, A. (2021). The C Factor: Cognitive dysfunction as a transdiagnostic dimension in psychopathology. Clinical Psychology Review, 86, 102007. https://doi.org/10.1016/j.cpr.2021.102007

      Beam, E., Potts, C., Poldrack, R. A., & Etkin, A. (2021). A data-driven framework for mapping domains of human neurobiology. Nature Neuroscience, 24(12), 1733–1744. https://doi.org/10.1038/s41593-021-00948-9

      Calvin, C. M., Batty, G. D., Der, G., Brett, C. E., Taylor, A., Pattie, A., Čukić, I., & Deary, I. J. (2017). Childhood intelligence in relation to major causes of death in 68 year follow-up: Prospective population study. BMJ, j2708. https://doi.org/10.1136/bmj.j2708

      Casey, B. J., Cannonier, T., Conley, M. I., Cohen, A. O., Barch, D. M., Heitzeg, M. M., Soules, M. E., Teslovich, T., Dellarco, D. V., Garavan, H., Orr, C. A., Wager, T. D., Banich, M. T., Speer, N. K., Sutherland, M. T., Riedel, M. C., Dick, A. S., Bjork, J. M., Thomas, K. M., … ABCD Imaging Acquisition Workgroup. (2018). The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Developmental Cognitive Neuroscience, 32, 43–54. https://doi.org/10.1016/j.dcn.2018.03.001

      Choi, S. W., Mak, T. S.-H., & O’Reilly, P. F. (2020). Tutorial: A guide to performing polygenic risk score analyses. Nature Protocols, 15(9), Article 9. https://doi.org/10.1038/s41596-020-0353-1

      Dadi, K., Varoquaux, G., Houenou, J., Bzdok, D., Thirion, B., & Engemann, D. (2021). Population modeling with machine learning can enhance measures of mental health. GigaScience, 10(10), giab071. https://doi.org/10.1093/gigascience/giab071

      Deary, I. J. (2012). Intelligence. Annual Review of Psychology, 63(1), 453–482. https://doi.org/10.1146/annurev-psych-120710-100353

      Deary, I. J., Pattie, A., & Starr, J. M. (2013). The Stability of Intelligence From Age 11 to Age 90 Years: The Lothian Birth Cohort of 1921. Psychological Science, 24(12), 2361–2368. https://doi.org/10.1177/0956797613486487

      East-Richard, C., R. -Mercier, A., Nadeau, D., & Cellard, C. (2020). Transdiagnostic neurocognitive deficits in psychiatry: A review of meta-analyses. Canadian Psychology / Psychologie Canadienne, 61(3), 190–214. https://doi.org/10.1037/cap0000196

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. https://doi.org/10.1176/appi.ajp.2010.09091379

      Kessler, R. C., Amminger, G. P., Aguilar-Gaxiola, S., Alonso, J., Lee, S., & Üstün, T. B. (2007). Age of onset of mental disorders: A review of recent literature. Current Opinion in Psychiatry, 20(4). https://journals.lww.com/co-psychiatry/fulltext/2007/07000/age_of_onset_of_mental_disorders_a_review_of .10.aspx

      Marek, S., Tervo-Clemmens, B., Calabro, F. J., Montez, D. F., Kay, B. P., Hatoum, A. S., Donohue, M. R., Foran, W., Miller, R. L., Hendrickson, T. J., Malone, S. M., Kandala, S., Feczko, E., Miranda-Dominguez, O., Graham, A. M., Earl, E. A., Perrone, A. J., Cordova, M., Doyle, O., … Dosenbach, N. U. F. (2022). eproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654–660. https://doi.org/10.1038/s41586-022-04492-9

      Michelini, G., Barch, D. M., Tian, Y., Watson, D., Klein, D. N., & Kotov, R. (2019). Delineating and validating higher-order dimensions of psychopathology in the Adolescent Brain Cognitive Development (ABCD) study. Translational Psychiatry, 9(1), 261. https://doi.org/10.1038/s41398-019-0593-4

      Morris, S. E., & Cuthbert, B. N. (2012). Research Domain Criteria: Cognitive systems, neural circuits, and dimensions of behavior. Dialogues in Clinical Neuroscience, 14(1), 29–37.

      Morris, S. E., Sanislow, C. A., Pacheco, J., Vaidyanathan, U., Gordon, J. A., & Cuthbert, B. N. (2022). Revisiting the seven pillars of RDoC. BMC Medicine, 20(1), 220. https://doi.org/10.1186/s12916-022-02414-0

      Pat, N., Riglin, L., Anney, R., Wang, Y., Barch, D. M., Thapar, A., & Stringaris, A. (2022). Motivation and Cognitive Abilities as Mediators Between Polygenic Scores and Psychopathology in Children. Journal of the American Academy of Child and Adolescent Psychiatry, 61(6), 782-795.e3. https://doi.org/10.1016/j.jaac.2021.08.019

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2023). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, 33(6), 2682–2703. https://doi.org/10.1093/cercor/bhac235

      Quah, S. K. L., Jo, B., Geniesse, C., Uddin, L. Q., Mumford, J. A., Barch, D. M., Fair, D. A., Gotlib, I. H., Poldrack, R. A., & Saggar, M. (2025). A data-driven latent variable approach to validating the research domain criteria framework. Nature Communications, 16(1), 830. https://doi.org/10.1038/s41467-025-55831-z

      Reef, J., Diamantopoulou, S., van Meurs, I., Verhulst, F., & van der Ende, J. (2010). Predicting adult emotional and behavioral problems from externalizing problem trajectories in a 24-year longitudinal study. European Child & Adolescent Psychiatry, 19(7), 577–585. https://doi.org/10.1007/s00787-010-0088-6

      Rodrigue, A. L., Hayes, R. A., Waite, E., Corcoran, M., Glahn, D. C., & Jalbrzikowski, M. (2024). Multimodal Neuroimaging Summary Scores as Neurobiological Markers of Psychosis. Schizophrenia Bulletin, 50(4), 792–803. https://doi.org/10.1093/schbul/sbad149

      Roza, S. J., Hofstra, M. B., Van Der Ende, J., & Verhulst, F. C. (2003). Stable Prediction of Mood and Anxiety Disorders Based on Behavioral and Emotional Problems in Childhood: A 14-Year Follow-Up During Childhood, Adolescence, and Young Adulthood. American Journal of Psychiatry, 160(12), 2116–2121. https://doi.org/10.1176/appi.ajp.160.12.2116

      Yarkoni, T., & Westfall, J. (2017). Choosing Prediction Over Explanation in Psychology: Lessons From Machine Learning. Perspectives on Psychological Science, 12(6), 1100–1122. https://doi.org/10.1177/1745691617693393

    1. eLife Assessment

      This important manuscript introduces a genetic tool utilizing mutant mitfa-Cas9 expressing zebrafish to knockout genes to analyze melanocyte function in development and tumorigenesis. The data are convincing and the authors cover potential caveats from their model that might impact its utility for future work. This work significantly adds to the existing approaches in the field, as the mitfa:Cas9 strategy taken here provides a roadmap for generating similar platforms for using other tissue-specific regulators and Cas proteins in the future.

    2. Reviewer #1 (Public review):

      Summary:

      Perlee et al. sought to generate a zebrafish line where CRISPR-based gene editing is exclusively limited to the melanocyte lineage, allowing assessment of cell-type restricted gene knockouts. To achieve this, they knocked in Cas9 to the endogenous mitfa locus, as mitfa is a master regulator of melanocyte development. The authors use multiple candidate genes - albino, sox10, tuba1a, ptena/ptenb, tp53 - to demonstrate that their system induces lineage-restricted gene editing. This method allows researchers to bypass embryonic lethal and non-cell autonomous phenotypes emerging from whole body knockout (sox10, tuba1a), drive directed phenotypes, such as depigmentation (albino), and induce lineage-specific tumors, such as melanomas (ptena/ptenb, tp53, when accompanied with expression of BRAFV600E). The main weakness of the manuscript is that the mechanistic explanations proposed to underlie the presented phenotypes are minimally interrogated, but nonetheless interesting and motivating for future experimentation. Overall, there is a clear use for this genetic methodology, and its implementation will be of value to many in vivo researchers.

      Strengths:

      The strongest component of this manuscript is the genetic control offered by the mitfa:Cas9 system and the ability to make stable, lineage-specific knockouts in zebrafish. This is exemplified by the studies of tuba1a, where the authors nicely show non-cell autonomous mechanisms have obfuscated the role of this gene in melanocyte development. In addition, the mitfa:Cas9 system is elegantly straightforward and can be easily implemented in many labs. Mostly, the figures are clean, controls are appropriate, and phenotypes are reproducible. The invented method is a welcome addition to the arsenal of genetic tools used in zebrafish. The authors kindly and honestly responded to reviewer criticism, which has led to an improved manuscript and a pleasant review process.

      Weaknesses:

      The authors argue that the benefit of their system is the maintenance of endogenous regulatory elements. However, no direct comparison is made with other tools that offer similar genetic control, such as MAZERATI. This is a missed opportunity to provide researchers the ability to evaluate these two similar genetic approaches. There is a slight concern that tumor onset with this system is hindered by the heterozygous state it imparts to the lineage master regulator (here, mitfa). The authors do a good job at addressing these issues in the Discussion, but experimentation would have been appreciated. Additionally, the authors claim 86% of mitfa+ cells express Cas9. The image shown in Figure 1C does not do a convincing job at showing this percentage.

      Another weakness of the manuscript regards minimally investigated mechanistic explanations for each biological vignette. Detailed mechanistic information is indeed out-of-scope for this manuscript, which intends to prove the efficacy of a genetic tool. Readers are cautioned to use the mechanistic insights from these vignettes as inspiration rather than bona fide truth.

      The authors performed the necessary experiments to address each of the reviewers' concerns and thereby quell any substantial issues raised during the first review. They have additionally edited their language appropriately to make their claims more accurate. Their efforts during the review process are appreciated.

      Conclusion:

      The authors were highly receptive to reviewer comments and improved their manuscript from the first submission. The authors were successful in their goal of creating a rapid genetic approach to study cell-type specific genetic insults in vivo. They have presented multiple interesting and convincing stories to support the power of their invented methodology. The refined mechanisms underlying their observed phenotypes may be lacking but this does not take away from the methodological benefit this manuscript provides to the large field of in vivo researchers.

    3. Reviewer #3 (Public review):

      Summary:

      Perlee et al. present a method for generating cell-type restricted knockouts in zebrafish, focusing on melanocytes. For this method, the authors knock-in a Cas9 encoding sequence into the mitfa locus. This mitfaCas9 line has restricted Cas9 expression, allowing the authors to generate melanocyte-specific knockouts rapidly by follow-up injection of sgRNA expressing transposon vectors.

      The paper presents some interesting vignettes to illustrate the utility of their approach. These include 1) a derivation of albino mutant fish as a demonstration of the method's efficiency, 2) an interrogation and novel description of tuba1a/tuba1c as a potential non-autonomous contributor to melanosome dispersion, and 3) the generation of sox10 deficient melanoma tumors that show "escape" of sox10 loss through upregulation of sox9. The latter two examples highlight the usefulness of cell-type targeted knockouts (Body-wide sox10 and tuba1a loss elicit developmental defects). Additionally, the tumor models involve highly multiplexed sgRNAs for tumor initiation which is nicely facilitated by the stable Cas9.

      Strengths:

      The approach is clever and could prove very useful for studying melanocytes and other cell types. As the authors hint at in their discussion, this approach would become even more powerful with the generation of other Cas9-restricted lineages so a single sgRNA construct can be screened across many lineages rapidly (or many sgRNA and fish lines screened combinatorially).

      The biological findings used to demonstrate the power of the approach are interesting in their own right. The non-autonomous effect of tuba1a/tuba1c loss on melanosome dispersion are striking and demonstrates very nicely how one could use Perlee et al.'s approach to search for similar mechanisms systematically. The dual targeting nature of the tuba1a/tuba1c sgRNA also suggests similar approaches might be explored for knocking out paralogs. The observation of the sox9 escape mechanism with sox10 loss is a beautiful demonstration of the relevance of SOX10/SOX9's reciprocal regulation in vivo. This system would be a very nice model for further interrogating mechanisms/interventions surrounding Sox10 in melanoma.

      Finally, the figure presentation is very nice. This work involves complex genetic approaches, including multiple fish generations and multiplexed construct injections. The vector diagrams and breeding schemes in the paper make everything very clear/"grok-able," and the paper was enjoyable to read.

      Weaknesses:

      The authors' claims are grounded and tested rigorously. The major weaknesses that we raised in the first round of reviews were either addressed experimentally or are now detailed as limitations in the text. Congrats on the beautiful paper!

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Perlee et al. sought to generate a zebrafish line where CRISPR-based gene editing is exclusively limited to the melanocyte lineage, allowing assessment of cell-type restricted gene knockouts. To achieve this, they knocked in Cas9 to the endogenous mitfa locus, as mitfa is a master regulator of melanocyte development. The authors use multiple candidate genes - albino, sox10, tuba1a, ptena/ptenb, tp53 - to demonstrate their system induces lineagerestricted gene editing. This method allows researchers to bypass embryonic lethal and non-cell autonomous phenotypes emerging from whole body knockout (sox10, tuba1a), drive directed phenotypes, such as depigmentation (albino), and induce lineage-specific tumors, such as melanomas (ptena/ptenb, tp53, when accompanied with expression of BRAFV600E). While the genetic approaches are solid, the argued increase in efficiency of this model compared to current tools was untested, and therefore unable to be assessed. Furthermore, the mechanistic explanations proposed to underlie their phenotypes are mostly unfounded, as discussed further in the Weaknesses section. Despite these concerns, there is still a clear use for this genetic methodology and its implementation will be of value to many in vivo researchers.

      Strengths:

      The strongest component of this manuscript is the genetic control offered by the mitfa:Cas9 system and the ability to make stable, lineage-specific knockouts in zebrafish. This is exemplified by the studies of tuba1a, where the authors nicely show non-cell autonomous mechanisms have obfuscated the role of this gene in melanocyte development. In addition, the mitfa:Cas9 system is elegantly straightforward and can be easily implemented in many labs. Mostly, the figures are clean, controls are appropriate, and phenotypes are reproducible. The invented method is a welcomed addition to the arsenal of genetic tools used in zebrafish.

      Weaknesses:

      The major weaknesses of the manuscript include the overly bold descriptions of the value of the model and the superficial mechanistic explanations for each biological vignette.

      The authors argue that a major advantage of this system is its high efficiency. However, no direct comparison is made with other tools that achieve the same genetic control, such as MAZERATI. This is a missed opportunity to provide researchers the ability to evaluate these two similar genetic approaches. In addition, Fig.1 shows that not all melanocytes express Cas9. This is a major caveat that goes unaddressed. It is of paramount importance to understand the percentage of mitfa+ cells that express Cas9. The histology shown is unclear and too zoomed out of a scale to make any insightful conclusions, especially in Fig.S1. It would also be beneficial to see data regarding Cas9 expression in adult melanocytes, which are distinct from embryonic melanocytes in zebrafish. Moreover, this system still requires the injection of a plasmid encoding gRNAs of interest, which will yield mosaicism. A prime example of this discrepancy is in Fig.6, where sox10 is clearly still present in "sox10 KO" tumors.

      We agree with these points. While our method has the advantage of endogenous knockin (thus keeping all regulatory elements), you are correct that we did not make a direct comparison with existing technologies like MAZERATI, and therefore we cannot make comparative claims about efficiency. Based on this, we have revised the manuscript to remove these points, reduce the strength/boldness of the claims, and make it more clear what our system achieves in comparison to existing systems. In reference to the other specific points you raise above about mosaicism and extent of Cas9 expression:

      - We have added a paragraph to address the advantages and disadvantages of mitfaCas9 compared to expression of Cas9 with lineage-specific promoters including MAZERATI in the discussion.  

      - Figure 1C has been revised to more clearly show the overlap of mitfa and Cas9 in melanocytes. 

      - We then quantified the percentage of mitfa+ cells expressing Cas9 from the in situ hybridizations (Supplemental Figure S1D). We did attempt to look at Cas9 protein expression in both embryonic and adult melanocytes by immunofluorescence. Unfortunately, the Cas9 antibodies commercially available did not work on the zebrafish embryos or adult tailfins, so we are limited in proper quantification to the in situs in the embryos.

      The authors argue that their model allows rapid manipulation of melanocyte gene expression. Enthusiasm for the speed of this model is diminished by minimal phenotypes in the F0, as exemplified in Fig.2. Although the authors say >90% of fish have loss of pigmentation, this is misleading as the phenotype is a very weak, partial loss. Only in the F1 generation do robust phenotypes emerge, which takes >6 months to generate. How this is more efficient than other tools that currently exist is unclear and should be discussed in more detail.

      This needed clarification, and we have now modified the Discussion to reflect this more accurately. What we were trying to show is that both F0 and F1 fish can be useful in screening for the effect of any given gene. In the F0, while you are correct that the phenotype is indeed weak/partial, it is also quantifiable and therefore can be used as a rapid screen for potential effects of knockout, so it can help with speed. The major advantage of the F1 generation is that we can generate fully penetrant phenotypes for recessive genes since the fish just needs to have 1 copy of the Cas9/sgRNA instead of 2. This means we do not have to go to F2 or F3 generations, which really does save time. But we agree this could be achieved using MAZERATI, and so we have added these considerations to the manuscript, as we feel these are important.

      In Figure 3, the authors find that melanocyte-specific knockout of sox10 leads to only a 25% reduction in melanocytes in the F1 generation. This is in contradiction to prior literature cited describing sox10 as indispensable for melanocyte development. In addition, the authors argue that sox10 is required for melanocyte regeneration. This claim is not accurate, as >50% of melanocytes killed upon neocuproine treatment can regenerate. This data would indicate that sox10 is required for only a subset of melanocytes to develop (Fig.3C) and for only a subset to regenerate (Fig.3G). This is an interesting finding that is not discussed or interrogated further.

      We too were initially very puzzled by this result. We do not completely understand it, but we have two thoughts about it. First could be timing. sox10 usually starts to be expressed around the 1-somite stage, and so in the original sox10/colourless mutant (which truly has no melanocytes), sox10 will be lost during those early stages. In contrast, mitf comes on later (around 18hpf) so this might indicate that there is a subset of melanocytes that are dependent upon this early expression of sox10. This may indicate that there could be different functions of sox10 early in melanocyte development versus later timepoints after melanocytes have already been specified. This might also help explain our findings during regeneration.  Second could be genetic compensation. Since in the other parts of the paper we seem to see a somewhat reciprocal relationship between sox10 and sox9, it is conceivable that loss of sox10 in the melanocytes could be compensated for by sox9 (or even other genes) in our CRISPR approach (as opposed to the ENU allele in colourless). Since we really do not fully understand this, we have added a section to the Discussion about this issue, mentioning these possibilities but leaving open other yet to be defined mechanisms.

      Tumor induction by this model is weak, as indicated by the tumor curves in Figs.5,6. This might be because these fish are mitfa heterozygous. Whereas the avoidance of mitfa overexpression driven by other models including MAZERATI is a benefit of this system, the effect of mitfa heterozygosity on tumor incidence was untested. This is an essential question unaddressed in the manuscript.

      We agree that in the BRAF;p53 group especially tumor incidence is very low, although PTEN loss does accelerate it. One possibility is exactly as you stated, and that mitfa heterozygosity is the etiology. The other possibility is that in the MAZERATI approach (https://pubmed.ncbi.nlm.nih.gov/30385465/) the authors used the casper background as opposed to the wild-type T5D as we did in our study. In unpublished observations, we have found that casper (with miniCoopR rescue) is markedly more sensitive to melanoma induction compared to WT fish in this setting. In fact, in looking at our BRAF;p53 curves compared to the original Patton paper curves (https://pubmed.ncbi.nlm.nih.gov/15694309/) which were also done in a WT background with no miniCoopR, they are fairly similar. This might indicate that casper + miniCoopR particularly sensitizes the fish to melanoma. However, because we do not fully know the reasons for this, we have now included both of these possible reasons in the Discussion.

      In Fig.6, the authors recapitulate previous findings with their model, showing sox10 KO inhibits tumor onset. The tumors that do develop are argued to be highly invasive, have mesenchymal morphology, and undergo phenotypic switching from sox10 to sox9 expression. The data presented do not sufficiently support these claims. The histology is not readily suggestive of invasive, mesenchymal melanomas. Sox10 is still present in many cells and sox9 expression is only found in a small subset (<20%). Whether sox10-null cells are the ones expressing sox9 is untested. If sox9-mediated phenotypic switching is the major driver of these tumors, the authors would need to knockout sox9 and sox10 simultaneously and test whether these "rare" types of tumors still emerge. Additional histological and genetic evaluation is required to make the conclusions presented in Fig.6. It feels like a missed opportunity that the authors did not attempt to study genes of unknown contribution to melanoma with their system.

      We did not mean to overstate the admittedly early observations from these fish. Invasiveness in the fish models can be difficult to precisely quantify, and therefore is somewhat qualitative. While we did not mean to imply that every cell that loses sox10 will become sox9 positive (which is clearly not the case), the human single-cell RNA-seq data does suggest these are somewhat mutually exclusive populations (https://pubmed.ncbi.nlm.nih.gov/32753671/). This phenomenon has also long been observed even prior to single-cell approaches (https://pubmed.ncbi.nlm.nih.gov/25629959/). So while we agree our data is not definitive in this regard, it is consistent with the literature and was presented mainly to provide areas for future exploration with the model. 

      Overall, this manuscript introduces a solid method to the arsenal of zebrafish genetic tools but falls short of justifying itself as a more efficient and robust approach than what currently exists. The mechanisms provided to explain observed phenotypes are tenuous. Nonetheless, the mitfa:Cas9 approach will certainly be of value to many in vivo biologists and lays the foundation to generate similar methods using other tissue-specific regulators and other Cas proteins.

      We hope that by toning down the language around what we have observed, and providing as honest an assessment as possible as to what might be occurring, that the manuscript will be helpful for future studies aiming to knock out genes in the melanocyte lineage.

      Reviewer #2 (Public review):

      Summary:

      This manuscript describes a genetic tool utilizing mutant mitfa-Cas9 expressing zebrafish to knockout genes to analyze their function in melanocytes in a range of assays from developmental biology to tumorigenesis. Overall, the data are convincing and the authors cover potential caveats from their model that might impact its utility for future work.

      Strengths:

      The authors do an excellent job of characterizing several gene deletions that show the specificity and applicability of the genetic mitfa-Cas9 zebrafish to studying melanocytes.

      Weaknesses:

      Variability across animals not fully analyzed.

      To more clearly show variability across animals, we calculated the percentage of mitfa+ cells that express Cas9 across n=7 mitfaCas9 embryos. We also expanded Supplemental Figure 2 to show loss of pigmentation across n=7 individual adult MG-albino F2 fish instead of one representative image.

      Reviewer #3 (Public review):

      Summary:

      Perlee et al. present a method for generating cell-type restricted knockouts in zebrafish, focusing on melanocytes. For this method, the authors knock-in a Cas9 encoding sequence into the mitfa locus. This mitfaCas9 line has restricted Cas9 expression, allowing the authors to generate melanocyte-specific knockouts rapidly by follow-up injection of sgRNA expressing transposon vectors.

      The paper presents some interesting vignettes to illustrate the utility of their approach. These include 1) a derivation of albino mutant fish as a demonstration of the method's efficiency, 2) an interrogation and novel description of tuba1a as a potential non-autonomous contributor to melanocyte dispersion, and 3) the generation of sox10 deficient melanoma tumors that show "escape" of sox10 loss through upregulation of sox9. The latter two examples highlight the usefulness of cell-type targeted knockouts (Body-wide sox10 and tuba1a loss elicit developmental defects). Additionally, the tumor models involve highly multiplexed sgRNAs for tumor initiation which is nicely facilitated by the stable Cas9.

      Strengths:

      The approach is clever and could prove very useful for studying melanocytes and other cell types. As the authors hint at in their discussion, this approach would become even more powerful with the generation of other Cas9-restricted lineages so a single sgRNA construct can be screened across many lineages rapidly (or many sgRNA and fish lines screened combinatorially).

      The biological findings used to demonstrate the power of the approach are interesting in their own right. If it proves true, tuba1a's non-autonomous effects on melanosome dispersion are striking, and this example demonstrates very nicely how one could use Perlee et al.'s approach to search for other non-autonomous mechanisms systematically. Similarly, the observation of the sox9 escape mechanism with sox10 loss is a beautiful demonstration of the relevance of SOX10/SOX9's reciprocal regulation in vivo. This system would be a very nice model for further interrogating mechanisms/interventions surrounding Sox10 in melanoma.

      Finally, the figure presentation is very nice. This work involves complex genetic approaches including multiple fish generations and multiplexed construct injections. The vector diagrams and breeding schemes in the paper make everything very clear/"grok-able," and the paper was enjoyable to read.

      Weaknesses:

      The mitfa-driven GFP on their sgRNA-expressing cassette is elegant, but it makes one wonder why the endogenous knock-in is necessary. It would strengthen the motivation of the work if the authors could detail the potential advantages and disadvantages of their system compared to expressing Cas9 with a lineage-specific promoter from a transposon in their introduction or discussion.

      We agree this needed a better and more clear explanation. There are many excellent examples of promoter driven Cas9 approaches. Within melanocytes, Ablain and others have developed the MAZERATI system (https://pubmed.ncbi.nlm.nih.gov/30385465/) which is very powerful, especially for melanoma development. In our minds, the major advantage of endogenous knockin is that we retain all of the natural regulatory elements (many of which are not known) and so small promoter fragments always run the risk of missing certain types of regulation. While these regulatory elements may not matter under homeostatic conditions, they may become very important under perturbation, stress or disease states. This is why it is common, for example, in the mouse field, to knock in things like Cre into endogenous loci. We have now added a clarification of this to the manuscript.

      Related to the above - is mitfa haplosufficient? If the mitfaCas9/+ fish have any notable phenotypes, it would be worth noting for others interested in using this approach to study melanoma and pigmentation.

      In normal melanocytes, mitfa is haplosufficient. There are no visible differences between mitfaCas9/+ and wild-type fish at any stages of development (Figure S1C). Although we did not directly compare tumor growth in mitfa-/+ and mitfa+/+ fish in this study, it is possible that the disruption of mitfa in mitfaCas9/+ fish affects melanoma development. Most zebrafish melanoma models involve the overexpression of mitfa with MiniCoopR vectors and it would be interesting in future studies to determine how mitfa heterozygosity affects melanoma initiation or progression. 

      A core weakness (and also potential strength) of the system is that introduced edits will always be non-clonal (Fig 2H/I). The activity of individual sgRNAs should always be validated in the absence of any noticeable phenotype to interpret a negative result. Additionally, caution should be taken when interpreting results from rare events involving positive outgrowth (like tumorogenesis) to account for the fact many cells in the population might not have biallelic null alleles (i.e., 100% of the gene product removed).

      Along those lines: in my opinion, the tuba1a results are the most provocative finding in the paper, but they lack key validation. With respect to cutting activity, the Alt-R and transgenic sgRNA expression approaches are not directly comparable. Since there is no phenotype in the melanocyte specific tuba1a knockouts, the authors must confirm high knockout efficiency with this set of reagents before making the claim there is a non-autonomous phenotype. This can be achieved with GFP+ sorting and NGS like they performed with their albino melanocytes.

      The whole-body tuba1a knockout phenotype is expected to be pleiotropic, and this expectation might mask off-target effects. Controls for knockout specificity should be included. For instance, confidence in the claims would greatly increase if the dispersed melanosome phenotype could be recovered with guide-resistant tuba1a re-expression and if melanocyte-restricted tuba1a reexpression failed to rescue. As a less definitive but adequate alternative, the authors could also test if another guide or a morpholino against tuba1a phenocopies the described Alt-R edited fish.

      Thank you for your thoughtful suggestions, which led us to an important discovery. While validating the original tuba1a guide RNA, we found that tuba1a sg1 also targets tuba1c, a gene that shares 99.78% homology with tuba1a in zebrafish. To determine which gene was responsible for the melanocyte phenotype, we designed multiple new guide RNAs specifically targeting either tuba1a or tuba1c and used Alt-R to globally knock them out in zebrafish embryos. However, none of these guides successfully replicated the phenotype (Sanger sequencing validation for the most efficient tuba1a and tuba1c guides is provided below).

      Ultimately, we identified a new guide RNA (5’-GGTCTACAAAGACAGCCCTA-3’) that successfully phenocopied the original tuba1a sg1 melanocyte phenotype. Tuba1c—but not tuba1a—was predicted to have a mismatch at the 3’ end of the guide sequence, which is typically expected to inhibit target cleavage. Surprisingly, despite this mismatch, we observed robust cleavage in both tuba1a and tuba1c. Since the melanocyte phenotype was only reproducible when both tuba1a and tuba1c were targeted, this suggests potential compensatory interactions between these highly similar genes. We have updated the text and figures to reflect this finding and have included validation of this second guide RNA (tuba1a/c sg2) in Supplemental Figure 3.

      As you suggested, we also conducted GFP+ sorting and NGS to confirm knockout of both tuba1a and tuba1c in melanocytes of mitfaCas9 fish (Figure S3G). The knockout percentages were comparable to those observed in our previous experiment with MG_-albino_ fish. This also confirms that this method can be used to sort and sequence GFP+ cells even when pigmentation is retained, which was not the case for albino fish. 

      I have similar questions about the sox10 escapers, but these suggestions are less critical for supporting the authors claims (especially given the nice staining). Are the sox10 tumors relatively clonal with respect to sox10 mutations? And are the sox10 tumor mutations mostly biallelic frameshifts or potential missense mutations/single mutations that might not completely remove activity? I am particularly curious as SOX10 doesn't seem to be completely absent (and is still very high in some nuclei) in the immunohistochemistry.

      We attempted to address this question by performing DNA sequencing on the FFPE blocks that we had retained from the original study. While our sequencing facility said this should be possible, we could not consistently generate high enough quality DNA to make a definitive statement either way. While we are very curious to know what the nature of the mutations are in these “escapers”, the student who performed these studies has now graduated, and it would take us several additional months to a year to fully address it. Given this, we would prefer to leave this open question to a future paper, but have addressed this limitation in the Discussion.

      Recommendations for the authors:

      Reviewing Editor:

      Overall, the reviewers felt and eLife concurs that your manuscript is insightful and appropriate for publication. Reviewers were impressed by your generating a zebrafish line where CRISPRbased gene editing is exclusively limited to the melanocyte lineage, allowing assessment of celltype restricted gene knockouts. Your use of multiple candidate genes to demonstrate that your system induces lineage-restricted gene editing is compelling and will be of interest to the broad readership of eLife. This method will allow researchers to bypass embryonic lethal and non-cell autonomous phenotypes emerging from whole body knockout, drive directed phenotypes, such as depigmentation, and induce lineage-specific tumors, such as melanomas. This said, the argued increase in efficiency of this model compared to current tools was untested, and therefore it remains difficult for a reader to assess the extent to which your new model represents a major advance over prior ones. Of additional concern are the mechanistic explanations proposed to underlie the phenotypes, as these are largely unfounded. Thus, in preparing your final publication version of the paper, eLife strongly encourages you to fully address the reviewers' thoughtful comments. In particular, the boldness of the claims made in the manuscript should be reduced. Terms like "highly efficient" and "rapid" are unsupported due to the lack of comparison with other well-established methods, like MAZERATI.

      As discussed above in each of the reviewer points above, we agree with both of these points. We have reduced the boldness of the claims, with a better discussion of the different approaches. We also address the potential mechanisms of our observations, and where and why we still lack an understanding of what gives rise to those phenotypes. 

      There are also some minor discrepancies that should be edited in the manuscript: Fig.2A plasmid description is written oppositely in text; Fig.3 labels G-H are swapped in the legend description; Fig.5A MTdT is unexplained. This is a non-exhaustive list, and the authors are encouraged to carefully read through their manuscript to revise other minor mistakes and formatting errors.

      Figure 2A was revised to show the correct orientation of mitfa:GFP and the guide RNA cassette as described in the text. Figure 3 legend was fixed. We have gone through the manuscript again to make sure we have not made any other errors, to the best of our knowledge.

      The biggest concern is the expression of cas9 and the weak histological support shown in Fig.1 and Fig.S1. It would be a benefit to all readers and potential future users to know how robust cas9 expression is in the melanocyte lineage. It would be helpful if there is a way to analyze the percentage of cells that are mutated in each animal to understand the variability that can exist across animals with the method.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The analysis of the scRNA sequencing could also be described more fully.

      More details have been added to the scRNA sequencing analysis including the functions that were used. 

      The final major concern is whether this model is genuinely more valuable than MAZERATI. A more elaborate discussion would benefit potential future users to guide their decisions regarding which tool best suits their experimental goals.

      As noted above, we agree with this statement. The reviewers are correct in that we did not directly compare our system to MAZERATI, and therefore cannot make any claims about efficiency in a comparative regard. Therefore, in our revised Discussion, we talk about the relative strengths and weaknesses of each approach, and emphasize that our approach mainly has the advantage of retaining endogenous regulatory elements for mitfa, but that each user should decide which is the best approach for their problem.

      There are also some minor concerns that should be addressed.

      Are the mitfaCas9 fish used as homozygotes before the first cross? If so, might be nice to include their nacre-like phenotype in diagrams like Fig 2A.

      For these studies, heterozygous mitfaCas9 fish were used for all breedings and progeny were sorted for BFP+ eyes. This enabled the comparison to sibling controls without Cas9 expression. 

      BFP+ eye screening for mitfaCas9 is elegant and included nicely in the diagrams. Are germline sgRNA integrants identified in F1 with melanocyte GFP? Or present at a high enough efficiency that this is not relevant? This would be good to include in the diagrams.

      Germline sgRNA integrants are identified with melanocyte GFP in embryos. Figure 2A has been edited to show GFP expression. 

      Most cells are GFP positive in S3C (the F0 "mosaic"). It might be nice to show a single GFP stripe like in the other panels for direct comparison of edited/non-edited in the same fish.

      This figure (now S3E) has been edited to show a clear comparison between GFP+ and GFP- cells in the same fish. 

      177 - CRISPR-Seq is basically amplicon sequencing. This would measure efficiency but not "specificity" as described. Off-target activity would have to be measured at other loci etc. Not necessary to do, but I don't think measured.

      In this case, “specificity” refers to cell type specificity, not genomic specificity. We are measuring cell type specificity by comparing on-target cutting in GFP+ cells (melanocytes) versus GFP- cells (non-mitfa expressing cells). We did not look at off-target activity of Cas9 in this study and have edited the text to make this clearer. 

      219 -"several gaps were visible"

      Fixed

      286 - TUBA1A should be italicized

      Fixed

      399 - SOX9's most enriched dependency in DepMap is cutaneous melanoma and its top coessential gene is SOX10. I'm not sure the SOX9/SOX10 interaction couldn't be parsed from DepMap alone.

      This is true, and the DepMap was actually somewhat of an inspiration for our own studies. We have modified the line to acknowledge this and explain the main advantage of our system is in vivo confirmation of what the DepMap had alluded to.

      433 - "fewer animals since all F1 animals (even those for recessive alleles) are informative."

      The fact that this is approach is faster and more efficient per animal is important to highlight (and very believable), but is this technically true given not all F1 fish will have Cas9 or a germline sgRNA integration?

      In considering this statement, we agree with you and decided to remove it from the text.

      We hope the comments in both the public and private reviews will help improve the manuscript.

      Reviewer #1 (Recommendations for the authors):

      Overall, the boldness of the claims made in the manuscript should be reduced. Terms like "highly efficient" and "rapid" are unsupported due to the lack of comparison with other wellestablished methods, like MAZERATI.

      As discussed above, we agree with this and have now modified the manuscript to better reflect what our system achieves in comparison to the well developed systems such as MAZERATI. Because we have not done a direct comparison, we are not able to make any claims about comparative efficiency, and instead focus on the potential benefits of a knockin approach, which is the maintenance of endogenous regulatory elements.

      There are some minor discrepancies that should be edited in the manuscript: Fig.2A plasmid description is written oppositely in text; Fig.3 labels G-H are swapped in the legend description; Fig.5A MTdT is unexplained. This is a non-exhaustive list, and the authors are encouraged to carefully read through their manuscript to revise other minor mistakes and formatting errors.

      Figure 2A was revised to show the correct orientation of mitfa:GFP and the guide RNA cassette as described in the text. Figure 3 legend was fixed. We have gone through the manuscript again to make sure we have not made any other errors, to the best of our knowledge.

      The biggest concern is the expression of cas9 and the weak histological support shown in Fig.1 and Fig.S1. It would be a benefit to all readers and potential future users to know how robust cas9 expression is in the melanocyte lineage.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The second major concern is whether this model is genuinely more valuable than MAZERATI. A more elaborate discussion would benefit potential future users to guide their decision regarding which tool best suits their experimental goals.

      As noted above, we agree with this statement. The reviewers are correct in that we did not directly compare our system to MAZERATI, and therefore cannot make any claims about efficiency in a comparative regard. Therefore, in our revised Discussion, we talk about the relative strengths and weaknesses of each approach, and emphasize that our approach mainly has the advantage of retaining endogenous regulatory elements for mitfa, but that each user should decide which is the best approach for their problem.

      We hope the comments in both the public and private reviews will help improve the manuscript.

      Reviewer #2 (Recommendations for the authors):

      While that authors show the indel charts for the Crispr mutations generated in the supplement. However, I wonder if there is a way to analyze the percentage of cells that are mutated in each animal to understand the variability that can exist across animals with the method.

      We have revised Figure 1C to show additional melanocytes and added a new quantification of Cas9 RNA expression in melanocytes (S1D). 

      The analysis of the scRNA sequencing could be described more fully.

      More details have been added to the scRNA sequencing analysis including the functions that were used. 

      Reviewer #3 (Recommendations for the authors):

      This was an excellent read, and I'm very interested in seeing it in its final form. Congratulations! My larger critiques are outlined in the public reviews. A few smaller points:

      Are the mitfaCas9 fish used as homozygotes before the first cross? If so, might be nice to include their nacre-like phenotype in diagrams like Fig 2A.

      For these studies, heterozygous mitfaCas9 fish were used for all breedings and progeny were sorted for BFP+ eyes. This enabled the comparison to sibling controls without Cas9 expression. 

      BFP+ eye screening for mitfaCas9 is elegant and included nicely in the diagrams. Are germline sgRNA integrants identified in F1 with melanocyte GFP? Or present at a high enough efficiency that this is not relevant? This would be good to include in the diagrams.

      Germline sgRNA integrants are identified with melanocyte GFP in embryos. Figure 2A has been edited to show GFP expression. 

      Most cells are GFP positive in S3C (the F0 "mosaic"). It might be nice to show a single GFP stripe like in the other panels for direct comparison of edited/non-edited in the same fish.

      This figure (now S3E) has been edited to show a clear comparison between GFP+ and GFP- cells in the same fish. 

      177 - My understanding is that CRISPR-Seq is basically amplicon sequencing. This would measure efficiency but not "specificity" as described. Off-target activity would have to be measured at other loci etc. Not necessary to do in my opinion, but I don't think measured.

      In this case, “specificity” refers to cell type specificity, not genomic specificity. We are measuring cell type specificity by comparing on-target cutting in GFP+ cells (melanocytes) versus GFP- cells (non-mitfa expressing cells). We did not look at off-target activity of Cas9 in this study and have edited the text to make this clearer. 

      219 -"several gaps were visible"

      Fixed

      286 - TUBA1A should be italicized

      Fixed

      399 - I think I understand the logic of the DepMap argument, and the importance of studying tumor initiation in vivo stands for itself. But here is maybe not the best example (or might need clarification)? - SOX9's most enriched dependency in DepMap is cutaneous melanoma and its top co-essential gene is SOX10. I'm not sure the SOX9/SOX10 interaction couldn't be parsed from DepMap alone.

      This is true, and the DepMap was actually somewhat of an inspiration for our own studies. We have modified the line to acknowledge this and explain the main advantage of our system is in vivo confirmation of what the DepMap had alluded to.

      433 - "fewer animals since all F1 animals (even those for recessive alleles) are informative."

      The fact that this is approach is faster and more efficient per animal is important to highlight (and very believable), but is this technically true given not all F1 fish will have Cas9 or a germline sgRNA integration?

      In considering this statement, we agree with you and decided to remove it from the text.

    1. eLife Assessment

      This important study reveals that Excitatory Amino Acid Transporters play a role in chromatic information processing in the retina. The combination of (double) mutants, behavioral assays, immunohistochemistry, and electroretinograms provides solid evidence supporting the appropriately conservative conclusions. The work will be of interest to neurobiologists working on color vision or retinal processing.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Garbelli et al. investigates the roles of excitatory amino acid transporters (EAATs) in retinal bipolar cells. The group previously identified that EAAT5b and EAAT7 are expressed at the dendritic tips of bipolar cells, where they connect with photoreceptor terminals. The previous study found that the light responses of bipolar cells, measured by electroretinogram (ERG) in response to white light, were reduced in double mutants, though there was little to no reduction in light responses in single mutants of either EAAT5b or EAAT7.

      The current study further explores the roles of EAAT5b and EAAT7 in bipolar cells' chromatic responses. The authors found that bipolar cell responses to red light, but not to green or UV-blue light, were reduced in single mutants of both EAAT5b and EAAT7. In contrast, UV-blue light responses were reduced in double mutants. Additionally, the authors observed that EAAT5b, but not EAAT7, is strongly localized in the UV cone-enriched area of the eye, known as the "Strike Zone (SZ)." This led them to investigate the impact of the EAAT5b mutation on prey detection performance, which is mediated by UV cones in the SZ. Surprisingly, contrary to the predicted role of EAAT5b in prey detection, EAAT5b mutants did not show any changes in prey detection performance compared to wild-type fish. Interestingly, EAAT7 mutants exhibited enhanced prey detection performance, though the underlying mechanisms remain unclear.

      The distribution of EAAT7 protein in the outer plexiform layer across the eye correlates with the distribution of red cones. Based on this, the authors tested the behavioral performance driven by red light in EAAT5b and EAAT7 mutants. The results here were again somewhat contrary to predictions based on ERG findings and protein localization: the optomotor response was reduced in EAAT5b mutants, but not in EAAT7 mutants.

      Strengths:

      Although the paper lacks cohesive conclusions, as many results contradict initial predictions as mentioned above, the authors discuss possible mechanisms for these contradictions and suggest future avenues for study. Nevertheless, this paper demonstrates a novel mechanism underlying chromatic information processing.<br /> The manuscript is well-written, the data are well-presented, and the analysis is thorough.

      Weaknesses:

      I have only a minor comment. The authors present preliminary data on mGluR6b distribution across the eye. Since this result is based on a single fish, I recommend either adding more samples or removing this data, as it does not significantly impact the paper's main conclusions.

      Comments on revisions:

      The authors addressed all of the concerns that I had in the original manuscript.

    3. Reviewer #2 (Public review):

      Garbelli et. al. set out to elucidate the function of two glutamate transporters, EAAT5b and EAAT7, in the functional and behavioral responses to different wavelengths of light. The question is an interesting one because these transporters are well-positioned to affect responses to light, and their distribution in the retina suggests that they could play differential roles in visual behaviors. However, the resolution of the functional and behavioral data presented here means that the conclusions are necessarily a bit vague.

      In Figure 1, the authors show that the double KO has a decreased ERG response to UV/blue and red wavelengths. However, the individual mutations both only affect the response to red light, suggesting that they might affect behaviors such as OMR that typically rely on this part of the visual spectrum. However, there was no significant change in the response to UV/blue light of any intensity, making it unclear whether the mutations could individually play roles in detection of UV prey. Based on the later behavioral data, it seems likely that at least the EAAT7 KO should affect retinal responses to UV light, but it may be that the ERG does not have the spatial or temporal resolution to detect the difference, or that the presence of blue light overwhelmed any effect of the individual knockouts on the response to UV light.

      In Figures 5 and 6, the authors compare the two knockouts to wild-type fish in terms of their sensitivity to UV prey in a hunting assay. The EAAT5b KO showed no significant impairment in UV sensitivity, while the EAAT7 KO fish actually had an increased hunting response to UV prey. However, there is no comparison of the KO and WT responses to different UV intensities, only in bulk, so we cannot conclude that the EAAT7 KO is allowing the fish to detect weaker prey-like stimuli.

      In Figure 7, the EAAT5b KO seems to cause a decrease in OMR behavior to red grating stimuli, but only one stimulus is tested, so it is unclear whether this is due to a change in visual sensitivity or resolution.

      The conclusions made in the manuscript are appropriately conservative; the abstract states that these transporters somehow influence prey detection and motion sensing, and this is likely true.

      In terms of impact on the field, this work highlights the potential importance of these two transporters to visual processing, but further studies will be required to say how important they are and exactly what they are doing.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Garbelli et al. investigates the roles of excitatory amino acid transporters (EAATs) in retinal bipolar cells. The group previously identified that EAAT5b and EAAT7 are expressed at the dendritic tips of bipolar cells, where they connect with photoreceptor terminals. The previous study found that the light responses of bipolar cells, measured by electroretinogram (ERG) in response to white light, were reduced in double mutants, though there was little to no reduction in light responses in single mutants of either EAAT5b or EAAT7.

      The current study further explores the roles of EAAT5b and EAAT7 in bipolar cells' chromatic responses. The authors found that bipolar cell responses to red light, but not to green or UV-blue light, were reduced in single mutants of both EAAT5b and EAAT7. In contrast, UV-blue light responses were reduced in double mutants. Additionally, the authors observed that EAAT5b, but not EAAT7, is strongly localized in the UV cone-enriched area of the eye, known as the "Strike Zone (SZ)." This led them to investigate the impact of the EAAT5b mutation on prey detection performance, which is mediated by UV cones in the SZ. Surprisingly, contrary to the predicted role of EAAT5b in prey detection, EAAT5b mutants did not show any changes in prey detection performance compared to wild-type fish. Interestingly, EAAT7 mutants exhibited enhanced prey detection performance, though the underlying mechanisms remain unclear.

      The distribution of EAAT7 protein in the outer plexiform layer across the eye correlates with the distribution of red cones. Based on this, the authors tested the behavioral performance driven by red light in EAAT5b and EAAT7 mutants. The results here were again somewhat contrary to predictions based on ERG findings and protein localization: the optomotor response was reduced in EAAT5b mutants, but not in EAAT7 mutants.

      Strengths:

      Although the paper lacks cohesive conclusions, as many results contradict initial predictions as mentioned above, the authors discuss possible mechanisms for these contradictions and suggest future avenues for study. Nevertheless, this paper demonstrates a novel mechanism underlying chromatic information processing.

      The manuscript is well-written, the data are well-presented, and the analysis is thorough.

      We are happy about the perceived strengths of our manuscript.

      Weaknesses:

      I have only a minor comment. The authors present preliminary data on mGluR6b distribution across the eye. Since this result is based on a single fish, I recommend either adding more samples or removing this data, as it does not significantly impact the paper's main conclusions.

      We agree that the mGluR6 result is statistically underpower (we would never claim differently). The data is based on only one clutch of fish, comprising 11 eyes. Since the data is anyway in the supplement and not part of the main story, we would like to keep it to spur further investigations into anisotropic distribution of synaptic proteins.

      Reviewer #2 (Public review):

      Garbelli et. al. set out to elucidate the function of two glutamate transporters, EAAT5b and EAAT7, in the functional and behavioral responses to different wavelengths of light. The question is an interesting one, because these transporters are well positioned to affect responses to light, and their distribution in the retina suggests that they could play differential roles in visual behaviors. However, the low resolution of both the functional and behavioral data presented here means that the conclusions are necessarily a bit vague.

      In Figure 1, the authors show that the double KO has a decreased ERG response to UV/blue and red wavelengths. However, the individual mutations only affect the response to red light, suggesting that they might affect behaviors such as OMR which typically rely on this part of the visual spectrum. However, there was no significant change in the response to UV/blue light of any intensity, making it unclear whether the mutations could individually play roles in the detection of UV prey. Based on the later behavioral data, it seems likely that at least the EAAT7 KO should affect retinal responses to UV light, but it may be that the ERG does not have the spatial or temporal resolution to detect the difference, or that the presence of blue light overwhelmed any effect of the individual knockouts on the response to UV light.

      In Figures 5 and 6, the authors compare the two knockouts to wild-type fish in terms of their sensitivity to UV prey in a hunting assay. The EAAT5b KO showed no significant impairment in UV sensitivity, while the EAAT7 KO fish actually had an increased hunting response to UV prey. However, there is no comparison of the KO and WT responses to different UV intensities, only in bulk, so we cannot conclude that the EAAT7 KO is allowing the fish to detect weaker prey-like stimuli.

      We have now reported in both in the results paragraph and in the methods section that response-comparison of intensity-specific responses were non-significant in all instances of analyses (Chi-square test with p>0.05). We decided not to add the information to the figure as it does not add to the data and risks causing excessive clutter of an already complex graph.

      As reviewer #2 rightfully states, we cannot conclude that EAAT7 KO is allowing the fish to detect weaker prey-like stimuli. We only intend to suggest that a lack of EAAT7 might facilitate prey detection events as the number of hunting events in total, is increased compared to WT.

      In Figure 7, the EAAT5b KO seems to cause a decrease in OMR behavior to red grating stimuli, but only one stimulus is tested, so it is unclear whether this is due to a change in visual sensitivity or resolution.

      We fully agree that further experiments presenting different stimuli in the setup may very well reveal more details on the nature of the observed defect and thank reviewer #2 for the suggestion. We feel that identifying the reason of the defect lies outside of the scope of this paper, but should definitely be investigated in future studies.

      The conclusions made in the manuscript are appropriately conservative; the abstract states that these transporters somehow influence prey detection and motion sensing, and this is probably true. However, it is unclear to what extent and how they might be acting on these processes, so the conclusions are a bit unsatisfying.

      In terms of impact on the field, this work highlights the potential importance of these two transporters to visual processing, but further studies will be required to say how important they are and what they are doing. The methods presented here are not novel, as UV prey and red OMR stimuli and behaviors have previously been described.

      We agree that this study is not fully conclusive but a first step towards a clarification of the role of glutamate transporters in shaping visual behavior.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Suggestions for improved or additional experiments, data, or analyses:

      Figure 3:

      (a) What is the intensity of the light emitted by the UV and yellow LEDs and experienced by the larva, e.g. in nW? This is necessary in order to be able to compare and replicate the results.

      Stimuli intensities in microwatts are now included and reported in the Materials and Methods sections

      (b) In Figure 3D, are all the example eye movement events hunting initiations? Does right eye/left eye positive or negative angle change denote convergence?

      As indicated in the figure legend, hunting initiations are indicated by black dots on the graph. In Stytra’s eye tracking system, eye convergence is indicated by an increase in the left eye angle and a decrease in the right eye angle. Both these points have now been clarified in the figure legend.

      (c) Also in 3D, the tail angle plot and x-axis are too small to read.

      Figure 3D has been reformatted to be more legible.

      (d) How much eye convergence constitutes a response? In order to compare the findings to previous studies of prey capture, it would be best to use a bimodal distribution of eye angles to set a convergence threshold for each fish (e.g. Paride et. al., eLife 2019), but there should at least be a clear threshold mentioned.

      We have expanded the explanation of how the response detection paradigm was calculated. We acknowledge that this analysis has limitations in terms of comparability with previous studies, as it was developed de novo, based on the format of eye coordinate data provided by Stytra and refined through iterative comparison with experimental video recordings. Since the threshold was defined relative to the average noise level of the trace, it is difficult to specify an exact value. However, we are happy to share the Python scripts used for the analysis to facilitate further investigation.

      (e) The previous study using artificial UV prey stimuli to trigger hunting (Khan et. al., Current Biology 2023) should be acknowledged.

      This is an indeed an embarrassing omission, not excused by the first version of this section being drafted before the Khan publication. We have now cited this important study.

      Figure 5:

      Was the response at any individual intensity significantly lower in the mutant? If not, this should be clearly stated.

      Yes, and this is now clearly stated in the main text

      Figure 6:

      Again, it would be more informative to know for which intensities the KO response was significantly greater than WT.

      This is now also clearly stated in the main text

      Figure 7:

      (a) What are the intensity units?

      We now clarified in the figure that the intensity shown in the graph is digital intensity

      (b) Similar to Figures 5 and 6, it would be more informative to know at which intensities the KO response was significantly different from WT.

      We now report the measured optical powers relative to the digital intensities in the Materials and Methods sections.

      Suggestion for writing:

      The discussion was a bit discursive. A more structured discussion, sequentially explaining each of the key results, would be easier for the reader to follow. And, it would be helpful to have hypotheses for how these transporter mutants could cause each of the changes in visual behaviors that were observed.

      We agree that the discussion needed improvements. We have completely rewritten the discussion and hope that it now more concisely put our results into context.

    1. eLife Assessment

      This study proposes a useful assay to identify relative social ranks in mice incorporating the competitive drive for two basic resources - food and living space. Using this new protocol, the authors provide solid evidence of stable ranking among male and female pairs, while reporting more fluctuant hierarchies among triads of males. The evidence is, however, limited by the lack of ethologically based validation, assessment of the influence of competitor recognition, and proof of concept of application to neuroscience. This manuscript may be of interest to those interested in social behavior and related neuroscience.

    2. Reviewer #1 (Public review):

      Summary

      The authors present a new protocol to assess social dominance in pairs and triads of C57BL/6j mice, based on a competition to access a hidden food pellet. Using this new protocol, the authors have been able to identify stable ranking among male and female pairs, while reporting more fluctuant hierarchies among triads of males. Ranking readout identified with this new apparatus was compared to the outcome obtained with the same animals competing in the tube and in the warm spot tests, which have been both commonly used during the last decade to identify social ranks in rodents under laboratory conditions.

      Strengths

      FPCT allows for an easy and fast identification of a winner and loser in a context of food competition. The apparatus and the protocol are relatively easy and quick to implement in the lab and free from any complex post processing/analysis, which qualifies it for wide distribution, particularly within laboratories that do not have the resources to implement more sophisticated protocols. Hierarchical readout identified through the FPCT correlates with social ranks identified with the tube and the warm spot tests, which have been widely adopted during the last decade and allow for study comparison.

      Weaknesses

      While the FPCT is validated by the tube and the warm spot test, this paper would have gained strength by providing a more ethologically based validation. Tube and warm spot tests have been shown to provide conflicting results and might not be a sufficient measurement for social ranking (see Varholik et al, Scientific Reports, 2019; Battivelli et al, Biological Psychiatry, 2024). Instead, a general consensus pushing toward more ethological approaches for neuroscience studies is emerging.<br /> Other papers already successfully identified social ranks dyadic food competition, using relatively simple scoring protocol (see, for example, Merlot et al., 2006), within a more naturalistic set-up, allowing the 2 opponents to directly interact while competing for the food. A potential issue with the FPCT, is that the opponents being isolated from each other, the normal inhibition expected to appear in subordinates in presence of a dominant to access food, could be diminished, and usually avoiding subordinates could be more motivated to push for the access to the food pellet.

      Comments on revisions:

      We thank the authors for the significant improvement of the English in the revised version and for the replacement of some conceptual terms that now seem more relevant and appropriate. We only noticed that the term "society" remains in use, although it might not be appropriate to describe a mouse colony (see previous review).

      Conclusive remarks

      Although this protocol aims to provide a novel approach to evaluate social ranks in mice, it is not clear how it really brings a significant advance in neuroscience research. The FPCT dynamic is very similar to the one observed in the tube test, where mice compete to navigate forward in a narrow space, constraining the opponent to go backwards. The main difference between the FPCT and the tube test is the presence of food between the opponents. In the tube test, food reward was initially used to increase motivation to cross the tube and push the opponent upon the testing day. This component has been progressively abandoned, precisely because it was not necessary for the mice to compete in the tube.<br /> This paper would really bring a significant contribution to the field by providing a neuronal imaging or manipulation correlate to the behavioral outcome obtained by the application of the FPCT.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, the authors have devised a novel assay to measure relative social rank in mice that is aimed at incorporating multiple aspects of social competition while minimizing direct contact between animals. Forming a hierarchy often involves complex social dynamics related to competitive drives for different fundamental resources, including access to food, water, territory and sexual mates. This makes the study of social dominance and its neural underpinnings hard, warranting the development of new tools and methods that can help understand both social function as well as dysfunction.

      Strengths:

      This study showcases an assay called the Food Pellet Competition Test, where cagemate mice compete for food, without direct contact, by pushing a block in a tube from opposite directions. This task ran with stranger mice leads to more variable outcomes, suggesting co-housing helps stabilize outcomes. The authors have attempted to quantify motivation to obtain the food independent of other factors by running the assay under two conditions: one where the food is accessible and one where it isn't. This assay results in high outcome consistency across days for females and males paired housed and for male groups of three. Further, the determined social ranks correlate strongly with two common assays: the tube test and the warm spot test.

      Weaknesses:

      This new assay has limited ethological validity since mice do not compete for food without touching each other with a block in the middle. In addition, the assay may only be valid for a single trial per day, making its utility for recording neural recordings and manipulations limited to a single sample per mouse. The authors claim, as currently stated in results, for the new control experiment in 1H-J is not warranted given that 6/8 mice had majority winning or losing across all strangers.

    4. Reviewer #3 (Public review):

      Summary:

      The laboratory mouse is an ideal animal to study the neural and psychological underpinnings of social dominance behavior because of its economic cost and the animals' readiness to display dominant and subordinate behaviors in simple and testable environments. Here, a new and novel method for measuring dominance and the individual social status of mice is presented using a food competition assay. Historically, food competition assays have been avoided because they occur in an open arena or the home cage, and it can be difficult to assess who gets priority access to the resource and to avoid aggressive interactions such as bite wounding. Now, the authors have designed a narrow rectangular arena separated in half by a sliding floor-to-ceiling obstacle, where the mice placed at opposite sides of the obstacle compete by pushing the obstacle to gain priority access to a food pellet resting on the arena floor under the obstacle. One can also place the food pellet within the obstacle to restrict priority access to the food and measure the time or effort spent pushing the obstacle back and forth. As hypothesized, the outcomes in the food competition test were significantly consistent with those of the more common tube test (space competition) and warm spot competition test. This suggests that these animals have a stereotypic dominance organization that exists across multiple resource domains (i.e., food, space, and temperature). Only male and female C57 mice in same-sex pairs or triads were tested.

      Strengths:

      The design of the apparatus and the inclusion of females are significant strengths within the study.

      Weaknesses:

      There are at least two major weaknesses of the study: the test with unfamiliar non-cagemates and not providing the mice time to recognize who they are competing with.

      The authors conclude in the first section of the results that they "did not detect significant difference in winning/losing results between unfamiliar non-cagemate male mice." Given the data and analysis provided, I believe this statement is false. My understanding is that the authors would like to show that the establishment of social relationships (i.e., familiarity) is necessary for FPCT to distinguish social ranks of mice. There are many ways to test this. The simplest would be to randomly pair unfamiliar non-cagemates that are housed in isolation with one another and see if they perform at chance, individually. The more involved empirical way would be to measure the ranks of mice in a social group, then test them with unfamiliar non-cagemate mice to see if they maintain their outcomes regardless of social familiarity, or return to chance outcomes when paired with non-cagemates. Figure 1I clearly shows that they did not perform at chance. Since the outcome is win or lose, then the probability of getting all of one outcome 4 times in a row would be 1 in 16. The data shows that this occured twice, so 2 mice of 8 had the same outcome 4 times in a row (i.e., Mouse B3 and A1). So, they did not perform at chance. I am not even sure if there are enough animals here to test this question. One may need to consult a mathematician. Moreover, the original tube-test study by Lindzey et al. 1961 (https://www.nature.com/articles/191474a0) used unfamiliar non-cagemate male mice, and showed that 100% of the A/alb strain won more than half of their oppositions against C3H and DBA/8 mice. Thus, A/alb mice were more "dominant" mice relative to C3H or DBA/8. Taking into consideration the results, is mouse A1 naturally dominant? So maybe it doesn't matter what mouse you pair with it, it will always win? If this is true, is "individual identification of the partner" actually necessary to get this outcome? All they have to do is push to get the food reward, does it matter who is on the other side? If one wants to measure social dominance relationships, then it should matter who is on the other side. If one would like to measure attributes of dominant behavior (e.g., pushing), then one may do so and not insinuate a social link. Studying dominance relationships (i.e., social ranking) of animals is an extremely difficult task. We must ensure that we are not assigning something about a relationship that does not exist. Please read "Dominance: The baby and the bathwater" but Irwin Bernstein, https://annas-archive.org/scidb/10.1017/s0140525x00009614/

      Unlike the tube test and warm spot test, the food competition test presented here provides no opportunity for the animals to identify their opponent. That is, they cannot sniff their opponent's fur or anogenital region, which would allow them an opportunity to identify them individually. Thus, as the authors state, the test only measures a psychological motivation to get a food reward. Notably, the outcome in the direct and indirect testing of food competition is in agreement, leaving many to wonder whether they are measuring the social relationship or the effort an individual puts forth in attaining a food reward regardless of the social opponent. Specifically, in the direct test, an individual can retrieve the food reward by pushing the obstacle out of the way first. In the indirect test, the animals cannot retrieve the reward and can only push the obstacle back and forth, which contains the reward inside. In Figure 2F, you can see that winners spent more time pushing the block in the indirect test--albeit not significantly. Thus, whether the test measures a social relationship or just the likelihood to gain priority access to food is unclear. To rectify this issue, the authors could provide an opportunity for the animals to interact before lowering the obstacle and raising(?) a food reward. They may also create a very long one-sided apparatus to measure the amount of effort an individual mouse puts forth in the indirect test with only one individual-or any situation with just one mouse where the moving obstacle is not pushed back, and the animal can just keep pushing until they stop. This would require another experiment. It also may not tell us much more since it remains unclear whether inbred mice can individually identify one another (see https://doi.org/10.1098/rspb.2000.1057 for more details).

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors present a new protocol to assess social dominance in pairs and triads of C57BL/6j mice, based on a competition to access a hidden food pellet. Using this new protocol, the authors have been able to identify stable ranking among male and female pairs, while reporting more fluctuant hierarchies among triads of males. Ranking readouts identified with this new apparatus were compared to the outcomes obtained with the same animals competing in the tube and in the warm spot tests, which have been both commonly used during the last decade to identify social ranks in rodents under laboratory conditions.

      Strengths:

      FPCT allows for easy and fast identification of a winner and a loser in the context of food competition. The apparatus and the protocol are relatively easy and quick to implement in the lab and free from any complex post-processing/analysis, which qualifies it for wide distribution, particularly within laboratories that do not have the resources to implement more sophisticated protocols. Hierarchical readouts identified through the FPCT correlate with social ranks identified with the tube and the warm spot tests, which have been widely adopted during the last decade and allow for study comparison.

      Weaknesses:

      While the FPCT is validated by the tube and the warm spot test, this paper would have gained strength by providing a more ethologically based validation. Tube and warm spot tests have been shown to provide conflicting results and might not been a sufficient measurement for social ranking (see Varholik et al, Scientific reports, 2019; Battivelli et al, Biological psychiatry, 2024). Instead, a general consensus pushing toward more ethological approaches for neuroscience studies is emerging.

      We appreciate all the reviewers for recognizing the strength of the FPCT setup and the data. We also appreciate the reviewers for pointing out weakness and giving us valuable suggestions that help us to improve the quality of our manuscript through revision.

      In this manuscript, we found the ranking results of the FPCT were largely consistent with the tube and the warm spot tests. Such a finding was unexpected by us as we considered that different competitive targets of different paradigms should provide the mice with distinct appeals and enable them to exert their specific advantages. However, the consistency between the FPCT and tube test was observed in the pairs of female mice, pairs of male mice and triads of male mice. The consistency between the FPCT, tube test and warm spot test was observed in pairs of male mice and triads of male mice. Thus, we concluded that there is a social rank-order stability of mice. 

      We acknowledge that it’d better if this conclusion could be validated by more ethological approaches like urine-marking analysis and water competition test. Whereas, we did not rule out inconsistency of ranking results between two or more paradigms. Actually, there were inconsistent cases in our experiments. The inconsistency of ranking results between paradigms, even between FPCT and tube test, could be amplified if the tests were operated with other details of experimental protocols and conditions. This is in that too many factors and aspects can affect the readouts, such as formation of colony, tasks, test protocols, habituation and training. Using tube test itself, both stable 1,2 and unstable 3 ranking results have been reported.

      Other papers already successfully identified social ranks dyadic food competition, using relatively simple scoring protocol (see for example Merlot et al., 2006), within a more naturalistic set-up, allowing the 2 opponents to directly interact while competing for the food. A potential issue with the FPCT, is that the opponents being isolated from each other, the normal inhibition expected to appear in subordinates in the presence of a dominant to access food, could be diminished, and usually avoiding subordinates could be more motivated to push for the access to the food pellet.

      The hierarchical structure of mice colony could be established on the basis of physical aspects—such as muscular strength, vigorousness of fighting—and psychological aspects— such as boldness, focused motivation, active self-awareness of status. In the contexts of currently available food contest paradigms where the mice compete with bodily interaction, the physical and psychological aspects are intermingled in the interpretation of the mice’s winning/losing. In the FPCT, the opponents are isolated from each other so that the importance of direct bodily interaction in a competition is minimized, facilitating the exposure of psychological factors contributing to the establishment and/or expression of social status of the mice. In this study, the overall stable ranking results across the FPCT, tube test and warm spot test indicate that the status sense of animals is part of a comprehensive identify of self-recognition of individuals in an established mice social colony.

      There are issues with use of the English language throughout the text. Some sentences are difficult to understand and should be clarified and/or synthesized.

      We thank the reviewer for pointing out language issues. We have carefully corrected the grammar errors.

      Open question:

      Is food restriction mandatory? Palatable food pellet is not sufficient to trigger competition? Food restriction has numerous behavioral and physiological consequences that would be better to prevent to be able to clearly interpret behavioral outcomes in FPCT (see for example Tucci et al., 2006).

      We thank the reviewer for raising this question. In the preliminary experiments, we noticed that food restriction was mandatory and palatable food pellet was not sufficient to trigger competition. In order to limit the potential influence of food restriction on competitive behavior, the mice underwent only a 24-hour food deprivation period at the beginning of training, followed by mild restriction of food supply to meet basic energy requirement.

      Conclusive remarks:

      Although this protocol attempts to provide a novel approach to evaluate social ranks in mice, it is not clear how it really brings a significant advance in neuroscience research. The FPCT dynamic is very similar to the one observed in the tube test, where mice compete to navigate forward in a narrow space, constraining the opponent to go backward. The main difference between the FPCT and the tube test is the presence of food between the opponents. In the tube test, a food reward was initially used to increase motivation to cross the tube and push the opponent upon the testing day. This component has been progressively abandoned, precisely because it was not necessary for the mice to compete in the tube.

      This paper would really bring a significant contribution to the field by providing a neuronal imaging or manipulation correlate to the behavioral outcome obtained by the application of the FPCT.

      Thank the reviewer for this comment on the significance of the FPCT paradigm. In this manuscript, we think it is interesting to report that the ranking results were consistent across the FPCT, tube test and warm spot test. This finding indicates that the status sense of animals might be a part of a comprehensive identify of self-recognition of individuals in an established social colony. 

      Moreover, we are conducting researches on biological consequences and mechanisms of social competition. Hopefully, the results of the on-going project will be published in the near future.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors have devised a novel assay to measure relative social rank in mice that is aimed at incorporating multiple aspects of social competition while minimizing direct contact between animals. Forming a hierarchy often involves complex social dynamics related to competitive drives for different fundamental resources including access to food, water, territory, and sexual mates. This makes the study of social dominance and its neural underpinnings hard, warranting the development of new tools and methods that can help understand both social functions as well as dysfunction.

      Strengths:

      This study showcases an assay called the Food Pellet Competition Test where cagemate mice compete for food, without direct contact, by pushing a block in a tube from opposite directions. The authors have attempted to quantify motivation to obtain the food independent of other factors such as age, weight, sex, etc. by running the assay under two conditions: one where the food is accessible and one where it isn't. This assay results in an impressive outcome consistency across days for females and males paired housed and for male groups of three. Further, the determined social ranks correlate strongly with two common assays: the tube test and the warm spot test.

      Weaknesses:

      This new assay has limited ethological validity since mice do not compete for food without touching each other with a block in the middle. In addition, the assay may only be valid for a single trial per day making its utility for recording neural recordings and manipulations limited to a single sample per mouse. Although the authors attempt to measure motivation as a factor driving who wins the social competition, the data is limited. This novel assay requires training across days with some mice reaching criteria before others. From the data reported, it is unclear what effects training can have on the outcome of social competition. Beyond the data shown, the language used throughout the manuscript and the rationale for the design of this novel assay is difficult to understand.

      We appreciate the reviewers for the valuable comments on the strength and weakness of our manuscript. 

      The design mentality of the FPCT was to (1) provide researchers with a choice of new food competition paradigm and (2) expose psychological factors influencing the establishment and/or expression social status in mice by avoiding direct physical competition between contenders (see revised Abstract and the last paragraph in the Introduction).

      As a result, the consistent ranking across the FPCT, tube test and warm spot test might indicate that the status sense of animals is part of a comprehensive identify of self-recognition of individuals in an established social colony. 

      We suggest to perform the FPCT test one trial per day per mouse as the mice might lose interest in the food pellet if it is tested frequently in a day, but it is practical to perform the FPCT assay for several days. 

      Regarding the training, we suggest 4-5 days for training as we did. In this revision, we add training data which show the progressing latency of food-getting of mice (Figure 1). At the last day of training, the mice would go directly to push the block and eat the food after they entered the arena.

      We thank the reviewer for pointing out language issues. We have carefully corrected the errors.

      Reviewer #3 (Public review):

      Summary:

      The laboratory mouse is an ideal animal to study the neural and psychological underpinnings of social dominance behavior because of its economic cost and the animals' readiness to display dominant and subordinate behaviors in simple and testable environments. Here, a new and novel method for measuring dominance and the individual social status of mice is presented using a food competition assay. Historically, food competition assays have been avoided because they occur in an open arena or the home cage, and it can be difficult to assess who gets priority access to the resource and to avoid aggressive interactions such as bite wounding. Now, the authors have designed a narrow rectangular arena separated in half by a sliding floor-to-ceiling obstacle, where the mice placed at opposite sides of the obstacle compete by pushing the obstacle to gain priority access to a food pellet resting on the arena floor under the obstacle. One can also place the food pellet within the obstacle to restrict priority access to the food and measure the time or effort spent pushing the obstacle back and forth. As hypothesized, the outcomes in the food competition test were significantly consistent with those of the more common tube test (space competition) and warm spot competition test. This suggests that these animals have a stereotypic dominance organization that exists across multiple resource domains (i.e., food, space, and temperature). Only male and female C57 mice in same-sex pairs or triads were tested.

      Strengths:

      The design of the apparatus and the inclusion of females are significant strengths within the study.

      Weaknesses:

      There are at least two major weaknesses of the study: neglecting the value of test inconsistency and not providing the mice time to recognize who they are competing with.

      Several studies have demonstrated that although inbred mice in laboratory housing share similar genetics and environment, they can form diverse types of hierarchical organizations (e.g., loose, stable, despotic, linear, etc.) and there are multiple resource domains in the home cage that mice compete over (e.g., space, food, water, temperature, etc.). The advantage of using multiple dominance assays is to understand the nuances of hierarchical organizations better. For example, some groups may have clear dominant and subordinate individuals when competing for food, but the individuals may "change or switch" social status when competing for space. Indeed, social relationships are dynamic, not static. Here, the authors have provided another test to measure another dimension of dominance: food competition. Rather than highlight this advantage, the authors highlight that the test is in agreement with the standard tube test and warm spot test and that C57 mice have stereotypic dominance across multiple domains. While some may find this great, it will leave many to continue using the tube test only (which measures the dimension of space competition) and avoid measuring food competition. If the reader looks at Figures 6E, F, and G they will see examples of inconsistency across the food competition test, tube test, and warm spot test in triads of mice. These groups are quite interesting and demonstrate the diversity of social dynamics in groups of inbred mice in highly standardized environmental conditions. Scientists interested in dominance should study groups that are consistent and inconsistent across multiple dimensions of dominance (e.g., space, food, mates, etc.).

      Unlike the tube test and warm spot test, the food competition test presented here provides no opportunity for the animals to identify their opponent. That is, they cannot sniff their opponent's fur or anogenital region, which would allow them an opportunity to identify them individually. Thus, as the authors state, the test only measures psychological motivation to get a food reward. Notably, the outcome in the direct and indirect testing of food competition is in agreement, leaving many to wonder whether they are measuring the social relationship or the effort an individual puts forth in attaining a food reward regardless of the social opponent. Specifically, in the direct test, an individual can retrieve the food reward by pushing the obstacle out of the way first. In the indirect test, the animals cannot retrieve the reward and can only push the obstacle back and forth, which contains the reward inside. In Figure 4E, you can see that winners spent more time pushing the block in the indirect test. Thus, whether the test measures a social relationship or just the likelihood of gaining priority access to food is unclear. To rectify this issue, the authors could provide an opportunity for the animals to interact before lowering the obstacle and raising(?) a food reward. They may also create a very long one-sided apparatus to measure the amount of effort an individual mouse puts forth in the indirect test with only one individual - or any situation with just one mouse where the moving obstacle is not pushed back, and the animal can just keep pushing until they stop. This would require another experiment. It also may not tell us much more since it remains unclear whether inbred mice can individually identify one another

      (see https://doi.org/10.1098/rspb.2000.1057 for more details).

      A minor issue is that the write-up of the history of food competition assays and female dominance research is inaccurate. Food competition assays have a long history since at least the 1950s and many people study female dominance now.

      Food competition: https://doi.org/10.1080/00223980.1950.9712776, https://psycnet.apa.org/fullte xt/1953-03267-

      001.pdf, https://doi.org/10.1016/j.bbi.2003.11.007, https://doi.org/10.1038/s41586-02204507-5

      Female dominance: history  https://doi.org/10.1016/j.cub.2023.03.020,  https://doi.org/10.1016/S0 031-9384(01)00494-2,  https://doi.org/10.1037/0735-7036.99.4.411

      We thank the reviewers very much for so many helpful comments and suggestions.

      In this manuscript, we want to address the overall and averagely consistency of ranking results between FPCT, tube test and warm spot test) as an unexpected finding. We agree that the inconsistency of social ranking occurred between trials and between paradigms should not be ignored. In the revision, we added description and discussion of inconsistent part of the different test paradigms (paragraph 2 in the section 3 of the Result, last 2 sentences of paragraph 4 in the Discussion)

      Although the two opponents were separated each other, they were able to see and sniff each other because the block is transparency, there are holes in the lower portion of the block, and there is the gap between the block and chamber (Supplementary figures 1 and 2). In the female but not male groups, the presence of a cagemate opponent during the test 1 could significantly disturb the female mice and increase the its latency to get the food, comparing with last day of training when there was no opponent (Figure 3A). This indicates that one mouse, at least female mouse, could identify the existence of the opponent in the opposite side of the chamber. To further see whether social relation was influential to readouts of the FPCT, we performed additional experiments using two groups of non-cagemate mice to perform the competition. We did not detect obviously different ranks between the two groups (Figure 1H-1J), suggesting that establishment of social colony is necessary for FPCT to distinguish social ranks of mice.

      Thank the reviewer for reminding us to recognize the history of food competition assays. We have added the citations and discussions of related literatures, both for male (paragraph 2 in the Introduction; paragraph 3 in the Discussion) and female (paragraph 1 of section 3 in the Results; paragraph 4 in the Discussion) mice. 

      Reviewer #1 (Recommendations for the authors):

      There are issues with use of the English language throughout the text. Some sentences are difficult to understand and should be clarified and/or synthesized.

      We appreciate the reviewer for constructive comments and helpful corrections.

      “Despite that 6 in 9 groups of mice display some extent of flipped ranking (Figures 6B-6G) and only 3 in 9 groups displayed continuously unaltered ranking (Figure 6H) during a total of 9 trials consisting of 3 trials of FPCT, 3 trials of tube test and 1 trial of WST, an obvious stable linear intragroup hierarchy was observed throughout all the trials and tasks"

      The above sentence has been re-written as: The ranking result showed that 6 in 9 groups of mice displayed some extent of flipped ranking (Figures 4B-4G), and only 3 in 9 groups displayed continuously unaltered ranking (Figure 4H). Averagely, in the totally 27 trials consisting of 12 trials of FPCT, 12 trials of tube test and 3 trials of WST, an obvious stable linear intragroup hierarchy was observed across all the trials and tasks (paragraph 1 of section 4 in the Results).

      "it is hard to attribute winning a competition in a shared space to stronger motivation rather than muscular superiority".

      The above sentence has been deleted and re-written in paragraph 1 of section 4 in the Results and paragraph 3 in the Discussion.

      "Unexpectedly, in most of the trials the mice preserved the winner or loser identity acquired in FPCT into tube test and WST (Figures 5L-5O)".

      Why this is unexpected? Instead, it looks like this result is expected (tube test has been successfully applied to identify ranks in females, see Leclair et al, eLife, 2021).

      We thank the reviewer for raising this point. FPCT is different from tube test and warm spot test at least in two aspects: competition for food vs space; presence vs absence of direct bodily interaction during competition. Some mice might be active in food competition, but not in space competition, while others might be on the contrary. Some mice might be good at physical contest, while others might be good at play tricks. Therefore, these factors made us expect task-specific outcomes of ranking results.

      Vocabulary issues:

      "Stereotypic", to talk about rank stability in a different context does not look appropriate. In behavioral neuroscience, stereotypy is more excepted to intend abnormal repetitive behaviors. The stability that the authors seem to indicate with the word "stereotype" refers rather to the concept of "consistency" or "stability".

      We thank the reviewer for this detailed explanation. We have chosen to use "stability" to describe the data.

      "Society", to talk about groups or colonies of animals sounds a bit odd. Society evokes more abstract concepts more likely to fit with human organization. I suggest the use of "group" or "colony".

      "Hide" to qualify the block preventing access to the food pellet. It is said that the block is transparent. We suggest the use of "inaccessible" instead of hidden.

      We strongly encourage the authors to further edit the entire script to improve language.

      Thank the reviewer for kind correction. We have corrected the above vocabulary misuse. 

      Technical issues / typos:

      Figure 1. The picture does not seem optimal to visualize the apparatus.

      Missing unit legend in Figure 4E.

      Supplementary videos 2 and 4 are missing.

      We have added a frontal view of the apparatus in the figure (Supplementary Figure 1), added a unit to the Figure 2F (previous Figure 4E), and we will make sure to upload the missing videos.

      Reviewer #2 (Recommendations for the authors):

      While the assay shows promise as a tool for studying social dominance, the study suffers from some limitations such as lack of ethological relevance. In addition, there is a lack of rationale and methodological clarity in the manuscript that can impact the ability of other scientists to be able to perform this novel assay.

      (1) Related to lack of scientific rigor:

      a. In the first paragraph of the introduction, the authors mention that "disability in social recognition and unsatisfied social status are associated with brain diseases such as autism, depression and schizophrenia". Both papers that they cited refer to mouse models, not humans (which is the species that is attributed these diagnoses clinically). In addition, neither citation discusses schizophrenia. While social dysfunctions can indeed be related to these diseases, to my knowledge this is not caused by a change in "social status" and there is no human data with patient populations and social status. Therefore, this sentence is inaccurate and there is no research that demonstrates that.

      We thank the reviewer for raising this point. To express the opinion and cite literatures more accurately, we improved the sentence in the 1st paragraph of Introduction as follows: “Impaired awareness of social competition has been documented in individuals with autism spectrum disorder (ASD)4,5, and reduced social interaction has been characterized in corresponding animal models6. Similarly, maladaptive responses to social status loss has been associated with patient depressive disorders7,8 and animal models of depression1,9”. The reviewer is right that no patient disease is causally related with social status, and only depression has been proposedly associated with change of social status7,8.

      b. In the second paragraph of the introduction, the authors mention a scarcity of research papers with designs for food competition-based social hierarchy assays for mice. At least two such papers have been published in the past few years (DOIs https://doi.org/10.1038/s41586-

      021-04000-5 and https://doi.org/10.1038/s41586-022-04507-5). The authors should acknowledge the existence of these and other assays and discuss how their work would be related. In the same paragraph, they also mention that existing assays suffer from "hierarchy instability" and "complex calculations" without showing any citations or details for these claims.

      We thank the reviewer for raising this point. We acknowledged that there are some available food competitions to measure social hierarchy for mice. But relative to space competition, food competition tests have not been used so commonly and widely. No food competition paradigm has been accepted as generally as some space competition paradigms like tube test and warm spot test. To improve the language and scientific expression, we revised the sentences as follows: “Relative to space competition, food competition tests for mice have been designated and applied less commonly in animal studies despite its long history 28-30. Several issues could be thought to be the underlying limitations for the application of food competition paradigms. First, there are methodological issues in some of these approaches, such as long video recording duration and difficulty in analyzing animal’s behaviors during competitive physical interaction in videos, hindering their application by laboratories that cannot afford sophisticated equipment and analysis”. Corresponding citations have been updated (see paragraph 3 in the Introduction).

      c. The authors say that their study is the first to demonstrate that female mice follow social ranks. This is not the first study to do so and the authors should acknowledge existing publications that have done the same (eg DOI https://doi.org/10.7554/eLife.71401).

      We have followed the reviewer’s suggestion to increase citations regarding social ranking of female mice tested by competition paradigms, especially food competition paradigms (see paragraph 1 of section 3 in the Results; paragraph 4 in the Discussion).

      (2) Related to problems with interpretation of data:

      a. The authors showed the assay works for females and males in pairwise housing, but two mice don't make a hierarchy, as hierarchies require a minimum of three individuals. Therefore, whether the assay works for females caged in three is an important question that is unaddressed in this study and is a caveat. extended the competition assay to male mice that are housed in cages of three. It would be important to show whether the assay generalizes well for female mice with this three-animal housing as well as discuss the effect of using even bigger groups of mice on the results of the assay.

      We thank the reviewer for raising questions related to the interpretation of data and giving us the insightful the suggestions. We agree that it is interesting and important to probe if FPCT works for a group of three female mice. Although social rankings of pairs of male and female mice were not significantly different (new Figure 2D-2F and 3F-3H), that of triads of male and female mice could be different. We have tested trads of male mice and found that the mice displayed an overall linear hierarchical ranking. We would like to use FPCT to investigate the rankings of trads of female mice and even bigger group of mice in the future. In the present manuscript we’d like to address the feasible application of the FPCT in smaller groups. In the Discussion, we add contents commenting group size effect on social competition tests (see paragraph 4 in the Discussion).

      b. The authors claim that "test 2" of their assay helps assert the motivation of mice for social competition as in Figure 4E. This could simply be a readout of how strong the mice are (muscle mass). To claim that this is indeed related to motivation during the FPCT assay, the authors should show the correlation of this readout with the latency to push the block during the social competition task.

      We appreciate the reviewer for raising this question. The dimensions establishing the social structures include physical and psychological factors. In the FPCT paradigm, the two contenders are separated so that physical factors are minimized in this context and psychological factors should play more important role in competition in comparison with previous reported food competition paradigms. Therefore, in the revised manuscript we consider to attribute the ranking results mainly to psychological factors, rather than only motivation which is just one of the numerous psychological factors (paragraph 3 of Discussion). Moreover, in the Discussion we point out that we could not exclude physical factors still participate in the determination of competitive outcomes since some of mice pairs pushed the block simultaneously (paragraph 3 of Discussion).

      c.The authors mention that they are interested to understand which factors lead to the outcome of the competition such as age, sex, physical strength, training level, and intensity of psychological motivation. However, in all their runs of the assay, they always matched these variables between the competitors. They should clarify that they were instead controlling for these variables. Another thing to note here is that while they controlled the body mass of the animals, that isn't the same as physical strength, as a lighter mouse can have more muscle mass than a heavier mouse. They should either specify this limitation or quantify the additional metric of "muscle mass" which is a much better proxy for physical strength. Thus, the claim that the outcome of the competition is solely affected by motivation is not convincing since they didn't rule out the others such as quantifying the rate of learning during training and strength.

      We thank the reviewer for addressing this question. As our response to the question in (c), we acknowledge that it is not accurate to ascribe the outcomes of FPCT to psychological motivation. In the revised manuscript, the dimensions of contributing factors to the outcomes of FPCT have been simplified to physical and psychological factors. We consider that the psychological factor could be the main driver of mice participating in FPCT (see paragraph 3 of Discussion).

      d. In the discussion, the authors mention that their task only requires a single day of food deprivation (the day before the first trial) while other assays suffer from a continued food deprivation protocol. However, the authors also use 10g per cage as the amount of food instead of giving them ad libitum access. Limited food is a food deprivation method. Thus, this is an inaccurate claim.

      We thank the reviewer for raising this point. We have clarified the requirement of food restriction for FPCT in the revision. The mice were deprived of food for 24 hours while water consumption remained normally to enhance the appeal of the food pellet to the mice. Then, after 24 hours of food deprivation, each cage of mice was given 10 g of food every morning to meet their daily food requirements until the end of the test (see FPCT procedure section in Methods and materials).

      e.In the second section of the results, the authors run their assay with female mice that are housed in cages of two. This section suffers from the same limitations as the first and can be improved by showing the training data, correlations of competition outcome with "motivation" and ruling out the other factors that could contribute to the outcome. Further, the authors saying that their FPCT assay is enough to show that female mice follow a social hierarchy by itself is a weak claim. They should instead include their cross-validation with the others to strengthen it.

      We appreciate the reviewer for raising this question. We have taken the reviewer’s suggestion to show the training data (Figures 1E, 2A and 3A). As the factors contributing to the outcomes of FPCT are diverse, we’d like not to control and determine the exact factor in the current manuscript. We agree with the reviewer that cross-validation with different paradigms is suggested for the studies to rank social hierarchy as the ranking results could be variable with tasks, procedures and operations.

      f.  In the last paragraph of the introduction, the authors mention how their assay involves "peaceful competition" since the mice are not in direct contact and hence cannot exhibit aggression. The authors do not address the limitation that a lack of physical contact actually makes the assay less ethological. Further, since the mice are housed in groups of two and three, it is not guaranteed that the mice will not be aggressive during their time in the home cage, which could affect their behavior during the competition assay. Whether the assay causes more aggression in the cage due to the lack of physical contact during the competition is not addressed in this study.

      We thank the reviewer for raising this point. Diverse factors affect the outcomes of a food competition test, some of which belong to psychological factors and others belong to physical factors. We agree that a lack of physical contact makes the assay less naturally ethological. However, when the social statuses have been established during habituation housing a group of mice for enough time, the win/lose outcomes in the FPCT could be a readout of the expression of social statuses since the mice cannot exhibit aggression in the test. We have revised the Introduction and Discussion (paragraph 3 of Discussion). Thank you.

      (3) Related to lack of methodological rigor and rationale clarity:

      a. In the first section of the results, the authors run their assay with male mice that are housed in cages of two. While the data that they display is promising, we do not see how mice change behavior across days of training and how that relates to the outcome of the competition. It would be valuable to also show the training data for the mice, answering questions related to competency and any inter-animal variabilities prior to rank assessment. Plotting the training data across all days would be helpful for the other parts of the results as well. This is especially important because the methods mention that mice are trained until they get to the criterium, so this means that different individuals get different amounts of training.

      We appreciate the reviewer for addressing the importance of showing training data. We have taken the reviewer’s suggestion and shown the training data (Figures 1E, 2A and 3A).

      b.  It is unclear why the assay was run only once per mouse pair per day since most protocols for the tube test involve multiple repetitions each day while alternating the side from which the mice enter. The authors should address whether a single trial per day is enough to show consistent results and that it wouldn't vary with more.

      We suggest to run the FPCT once or twice per mouse per day under conditions of mild food restriction, training and test procedures in this manuscript. Frequent tests might make the mice’s interest in the food pellet gradually diminished because the food supply was not fully deprived. According to our data, the outcomes of FPCT in 4 consecutive days were overall stable.

      c.  In the results the authors say that they "raised 3 male mice" which may be incorrect because they report in the methods buying the mice buy mice and they housed all their mice for only three days before running the assay which might be too little for the hierarchy to stabilize. The authors should comment on what was the range of the cohabitation across different cages and whether it had an impact on the results.

      According to our experiments, housing the mice for 3 days is enough to establish a mice social colony with relative stable status structure. Prolonged housing may produce either similar, stabler or more dynamic social colony.

      d. There are also some formatting and/or convention issues in the results. The first figure callout in the results is for Figure 4 instead of Figure 1 (which is the standard). This is because the authors do not explain how the mice are trained for the task in the results section and show limited data about the training of the task. Not showing comprehensive training data would make replication of this study very difficult.

      We appreciate the reviewer for raising this question. We have re-arranged the figures. The new arrangement of figures started with schematic drawing of FPCT procedure and training data (Figure 1).

      e. The authors don't report the exact p-values in the figures

      We reported the difference level in the figures in the revised manuscript. Thank you.

      4. The writing of the manuscript suffers from a lack of clarity in most sections of the manuscript.

      Here are several examples that are critical:

      a. In the title and abstract, it isn't clear what the authors mean by "stereotype". It could be a behavior during the competition, or that the social ranks across assays are correlated or that the rank for the new assay is consistent across days.

      b. There are several instances where the authors anthropomorphize mice using human features such as "urbanization" and "society" which are not established factors affecting mouse hierarchy. This further extends to anthropomorphizing mice in ways that are not standard such as an animal being "timid" or "bold" which would be hard to measure in mice, if not impossible.

      c. Across the social dominance literature, relative social rank is described using more general "dominant" and "subordinate" titles instead of "superior" and "inferior" that are sometimes used in the manuscript. The authors should follow the standard language so that readers understand.

      d.  In the third paragraph of the introduction, the authors say "Thus, it is more likely expected that different paradigms to weigh the social competency and status may lead to diverse readouts, given that competitive factors are included in competition paradigms." This sentence suffers from multiple syntax errors thereby reducing clarity

      e. There are several typos in the manuscript such as using "dominate" instead of "dominant", "grades" instead of "outcomes" and "forth" instead of "fourth", to give a few examples.

      We thank the reviewer for careful reading of the manuscript and very helpful comments. We have taken the above suggestions and improved the writing of the manuscript. For examples, "stereotype" was replaced by “stability”, mice "society" was expressed by "colony", the sentence “Thus, it is more.... in competition paradigms” has been deleted.

      Reviewer #3 (Recommendations for the authors):

      (1) The justification for the design of this new test paradigm is unclear. In the abstract, you state that the field needs a reliable, valid, and easily executable test. Your test provides this, as you state, but how is it better than the tube test? Does the tube test suffer from taskspecific win-or-lose outcomes? Can you provide evidence for this? The nature methods protocol for the tube test (https://doi.org/10.1038/s41596-018-0116-4) "strongly suggest using more than two dominance measures, for example, by also carrying out the warm spot test, or territory urine marking or ultrasonic courtship vocalization assays." This would suggest that results from the tube test can be task-specific, but I am not convinced that you have demonstrated that results from your food competition test are not task-specific. Indeed, by your title, one must run multiple tests.

      This same problem is apparent in the introduction. In the second paragraph, there is a discussion of the tube test, warm spot test, and food competition tests. What is the problem with these tests?

      I believe that social dominance relationships are complex and dynamic social relationships indicating who has priority access to a resource between multiple animals that live together. In these living situations, several resources can often be capitalized competed over-for example, space, food, mates, temperature, etc. Currently, we have tests to measure space via the tube test or urine marking, mates via ultrasonic vocalization, temperature via warm spot test, and food via food competition assays. The tube test, urine marking assay, and ultrasonic vocalization test have been demonstrated to be reliable, valid, and easily executable. However, the food competition assays are often difficult to execute because it is difficult to interpret the dominant behaviors and aggressive behaviors like bite wounding can occur during the test. Here, you present a new food competition assay to address these issues and show that it can be used in conjunction with other assays to measure social dominance across multiple resources easily. In doing so, you revealed that many same-sex groups of C57 mice have a stereotypic pattern of dominance behavior when competing across multiple types of resources: space, temperature, and food.

      I ask that you please rebut if you disagree with me, and adjust your abstract, introduction, and discussion accordingly.

      We thank the reviewer for all the constructive comments. We have adjusted the Abstract, Introduction and Discussion of the manuscript.

      We recognize and appreciate the valuable tube test, warm spot test and many other competition tests, including food competitions. Tube test and warm spot test are space competition tasks. Relative to space competition, food competition tests for mice have been designated and applied less commonly in animal studies. Several issues (such as methodological issue, aggressive behaviors occurring in competition, and prolonged food deprivation) could be thought to be the underlying limitations of the application of food competition paradigms (paragraph 3 in the Introduction). Therefore, we clarify that the justification for the design of FPCT was “to have a new choice of food competition paradigm for mice, and to facilitate the exposure of psychological aspects contributing to the winning/losing outcomes in competitions” (last paragraph in the Introduction).

      FPCT is different from tube test and warm spot test at least in two ways. FPCT is food completion task where the mice need no physical contact during competition, while tube test and WST are space competition tasks where the mice need direct physical contact during competition. Therefore, we expected inconsistent evaluation results of competitiveness and rankings if we compared FPCT with typically available competition paradigms—tube test and WST (last paragraph in the Introduction).

      (2)  The design of the test needs to be described before the results. You can either move the methods section before the results or add a paragraph in the introduction to better describe the test. Here, you can also reference Figures 1 through 3 so that the figures are presented in the order of which they are mentioned in the paper. (It is very confusing that the first reference to a figure is Figure 4, when it should be Figure 1).

      We appreciate the reviewer for raising this point and giving us suggestions. We have added a new section (section 1) in the Results. In the revised manuscript, the figures in the Results start with Figure 1 which shows schematic drawing of FPCT procedure, training data and some test results (Figure 1).

      (3)  The sentence describing Figure 4H. You argue that this shows that the mice are well and equally trained. It also shows that they have the same motivation or preference for the food.

      We appreciate the reviewer for this helpful comment. Data in previous Figures 4H and 5I have been presented as new Figures 2A and 3A, respectively, of revised manuscript. These retrospect analysis of training data displayed similar training level of food-getting and craving state for food (Sections 2 and 3 in the Results).

      (4)  "Social ranking of multiple cagemate mice using FPCT, tube test and WST"

      Here, you claim that "comparison of inter-task consistency revealed that the ranks evaluated by FPCT, tube test and WST did not differ from each other...Figure 6K." Okay, however, it is important to discuss the three cases when there wasn't consistency between the tests! Figure 6E-G.

      We appreciate the reviewer for raising this point. In the revised manuscript, we add description and discussion of inconsistent part of the different test paradigms (paragraph 2 in the section 3 of the Result, last 2 sentences of paragraph 4 in the Discussion)

      (5)  Replace all instances of "gender" with "sex". Animals do not have a gender.

      (6)  Adjust the strain of the mice to C57BL/6JNifdc.

      We have replaced "gender" with "sex" and “C57BL/6J” with “C57BL/6JNifdc”. Thank you for your careful correction.

      (7)  What is the justification for running the warm spot test for one day and the other tests for four days?

      From the consecutive FPCT and tube test, we already knew that the ranking results were overall stable. This stability was still observed in the day of warm spot test. A bad point for frequent warm spot test is that mice get much stress due to exposure in ice-cold environment. Therefore, we terminated the competition test after only one trial of warm spot test.

      (8)  Grammar

      The second sentence of the abstract: ...recognized as a valuable...

      Results, sentence after "...was observed (Figure 4G)." it should be "Fourth"

      We have corrected these and other grammar errors. We appreciate the reviewers for very careful review and all helpful comments.

    1. eLife Assessment

      In this study Wang et. al. mined publicly available RNA-seq data from The Genotype-Tissue Expression (GTEx) database spanning multiple tissues to ask the question of how transcriptomes are changed with age and in both sexes. The authors provide solid evidence reporting widespread gene expression changes and alternative splicing events that vary in an age- and sex-dependent manner. An important finding is that many of these changes coincide with the time sex hormones begin to decline; additionally, the rate of aging is faster in males than in females.

    2. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Wang et al analyze ~17,000 transcriptomes from 35 human tissues from the GTEx database and address transcriptomic variations due to age and sex. They identified both gene expression changes as well as alternative splicing events that differ among sexes. Using breakpoint analysis, the authors find sex dimorphic shifts begin with declining sex hormone levels with males being affected more than females. This is an important pan-tissue transcriptomic study exploring age and sex-dependent changes although not the first one.

      Strengths:

      (1) The authors use sophisticated modeling and statistics for differential, correlational and predictive analysis.

      (2) The authors consider important variables such as genetic background, ethnicity, sampling bias, sample sizes, detected genes etc.

      (3) This is likely the first study to evaluate alternative splicing changes with age and sex at a pan-tissue scale.

      (4) Sex dimorphism with age is an important topic and is thoroughly analyzed in this study.

    3. Reviewer #3 (Public review):

      Summary:

      In this study, Wang et al utilized the available GTEx data to compile a comprehensive analysis that attempt to reveal aging-related sex-dimorphic gene expression as well as alternative splicing changes in human. The key conclusions based upon their analysis are that 1) extensive sex-dimorphisms during aging with distinct patterns of change in gene expression and alternative splicing (AS), and 2) the male-biased age-associated AS events have a stronger association with Alzheimer's disease, and 3) the females-biased events are often regulated by several sex-biased splicing factors that may be controlled by estrogen receptors. They further performed break-point analysis and reveal in males there are two main breakpoints around ages 35 and 50, while in female only one breakpoint at 45.

      Strengths:

      This study sets an ambitious goal, leveraging the extensive GTEx dataset to investigate aging-related, sex-dimorphic gene expression and alternative splicing changes in humans. The research addresses a significant question, as our understanding of sex-dimorphic gene expression in the context of human aging is still in its early stages. Advancing our knowledge of these molecular changes is vital for identifying therapeutic targets for age-related diseases and extending human healthspan. The study is highly comprehensive, and the authors are commendable for their attempted thorough analysis of both gene expression and alternative splicing-an area often overlooked in similar studies.

    1. eLife Assessment

      Peukes et al. report compelling ultrastructures of excitatory synapses in the mouse forebrain that will serve as a reference for future work in the field. Their important findings using correlated fluorescence and cryo-electron tomography challenge the textbook view of synaptic structure that emerged from chemically fixed and metal-stained tissues. Instead of a post-synaptic density, these authors reveal the architecture of the cytoskeletal, neurotransmitter receptor clusters, and organelles in the 'synaptoplasm'.

    2. Reviewer #1 (Public review): <br /> The authors survey the ultrastructural organization of glutamatergic synapses by cryo-ET and image processing tools using two complementary experimental approaches. The first approach employs so-called "ultra-fresh" preparations of brain homogenates from a knock-in mouse expressing a GFP-tagged version of PSD-95, allowing Peukes and colleagues to specifically target excitatory glutamatergic synapses. In the second approach, direct in-tissue (using cortical and hippocampal regions) targeting of the glutamatergic synapses employing the same mouse model is presented. In order to ascertain whether the isolation procedure causes any significant changes in the ultrastructural organization (and possibly synaptic macromolecular organization) the authors compare their findings using both of these approaches. The quantitation of the synaptic cleft height reveals an unexpected variability, while the STA analysis of the ionotropic receptors provides insights into their distribution with respect to the synaptic cleft.

      The main novelty of this study lies in the continuous claims by the authors that the sample preservation methods developed here are superior to any others previously used. This leads them as well to systematically downplay or directly ignore a substantial body of previous cryo-ET studies of synaptic structure. Without comparisons with the cryo-ET literature, it is very hard to judge the impact of this work in the field. Furthermore, the data does not show any better preservation in the so-called "ultra-fresh" preparation than in the literature, perhaps to the contrary as synapses with strangely elongated vesicles are often seen. Such synapses have been regularly discarded for further analysis in previous synaptosome studies (e.g. Martinez-Sanchez 2021). Whilst the targeting approach using a fluorescent PSD95 marker is novel and seems sufficiently precise, the authors use a somewhat outdated approach (cryo-sectioning) to generate in-tissue tomograms of poor quality. To what extent such tomograms can be interpreted in molecular terms is highly questionable. The authors also don't discuss the physiological influence of 20% dextran used for high-pressure freezing of these "very native" specimens.

      Lastly, a large part of the paper is devoted to image analysis of the PSD which is not convincing (including a somewhat forced comparison with the fixed and heavy-metal staining room temperature approach). Despite being a technically challenging study, the results fall short of expectations.

    3. Reviewer #2 (Public review):

      Summary:

      The authors set out to visualize the molecular architecture of the adult forebrain glutamatergic synapses in a near-native state. To this end, they use a rapid workflow to extract and plunge-freeze mouse synapses for cryo-electron tomography. In addition, the authors use knockin mice expression PSD95-GFP in order to perform correlated light and electron microscopy to clearly identify pre- and synaptic membranes. By thorough quantification of tomograms from plunge- and high-pressure frozen samples, the authors show that the previously reported 'post-synaptic density' does not occur at high frequency and therefore not a defining feature of a glutamatergic synapse.

      Subsequently, the authors are able to reproduce the frequency of post-synaptic density when preparing conventional electron microscopy samples, thus indicating that density prevalence is an artifact of sample preparation. The authors go on to describe the arrangement of cytoskeletal components, membraneous compartments, and ionotropic receptor clusters across synapses.

      Demonstrating that the frequency of the post-synaptic density in prior work is likely an artifact and not a defining feature of glutamatergic synapses is significant. The descriptions of distributions and morphologies of proteins and membranes in this work may serve as a basis for the future of investigation for readers interested in these features.

      Strengths:

      The authors perform a rigorous quantification of the molecular density profiles across synapses to determine the frequency of the post-synaptic density. They prepare samples using two cryogenic electron microscopy sample preparation methods, as well as one set of samples using conventional electron microscopy methods. The authors can reproduce previous reports of the frequency of the post-synaptic density by conventional sample preparation, but not by either of the cryogenic methods, thus strongly supporting their claim.

    4. Reviewer #3 (Public review):

      Summary:

      The authors use cryo-electron tomography to thoroughly investigate the complexity of purified, excitatory synapses. They make several major interesting discoveries: polyhedral vesicles that have not been observed before in neurons; analysis of the intermembrane distance, and a link to potentiation, essentially updating distances reported from plastic-embedded specimen; and find that the postsynaptic density does not appear as a dense accumulation of proteins in all vitrified samples (less than half), a feature which served as a hallmark feature to identify excitatory plastic-embedded synapses.

      Strengths:

      (1) The presented work is thorough: the authors compare purified, endogenously labeled synapses to wild-type synapses to exclude artifacts that could arise through the homogenation step, and, in addition, analyse plastic embedded, stained synapses prepared using the same quick workflow, to ensure their findings have not been caused by way of purification of the synapses. Interestingly, the 'thick lines of PSD' are evident in most of their stained synapses.

      (2) I commend the authors on the exceptional technical achievement of preparing frozen specimens from a mouse within two minutes.

      (3) The approaches highlighted here can be used in other fields studying cell-cell junctions.

      (4) The tomograms will be deposited upon publication which will enable neurobiologists and researchers from other fields to carry on data evaluation in their field of expertise since tomography is still a specialized skill and they collected and reconstructed over 100 excellent tomograms of synapses, which generates a wealth of information to be also used in future studies.

      (5) The authors have identified ionotropic receptor positions and that they are linked to actin filaments, and appear to be associated with membrane and other cytosolic scaffolds, which is highly exciting.

      (6) The authors achieved their aims to study neuronal excitatory synapses in great detail, were thorough in their experiments, and made multiple fascinating discoveries. They challenge dogmas that have been in place for decades and highlight the benefit of implementing and developing new methods to carefully understand the underlying molecular machines of synapses.

      Impact on community:

      The findings presented by Peukes et al. pertaining to synapse biology change dogmas about the fundamental understanding of synaptic ultrastructure. The work presented by the authors, particularly the associated change of intermembrane distance with potentiation and the distinct appearance of the PSD as an irregular amorphous 'cloud' will provide food for thought and an incentive for more analysis and additional studies, as will the discovery of large membranous and cytosolic protein complexes linked to ionotropic receptors within and outside of the synaptic cleft, which are ripe for investigation. The findings and tomograms available will carry far in the synapse fields and the approach and methods will move other fields outside of neurobiology forward. The method and impactful results of preparing cryogenic, unlabeled, unstained, near-native synapses may enable the study of how synapses function at high resolution in the future.

    5. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The authors survey the ultrastructural organization of glutamatergic synapses by cryo-ET and image processing tools using two complementary experimental approaches. The first approach employs so-called "ultra-fresh" preparations of brain homogenates from a knock-in mouse expressing a GFP-tagged version of PSD-95, allowing Peukes and colleagues to specifically target excitatory glutamatergic synapses. In the second approach, direct in-tissue (using cortical and hippocampal regions) targeting of the glutamatergic synapses employing the same mouse model is presented. In order to ascertain whether the isolation procedure causes any significant changes in the ultrastructural organization (and possibly synaptic macromolecular organization) the authors compare their findings using both of these approaches. The quantitation of the synaptic cleft height reveals an unexpected variability, while the STA analysis of the ionotropic receptors provides insights into their distribution with respect to the synaptic cleft.

      The main novelty of this study lies in the continuous claims by the authors that the sample preservation methods developed here are superior to any others previously used. This leads them as well to systematically downplay or directly ignore a substantial body of previous cryo-ET studies of synaptic structure. Without comparisons with the cryo-ET literature, it is very hard to judge the impact of this work in the field. Furthermore, the data does not show any better preservation in the so-called "ultra-fresh" preparation than in the literature, perhaps to the contrary as synapses with strangely elongated vesicles are often seen. Such synapses have been regularly discarded for further analysis in previous synaptosome studies (e.g. Martinez-Sanchez 2021). Whilst the targeting approach using a fluorescent PSD95 marker is novel and seems sufficiently precise, the authors use a somewhat outdated approach (cryo-sectioning) to generate in-tissue tomograms of poor quality. To what extent such tomograms can be interpreted in molecular terms is highly questionable. The authors also don't discuss the physiological influence of 20% dextran used for high-pressure freezing of these "very native" specimens.

      Lastly, a large part of the paper is devoted to image analysis of the PSD which is not convincing (including a somewhat forced comparison with the fixed and heavy-metal staining room temperature approach). Despite being a technically challenging study, the results fall short of expectations. 

      Our manuscript contains a discussion of both conventional EM and cryoET of synapses. We apologise if we have omitted referencing or discussing any earlier cryoET work. This was certainly not our intention, and we include a more complete discussion of published cryoET work on synapses in our revised manuscript.

      The reviewer is concerned that the synaptic vesicles in some synapse tomograms are “stretched” and that this may reflect poor preservation.  We would like to point out that such non-spherical synaptic vesicles have also been previously reported in cryoET of primary neurons grown on EM grids (Tao et al., J. Neuro, 2018). Indeed, there is no reason per se to suppose synaptic vesicles are always spherical and there are many diverse families of proteins expressed at the synapse that shape membrane curvature (BAR domain proteins, synaptotagmin, epsins, endophilins and others). We will add further discussion of this issue in the revised manuscript.

      The reviewer regards ‘cryo-sectioning’ as outdated and cryoET data from these preparations as “poor quality”. We respectfully disagree. Preparing brain tissues for cryoET is generally considered to be challenging. The first successful demonstration of preparing such samples was before the advent of the cryoEM resolution revolution (with electron counting detectors) by Zuber et al (Proc. Natl. Acad. Sci.,2005) preparing cryo-sections/CEMOVIS of in vitro brain cultures. We followed this technique to prepare tissue cryo-sections for cryoET in our manuscript. Recently, cryoFIB-SEM liftout has been developed as an alternative method to prepare tissue samples for cryoET (Mahamid et al., J. Struct. Biol., 2015) and only more recently this method became available to more laboratories. Both techniques introduce damage as has been described (Han et al., J. Microsc., 2008; Lucas et al., Proc. Natl. Acad. Sci., 2023). Importantly no like-for-like, quantitative comparison of these two methodologies has yet been performed. We have recently demonstrated that the molecular structure of amyloid fibrils within human brain is preserved down to the protein fold level in samples prepared by cryo-sectioning (Gilbert et al., Nature, 2024). We will add further detail on the process by which we excluded poor quality tomograms from our analysis, which we described in detail in our methods section.

      The reviewer asks what the physiological effect is of adding 20% w/v ~40,000 Da dextran? This is a reasonable concern since this could in principle exert osmotic pressure on the tissue sample. While we did not investigate this ourselves, earlier studies have (Zuber et al, 2005) showing cell membranes were not damaged by and did not have any detectable effect on cell structure in the presence of this concentration of dextran.

      The reviewer is not convinced by our analysis of the apparent molecular density of macromolecules in the postsynaptic compartment that in conventional EM is called the postsynaptic density. However, the reviewer provides no reasoning for this assessment nor alternative approaches that could be attempted. We would like to add that we have tested multiple different approaches to objectively measure molecular crowding in cryoET data, that give comparable results. We believe that our conclusion – that we do not observe an increased molecular density conserved at the postsynaptic membrane, and that the PSD that we and others observed by conventional EM does not correspond to a region of increased molecular density - is well supported by our data.  We and the other reviewers consider this an important and novel observation.

      Reviewer #2 (Public review)

      Summary: 

      The authors set out to visualize the molecular architecture of the adult forebrain glutamatergic synapses in a near-native state. To this end, they use a rapid workflow to extract and plunge-freeze mouse synapses for cryo-electron tomography. In addition, the authors use knockin mice expression PSD95-GFP in order to perform correlated light and electron microscopy to clearly identify pre- and synaptic membranes. By thorough quantification of tomograms from plunge- and high-pressure frozen samples, the authors show that the previously reported 'post-synaptic density' does not occur at high frequency and therefore not a defining feature of a glutamatergic synapse.

      Subsequently, the authors are able to reproduce the frequency of post-synaptic density when preparing conventional electron microscopy samples, thus indicating that density prevalence is an artifact of sample preparation. The authors go on to describe the arrangement of cytoskeletal components, membraneous compartments, and ionotropic receptor clusters across synapses.

      Demonstrating that the frequency of the post-synaptic density in prior work is likely an artifact and not a defining feature of glutamatergic synapses is significant. The descriptions of distributions and morphologies of proteins and membranes in this work may serve as a basis for the future of investigation for readers interested in these features.

      Strengths: 

      The authors perform a rigorous quantification of the molecular density profiles across synapses to determine the frequency of the post-synaptic density. They prepare samples using two cryogenic electron microscopy sample preparation methods, as well as one set of samples using conventional electron microscopy methods. The authors can reproduce previous reports of the frequency of the post-synaptic density by conventional sample preparation, but not by either of the cryogenic methods, thus strongly supporting their claim. 

      We thank the reviewer for their generous assessment of our manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      The authors use cryo-electron tomography to thoroughly investigate the complexity of purified, excitatory synapses. They make several major interesting discoveries: polyhedral vesicles that have not been observed before in neurons; analysis of the intermembrane distance, and a link to potentiation, essentially updating distances reported from plastic-embedded specimen; and find that the postsynaptic density does not appear as a dense accumulation of proteins in all vitrified samples (less than half), a feature which served as a hallmark feature to identify excitatory plastic-embedded synapses. 

      Strengths: 

      (1)The presented work is thorough: the authors compare purified, endogenously labeled synapses to wild-type synapses to exclude artifacts that could arise through the homogenation step, and, in addition, analyse plastic embedded, stained synapses prepared using the same quick workflow, to ensure their findings have not been caused by way of purification of the synapses. Interestingly, the 'thick lines of PSD' are evident in most of their stained synapses.

      (2)I commend the authors on the exceptional technical achievement of preparing frozen specimens from a mouse within two minutes.

      (3)The approaches highlighted here can be used in other fields studying cell-cell junctions.

      (4)The tomograms will be deposited upon publication which will enable neurobiologists and researchers from other fields to carry on data evaluation in their field of expertise since tomography is still a specialized skill and they collected and reconstructed over 100 excellent tomograms of synapses, which generates a wealth of information to be also used in future studies.

      (5) The authors have identified ionotropic receptor positions and that they are linked to actin filaments, and appear to be associated with membrane and other cytosolic scaffolds, which is highly exciting.

      (6) The authors achieved their aims to study neuronal excitatory synapses in great detail, were thorough in their experiments, and made multiple fascinating discoveries. They challenge dogmas that have been in place for decades and highlight the benefit of implementing and developing new methods to carefully understand the underlying molecular machines of synapses.

      Weaknesses: 

      The authors show informative segmentations in their figures but none have been overlayed with any of the tomograms in the submitted videos. It would be helpful for data evaluation to a broad audience to be able to view these together as videos to study these tomograms and extract more information. Deposition of segmentations associated with the tomgrams would be tremendously helpful to Neurobiologists, cryo-ET method developers, and others to push the boundaries.

      Impact on community: 

      The findings presented by Peukes et al. pertaining to synapse biology change dogmas about the fundamental understanding of synaptic ultrastructure. The work presented by the authors, particularly the associated change of intermembrane distance with potentiation and the distinct appearance of the PSD as an irregular amorphous 'cloud' will provide food for thought and an incentive for more analysis and additional studies, as will the discovery of large membranous and cytosolic protein complexes linked to ionotropic receptors within and outside of the synaptic cleft, which are ripe for investigation. The findings and tomograms available will carry far in the synapse fields and the approach and methods will move other fields outside of neurobiology forward. The method and impactful results of preparing cryogenic, unlabelled, unstained, near-native synapses may enable the study of how synapses function at high resolution in the future.

      We thank the reviewer for their supportive assessment of our manuscript.  We thank the reviewer for suggesting overlaying segmentations with videos of the raw tomographic volumes. We will include this in our revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      Major comments: 

      (1) The previous literature on synaptic cryo-ET studies is systematically ignored. The results presented here (and their novelty) must be compared directly with this body of work, rather than with classical EM.

      Our submitted manuscript included a 3-paragraph discussion of earlier synaptic cryoET studies, albeit we apologize that a seminal citation was missing, which we have corrected in our revised manuscript. We have now also included an additional brief discussion related to several more recent cryoET studies (see citations below) that were published after our pre-print was first deposited in 2021.

      (1) Held, R.G., Liang, J., and Brunger, A.T. (2024). Nanoscale architecture of synaptic vesicles and scaffolding complexes revealed by cryo-electron tomography. Proc. Natl. Acad. Sci. 121, e2403136121. https://doi.org/10.1073/pnas.2403136121.

      (2) Held, R.G., Liang, J., Esquivies, L., Khan, Y.A., Wang, C., Azubel, M., and Brunger, A.T. (2024). In-Situ Structure and Topography of AMPA Receptor Scaffolding Complexes Visualized by CryoET. bioRxiv, 2024.10.19.619226. https://doi.org/10.1101/2024.10.19.619226.

      (3)Matsui, A., Spangler, C., Elferich, J., Shiozaki, M., Jean, N., Zhao, X., Qin, M., Zhong, H., Yu, Z., and Gouaux, E. (2024). Cryo-electron tomographic investigation of native hippocampal glutamatergic synapses. eLife 13, RP98458. https://doi.org/10.7554/elife.98458.

      (4)Glynn, C., Smith, J.L.R., Case, M., Csöndör, R., Katsini, A., Sanita, M.E., Glen, T.S., Pennington, A., and Grange, M. (2024). Charting the molecular landscape of neuronal organisation within the hippocampus using cryo electron tomography. bioRxiv, 2024.10.14.617844. https://doi.org/10.1101/2024.10.14.617844.

      We discuss the above papers in our revised manuscript with the following:

      “Since submission of our manuscript, several reports of synapse cryoET from within cultured primary neurons (Held et al., 2024a, 2024b)  and mouse brain(Glynn et al., 2024; Matsui et al., 2024) were prepared by cryoFIB-milling. These new datasets are largely consistent with the data reported here. CryoFIB-SEM has the advantage of overcoming the local knife damage caused by cryo-sectioning but introduces amorphization across the whole sample that diminishes the information content (Al-Amoudi et al., 2005; Lovatt et al., 2022; Lucas and Grigorieff, 2023). We have recently shown cryoET data is capable of revealing subnanometer resolution in-tissue protein structure from vitreous cryo-sections (Gilbert et al., 2024) and near-atomic structures within cryo-sections has recently been demonstrated (Elferich et al., 2025).”

      Although there is variation between individual synapses, PSDs are clearly visible in several previous cryo-ET studies (even if it's not as striking as in heavy-metal stained samples). In fact, although the contrast of the images is generally poor, PSDs are also visible in several examples shown in Figure 1 - Supplement 3. Not being able to detect them seems more of a problem of the workflow used here than of missing features. The authors should also discuss why heavy-metal stains would accumulate on a non-existing structure (PSD) in conventional EM.

      We agree that apparent higher molecular density can be observed in example tomographic data of earlier cryoET studies. We also report individual examples of similar synapses in our dataset. A key strength of our approach is that we have assessed the molecular architecture of large numbers of adult brain synapses acquired by an unbiased approach (solely guided by PSD95 cryoCLEM), which indicate that a higher molecular density proximal to the postsynaptic membrane is not a conserved feature of glutamatergic synapses in the adult brain. There is no rationale for our cryoCLEM approach being a ‘problem of the workflow’.

      The reviewer misunderstands the weaknesses of conventional/room temperature EM workflows (including resin-embedding and freeze substitution). It is unavoidable that most proteins are damaged by denaturation and/or washed away by washing samples in organic solvents (methanol/acetone that directly denature most proteins) during tissue preparation for conventional EM. It is therefore conceivable that in such preparations a relative increase in contrast proximal to the postsynaptic membrane (‘PSD’) would appear if cytoplasmic proteins were washed away during these harsh organic solved washing steps, leaving only those denatured proteins that are tethered to the postsynaptic membrane. It is not that the PSD is absent in cryoEM, rather that this difference in molecular crowding is not evident when tissues are imaged directly by cryoEM and have not undergone the harsh sample preparation required for conventional/room temperature EM.

      (2) Whether the synapses examined here are in a more physiological state than those analyzed in other papers remains absolutely unclear. For example, the quality of the tomographic slice shown in Figure 1C is poor, with the majority of synaptic vesicles looking suspiciously elongated. 

      We addressed this in our public reviews.

      (3) How were actin filaments segmented and quantified (e.g. for Fig 1E)? Apart from actin, can the authors show some examples of other macromolecular complexes (e.g. ribosomes) that they are able to identify in synapses (based on the info in supplementary tables)? Also, the mapping of glutamatergic receptors is not convincing, as the molecules were picked manually. To analyze their distribution, they should be mapped as comprehensively as possible by e.g. template matching.

      Actin filaments identified by ~7 nm diameter with ~70° branch points were manually segmented in IMOD. The number of filaments was counted per postsynaptic compartment. We have amended the methods section to include this description.

      “In the PoSM, F-actin formed a network with ~70° branch points (Figure 1–figure supplement 1C) likely formed by Arp2/3, as expected(Pizarro-Cerdá 2017,Fäßler 2020) . Putative filament copy number in the PoSM was estimated by manual segmentation in IMOD.” Manual picking was validated by the quality of the subtomogram average, which although only reached modest resolution (25 Å) is consistent with the identification of ionotropic glutamate receptors.

      (4) In the section "Synaptic organelles" the authors should provide some general information on the average number and size of synaptic vesicles (for the in-tissue tomograms).

      We have provided this information in the methods section:

      “The average diameter of synaptic vesicles was 40.2 nm and the minimum and maximum dimensions ranged from 20 to 57.8 nm, measured from the outside of the vesicle that included ellipsoidal synaptic vesicles similar to those previously reported (Tao et al., 2018).” A detailed survey of the presynaptic compartment, including the number of presynaptic vesicles was not the focus of our manuscript. We have deposited all tomograms from our dataset for any further data mining.

      Can the "flat tubular membranes compartments" be attributed to ER? The angular vesicles certainly have a typical ER appearance, as such morphology has been seen in several cryo-ET studies of neuronal and non-neuronal cells.

      In neuronal cells we regard it as unsafe to describe an intracellular organelle as being endoplasmic reticulum on the basis of morphology alone (eg. Smooth ER described widely in conventional EM) because of the apparent diversity of distinct organelles. As described in our methods section, we could have confidence that a membrane compartment is ER when we observe ribosomes tethered to the membrane. In instances where flat/tubular membranes did not have associated ribosomes, we take the cautious view that there is not sufficient evidence to define these as ER.

      Importantly, polyhedral vesicles were distinct from the flat/tubular membranes that resembled ER and are at present organelles of unknown identity. It will be important in future experiments to determine what are the protein constituents of these distinct organelle types to understand both their functions and how these distinct membrane architectures are assembled.

      Therefore, the sentences in lines 198-199 are simply wrong. Additionally, features of even higher membrane curvature are common in the ER (e.g. Collado et al., Dev Cell 2019). 

      We thank the reviewer for bringing our attention to this excellent paper (Collado et al.). We agree that the sentence describing the curvature being higher than all other membranes except mitochondrial cristae is wrong. We have removed this sentence in the revised manuscript.

      (5)The quality of the tomographic data for the in-tissue sample is low, likely due to cryo-sectioning-induced artifacts, as extensively documented in the literature. Additionally, the authors used 20% dextran as cryo-protectant for high-pressure freezing, which contrasts with statements like those in lines 342-344. Given that several publications describing the in-tissue targeting of synapses (e.g. from Eric Gouaux's lab) are available, the quality of the tomographic data presented in this work is underwhelming and limits the conclusions that can be drawn, not providing a solid basis for future studies of in-tissue synapse targeting. However, the complete workflow (excluding the sectioning part) can be adapted for a cryo-FIB approach. The authors should discuss the limitations of their approach. 

      Our manuscript preprint was deposited in the Biorxiv several years before Matsui/Gouaux’s recent ELife paper that reported a novel work-flow for in-tissue cryoET. It is difficult to directly compare data from our and Matsui/Gouaux’s approach because the latter reported a dataset of only 3 tomograms. Note also that Matsui/Gouaux followed our approach of using 20% dextran 40,000 as a cryo-preservative. The use of 20% dextran 40,000 as a cryo-protectant was first established by Zuber et al., 2005 (PMID: 16354833) and shown avoid hyper-osmotic pressure and cell membrane rupture. However, Matsui/Gouaux additionally included 5% sucrose in their cryoprotectant. We did not include sucrose as cryo-preservative because this exerts osmotic pressure and was not necessary to achieve vitreous tissues in our workflow.

      Before high-pressure freezing, Matsui/Gouaux also incubated tissue slices in a HEPES-buffered artificial cerebrospinal fluid (that included 2 mM CaCl2 but did not include glucose as an energy source) for 1 h at room temperature to label AMPA receptors with Fab fragment-Au conjugates. Under these conditions, neurons can elicit both physiological and excitotoxic action potentials (even though AMPARs were themselves antagonised with ZK-200775). The absence of glucose is a concern, and it is unclear to what extent tissue viability is affected by this incubation step. In contrast, we chose to use an NMDG-based artificial cerebrospinal fluid for slice preparation and high-pressure freezing that is a well-established method for preserving neuronal viability (Ting et al., 2018).

      We addressed the supposed limitations of cryo-sectioning versus cryoFIB-SEM in our public response. In particular, we have recently shown that cryo-sectioning produced a  subnanometer resolution in-tissue structure of a protein, that has so far only been achieved for ribosome within cryoFIB-SEM sample preparations. A discussion of cryo-sectioning versus cryoFIB-SEM must be informed by new data that directly compares these methods, which is not the subject of our eLife paper. We also cite a recent preprint directly comparing cryoFIB-milled lamellae with cryo-sections and showing that near atomic resolution structures can also be obtained from the latter sample preparations (Elferich et al., 2025).

      (6) The authors show (in Supplementary) putative tethers connecting SV and the plasma membrane. Is it possible to improve the image quality (e.g. some sort of filtering or denoising) so that the tethers appear more obvious? Can the authors observe connectors linking synaptic vesicles? 

      We have tested multiple iterative reconstruction and denoising approaches, including SIRT and noise2noise filtering in Isonet. We observed instances of macromolecular complexes linking one synaptic vesicle with another. However, there was no question we sought to answer by performing a quantitative analysis of these linkers.

      (7) Figure 4F is missing. 

      Thank you for spotting this omission. We have corrected this in the revised manuscript.

      (8) Most quantifications lack statistical analyses. These need to be included, and only statistically significant findings should be discussed. Terms like "significantly" (e.g. Line 144) should only be used in these cases.

      We used the term ‘significantly’ in the results section (line 143 and line 166 in revised text, we cite figure 1H and 2F showing analyses in which we have in fact performed statistical tests (t-tests with Bonferroni correction) comparing the voxel intensities in regions of the cytoplasm that are proximal versus distal to the postsynaptic membrane. We have amended the main text to include the details of the statistical test that we performed. Also, we neglected to include a description of the statistical test in line 241, which cites Figure 3G. We have corrected this in the revised text.

      Minor comments: 

      (1) Can the authors comment on why only 1-2 grids are prepared per mouse brain (in M&M -section)?

      We prepared only two grids in order to have prepared samples within 2 minutes, to limit deterioration of the sample.

      (2) Figure 1 Supplement 2 and its legend are confusing (averaging of non-aligned versus aligned post-synaptic membrane). Can the authors describe more clearly their molecular density profile analysis?

      We apologise that this figure legend was insufficient. We have included a detailed description of our molecular density profile analysis in the methods section entitled ‘Molecular density profile analysis’. In the revised manuscript we have now also included a citation to this methods section in Figure – figure 1 supplement 2 legend.

      (3) Please clarify with higher precision the areas were recorded in relation to the fluorescent spots (e.g. Figures 3A-C).

      We have included a white rectangular annotation in the cryoCLEM inset panels of Figures 3A-C to indicate the field of view of each corresponding tomographic slice. This shows that PSD95-GFP puncta localise to the postsynaptic compartments in each tomogram.

      (4) Figure 4 Supplement 2D is not clear: the connection between receptors and actin should be shown in a segmentation.

      We agree with the reviewer. A ‘connection’ is not clear, which is expected because the cytoplasmic domain of ionotropic glutamate receptor subunits is composed of a non-globular/intrinsically disordered sequence. We have amended our description of the proximity of actin cytoskeleton to ionotropic glutamate receptor clusters in the main text replacing “associated with” to “adjacent to”.

      (5) Line 341: the reference is referred to by a number (56) at the end of the sentence, rather than by name.

      Good spot. We have corrected this in the revised manuscript.

      (6) Line 968: tomograms is misspelled. 

      Good spot. We have corrected this error (line 1018 in our revised manuscript).

      Reviewer #2 (Recommendations for the authors): 

      (1) On page 11: "The position of (i)onotropic receptor...". 

      Good spot. We have corrected this.

      (2) On page 13: "Slightly higher relative molecular density..." this line ends with a citation to reference '56', but the works cited are not numbered.

      Good spot. We have corrected this in the revised manuscript.

      (3) On page 46: "as described in (69)..." the works cited are not numbered. 

      Good spot. We have corrected this in the revised manuscript.

      Reviewer #3 (Recommendations for the authors): <br /> (1) The title does not do the work justice. The authors make many exciting discoveries, e.g. PSD appearance, new polyhedral vesicles, ionotropic receptor positions, and intermembrane distance changes even within the synaptic cleft, but title their manuscript "The molecular infrastructure of glutamatergic synapses in the mammalian forebrain". It is also a bit misleading, since one would have expected more molecular detail and molecular maps as part of the work, so the authors may think about updating the title to reflect their exciting work. 

      We thank the reviewer for recognising the exciting discoveries in our manuscript. Summarising all these in a title is challenging. We intend ‘molecular infrastructure’ to mean a structure composed of many molecules including proteins (by analogy ‘transport infrastructure’ is composed of many roads, ports and train lines).

      (2) It would be in the spirit of eLife and open science if the authors could submit their segmentations alongside the tomographic data to either EMPIAR or pdb-dev (if they accept it) or the new CZII cryoET data portal for neurobiologists, method developers, and others to use. 

      We agree with the reviewer. We have deposited in subtomogram averaged map of AMPA receptor in EMDB, and all tilt series and 4x binned tomographic reconstructions described in our manuscript (figure 1- table1 and figure 2 -table 2), together with segmentations in EMPIAR.  

      (3) Methods: the authors establish an exciting new workflow to get from living mice to frozen specimens within 2 minutes and perform many unique analyses that would be useful to different fields. Their methods section overall is well described and contains criteria and details that should allow others to apply experiments to their scientific problems. However, it would be very helpful to expand on the methods in the 'annotation and analysis [...]' and "Subtomogram averaging" sections, to at least in short describe the steps without having to embark on a reference journey for each method and generally provide more detail. For the annotation section, the software used for annotation is not listed. Table 1 only contains the list of the counts of organelles etc. identified in each tomogram, no processing details. 

      We have revised the methods section ‘annotation and analysis’ including software used (IMOD). We have also included a slightly more detailed description of subtomogram averaging. We did not include ‘processing details’ because there are none - identification of constituents in each tomogram was carried out manually, as described in the methods section.

      (4) Some of the tomograms submitted as videos may have slipped through as an early version since they appear to be originating from not perfectly aligned tiltseries; vesicles and membranes can be observed 'rubberbanding'. The authors should go through and check their videos. 

      We thank the referee for suggesting we double check our tomogram videos. All movies are representative tomographic reconstructions from ultra-fresh synapse preparations (Figure 1 – videos 1-7) and synapses in tissue cryo-sections (Figure 2 – videos 1-2). We have double checked that the videos correspond to tomograms that were aligned as good as possible. In general, tissue cryo-section tomograms reconstructed less well than ultra-fresh synapse tomograms, which limits the information content of these data, as expected. Consequently, the reconstructions shown in these videos were all reconstructed as best we could (testing multiple approaches in IMOD, and more recent software packages, eg. AreTomo). While we think it is important to share all tomograms, regardless of quality, we were careful to exclude tomograms for analysis that did not contain sufficient information for analysis (as described in the methods section).

      Minor suggestions: 

      (1) Page 13, line 341, reference 56, but references are not numbered. Please update.

      Good spot. We have corrected this in the revised manuscript.

      (2) Page 33, line 746, the figure legend is not referencing the correct figure panels G-K should be I-K;

      We have amended the Figure 3 legend to “(G-K) Snapshots and quantification of membrane remodeling within glutamatergic synapses”.

      (3) Page 33, line 750; reads 'same as E', but should be 'same as G'. 

      Good spot. We have corrected this in the revised manuscript.

      (4) Page 35, Figure 4: Please use more labels: Figure 4B: it would be helpful to use different colors for each view and match to the tomogram - then non-experts could easily relate the projections and real data; Figure 4C: please label domains; Figure 4F: the figure panel got lost. 

      This is an interesting idea. While our subtomgram average of 2522 subvolumes provided decent evidence that these are ionotropic receptors, we are reluctant to label specific putative domains of individual subvolumes in the raw tomographic slice because the resolution of the raw tomogram (particularly in the Z-direction) is worse and may not be sufficient to resolve definitely each domain layer. We hope the reviewer appreciates our cautious approach.

      (5) Page 42, line 933: incomplete sentence. 

      Good spot. We have corrected this in the revised manuscript.

      (6) Page 46, line 1038; Reference 69 is in brackets, but references are not numbered. Please update.

      Good spot. We have corrected this in the revised manuscript.

    1. eLife Assessment

      This paper is an important overview of the currently published literature on low-intensity focussed ultrasound stimulation (TUS) in humans, providing a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects. The overall synthesis, except for the section on TPS and AD, is convincing though could be streamlined at places. The database proposed by the paper has the potential to become a key community resource if carefully curated and developed.

    2. Reviewer #1 (Public review):

      Summary:

      This paper is a relevant overview of the currently published literature on low-intensity focused ultrasound stimulation (TUS) in humans, with a meta-analysis of this literature that explores which stimulation parameters might predict the directionality of the physiological stimulation effects.

      The pool of papers to draw from is small, which is not surprising given the nascent technology. It seems nevertheless relevant to summarize the current field in the way done here, not least to mitigate and prevent some of the mistakes that other non-invasive brain stimulation techniques have suffered from, most notably the theory- and data-free permutation of the parameter space.

      The meta-analysis concludes that there are, at best, weak trends toward specific parameters predicting the direction of the stimulation effects. The data have been incorporated into an open database that will ideally continue to be populated by the community and thereby become a helpful resource as the field moves forward.

      Strengths:

      The current state of human TUS is concisely and well summarized. The methods of the meta-analysis are appropriate. The database is a valuable resource.

      Suggestions:<br /> - The paper remains lengthy and somewhat unfocused, to the detriment of readability. One can understand that the authors wish to include as much information as possible, but this reviewer is sceptical that this will aid the use of the databank, or help broaden the readership. For one, there is a good chunk of repetition throughout. The intro is also somewhat oscillating between TMS, tDCS and TUS. While the former two help contextualizing the issue, it doesn't seem necessary. In the section on clinical applications of TUs and possible outcomes of TUS, there's an imbalance of the content across examples. That's in part because of the difference in knowledge base but some sections could probably be shortened, eg stroke. In any case, the authors may want to consider whether it is worth making some additional effort in pruning the paper

      - The terms or concept of enhancement and suppression warrant a clearer definition and usage. In most cases, the authors refer to E/S of neural activity. Perhaps using terms such as "neural enhancement" etc helps distinguish these from eg behavioural or clinical effects. Crucially, how one maps onto the other is not clear. But in any case, a clear statement that the changes outlined on lines 277ff do not

      - Re tb-TUS (lines 382ff), it is worth acknowledging here that independent replication is very limited (eg Bao et al 2024; Fong et al bioRxiv 2024) and seems to indicate rather different effects

      - The comparison with TPS is troublesome. For one, that original study was incredibly poorly controlled and designed. Cherry-picking individual (badly conducted) proof-of-principle studies doesn't seem a great way to go about as one can find a match for any desired use or outcome.

      Moreover, other than the concept of "pulsed" stimulation, it is not clear why that original study would motivate the use of TUS in the way the authors propose; both types of stimulation act in very different ways (if TPS "acts" at all). But surely the cited TPS study does not "demonstrate the capability for TUS for pre-operative cognitive mapping". As an aside, why the authors feel the need to state the "potential for TPS... to enhance cognitive function" is unclear, but it is certainly a non-sequitur. This review feels quite strongly that simplistic analogies such as the one here are unnecessary and misleading, and don't reflect the thoughtful discussion of the rest of the paper. In the other clinical examples, the authors build their suggestions on other TUS studies, which seems more sensible.