10,000 Matching Annotations
  1. Mar 2025
    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This study identified three independent components of glucose dynamics-"value," "variability," and "autocorrelation", and reported important findings indicating that they play an important role in predicting coronary plaque vulnerability. Although the generalizability of the results needs further investigation due to the limited sample size and validation cohort limitations, this study makes several notable contributions: validation of autocorrelation as a new clinical indicator, theoretical support through mathematical modeling, and development of a web application for practical implementation. These contributions are likely to attract broad interest from researchers in both diabetology and cardiology and may suggest the potential for a new approach to glucose monitoring that goes beyond conventional glycemic control indicators in clinical practice.

      Strengths:

      The most notable strength of this study is the identification of three independent elements in glycemic dynamics: value, variability, and autocorrelation. In particular, the metric of autocorrelation, which has not been captured by conventional glycemic control indices, may bring a new perspective for understanding glycemic dynamics. In terms of methodological aspects, the study uses an analytical approach combining various statistical methods such as factor analysis, LASSO, and PLS regression, and enhances the reliability of results through theoretical validation using mathematical models and validation in other cohorts. In addition, the practical aspect of the research results, such as the development of a Web application, is also an important contribution to clinical implementation.

      We appreciate reviewer #1 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      The most significant weakness of this study is the relatively small sample size of 53 study subjects. This sample size limitation leads to a lack of statistical power, especially in subgroup analyses, and to limitations in the assessment of rare events.

      We appreciate the reviewer’s concern regarding the sample size. We acknowledge that a larger sample size would increase statistical power, especially for subgroup analyses and the assessment of rare events.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size determination followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Furthermore, the primary objective of our study was not to assess rare events, but rather to demonstrate that glucose dynamics can be decomposed into three main factors - mean, variance and autocorrelation - whereas traditional measures have primarily captured mean and variance without adequately reflecting autocorrelation. We believe that our current sample size effectively addresses this objective.

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we will add the following sentences in the Discussion section:

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed to improve the predictive utility and generalizability of our findings.

      We appreciate the reviewer’s feedback and believe that these clarifications will strengthen the manuscript.

      In terms of validation, several challenges exist, including geographical and ethnic biases in the validation cohorts, lack of long-term follow-up data, and insufficient validation across different clinical settings. In terms of data representativeness, limiting factors include the inclusion of only subjects with well-controlled serum cholesterol and blood pressure and the use of only short-term measurement data.

      We appreciate the reviewer’s comment regarding the challenges associated with validation. In terms of geographic and ethnic diversity, our study includes validation cohorts from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These cohorts include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. In addition, we recognize the limited availability of publicly available datasets with sufficient sample sizes for factor decomposition that include both healthy individuals and those with type 2 diabetes (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). The main publicly available datasets with relevant clinical characteristics have already been analyzed in this study using unbiased approaches.

      However, we fully agree with the reviewer that expanding the geographic and ethnic scope, including long-term follow-up data, and validation in different clinical settings would further strengthen the robustness and generalizability of our findings. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      Regarding the validation considerations, we will add the following sentences to the Discussion section:

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed to improve the predictive utility and generalizability of our findings.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      In terms of elucidation of physical mechanisms, the study is not sufficient to elucidate the mechanisms linking autocorrelation and clinical outcomes or to verify them at the cellular or molecular level.

      We appreciate the reviewer’s point regarding the need for further elucidation of the physical mechanisms linking glucose autocorrelation to clinical outcomes. We fully agree with the reviewer that the detailed molecular and cellular mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes. While further research at the cellular and molecular level is needed to fully validate these findings, it is important to note that the primary goal of this study was to analyze the characteristics of glucose dynamics and gain new insights into metabolism, rather than to perform molecular biology experiments.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved Detection of Decreased Glucose Handling Capacities via Novel Continuous Glucose Monitoring-Derived Indices: AC_Mean and AC_Var.” medRxiv (2023): 2023-09.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study.

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a strong theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we will add the following sentences in the Discussion section:

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2023), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      Reviewer #2 (Public review):

      Sugimoto et al. explore the relationship between glucose dynamics - specifically value, variability, and autocorrelation - and coronary plaque vulnerability in patients with varying glucose tolerance levels. The study identifies three independent predictive factors for %NC and emphasizes the use of continuous glucose monitoring (CGM)-derived indices for coronary artery disease (CAD) risk assessment. By employing robust statistical methods and validating findings across datasets from Japan, America, and China, the authors highlight the limitations of conventional markers while proposing CGM as a novel approach for risk prediction. The study has the potential to reshape CAD risk assessment by emphasizing CGM-derived indices, aligning well with personalized medicine trends.

      Strengths:

      (1) The introduction of autocorrelation as a predictive factor for plaque vulnerability adds a novel dimension to glucose dynamic analysis.

      (2) Inclusion of datasets from diverse regions enhances generalizability.

      (3) The use of a well-characterized cohort with controlled cholesterol and blood pressure levels strengthens the findings.

      (4) The focus on CGM-derived indices aligns with personalized medicine trends, showcasing the potential for CAD risk stratification.

      We appreciate reviewer #2 for the positive assessment and for the valuable and constructive comments on our manuscript.

      Weaknesses:

      (1) The link between autocorrelation and plaque vulnerability remains speculative without a proposed biological explanation.

      We appreciate the reviewer’s point about the need for a clearer biological explanation linking glucose autocorrelation to plaque vulnerability. We fully agree with the reviewer that the detailed biological mechanisms underlying this relationship are not yet fully understood, as noted in our Discussion section.

      However, we would like to emphasize the theoretical basis that supports the clinical relevance of autocorrelation. Our results show that glucose profiles with identical mean and variability can exhibit different autocorrelation patterns, highlighting that conventional measures such as mean or variance alone may not fully capture inter-individual metabolic differences. Incorporating autocorrelation analysis provides a more comprehensive characterization of metabolic states. Consequently, incorporating autocorrelation measures alongside traditional diabetes diagnostic criteria - such as fasting glucose, HbA1c and PG120, which primarily reflect only the “mean” component - can improve predictive accuracy for various clinical outcomes.

      Furthermore, our previous research has shown that glucose autocorrelation reflects changes in insulin clearance (Sugimoto, Hikaru, et al. “Improved Detection of Decreased Glucose Handling Capacities via Novel Continuous Glucose Monitoring-Derived Indices: AC_Mean and AC_Var.” medRxiv (2023): 2023-09.). The relationship between insulin clearance and cardiovascular disease has been well documented (Randrianarisoa, Elko, et al. “Reduced insulin clearance is linked to subclinical atherosclerosis in individuals at risk for type 2 diabetes mellitus.” Scientific reports 10.1 (2020): 22453.), and the mechanisms described in this prior work may potentially explain the association between glucose autocorrelation and clinical outcomes observed in the present study.

      Rather than a limitation, we view these currently unexplored associations as an opportunity for further research. The identification of autocorrelation as a key glycemic feature introduces a new dimension to metabolic regulation that could serve as the basis for future investigations exploring the molecular mechanisms underlying these patterns.

      While we agree that further research at the cellular and molecular level is needed to fully validate these findings, we believe that our study provides a strong theoretical framework to support the clinical utility of autocorrelation analysis in glucose monitoring, and that this could serve as the basis for future investigations exploring the molecular mechanisms underlying these autocorrelation patterns, which adds to the broad interest of this study. Regarding the physical mechanisms linking autocorrelation and clinical outcomes, we will add the following sentences in the Discussion section:

      This study also provided evidence that autocorrelation can vary independently from the mean and variance components using simulated data. In addition, simulated glucose dynamics indicated that even individuals with high AC_Var did not necessarily have high maximum and minimum blood glucose levels. This study also indicated that these three components qualitatively corresponded to the four distinct glucose patterns observed after glucose administration, which were identified in a previous study (Hulman et al., 2018). Thus, the inclusion of autocorrelation in addition to mean and variance may improve the characterization of inter-individual differences in glucose regulation and improve the predictive accuracy of various clinical outcomes.

      Despite increasing evidence linking glycemic variability to oxidative stress and endothelial dysfunction in T2DM complications (Ceriello et al., 2008; Monnier et al., 2008), the biological mechanisms underlying the independent predictive value of autocorrelation remain to be elucidated. Our previous work has shown that glucose autocorrelation is influenced by insulin clearance (Sugimoto et al., 2023), a process known to be associated with cardiovascular disease risk (Randrianarisoa et al., 2020). Therefore, the molecular pathways linking glucose autocorrelation to cardiovascular disease may share common mechanisms with those linking insulin clearance to cardiovascular disease. Although previous studies have primarily focused on investigating the molecular mechanisms associated with mean glucose levels and glycemic variability, our findings open new avenues for exploring the molecular basis of glucose autocorrelation, potentially revealing novel therapeutic targets for preventing diabetic complications.

      (2) The relatively small sample size (n=270) limits statistical power, especially when stratified by glucose tolerance levels.

      We appreciate the reviewer’s concern regarding sample size and its potential impact on statistical power, especially when stratified by glucose tolerance level. We fully agree that a larger sample size would increase statistical power, especially for subgroup analyses.

      We would like to clarify several points regarding the statistical power and validation of our findings. Our sample size determination followed established methodological frameworks, including the guidelines outlined by Muyembe Asenahabi, Bostely, and Peters Anselemo Ikoha. “Scientific research sample size determination.” (2023). These guidelines balance the risks of inadequate sample size with the challenges of unnecessarily large samples. For our primary analysis examining the correlation between CGM-derived measures and %NC, power calculations (a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4) indicated that a minimum of 47 participants was required. Our sample size of 53 exceeded this threshold and allowed us to detect statistically significant correlations, as described in the Methods section. Moreover, to provide transparency about the precision of our estimates, we have included confidence intervals for all coefficients.

      Furthermore, our sample size aligns with previous studies investigating the associations between glucose profiles and clinical parameters, including Torimoto, Keiichi, et al. “Relationship between fluctuations in glucose levels measured by continuous glucose monitoring and vascular endothelial dysfunction in type 2 diabetes mellitus.” Cardiovascular Diabetology 12 (2013): 1-7. (n=57), Hall, Heather, et al. “Glucotypes reveal new patterns of glucose dysregulation.” PLoS biology 16.7 (2018): e2005143. (n=57), and Metwally, Ahmed A., et al. “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning.” Nature Biomedical Engineering (2024): 1-18. (n=32).

      Regarding the classification of glucose dynamics components, we have conducted additional validation across diverse populations including 64 Japanese, 53 American, and 100 Chinese individuals. These validation efforts have consistently supported our identification of three independent glucose dynamics components.

      However, we acknowledge the importance of further validation on a larger scale. To address this, we conducted a large follow-up study of over 8,000 individuals with two years of follow-up (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      To address the sample size considerations, we will add the following sentences in the Discussion section:

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed to improve the predictive utility and generalizability of our findings.

      (3) Strict participant selection criteria may reduce applicability to broader populations.

      We appreciate the reviewer’s comment regarding the potential impact of strict participant selection criteria on the broader applicability of our findings. We acknowledge that extending validation to more diverse populations would improve the generalizability of our findings.

      Our study includes validation cohorts from diverse populations, including 64 Japanese, 53 American and 100 Chinese individuals. These cohorts include a wide range of metabolic states, from healthy individuals to those with diabetes, ensuring validation across different clinical conditions. However, we acknowledge that further validation in additional populations and clinical settings would strengthen our conclusions. To address this, we conducted a large follow-up study of over 8,000 individuals (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which confirmed our main finding that glucose dynamics consist of mean, variance, and autocorrelation. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, it provides further support for the clinical relevance and generalizability of our findings.

      We will add the following text to the Discussion section to address these considerations:

      Although our analysis included four datasets with a total of 270 individuals, and our sample size of 53 met the required threshold based on power calculations with a type I error of 0.05, a power of 0.8, and an expected correlation coefficient of 0.4, we acknowledge that the sample size may still be considered relatively small for a comprehensive assessment of these relationships. To further validate these findings, larger prospective studies with diverse populations are needed to improve the predictive utility and generalizability of our findings.

      Although our LASSO and factor analysis indicated that CGM-derived measures were strong predictors of %NC, this does not mean that other clinical parameters, such as lipids and blood pressure, are irrelevant in T2DM complications. Our study specifically focused on characterizing glucose dynamics, and we analyzed individuals with well-controlled serum cholesterol and blood pressure to reduce confounding effects. While we anticipate that inclusion of a more diverse population would not alter our primary findings regarding glucose dynamics, it is likely that a broader data set would reveal additional predictive contributions from lipid and blood pressure parameters.

      (4) CGM-derived indices like AC_Var and ADRR may be too complex for routine clinical use without simplified models or guidelines.

      We appreciate the reviewer’s concern about the complexity of CGM-derived indices such as AC_Var and ADRR for routine clinical use. We acknowledge that for these indices to be of practical use, they must be both interpretable and easily accessible to healthcare providers.

      To address this concern, we have developed an easy-to-use web application that automatically calculates these measures, including AC_Var, mean glucose levels, and glucose variability. This tool eliminates the need for manual calculations, making these indices more practical for clinical implementation.

      Regarding interpretability, we acknowledge that establishing specific clinical guidelines would enhance the practical utility of these measures. For example, defining a cut-off value for AC_Var above which the risk of diabetes complications increases significantly would provide clearer clinical guidance. However, given our current sample size limitations and our predefined objective of investigating correlations among indices, we have taken a conservative approach by focusing on the correlation between AC_Var and %NC rather than establishing definitive cutoffs. This approach intentionally avoids problematic statistical practices like p-hacking. It is not realistic to expect a single study to accomplish everything from proposing a new concept to conducting large-scale clinical trials to establishing clinical guidelines. Establishing clinical guidelines typically requires the accumulation of multiple studies over many years. Recognizing this reality, we have been careful in our manuscript to make modest claims about the discovery of new “correlations” rather than exaggerated claims about immediate routine clinical use.

      To address this limitation, we conducted a large follow-up study of over 8,000 individuals in the next study (Sugimoto, Hikaru, et al. “Stratification of individuals without prior diagnosis of diabetes using continuous glucose monitoring” medRxiv (2025)), which proposed clinically relevant cutoffs and reference ranges for AC_Var and other CGM-derived indices. As this large study was beyond the scope of the present manuscript due to differences in primary objectives and analytical approaches, it was not included in this paper; however, by integrating automated calculation tools with clear clinical thresholds, we expect to make these measures more accessible for clinical use.

      We will add the following text to the Discussion section to address these considerations:

      While CGM-derived indices such as AC_Var and ADRR hold promise for CAD risk assessment, their complexity may present challenges for routine clinical implementation. To improve usability, we have developed a web-based calculator that automates these calculations. However, the definition of clinically relevant thresholds and reference ranges requires further validation in larger cohorts.

      (5) The study does not compare CGM-derived indices to existing advanced CAD risk models, limiting the ability to assess their true predictive superiority.

      We appreciate the reviewer’s comment regarding the comparison of CGM-derived indices with existing CAD risk models. Given that our study population consisted of individuals with well-controlled total cholesterol and blood pressure levels, a direct comparison with the Framingham Risk Score for Hard Coronary Heart Disease (Wilson, Peter WF, et al. “Prediction of coronary heart disease using risk factor categories.” Circulation 97.18 (1998): 1837-1847.) may introduce inherent bias, as these factors are key components of the score.

      Nevertheless, to further assess the predictive value of the CGM-derived indices, we performed additional analyses using linear regression to predict %NC. Using the Framingham Risk Score, we obtained an R² of 0.04 and an Akaike Information Criterion (AIC) of 330. In contrast, our proposed model incorporating the three glycemic parameters - CGM_Mean, CGM_Std, and AC_Var - achieved a significantly improved R² of 0.36 and a lower AIC of 321, indicating superior predictive accuracy.

      We will add the following text to the Result section:

      The regression model including CGM_Mean, CGM_Std and AC_Var to predict %NC achieved an R² of 0.36 and an Akaike Information Criterion (AIC) of 321. Each of these indices showed statistically significant independent positive correlations with %NC. In contrast, the model using conventional glycemic markers (FBG, HbA1c, and PG120) yielded an R<sup>2</sup> of only 0.05 and an AIC of 340. Similarly, the model using the Framingham Risk Score for Hard Coronary Heart Disease (Wilson et al., 1998) showed limited predictive value, with an R<sup>2</sup> of 0.04 and an AIC of 330.

      (6) Varying CGM sampling intervals (5-minute vs. 15-minute) were not thoroughly analyzed for impact on results.

      We appreciate the reviewer’s comment regarding the potential impact of different CGM sampling intervals on our results. To assess the robustness of our findings across different sampling frequencies, we performed a down sampling analysis by converting our 5-minute interval data to 15-minute intervals. The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Furthermore, the regression model using CGM_Mean, CGM_Std, and AC_Var from 15-minute intervals to predict %NC achieved an R<sup>2</sup> of 0.36 and an AIC of 321, identical to the model using 5-minute intervals. These results indicate that our results are robust to variations in CGM sampling frequency.

      We will add this analysis to the Result section:

      The AC_Var value calculated from 15-minute intervals was significantly correlated with that calculated from 5-minute intervals (R = 0.99, 95% CI: 0.97-1.00). Consequently, the regression model including CGM_Mean, CGM_Std and AC_Var from 15-minute intervals to predict %NC achieved an R² of 0.36 and an AIC of 321.

      Reviewer #3 (Public review):

      Summary:

      This is a retrospective analysis of 53 individuals over 26 features (12 clinical phenotypes, 12 CGM features, and 2 autocorrelation features) to examine which features were most informative in predicting percent necrotic core (%NC) as a parameter for coronary plaque vulnerability. Multiple regression analysis demonstrated a better ability to predict %NC from 3 selected CGM-derived features than 3 selected clinical phenotypes. LASSO regularization and partial least squares (PLS) with VIP scores were used to identify 4 CGM features that most contribute to the precision of %NC. Using factor analysis they identify 3 components that have CGM-related features: value (relating to the value of blood glucose), variability (relating to glucose variability), and autocorrelation (composed of the two autocorrelation features). These three groupings appeared in the 3 validation cohorts and when performing hierarchical clustering. To demonstrate how these three features change, a simulation was created to allow the user to examine these features under different conditions.

      We appreciate reviewer #3 for the valuable and constructive comments on our manuscript.

      Review:

      The goal of this study was to identify CGM features that relate to %NC. Through multiple feature selection methods, they arrive at 3 components: value, variability, and autocorrelation. While the feature list is highly correlated, the authors take steps to ensure feature selection is robust. There is a lack of clarity of what each component (value, variability, and autocorrelation) includes as while similar CGM indices fall within each component, there appear to be some indices that appear as relevant to value in one dataset and to variability in the validation.

      We appreciate the reviewer’s comment regarding the classification of CGM-derived measures into the three components: value, variability, and autocorrelation. As the reviewer correctly points out, some measures may load differently between the value and variability components in different datasets. However, we believe that this variability reflects the inherent mathematical properties of these measures rather than a limitation of our study.

      For example, the HBGI clusters differently across datasets due to its dependence on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S7A). Conversely, in populations with a wider range of mean glucose levels, HBGI correlates more strongly with mean glucose levels (Fig. 3A). This context-dependent behavior is expected given the mathematical properties of these measures and does not indicate an inconsistency in our classification approach.

      Importantly, our main findings remain robust: CGM-derived measures systematically fall into three components-value, variability, and autocorrelation. Traditional CGM-derived measures primarily reflect either value or variability, and this categorization is consistently observed across datasets. While specific indices such as HBGI may shift classification depending on population characteristics, the overall structure of CGM data remains stable.

      To address these considerations, we will add the following text to the Discussion section:

      Some indices, such as HBGI, showed variation in classification across datasets, with some populations showing higher factor loadings in the “value” component and others in the “variability” component. This variation occurs because HBGI calculations depend on the number of glucose readings above a threshold. In populations where mean glucose levels are predominantly below this threshold, the HBGI is more sensitive to glucose variability (Fig. S7A). Conversely, in populations with a wider range of mean glucose levels, the HBGI correlates more strongly with mean glucose levels (Fig. 3A). Despite these differences, our validation analyses confirm that CGM-derived indices consistently cluster into three components: value, variability, and autocorrelation.

      We are sceptical about statements of significance without documentation of p-values.

      We appreciate the reviewer’s concern regarding statistical significance and the documentation of p values.

      First, given the multiple comparisons in our study, we used q values rather than p values, as shown in Figure S1. Q values provide a more rigorous statistical framework for controlling the false discovery rate in multiple testing scenarios, thereby reducing the likelihood of false positives.

      Second, our statistical reporting follows established guidelines, including those of the New England Journal of Medicine (Harrington, David, et al. “New guidelines for statistical reporting in the journal.” New England Journal of Medicine 381.3 (2019): 285-286.), which recommend that “reporting of exploratory end points should be limited to point estimates of effects with 95% confidence intervals” and that “replace p values with estimates of effects or association and 95% confidence intervals”. According to these guidelines, p values should not be reported in this type of study. We determined significance based on whether these 95% confidence intervals excluded zero - a statistical method for determining whether an association is significantly different from zero (Tan, Sze Huey, and Say Beng Tan. "The correct interpretation of confidence intervals." Proceedings of Singapore Healthcare 19.3 (2010): 276-278.).

      For the sake of transparency, we provide p values for readers who may be interested, although we emphasize that they should not be the basis for interpretation, as discussed in the referenced guidelines. Specifically, in Figure 1, the p values for CGM_Mean, CGM_Std, and AC_Var were 0.02, 0.02, and <0.01, respectively, while those for FBG, HbA1c, and PG120 were 0.83, 0.91, and 0.25, respectively. In Figure 3C, the p values for factors 1–5 were 0.03, 0.03, 0.03, 0.24, and 0.87, respectively, and in Figure S10B, the p values for factors 1–3 were <0.01, <0.01, and 0.20, respectively.

      We appreciate the opportunity to clarify our statistical methodology and are happy to provide additional details if needed.

      While hesitations remain, the ability of these authors to find groupings of these many CGM metrics in relation to %NC is of interest. The believability of the associations is impeded by an obtuse presentation of the results with core data (i.e. correlation plots between CGM metrics and %NC) buried in the supplement while main figures contain plots of numerical estimates from models which would be more usefully presented in supplementary tables.

      We appreciate the reviewer’s comment regarding the presentation of our results and recognize the importance of ensuring clarity and accessibility of the core data.

      The central finding of our study is twofold: first, that the numerous CGM-derived measures can be systematically classified into three distinct components-mean, variance, and autocorrelation-and second, that each of these components is independently associated with %NC. This insight cannot be derived simply from examining scatter plots of individual correlations, which are provided in the Supplementary Figures. Instead, it emerges from our statistical analyses in the main figures, including multiple regression models that reveal the independent contributions of these components to %NC.

      However, we acknowledge the reviewer’s concern regarding the accessibility of key data. To improve clarity, we will move several scatter plots from the Supplementary Figures to the main figures to allow readers to more directly visualize the relationships between CGM-derived measures and %NC. We believe this revision will improve the transparency and readability of our results while maintaining the rigor of our analytical approach.

      Given the small sample size in the primary analysis, there is a lot of modeling done with parameters estimated where simpler measures would serve and be more convincing as they require less data manipulation. A major example of this is that the pairwise correlation/covariance between CGM_mean, CGM_std, and AC_var is not shown and would be much more compelling in the claim that these are independent factors.

      We appreciate the reviewer’s feedback on our statistical analysis and data presentation. The correlations between CGM_Mean, CGM_Std, and AC_Var are documented in Figure S1B. However, to improve accessibility and clarity, we will move these correlation analyses to the main figures. Regarding our modeling approach, we chose LASSO and PLS methods because they are well-established techniques that are particularly suited to scenarios with many input variables and a relatively small sample size. These methods have been extensively validated in the literature as robust approaches for variable selection under such conditions (Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J R Stat Soc 58:267–288. Wold S, Sjöström M, Eriksson L. 2001. PLS-regression: a basic tool of chemometrics. Chemometrics Intellig Lab Syst 58:109–130. Pei X, Qi D, Liu J, Si H, Huang S, Zou S, Lu D, Li Z. 2023. Screening marker genes of type 2 diabetes mellitus in mouse lacrimal gland by LASSO regression. Sci Rep 13:6862. Wang C, Kong H, Guan Y, Yang J, Gu J, Yang S, Xu G. 2005. Plasma phospholipid metabolic profiling and biomarkers of type 2 diabetes mellitus based on high-performance liquid chromatography/electrospray mass spectrometry and multivariate statistical analysis. Anal Chem 77:4108–4116.).

      Lack of methodological detail is another challenge. For example, the time period of CGM metrics or CGM placement in the primary study in relation to the IVUS-derived measurements of coronary plaques is unclear. Are they temporally distant or proximal/ concurrent with the PCI?

      We appreciate the reviewer’s important question regarding the temporal relationship between CGM measurements and IVUS-derived plaque assessments. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610-615.), all individuals underwent continuous glucose monitoring for at least three consecutive days within the seven-day period prior to the PCI procedure. To improve clarity for readers, we will include this methodological detail in the revised manuscript.

      A patient undergoing PCI for coronary intervention would be expected to have physiological and iatrogenic glycemic disturbances that do not reflect their baseline state. This is not considered or discussed.

      We appreciate the reviewer’s concern regarding potential glycemic disturbances associated with PCI. As described in our previous work (Otowa‐Suematsu, Natsu, et al. “Comparison of the relationship between multiple parameters of glycemic variability and coronary plaque vulnerability assessed by virtual histology–intravascular ultrasound.” Journal of Diabetes Investigation 9.3 (2018): 610-615.), all CGM measurements were performed before the PCI procedure. This temporal separation ensures that the glycemic patterns analyzed in our study reflect the baseline metabolic state of the patients, rather than any physiological or iatrogenic effects of PCI. To avoid any misunderstanding, we will clarify this temporal relationship in the revised manuscript.

      The attempts at validation in external cohorts, Japanese, American, and Chinese are very poorly detailed. We could only find even an attempt to examine cardiovascular parameters in the Chinese data set but the outcome variables are unspecified with regard to what macrovascular events are included, their temporal relation to the CGM metrics, etc. Notably macrovascular event diagnoses are very different from the coronary plaque necrosis quantification. This could be a source of strength in the findings if carefully investigated and detailed but due to the lack of detail seems like an apples-to-oranges comparison.

      We appreciate the reviewer’s comment regarding the validation cohorts and the need for greater clarity, particularly in the Chinese dataset. We acknowledge that our initial description lacked sufficient methodological detail, and we will expand the Methods section to provide a more comprehensive explanation.

      For the Chinese dataset, the data collection protocol was previously documented (Zhao, Qinpei, et al. “Chinese diabetes datasets for data-driven machine learning.” Scientific Data 10.1 (2023): 35.). Briefly, trained research staff used standardized questionnaires to collect demographic and clinical information, including diabetes diagnosis, treatment history, comorbidities, and medication use. Physical examinations included anthropometric measurements, and body mass index was calculated using standard protocols. CGM monitoring was performed using the FreeStyle Libre H device (Abbott Diabetes Care, UK), which records interstitial glucose levels at 15-minute intervals for up to 14 days. Laboratory measurements, including metabolic panels, lipid profiles, and renal function tests, were obtained within six months of CGM placement. While previous studies have linked necrotic core to macrovascular events (Xie, Yong, et al. “Clinical outcome of nonculprit plaque ruptures in patients with acute coronary syndrome in the PROSPECT study.” JACC: Cardiovascular Imaging 7.4 (2014): 397-405.), we acknowledge the limitations of the cardiovascular outcomes in the Chinese data set. These outcomes were extracted from medical records rather than standardized diagnostic procedures or imaging studies. To address these concerns, we will expand the Discussion section to clarify the differences in outcome definitions and methodological approaches between the data sets.

      Finally, the simulations at the end are not relevant to the main claims of the paper and we would recommend removing them for the coherence of this manuscript.

      We appreciate the reviewer’s feedback regarding the relevance of the simulation component of our manuscript. The primary contribution of our study goes beyond demonstrating correlations between CGM-derived measures and %NC; it highlights three fundamental components of glycemic patterns-mean, variability, and autocorrelation-and their independent relationships with coronary plaque characteristics.

      The simulations are included to illustrate how glycemic patterns with identical means and variability can have different autocorrelation structures. Because temporal autocorrelation can be conceptually difficult to interpret, these visualizations were intended to provide intuitive examples for the readers.

      However, we recognize the reviewer’s concern about the coherence of the manuscript. In response, we will streamline the simulation section by removing technical simulations that do not directly support our primary conclusions, while retaining only those that enhance understanding of the three glycemic components.

    1. eLife Assessment

      The authors have demonstrated the use of adenine base editors delivered via adeno-associated viruses to introduce edits in the mitochondrial genome. The manuscript describes the methodology well, and the conclusions are convincingly supported by the results. The valuable results highlight the potential of these base editors to model mtDNA variations in somatic tissues in animal models.

    2. Reviewer #1 (Public review):

      Summary:

      This study represents an incremental step toward mitochondrial DNA editing but raises several concerns regarding its impact and broader applicability. The reported in vitro editing efficiency of 17% in mitotic cells, with non-specific editing across multiple A:T sites, offers limited improvement over prior technologies like DdCBE. Editing efficiency for the Mt-Atp6 gene was even lower (~4%), rendering it unlikely to produce functional changes relevant to mitochondrial function or bioenergetics.

      While the modified TadA8e(V28R) mutant alleviated toxicity and enabled sufficient AAV production for in vivo experiments, the low in vivo editing efficiency (~4%) after 4 weeks was disappointing and unlikely to be biologically meaningful. Furthermore, the use of P1 postnatal tissues, which are still developing, raises questions about their suitability as models for postmitotic tissues, especially since the brain - a key organ affected by mitochondrial diseases - was excluded from the analysis.

      Despite demonstrating feasibility for mitochondrial adenine base editing, the study highlights significant limitations, underscoring the need for further optimization. The reviewer also suggests adopting clearer terminology, such as "pathological variant" instead of "mutation," to enhance precision.

      Strengths:

      The study demonstrates the feasibility of adenine base editing in mitochondrial DNA, marking a step forward in expanding mitochondrial genome engineering capabilities. A notable strength is the development of a modified TadA8e(V28R) mutant, which successfully mitigated toxicity and enabled sufficient AAV production for in vivo experiments. This technical advancement addresses a key challenge in mitochondrial gene editing and provides a foundation for improving delivery methods and reducing off-target effects.

      Additionally, the study highlights the potential for targeted mitochondrial DNA modifications using optimized TALEs, achieving A:T to G:C conversions in multiple genes. While the in vitro editing efficiency remains modest, the approach represents an important proof-of-concept for potentially advancing mitochondrial editing technologies, particularly in the context of addressing pathological variants.

      Weaknesses:

      The major weaknesses of the study center around its low editing efficiency, both in vitro and in vivo. In vitro editing achieved only 17% efficiency in mitotic cells, while the efficiency for the Mt-Atp6 gene was even lower, around 4%. This level of editing is unlikely to produce meaningful functional or biological changes, particularly in cells with pathological mtDNA variants. Similarly, in vivo, editing efficiency after a 4-week exposure period remained at approximately 4%, which is insufficient to support claims of effective mitochondrial genome editing. Another significant limitation is the lack of editing specificity, as observed changes occurred at multiple A:T sites within and across the editing window rather than being confined to a single position, raising concerns about precision and off-target effects.

      The use of P1 postnatal mouse tissues also raises questions about the relevance of the model, as these tissues are still undergoing development and may not truly reflect postmitotic states. This casts doubt on whether the findings are transferable to mature tissues, such as the adult brain, which is frequently affected by mitochondrial diseases. Furthermore, the exclusion of brain tissue from the analysis limits the study's applicability to neurological disorders, a key area of mitochondrial disease research. The rationale for excluding brain tissue is not addressed, leaving an important gap in the study's scope.

      The findings also lack novelty, as the reported low efficiency and lack of specificity are consistent with previous studies, making it unclear whether this work represents a significant advancement over existing technologies.

      Collectively, these weaknesses underscore the need for further optimization of the approach, improved targeting specificity, and validation in more relevant models to demonstrate therapeutic potential.

    3. Reviewer #2 (Public review):

      The authors have demonstrated the use of adenine base editors delivered via adeno-associated viruses to introduce edits in the mitochondrial genome. The manuscript describes the methodology well, and the conclusions are aptly supported by the results. It highlights the potential of these base editors to model mtDNA variations in somatic tissues in animal models.

      However, there are a few comments that need to be addressed:

      (1) Limitations of the small sample size need to be explained clearly for the results described.

      (2) It will be beneficial for the readers if some light is shed on the possible reasons why the efficiencies of adenine base editing are lower than those reported for published cytosine base editors to introduce edits in the mitochondrial DNA.

      (3) The conclusion should more explicitly address the limitations and future directions on low editing efficiency and what can be possible optimization steps.

      (4) In Figure 1, A-to-G editing for the genes Mt-Cytb, Mt-CoII, and Mt-Atp6 appears to be strand-specific for the different architectures of adenine base editors. Do authors have a possible hypothesis if one of the strands is more favorable to editing depending on where the TadA8 binds or is it random?

    1. eLife Assessment

      Shah and colleagues take advantage of the presence of maternal and somatic ribosomes in zebrafish and confirm their differential expression during development. The authors convincingly show that ribosomes previously found expressed during oogenesis are also expressed in primordial germ cells and that hybrid maternal and somatic ribosomes are formed during development. The question of ribosome heterogeneity, the expression and function of maternal versus somatically provided ribosomes are of broad interest and this fundamental work sets new directions for future functional studies of this interesting phenomenon.

    2. Reviewer #1 (Public review):

      In all animals, the fertilized egg is transcriptionally silent, and thus early embryonic development relies on maternally deposited factors. A key mode of regulation is translational control to produce the proteins needed by the developing embryo. In zebrafish as well as other animals, distinct ribosomes, those coming from the maternal pool (maternal ribosomes produced in the germ line/oocytes), and those produced from new transcription after genome activation (somatic ribosomes). In zebrafish, the maternal pool consists of a "maternal" rRNA produced from rDNA on chromosome 4, that has previously been shown to be amplified or expressed specifically in the germ line and in oocytes. The observed sex-specific expression of m-rDNA has led to models that it is involved in sex differentiation and/or maternal control of early embryonic development, both as mediators of translation and as a source of raw materials needed to produce new ribosomes. The work to date in the field indicates that maternal and somatic ribosomes are distinct in their expression profiles but whether they have unique, or gene-specific activities awaits determining if somatic rDNA can functionally replace m-rDNA.

      In this manuscript, the authors investigated the expression profiles, protein composition, and ability of maternal and somatic ribosome components to interact with one another and their association with polysomes. This study reports sequence differences between maternal and somatic ribosomal components as well as proteomics and structural analysis of ribosome composition in oocytes and early development. This analysis shows that ribosome subunit composition changes over developmental time but did not uncover evidence suggesting maternal or somatic ribosome-specific ribosomal protein paralog use. The key findings of this work are:<br /> (1) Observation of hybrid ribosomes composed of subunits of maternal and somatic origin in the embryo.<br /> (2) Detection of both maternal and somatic ribosomes in polysomes, indicating maternal and somatic ribosomes both support translation in the embryos and may not be functionally unique.<br /> (3) Persistent expression of m-rRNA in germ cells, suggesting m-ribosomes, as the main ribosome type present, are important for translation in germ cells. The question of ribosome heterogeneity and the function of maternal versus somatic rDNA and ribosomes is of great interest to the broader scientific community. Overall, the manuscript is clearly written and the solid data provided support the main ideas and conclusions.

      Specific points are detailed below.

      (1) In Figure 1D the m-rRNA abundance goes down at 3dpf, then up again while the s-rRNA steadily increases and peaks at 3dpf then drops thereafter. As presented in the graph it is unclear if this up-then-down trend is consistently observed or not. There are bars on the graph for m-rRNA but not for s-rRNA, thus it is unclear how many times this experiment was performed for the s-rRNA or how variable the results were from sample to sample. Beyond this technical point, if the pattern is consistent, this is an interesting observation as it would signal either a shift in rDNA transcription to silence the somatic locus and/or post-transcriptional targeted degradation of the somatic rRNA in germ cells.

      (2) Although qualified by the authors to some extent, the conclusion regarding maternal ribosomes and specificity related to the translation of germ line-specific transcripts is potentially confusing or misleading. Since the maternal form appears to be the only or predominant form of ribosomes in the germ cells at this stage, these would be the only ribosomes available for translation in germ cells. So, any RNA being translated in the germ cells, even RNAs that are not specifically expressed in the germline would be "enriched in association with" and translated by the maternal ribosomes in germ cells. Additional supporting evidence would be required to support the conclusion that the maternal ribosomes are specifically dedicated to the translation of germ cell-specific RNAs, like nanos3, rather than just general translation in germ cells. Consistent with a more general role for the maternal ribosomes in translation in germ cells, differential codon use has been previously documented for the RNAs produced in oocytes (aka maternal RNAs) (for example Bazzini et al EMBO 2016; Mishima and Tomari Mol Cell 2016), and tRNA genes were recently reported by Wilson and Postlethwait to reside along with the maternal 5S genes and maternal-specific spliceosome components in the region of chromosome 4 that is differentially activated in oocytes and testis (region 2 coding genes are silenced in the ovary but maternal ribosome-related genes are expressed in the ovary; region 4 contains the maternal 45S gene). Further, some of the authors of this manuscript undergo a shift in tRNA repertoire and a change in iso-decoder expression at the onset of gastrulation (Rappol et al, Nucleic Acids Research 2024). Technical limitations pose challenges to definitively testing the hypothesis, but it would be helpful to place the findings here in the context of the published work.

      (3) "An alternate and non-exclusive hypothesis is that the maternal rDNA locus may be involved in PGC fate and sex determination in zebrafish." It would be helpful to further discuss the published evidence supporting this hypothesis. In accord with a potential role for m-rDNA in ovary differentiation, differential methylation of m-rDNA has been previously reported, with high methylation in testis and low methylation in ovaries. Further, several groups have shown that treating fish with broad inhibitors of methyltransferases causes testis-biased differentiation of the gonad. Finally, Moser et al (Philosophical Transactions of the Royal Society B 2024) recently published work in which CRISPR-Cas9 was used to target the 45S m-rDNA promoter and interfere with its expression. The mutants with these promoter mutations developed as fertile males, consistent with a role for m-rDNA in ovary differentiation. A recent paper from Moser et. al. (Philosophical Transactions of the Royal Society B 2024) showing that disrupting the m-rDNA locus leads to male-only development should be discussed. This paper does not exclude the possibility of a maternal role for the ribosomes since only one female was recovered among the 45S-m-rDNA mutants. The expression data in Figure 1D of this manuscript showing that m-rRNA levels go down and then up in PGCs indicates the PGCs are making their own m-rRNA. This observation together with the recovery of fertile males reported in the Moser et al study (Philosophical Transactions of the Royal Society B 2024) doesn't seem to support a requirement for m-rDNA in PGC fate or germ cell-specific translation, at least in testis, since the mutant males produce sperm and are fertile.

      (4) Although the rationale for examining rRNAs in adult tumors, cultured zebrafish cell lines, and during fin regeneration is clear based on the published literature showing elevated embryonic rRNAs, this line of investigation doesn't add much to this study and is a bit of a distraction. That said, the observation that in contrast to published work, neither the maternal (early embryo) nor the specific rRNAs examined are unregulated in these contexts is important and warrants communication with the research community.

      (5) The numbers of embryos and stages are not consistently stated in the manuscript. For example, in the "Isolation of zebrafish ribosome." and "isolation of monosomes" sections of the methods, the stage and number of embryos used for the IPs are not clearly stated in the methods. These important details should be stated throughout the manuscript so that others can perform future studies in a manner that will facilitate comparisons.

      (6) The terminology used for the RiboFLAG experiments is potentially confusing or misleading. Specifically, different terms are used to describe the source of the ribosomes (Figure 5, Figure S7, Figure S8 and in the text). For example, "transmission" is used to describe "maternal transmission" for Mat-RiboFLAG, and "paternal transmission" is used for Som-RiboFLAG, and in Figure 5 and Figure S8 "maternally provided" and "paternally provided" are used. However, these terms may be confusing or unintentionally misleading because transmission and provided refer to two different things. In the case of Mat-RiboFLAG, the terms refer to the maternal Rpl10-FLAG ribosomes, which the progeny receive from their mother independent of whether or not they express the transgene. On the other hand, for Som-RiboFLAG, the terms refer to the transgene rather than the Rpl10-FLAG ribosomes that will be produced by the embryo using the transgene they inherited from their father. Consider instead sticking to "maternal" and "somatic", or alternatively "zygotic expression" and "maternal expression" or "zygotic ribosomes" and "maternal ribosomes".

    3. Reviewer #2 (Public review):

      Summary:

      The study expands previous knowledge on the dual ribosome system in zebrafish by demonstrating the expression of maternal ribosomes in the primordial germ cells as well as the formation of hybrid ribosomes combining subunits of maternal and somatic ribosomes. Although the distinction between the two types is clear at the rRNA level, this is not paralleled at the protein level. An attempt to associate the expression of germ-line-specific transcripts to maternal ribosomes remains inconclusive. Thus, evidence for the functional specialisation of ribosomes in this system is still lacking.

      Strengths:

      The experiments are well-conducted and the main conclusions are well-supported.

      Weaknesses:

      The attempt to take advantage of the system to provide an example of functional ribosome specialisation is justified and the expression of maternal-type ribosomes in the germ line may still be key to understanding the expression of classes of mRNA. However, an alternative possibility related to genome evolution and sex determination is equally relevant.

      Assessment following the structure of the manuscript:

      Shah et al.: "A dual ribosomal system in zebrafish soma and germline"

      The zebrafish dual ribosome system is attractive because it offers a favourable setting to look for ribosome specialization and my impression is that this is exactly what the authors set out to do rather than to try to understand why zebrafish have this unusual setup. If this is correct, the title and the abstract should better reflect the authors' aim and main results. The title suggests to the non-specialist that the dual ribosome system is a novel find which obviously is not the case.

      I was a bit confused when reading the introduction. In the first paragraph, it was unclear to me if the degradation of maternal ribosomes is an active process different from normal turnover. I also found the third paragraph slightly out of tune with the discussion section. The dual ribosome setting at the level of ribosomal RNA genes represents an extreme case of sequence heterogeneity and appears to be sporadic in nature in that it only is reported from Plasmodium and zebrafish. The Xenopus example is 5S rRNA (as also mentioned in the discussion section), and the Drosophila example is protein composition, only. If a broader view of ribosome types is intended, there will be more examples, e.g. Trypanosomes that express different stage-dependent ribosomes at the level of rRNA modifications. The occurrence of dual ribosomes in fish should be placed in context with insight from other fish genomes, e.g. Medaka, which has only one type of ribosomes. Also, the duality in zebrafish is not restricted to ribosomes, but also comprises two types of spliceosomes. These observations suggest that the phenomenon should be investigated in the context of genome evolution. This is appropriately brought up in the discussion section, but I believe it would serve the reading of the manuscript if this was made clear from the beginning. With respect to the structural aspects, I am puzzled why one of the few other papers studying this system, Ramachandran et al. RNA 2020 (PMID: 32912962) is not referenced. This paper is focused on ribose methylation of the two types of ribosomal RNA and should be relevant to several aspects of the present study.

      The manuscript reports three novel and important findings. First, the maternal-type ribosomes are expressed in PGCs, where they furthermore are shown to translate germ line-specific transcripts, and in the male germ line. Regardless, the authors wisely decide to maintain the classical terminology of maternal and somatic ribosomes. Second, both types of ribosomes are polysome-associated and thus translationally active at 24 hpf when they are found in equal amounts. An elaborate experiment shows that hybrid ribosomes are formed at this stage. Finally, a RIP experiment fails to show selectivity in ribosomal recruitment of a germ line-specific mRNA based on the nanos3 3´-UTR. There are several other results, but these are mainly confirmatory or negative, albeit of good quality and important to communicate.

      The part of the study that describes differences in protein composition is a bit difficult to follow, partly because of the complexity of the results, and partly because of the disappointment that no parallel changes in proteins to the clear differences in rRNA were observed. Except for the discussion of eS8 in relation to subunit bridging, it is purely descriptive. There is quite a literature on paralog expression (e.g. in yeast and humans) and perhaps it would be possible to relate to the literature in a way that could provide more meaning to the observations. From the M&M section, it appears that the proteomics data were already published in the Leesch and Lorenzo-Orts et al. paper (Nature 2023). They are here found in Table S1 which is presented in a minimal fashion, from which it is time-consuming to extract meaningful information, e.g. on how stringently the ribosomes were prepared.

      The hybrid-ribosome observation is convincing, but additional information on the choice of cycloheximide concentration would be helpful to rule out other interpretations.

      The experiment on translation of primordial germ cell-specific transcripts by maternal ribosomes is a key experiment. Unfortunately, the experiment failed to show selectivity compared to somatic ribosomes, and in my reading, the promise in the abstract of "preferential association" is not quite justified. More importantly, this experiment is not exhaustive, and a more elaborate discussion on the limitations of the experiment and other approaches would be helpful.

      The discussion section is interesting. Importantly, the authors make the non-specialist aware of the peculiarities of laboratory strains of zebrafish with respect to the lack of sex chromosomes and a possible connection between the rDNA locus and sex determination. This information is critical to include in a journal that has a broad readership. I was unable to follow the argument about the 3´half of 5.8S "to play a role" in ribosome degradation based on Locati et al., 2018 (which is missing from the reference list) and "serve as a target for degradation of maternal ribosomes". Kinetic effects on the degradation pattern of rRNA are frequently observed and difficult to interpret.

    4. Reviewer #3 (Public review):

      Summary:

      Ribosomes are generally considered homogeneous complexes with no inherent role in regulating translation. However, recent studies have found heterogeneity in the composition of ribosome accessory factors, proteins, and ribosomal RNA. Moreover, there is evidence that district ribosomal isoforms are produced at different developmental stages in Xenopus, Drosophila, and zebrafish. In Drosophila, germline-derived ribosomes have a different protein composition to those produced by somatic cell types. In zebrafish, germline vs. somatic ribosomes have been shown to incorporate distinct rRNA isoforms. However, the functional significance of ribosome heterogeneity is not known.

      The manuscript by Shah et al., uses the power of the zebrafish to test the hypothesis that maternal ribosome isoforms have a distinct function relative to ribosome isoforms produced by somatic cells after the maternal-to-zygotic transition (MTZ). They confirm previous findings that all maternal rRNA are derived from the maternal-specific rRNA locus on Chromosome 4. Additionally, proteomic analysis showed that maternal and somatic ribosomes also differ in protein composition. Using ribosome tagging experiments they showed that maternally derived subunits can form functional heteroduplexes (hybrids) with somatic-derived subunits. Finally, they show that maternal-derived ribosomes continue to be expressed in germ cells where they preferentially associate with the maternally derived and germline localized nanos3 mRNA. This suggests a possible role of maternal ribosomes in germ cell-specific translational regulation.

      Strengths:

      The authors use the experimental power of zebrafish to test the hypothesis that maternal and somatic-derived ribosomes have distinct functions. They use state-of-the art proteomics, molecular modeling, and transgenesis techniques. For the most part, the data presented is clear and supports their conclusions.

      Weaknesses:

      Using pulldown experiments they show that maternal ribosomes associate with the PGC-enriched nanos3 RNA, suggesting a role for the maternal isoform in germline-specific translation. However, they acknowledge that the level of enrichment is similar to the level of maternal vs. somatic isoforms that localize to PGCs. The nanos3 mRNA is unique in that it is actively degraded in somatic cells shortly after MTZ so is never present in cells that express the somatic isoforms. Therefore, the association of nanos3 with maternal ribosomes shows that these ribosomes can associate with germline-specific RNAs, but does not provide compelling evidence for a maternal isoform-specific role in translational regulation.

    1. eLife Assessment

      The manuscript presents a useful analysis of the relationship between climate variables and malaria incidence, for local temperature and rainfall and the global climate driver of ENSO from 2008 to 2019 in a lowland region of East Africa, with wavelet analyses and linear regressions after time series decomposition. The paper is convincing albeit not novel in its application of wavelets to the analysis of this type of time series data for a vector-borne infection. It is less persuasive on what is learned about the role of climate variability (non-seasonal climate effects), and it is also unclear how the analysis informs climate change and malaria, and this motivation for the work is not warranted as it pertains to longer time scales than those considered. The work should be better placed in the context of what is known for malaria in East Africa and in different transmission settings.

    2. Reviewer #1 (Public review):

      Summary:

      This study investigates the relationship between climate variables and malaria incidence from monthly records, for rainfall, temperature, and a measure of ENSO, in a lowland region of Kenya in East Africa. Wavelet analyses show significant variability at the seasonal scale at the 6-month scale with some variation in its signal over time, and some additional variability at the 12-month scale for some variables. As conducted, the analyses show weak (non-significant) signals at the interannual time scales (longer than seasonal). Cross-wavelet analysis also highlights the 6-month scale and the association of malaria and climate variables at that scale, with some signal at 12 months, reflecting the role of climate in seasonality. Evidence is presented for some small changes in the lags of the response of malaria to the seasonal climate drivers over time.

      Strengths:

      Although there have been many studies of climate drivers of malaria dynamics in East Africa, these analyses have been largely focused on highlands where these drivers are expected to exhibit the strongest signal of association with disease burden at interannual and longer time scales. It is therefore of interest to take advantage of a relatively long time series of cases to examine the role of climate variables in more endemic malaria in lowlands.

      Weaknesses:

      (1) Major comments:

      The work is not sufficiently placed in the context of what is known about climate variability in East Africa, and the role of climate variables in the temporal variation of malaria cases in this region. This context includes the relationship between large (global/regional) drivers of interannual climate variability such as ENSO (and the Indian Ocean Dipole) and local temporal patterns in rainfall and temperature. There is for example literature on the influence of those drivers and the short and long rains in East Africa. That is, phenomena such as ENSO would influence malaria through those local climate variables. This context should be considered when formulating and interpreting the analyses.

      There are conceptual problems with the design of the analyses which can limit the findings on association. It is not surprising that rainfall would exhibit a clear association at seasonal scales. It is nevertheless valuable to confirm this as the authors have done and to examine the faster than 12-month scale, given the typical pattern of two rainfall seasons in this area. However, the results on temperature are less clear. If rainfall is the main limiting factor for the transmission season, the temperature variation that would matter can be during the rainy periods. One would then see an association with temperature only in particular windows of time during the year, when rainfall is sufficient (see for example, Rodo et al. Nat. Commun. 2022, for this finding in a highland region of Ethiopia). For this situation, there would be no clear association with temperature when all months are considered, and one would not find a significant relationship (or a lagged one) between peak times in this climate factor and malaria's seasonal cases. It would be difficult for the wavelet analysis to reveal such an effect. Another consideration is whether to use an ENSO variable that includes seasonality or to use an ENSO index computed as an anomaly, to focus on interannual variability. That is, it is most relevant to consider how ENSO influences time scales of variation longer than seasonal (the multiannual variation in seasonal epidemics) and for this purpose, one would typically rely on an anomaly. This choice would better enable one to see whether there is a role of ENSO at interannual time scales. It would also make sense to analyze with cross-wavelets the effect of ENSO on local climate factors, temperature, and rainfall, and not only on malaria. This would allow us to establish evidence for a chain of causality, from a global driver of interannual variability to local climate variability to malaria incidence.

      The multiresolution analysis and associated analysis of lag variations were confusing and difficult to follow as presented: (1) the lags chosen by the multiresolution analysis do not match the phase differences of the cross-wavelet analysis if I followed what was presented. On page 8, phase differences are expressed in months. I do not understand then the following statements on page 9: "The phase differences obtained by the cross-wavelet transforms were turned into lags, allowing us to plot the evolution of the lags over time". The resulting lags in Figure 6 are shorter than the phase differences provided in the text on page 8. (2) The phase difference of the cross-wavelet analyses for malaria and temperature is also too long for this climate factor to explain an effect on the vector and then on the disease. (3) In Table 3, the regression results that are highlighted are those for Land Surface Temperatures (LST) and ENSO, with a weak but significant negative linear correlation, and for LST and bednet coverage, and this is considered part of the lag analysis. The previous text and analyses up to that point do not seem to consider the relationship of ENSO and local climate variables, or that between local climate variables and bednets (which would benefit from some context for the causal pathways this would reflect).

      The conclusion in the Abstract: "Our study underlines the importance of considering long-term time scales when assessing malaria dynamics. The presented wavelet approach could be applicable to other infectious diseases" needs to be reformulated. The use of "long-term" time scales for those of ENSO and interannual variability is not consistent with the climate literature, where long-term could be interpreted as decadal and longer. The time scales beyond those of seasonality, especially those of climate variability, have been addressed in many malaria studies. It is not compelling to have the significance of this study be the importance of considering those time scales. This is not new. I recommend focusing on what has been done for lowland malaria and endemic regions (for example, in Laneri et al. PNAS 2015) as there has been less work for those regions than for seasonal epidemic ones of low transmission (e.g. altitude fringes and desert ones, e.g. Laneri et al. PloS Comp. Biol. 2010; Roy et al. Mal. J. 2015). Also, wavelet analyses have been used extensively by now to consider the association of climate variables and infectious diseases at multiple time scales. There is here an additional component of the analysis but the decomposition that underlies the linear regressions is also not that new, as decompositions of time series have been used before in this area. In summary, I recommend a more appropriate and compelling conclusion on what was learned about malaria at this location and what it may tell us about other, similar, locations, but not malaria dynamics everywhere.

      The conversion from monthly cases to monthly incidence needs a better explanation of the Methods, rather than a referral to another paper. This is a key aspect of the data. It may be useful to plot the monthly time series of both variables in the Supplement, for comparison.

      There is plenty of evidence of the seasonal role of rainfall on malaria's seasonality in many regions. The literature cited here to support this well-known association is quite limited. It would be useful to provide a context that better reflects the literature and some context for the environmental conditions of this lowland region that would explain the dominant role of rainfall on malaria seasonality. Two papers (from 2017 and 2019) are cited in the second paragraph of the introduction as showing that "key climatic factors are rainfall and temperatures". This is a misrepresentation of the field. That these factors matter to malaria in general has been known for a very long time given that the vectors are mosquitoes, and the cited studies are particular ones that examine the mechanistic basis of this link for modeling purposes. Either these papers are presented as examples, with a more accurate description of what they add to the earlier literature or earlier literature should be acknowledged. Also, what has been much less studied is the role of these variables at interannual time scales, as potentially mediating the effects of global drivers in teleconnections.

      (2) Minor comments:

      In relation to the conceptual issues raised above, it would be valuable to consider whether the negative association with temperature persists if one considers mean temperature during the rainy seasons only, against the total cases in the transmission season each year (as in Rodó et al. 2021). This would allow one to disentangle whether the negative association reflects a robust result or an artifact of an interaction between temperature and rainfall so that the former matters when the latter is permissive for transmission.

      The conclusion in the Discussion " This suggests that minor climate variations have a limited impact on malaria incidence at shorter time scales, whereas climatic trends may play a more substantial role in shaping long-term malaria dynamics" is unsubstantiated. There is no clear result in the paper on climatic trends that I can see.

      The Abstract writes: "The true impact of climate change...". This paper is not about climate change but about climate seasonality and variability. This text needs to be changed to make it consistent with the content of the paper.

      Page 2, Introduction: The statement on Pascual et al. 2008 is not completely accurate. This paper shows an interplay of climate variability and disease dynamics, but not cycles that are completely independent of climate.

      Page 2, next sentence: "More recently, such cycles have been attributed to global climate drivers such as ENSO (Cazelles et al., 2023)". This writing is also somewhat unclear. Are you referring to the cycles for the same location in Kenya? Or generically, to the interannual variability of malaria?

      There are multiple places in the writing that could be edited.

    3. Reviewer #2 (Public review):

      Summary:

      The analyses of long-time malaria series to investigate the complex relationship between malaria incidence and climate is hampered by the non-stationarity introduced by both changing control interventions and irregular climate events such as the el nino Southern Oscillation (ENSO).

      Strengths:

      By applying wavelets the authors were able to investigate the effect of the major climate factors such as rainfall, air and land temperature, and sea surface temperature (as a measure for ENSO) while at the same time taking into account changing bednet coverage. The wavelet approach is both flexible and powerful and was able to demonstrate well that shorter term. seasonal fluctuation in malaria incidence in Western Kenya is driven by rainfall patterns, while providing some evidence for temperature and SST may predict fluctuations at longer timescales.

      Weaknesses:

      While flexible and able to deal with non-stationarity, the wavelet approach does not really allow investigation of multiple factors at the same time but is limited to uni- and bivariate analyses. This limits the interpretability of the effect of complex climate patterns while also 'adjusting' for the changing control environment. There is also some concern that the choice of the wavelet and transforms used for different analyses (Morelet, Coiflet, maximal overlap discreet transform) may affect the results. The reasons for choosing these particular wavelets and transforms are not always evident.

      The attempt to investigate the effect of longer terms / irregular period climate events is laudable. However, why were the analyses restricted to only ENSO (measured as SST)? Other climate factors such as e.g. the Indian Ocean Dipole (i.e. the difference in SST between the western and eastern Indian Ocean) are also known to affect climate and rainfall patterns in Eastern Africa.

      Nevertheless, this work is a compelling demonstration of the utility of wavelets for the analyses of (non-stationary) epidemiological time series data.

    1. eLife Assessment

      This work derives a valuable general theory unifying theories of efficient information transmission in the brain with population homeostasis. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters. Applying this theory to the primary visual cortex, the authors present solid evidence that accounts for stimulus-specific and neuron-specific adaptation.

    2. Reviewer #1 (Public review):

      This work derives a general theory of optimal gain modulation in neural populations. It demonstrates that population homeostasis is a consequence of optimal modulation for information maximization with noisy neurons. The developed theory is then applied to the distributed distributional code (DDC) model of the primary visual cortex to demonstrate that homeostatic DDCs can account for stimulus-specific adaptation.

      What I consider to be the most important contribution of this work is the unification of efficient information transmission in neural populations with population homeostasis. The former is an established theoretical framework, and the latter is a well-known empirical phenomenon - the relationship between them has never been fully clarified. I consider this work to be an interesting and relevant step in that direction.

      The theory proposed in the paper is rigorous and the analysis is thorough. The manuscript begins with a general mathematical setting to identify normative solutions to the problem of information maximization. It then gradually builds towards questions about approximate solutions, neural implementation and plausibility of these solutions, applications of the theory to specific models of neural computation (DDC), and finally comparisons to experimental data in V1. Such a connection of different levels of abstraction is an obvious strength of this work.

      Overall I find this contribution interesting and assess it positively. At the same time, I have three major points of criticism, which I believe the authors should address. I list them below, followed by a number of more specific comments and feedback.

      Major comments:

      (1) Interpretation of key results and relationship between different parts of the manuscript. The manuscript begins with an information-transmission ansatz which is described as "independent of the computational goal" (e.g. p. 17). While information theory indeed is not concerned with what quantity is being encoded (e.g. whether it is sensory periphery or hippocampus), the goal of the studied system is to *transmit* the largest amount of bits about the input in the presence of noise. In my view, this does not make the proposed framework "independent of the computational goal". Furthermore, the derived theory is then applied to a DDC model which proposes a very specific solution to inference problems. The relationship between information transmission and inference is deep and nuanced. Because the writing is very dense, it is quite hard to understand how the information transmission framework developed in the first part applies to the inference problem. How does the neural coding diagram in Figure 3 map onto the inference diagram in Figure 10? How does the problem of information transmission under constraints from the first part of the manuscript become an inference problem with DDCs? I am certain that authors have good answers to these questions - but they should be explained much better.

      (2) Clarity of writing for an interdisciplinary audience. I do not believe that in its current form, the manuscript is accessible to a broader, interdisciplinary audience such as eLife readers. The writing is very dense and technical, which I believe unnecessarily obscures the key results of this study.

      (3) Positioning within the context of the field and relationship to prior work. While the proposed theory is interesting and timely, the manuscript omits multiple closely related results which in my view should be discussed in relationship to the current work. In particular:

      A number of recent studies propose normative criteria for gain modulation in populations:

      - Duong, L., Simoncelli, E., Chklovskii, D. and Lipshutz, D., 2024. Adaptive whitening with fast gain modulation and slow synaptic plasticity. Advances in Neural Information Processing Systems<br /> - Tring, E., Dipoppa, M. and Ringach, D.L., 2023. A power law describes the magnitude of adaptation in neural populations of primary visual cortex. Nature Communications, 14(1), p.8366.<br /> - Młynarski, W. and Tkačik, G., 2022. Efficient coding theory of dynamic attentional modulation. PLoS Biology<br /> - Haimerl, C., Ruff, D.A., Cohen, M.R., Savin, C. and Simoncelli, E.P., 2023. Targeted V1 co-modulation supports task-adaptive sensory decisions. Nature Communications<br /> - The Ganguli and Simoncelli framework has been extended to a multivariate case and analyzed for a generalized class of error measures:<br /> - Yerxa, T.E., Kee, E., DeWeese, M.R. and Cooper, E.A., 2020. Efficient sensory coding of multidimensional stimuli. PLoS Computational Biology<br /> - Wang, Z., Stocker, A.A. and Lee, D.D., 2016. Efficient neural codes that minimize LP reconstruction error. Neural Computation, 28(12),

      More detailed comments and feedback:

      (1) I believe that this work offers the possibility to address an important question about novelty responses in the cortex (e.g. Homann et al, 2021 PNAS). Are they encoding novelty per-se, or are they inefficient responses of a not-yet-adapted population? Perhaps it's worth speculating about.

      (2) Clustering in populations - typically in efficient coding studies, tuning curve distributions are a consequence of input statistics, constraints, and optimality criteria. Here the authors introduce randomly perturbed curves for each cluster - how to interpret that in light of the efficient coding theory? This links to a more general aspect of this work - it does not specify how to find optimal tuning curves, just how to modulate them (already addressed in the discussion).

      (3) Figure 8 - where do Hz come from as physical units? As I understand there are no physical units in simulations.

      (4) Inference with DDCs in changing environments. To perform efficient inference in a dynamically changing environment (as considered here), an ideal observer needs some form of posterior-prior updating. Where does that enter here?

      (5) Page 6 - "We did this in such a way that, for all ν, the correlation matrices, ρ(ν), were derived from covariance matrices with a 1/n power-law eigenspectrum (i.e., the ranked eigenvalues of the covariance matrix fall off inversely with their rank), in line with the findings of Stringer et al. (2019) in the primary visual cortex." This is a very specific assumption, taken from a study of a specific brain region - how does it relate to the generality of the approach?

    3. Reviewer #2 (Public review):

      Summary:

      Using the theory of efficient coding, the authors study how neural gains may be adjusted to optimize coding by noisy neural populations while minimizing metabolic costs. The manuscript first presents mathematical results for the general case where the computational goals of the neural population are not specified (the computation is implicit in the assumed tuning curves) and then develops the theory for a specific probabilistic coding scheme. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters, and the specific application further captures stimulus-specific and neuron-specific adaptation in the visual cortex.

      The mathematical derivations, simulations, and application to visual cortex data are solid as far as I can tell.

      In the current format, the significance is difficult to assess fully: the manuscript is a bit sprawling, in the first half the general theory is lengthy and technical, and then in the second half a few phenomena are addressed without a clear relation between them (rate homeostasis, rate heterogeneity, synaptic homeostasis, V1 adaptation, divisive normalization), requiring several ad-hoc choices and assumptions.

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough as far as I can tell. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in the cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed close to the optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific and neuron-specific adaptation in V1.

      Weaknesses:

      The novelty and significance of the work are not presented clearly. The relation to other theoretical work, particularly Ganguli and Simoncelli and other efficient coding theories, is explained in the Discussion but perhaps would be better placed in the Introduction, to motivate some of the many choices of the mathematical models used here.

      The manuscript is very hard to read as is, it almost feels like this could be two different papers. The first half seems like a standalone document, detailing the general theory with interesting results on homeostasis and optimal coding. The second half, from Section 2.7 on, presents a series of specific applications that appear somewhat disconnected, are not very clearly motivated nor pursued in-depth, and require ad-hoc assumptions.

      For instance, it is unclear if the main significant finding is the role of homeostasis in the general theory or the demonstration that homeostatic DDC with Bayes Ratio coding captures V1 adaptation phenomena. It would be helpful to clarify if this is being proposed as a new/better computational model of V1 compared to other existing models.

      Early on in the manuscript (Section 2.1), the theory is presented as general in terms of the stimulus dimensionality and brain area, but then it is only demonstrated for orientation coding in V1.

      The manuscript relies on a specific response noise model, with arbitrary tuning curves. Using a population model with arbitrary tuning curves and noise covariance matrix, as the basis for a study of coding optimality, is problematic because not all combinations of tuning curves and covariances are achievable by neural circuits (e.g. https://pubmed.ncbi.nlm.nih.gov/27145916/ )

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the 'adapter' is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here.

    1. eLife Assessment

      This fundamental study provides compelling evidence that TRPV4 plays a crucial role in mechanical sensing during cancer cell transition from non-invasive to invasive states, and offers novel insights into metastasis. By employing multiple experimental approaches, including pharmacological and genetic manipulation, as well as advanced imaging techniques, the authors demonstrate a strong correlation between TRPV4 dynamics, calcium homeostasis, and cell volume plasticity. The findings significantly enhance our understanding of mechanotransduction in cancer and present TRPV4 as a promising therapeutic target for inhibiting metastasis.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Bu et al examined the dynamics of TRPV4 channel in cell overcrowding in carcinoma conditions. They investigated how cell crowding (or high cell confluence) triggers a mechano-transduction pathway involving TRPV4 channels in high-grade ductal carcinoma in situ (DCIS) cells that leads to large cell volume reduction (or cell volume plasticity) and pro-invasive phenotype.

      In vitro, this pathway is highly selective for highly malignant invasive cell lines derived from a normal breast epithelial cell line (MCF10CA) compared to the parent cell line, but not present in another triple-negative invasive breast epithelial cell line (MDA-MB-231). The authors convincingly showed that enhanced TRPV4 plasmamembrane localization correlates with high-grade DCIS cells in patient tissue samples. Specifically in invasive MCF10DCIS.com cells they showed that overcrowding or over-confluence leads to a decrease in cell volume and intracellular calcium levels. This condition also triggers the trafficking of TRPV4 channels from intracellular stores (nucleus and potentially endosomes), to the plasma membrane (PM). When these over-confluent cells are incubated with a TRPV4 activator, there is an acute and substantial influx of calcium, attesting the fact that there are high number of TRPV4 channels present on the PM. Long-term incubation of these over-confluent cells with the TRPV4 activator results in the internalization of the PM-localized TRPV4 channels.

      In contrast, cells plated at lower confluence primarily have TRPV4 channels localized in the nucleus and cytosol. Long-term incubation of these cells at lower confluence with a TRPV4 inhibitor leads to the relocation of TRPV4 channels to the plasma membrane from intracellular stores and a subsequent reduction in cell volume. Similarly, incubation of these cells at low confluence with PEG 3000 (a hyperosmotic agent) promotes the trafficking of TRPV4 channels from intracellular stores to the plasma membrane.

      Strengths:

      The study is elegantly designed and the findings are novel. Their findings on this mechano-transduction pathway involving TRPV4 channels, calcium homeostasis, cell volume plasticity, motility and invasiveness will have a great impact in the cancer field and potentially applicable to other fields as well. Experiments are well-planned and executed, and the data is convincing. Authors investigated TRVP4 dynamics using multiple different strategies- overcrowding, hyperosmotic stress, pharmacological and genetic means, and showed a good correlation between different phenomena.

    3. Reviewer #2 (Public review):

      The metastasis poses a significant challenge in cancer treatment. During the transition from non-invasive cells to invasive metastasis cells, cancer cells usually experience mechanical stress due to a crowded cellular environment. The molecular mechanisms underlying mechanical signaling during this transition remain largely elusive. In this work, the authors utilize an in vitro cell culture system and advanced imaging techniques to investigate how non-invasive and invasive cells respond to cell crowding, respectively.

      The results clearly show that pre-malignant cells exhibit a more pronounced reduction in cell volume and are more prone to spreading compared to non-invasive cells. Furthermore, the study identifies that TRPV4, a calcium channel, relocates to the plasma membrane both in vitro and in vivo (patient's samples). Activation and inhibition of TRPV4 channel can modulate the cell volume and cell mobility. These results unveil a novel mechanism of mechanical sensing in cancer cells, potentially offering new avenues for therapeutic intervention targeting cancer metastasis by modulating TRPV4 activity. This is a very comprehensive study, and the data presented in the paper are clear and convincing. The study represents a very important advance in our understanding of the mechanical biology of cancer.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this study, Bu et al examined the dynamics of TRPV4 channel in cell overcrowding in carcinoma conditions. They investigated how cell crowding (or high cell confluence) triggers a mechano-transduction pathway involving TRPV4 channels in high-grade ductal carcinoma in situ (DCIS) cells that leads to large cell volume reduction (or cell volume plasticity) and proinvasive phenotype.

      In vitro, this pathway is highly selective for highly malignant invasive cell lines derived from a normal breast epithelial cell line (MCF10CA) compared to the parent cell line, but not present in another triple-negative invasive breast epithelial cell line (MDA-MB-231). The authors convincingly showed that enhanced TRPV4 plasmamembrane localization correlates with highgrade DCIS cells in patient tissue samples. Specifically in invasive MCF10DCIS.com cells they showed that overcrowding or over-confluence leads to a decrease in cell volume and intracellular calcium levels. This condition also triggers the trafficking of TRPV4 channels from intracellular stores (nucleus and potentially endosomes), to the plasma membrane (PM). When these over-confluent cells are incubated with a TRPV4 activator, there is an acute and substantial influx of calcium, attesting the fact that there are high number of TRPV4 channels present on the PM. Long-term incubation of these over-confluent cells with the TRPV4 activator results in the internalization of the PM-localized TRPV4 channels.

      In contrast, cells plated at lower confluence primarily have TRPV4 channels localized in the nucleus and cytosol. Long-term incubation of these cells at lower confluence with a TRPV4 inhibitor leads to the relocation of TRPV4 channels to the plasma membrane from intracellular stores and a subsequent reduction in cell volume. Similarly, incubation of these cells at low confluence with PEG 3000 (a hyperosmotic agent) promotes the trafficking of TRPV4 channels from intracellular stores to the plasma membrane.

      Strengths:

      The study is elegantly designed and the findings are novel. Their findings on this mechanotransduction pathway involving TRPV4 channels, calcium homeostasis, cell volume plasticity, motility and invasiveness will have a great impact in the cancer field and potentially applicable to other fields as well. Experiments are well-planned and executed, and the data is convincing. Authors investigated TRVP4 dynamics using multiple different strategies- overcrowding, hyperosmotic stress, pharmacological and genetic means, and showed a good correlation between different phenomena.

      All of my previous concerns have been addressed. The quality of the manuscript has improved significantly.

      We are deeply grateful to the reviewer for their thoughtful assessment and invaluable suggestions, including crucial additional experiments and more effective presentation and description of our findings, which have greatly enhanced the quality of our manuscript.

      Reviewer #2 (Public review):

      Summary:

      The metastasis poses a significant challenge in cancer treatment. During the transition from non-invasive cells to invasive metastasis cells, cancer cells usually experience mechanical stress due to a crowded cellular environment. The molecular mechanisms underlying mechanical signaling during this transition remain largely elusive. In this work, the authors utilize an in vitro cell culture system and advanced imaging techniques to investigate how non-invasive and invasive cells respond to cell crowding, respectively.

      The results clearly show that pre-malignant cells exhibit a more pronounced reduction in cell volume and are more prone to spreading compared to non-invasive cells. Furthermore, the study identifies that TRPV4, a calcium channel, relocates to the plasma membrane both in vitro and in vivo (patient's samples). Activation and inhibition of TRPV4 channel can modulate the cell volume and cell mobility. These results unveil a novel mechanism of mechanical sensing in cancer cells, potentially offering new avenues for therapeutic intervention targeting cancer metastasis by modulating TRPV4 activity. This is a very comprehensive study, and the data presented in the paper are clear and convincing. The study represents a very important advance in our understanding of the mechanical biology of cancer.

      We sincerely appreciate the reviewer’s insightful evaluation and invaluable recommendations for key additional experiments, which have significantly strengthened our manuscript.

    1. eLife Assessment

      This important study explores the interplay between gene dosage and gene mutations in the evolution of antibiotic resistance. The authors provide compelling evidence connecting proteostasis with gene duplication during experimental evolution in a model system. This paper is likely to be of interest to researchers studying antibiotic resistance, proteostasis, and bacterial evolution.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Jena et al. addresses important questions on the fundamental mechanisms of genetic adaptation, specifically, does adaptation proceed via changes of copy number (gene duplication and amplification "GDA") or by point mutation. While this question has been worked on (for example by Tomanek and Guet) the authors add several important aspects relating to resistance against antibiotics and they clarify the ability of Lon protease to reduce duplication formation (previous work was more indirect).

      A key finding Jena et al. present is that point mutations after significant competition displace GDA. A second one is that alternative GDA constantly arise and displace each other (see work on GDA-2 in Figure 3). Finally, the authors found epistasis between resistance allele that was contingent on lon. Together this shows an intricate interplay of lon proteolysis for the evolution and maintenance of antibiotic resistance by gene duplication.

      Strengths:

      The study has several important strengths: (i) the work on GDA stability and competition of GDA with point mutations is a very promising area of research and the authors contribute new aspects to it, (ii) rigorous experimentation, (iii) very clearly written introduction and discussion sections. To me, the best part of the data is that deletion of lon stimulates GDA, which has not been shown with such clarity until now.

      Weaknesses:

      Previously raised minor weaknesses and technical questions have been adequately resolved in the revised manuscript. As the experiments and their results are described in great detail the interested reader needs stamina. The details will, however, be informative to the specialist.

    3. Reviewer #3 (Public review):

      Summary:

      This is an important paper that investigates the relationship between proteolytic stability of an antibiotic target enzyme and the evolution of antibiotic resistance via increased gene copy number. The target of the antibiotic trimethoprim is dihydrofolate reductase (DHFR). In Escherichia coli, DHFR is encoded by folA and the major proteolysis housekeeping protease is Lon (lon). In this manuscript, the authors report the result of the experimental evolution of a lon mutant strain of E. coli in response to sub-inhibitory concentrations of the antibiotic trimethoprim then investigate the relationship between proteolytic stability of DHFR mutants and the evolution of folA gene duplication. After 25 generations of serial passaging in a fixed concentration of trimethoprim, the authors found that folA duplication events were more common during evolution of the lon strain, than the wt strain. However, with continued passaging, some folA duplications were replaced by a single copy of folA containing a trimethoprim resistance-conferring point mutation. Interestingly, evolution of the lon strain in the setting of increasing concentrations of trimethoprim resulted in evolved strains with different levels of DHFR expression. In particular, some strains maintained two copies of a mutant folA that encoded an unstable DHFR. In a lon+ background, this mutant folA did not express well and did not confer trimethoprim resistance. However, in the lon- background, it displayed higher expression and conferred high-level trimethoprim resistance. The authors concluded that maintenance of the gene duplication event (and the absence of Lon) compensated for the proteolytic instability of this mutant DHFR. In summary, they provide evidence that the proteolytic stability of an antibiotic target protein is an important determinant of the evolution of target gene copy number in the setting of antibiotic selection.

      Strengths:

      The major strength of this paper is identifying an example of antibiotic resistance evolution that illustrates the interplay between the proteolytic stability and copy number of an antibiotic target in the setting of antibiotic selection. The results are rigorous and convincingly support the conclusions. This paper will be of interest to any biologist that studies the evolution of resistance mechanisms or gene duplication.

      Weaknesses:

      The impact of this finding is somewhat limited given that it is a single example that occurred in a lon strain of E. coli. Although the specific mechanism is unlikely to occur naturally, this study represents an important and convincing proof of the principle that gene duplication can provide increased expression demand for an unstable resistance determinant in the setting of antibiotic selection.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Jena et al. addresses important questions on the fundamental mechanisms of genetic adaptation, specifically, does adaptation proceed via changes of copy number (gene duplication and amplification "GDA") or by point mutation. While this question has been worked on (for example by Tomanek and Guet) the authors add several important aspects relating to resistance against antibiotics and they clarify the ability of Lon protease to reduce duplication formation (previous work was more indirect).

      A key finding Jena et al. present is that point mutations after significant competition displace GDA. A second one is that alternative GDA constantly arise and displace each other (see work on GDA-2 in Figure 3). Finally, the authors found epistasis between resistance alleles that was contingent on lon. Together this shows an intricate interplay of lon proteolysis for the evolution and maintenance of antibiotic resistance by gene duplication.

      Strengths:

      The study has several important strengths: (i) the work on GDA stability and competition of GDA with point mutations is a very promising area of research and the authors contribute new aspects to it, (ii) rigorous experimentation, (iii) very clearly written introduction and discussion sections. To me, the best part of the data is that deletion of lon stimulates GDA, which has not been shown with such clarity until now.

      Weaknesses:

      The minor weaknesses of the manuscript are a lack of clarity in parts of the results section (Point 1) and the methods (Point 2).

      We thank the reviewer for their comments and suggestions on our manuscript. We also appreciate the succinct summary of primary findings that the Reviewer has taken cognisance of in their assessment, in particular the association of the Lon protease with the propensity for GDAs as well as its impact on their eventual fate. We have now revised the manuscript for greater clarity as suggested by Reviewer #1.

      Reviewer #2 (Public review):

      Summary:

      In this strong study, the authors provide robust evidence for the role of proteostasis genes in the evolution of antimicrobial resistance, and moreover, for stabilizing the proteome in light of gene duplication events.

      Strengths:

      This strong study offers an important interaction between findings involving GDA, proteostasis, experimental evolution, protein evolution, and antimicrobial resistance. Overall, I found the study to be relatively well-grounded in each of these literatures, with experiments that spoke to potential concerns from each arena. For example, the literature on proteostasis and evolution is a growing one that includes organisms (even micro-organisms) of various sorts. One of my initial concerns involved whether the authors properly tested the mechanistic bases for the rule of Lon in promoting duplication events. The authors assuaged my concern with a set of assays (Figure 8).

      More broadly, the study does a nice job of demonstrating the agility of molecular evolution, with responsible explanations for the findings: gene duplications are a quick-fix, but can be out-competed relative to their mutational counterparts. Without Lon protease to keep the proteome stable, the cell allows for less stable solutions to the problem of antibiotic resistance.

      The study does what any bold and ambitious study should: it contains large claims and uses multiple sorts of evidence to test those claims.

      Weaknesses:

      While the general argument and conclusion are clear, this paper is written for a bacterial genetics audience that is familiar with the manner of bacterial experimental evolution. From the language to the visuals, the paper is written in a boutique fashion. The figures are even difficult for me - someone very familiar with proteostasis - to understand. I don't know if this is the fault of the authors or the modern culture of publishing (where figures are increasingly packed with information and hard to decipher), but I found the figures hard to follow with the captions. But let me also consider that the problem might be mine, and so I do not want to unfairly criticize the authors.

      For a generalist journal, more could be done to make this study clear, and in particular, to connect to the greater community of proteostasis researchers. I think this study needs a schematic diagram that outlines exactly what was accomplished here, at the beginning. Diagrams like this are especially important for studies like this one that offer a clear and direct set of findings, but conduct many different sorts of tests to get there. I recommend developing a visual abstract that would orient the readers to the work that has been done.

      The reviewer’s comments regarding data presentation are well-taken. Since we already had a diagrammatic model that sums up the chief findings of our study (Figure 9), we have now provided schematics in Figures 1, 3, 5 and 8 to clarify the workflow of smaller sections of the study. We hope that these diagrams provide greater clarity with regards to the experiments we have conducted.

      Next, I will make some more specific suggestions. In general, this study is well done and rigorous, but doesn't adequately address a growing literature that examines how proteostasis machinery influences molecular evolution in bacteria.

      While this paper might properly test the authors' claims about protein quality control and evolution, the paper does not engage a growing literature in this arena and is generally not very strong on the use of evolutionary theory. I recognize that this is not the aim of the paper, however, and I do not question the authors' authority on the topic. My thoughts here are less about the invocation of theory in evolution (which can be verbose and not relevant), and more about engagement with a growing literature in this very area.

      The authors mention Rodrigues 2016, but there are many other studies that should be engaged when discussing the interaction between protein quality control and evolution.

      A 2015 study demonstrated how proteostasis machinery can act as a barrier to the usage of novel genes: Bershtein, S., Serohijos, A. W., Bhattacharyya, S., Manhart, M., Choi, J. M., Mu, W., ... & Shakhnovich, E. I. (2015). Protein homeostasis imposes a barrier to functional integration of horizontally transferred genes in bacteria. PLoS genetics, 11(10), e1005612

      A 2019 study examined how Lon deletion influenced resistance mutations in DHFR specifically: Guerrero RF, Scarpino SV, Rodrigues JV, Hartl DL, Ogbunugafor CB. The proteostasis environment shapes higher-order epistasis operating on antibiotic resistance. Genetics. 2019 Jun 1;212(2):565-75.

      A 2020 study did something similar: Thompson, Samuel, et al. "Altered expression of a quality control protease in E. coli reshapes the in vivo mutational landscape of a model enzyme." Elife 9 (2020): e53476.

      And there's a new review (preprint) on this very topic that speaks directly to the various ways proteostasis shapes molecular evolution:

      Arenas, Carolina Diaz, Maristella Alvarez, Robert H. Wilson, Eugene I. Shakhnovich, C. Brandon Ogbunugafor, and C. Brandon Ogbunugafor. "Proteostasis is a master modulator of molecular evolution in bacteria."

      I am not simply attempting to list studies that should be cited, but rather, this study needs to be better situated in the contemporary discussion on how protein quality control is shaping evolution. This study adds to this list and is a unique and important contribution. However, the findings can be better summarized within the context of the current state of the field. This should be relatively easy to implement.

      We thank the reviewer for their encouraging assessment of our manuscript as well as this important critique regarding the context of other published work that relates proteostasis and molecular evolution. Indeed, this was a particularly difficult aspect for us given the different kinds of literature that were needed to make sense of our study. We have now added the references suggested by the reviewer as well as others to the manuscript. We have also added a paragraph in the discussion section (Lines 463-476) that address this aspect and hopefully fill the lacuna that the reviewer points out in this comment.

      Reviewer #3 (Public review):

      Summary:

      This paper investigates the relationship between the proteolytic stability of an antibiotic target enzyme and the evolution of antibiotic resistance via increased gene copy number. The target of the antibiotic trimethoprim is dihydrofolate reductase (DHFR). In Escherichia coli, DHFR is encoded by folA and the major proteolysis housekeeping protease is Lon (lon). In this manuscript, the authors report the results of the experimental evolution of a lon mutant strain of E. coli in response to sub-inhibitory concentrations of the antibiotic trimethoprim and then investigate the relationship between proteolytic stability of DHFR mutants and the evolution of folA gene duplication. After 25 generations of serial passaging in a fixed concentration of trimethoprim, the authors found that folA duplication events were more common during the evolution of the lon strain, than the wt strain. However, with continued passaging, some folA duplications were replaced by a single copy of folA containing a trimethoprim resistance-conferring point mutation. Interestingly, the evolution of the lon strain in the setting of increasing concentrations of trimethoprim resulted in evolved strains with different levels of DHFR expression. In particular, some strains maintained two copies of a mutant folA that encoded an unstable DHFR. In a lon+ background, this mutant folA did not express well and did not confer trimethoprim resistance. However, in the lon- background, it displayed higher expression and conferred high-level trimethoprim resistance. The authors concluded that maintenance of the gene duplication event (and the absence of Lon) compensated for the proteolytic instability of this mutant DHFR. In summary, they provide evidence that the proteolytic stability of an antibiotic target protein is an important determinant of the evolution of target gene copy number in the setting of antibiotic selection.

      Strengths:

      The major strength of this paper is identifying an example of antibiotic resistance evolution that illustrates the interplay between the proteolytic stability and copy number of an antibiotic target in the setting of antibiotic selection. If the weaknesses are addressed, then this paper will be of interest to microbiologists who study the evolution of antibiotic resistance.

      Weaknesses:

      Although the proposed mechanism is highly plausible and consistent with the data presented, the analysis of the experiments supporting the claim is incomplete and requires more rigor and reproducibility. The impact of this finding is somewhat limited given that it is a single example that occurred in a lon strain and compensatory mutations for evolved antibiotic resistance mechanisms are described. In this case, it is not clear that there is a functional difference between the evolution of copy number versus any other mechanism that meets a requirement for increased "expression demand" (e.g. promoter mutations that increase expression and protein stabilizing mutations).

      We thank the reviewer for their in-depth assessment of our work and appreciate their concerns regarding reproducibility and rigor in analysis of our data. We have now incorporated this feedback and provided necessary clarifications/corrections in the revised version of our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Major Points:

      (1) The authors show that a deletion of lon increases the ability for GDA and they argue that this is adaptive during TMP treatment because it increases the dosage of folA (L. 129). However, the highest frequency of GDA occurred in drug-free conditions (see Figure 1C). This indicates either that GDA is selected in drug-free media and potentially selected against by certain antibiotics. It would help for the authors to discuss this possibility more clearly.

      We thank the reviewer for this astute observation. It is indeed striking that the GDA mutation (i.e. the GDA-2 mutation) selected in a lon-deficient background does not come up in presence of antibiotics. To probe this further, we have now measured the relative fitness of a representative population of lon-knockout from short-term evolution in drug-free LB (population #3) that harbours GDA-2 against its ancestor (marked with DlacZ). These competition experiments were performed in LB (in which GDA-2 emerged spontaneously), as well as in LB supplemented with antibiotics at the concentrations used during the short term evolution.

      Values of relative fitness, w (mean ± SD from 3 measurements), are provided below:

      LB: 1.4 ± 0.2

      LB + Trimethoprim: 1.6 ± 0.2

      LB + Spectinomycin: 0.9 ± 0.2

      LB + Erythromycin: 1.3 ± 0.3

      LB + Nalidixic acid: 1.5 ± 0.2

      LB + Rifampicin: 1.4 ± 0.2

      These data show an increase in relative fitness in drug-free LB as would be expected. Interestingly, we also observe an increase in relative fitness in LB supplemented with antibiotics, except spectinomycin. This result supports the idea that GDA-2 is a “media adaptation” and provides a general fitness advantage to the lon knockout. However, as the reviewer pointed out, we should expect to see GDA-2 emerge spontaneously in antibiotic-supplemented media as well. We think that this does not happen as the fitness advantage of drug-specific mutations (GDAs or point mutations) far exceed the advantage of a media adaptation GDA. As a result, we only see the specific mutations that provide high benefit against the antibiotic at least over the relatively short duration of 20-25 generations. It is noteworthy the GDA-2 mutation does come up in LTMPR1 when it is passaged over >200 generations in drug-free media, but shows fluctuating frequency over time. We expect, therefore, that given enough time we may detect the GDA-2 mutations even in antibiotic-supplemented media.  

      We note, however, that a major caveat in the above fitness calculations is that we cannot be sure that the competing ancestor has no GDA-2 mutations during the course of the experiment. Thus, the above fitness values are only indicative and not definitive. We have therefore not included these data in the revised manuscript.

      (2) It is unclear if the isolates WTMPR1 - 5 and LTMPR1 - 5 were pure clones. The authors write in L.488 "Colonies were randomly picked, cultured overnight in drug-free LB and frozen in 50% glycerol at -80C until further use." And in L. 492 "For long-term evolution, trimethoprim-resistant isolates LTMPR1, WTMPR4 and WTMPR5 were first revived from frozen stocks in drug-free LB overnight." From these descriptions, it is possible that the isolates contained a fraction of cells of other genotypes since colonies are often formed by more than one cell and thus, unless pure-streaked, a subpopulation is present and would in drug-free media be maintained. The possibility of pre-existing subpopulations is important for all statements relating to "reversal".

      This is indeed a valid concern. As far as we can tell all our initial isolates (i.e. WTMPR1-5 and LTMPR1-5) are pure clones at least as far as SNPs are concerned. This is based on whole genome sequencing data that we have reported earlier in Patel and Matange, eLife (2021), where we described the evolution and isolation of WTMPR1-5 and the present study for LTMPR1-5. All SNPs detected were present at a frequency of 100%. For clones with GDAs, however, there is no way to eliminate a sub-population that has a lower or higher gene copy number than average from an isolate. This is because of the inherent instability of GDAs that will inevitably result in heterogeneous gene copy number during standard growth. In this sense, there is most certainly a possibility of a pre-existing subpopulation within each of the clones that may have reversed the GDA. Indeed, we believe that it is this inherent instability that contributes to their rapid loss during growth in drug-free media.

      Minor Points:

      (1) L. 406. "allowing accumulation of IS transposases in E. coli" Please specify that it is the accumulation of transposase proteins (and not genes).

      We have made this change.

      (2) L. 221 typo. Known "to" stabilize.

      We have made this change.

      Reviewer #2 (Recommendations for the authors):

      Most of my suggestions are found in the public review. I believe this to be a strong study, and some slight fixes can solidify its presence in the literature.

      We have attempted to address the two main critiques by Reviewer 2. To simplify the understanding of our data, we have provided small schematics at various points in the paper to clarify the experimental pipelines used by us. We have also provided additional discussion situating our study in the emerging area of proteostasis and molecular evolution. We hope that our revisions have addressed these lacunae in our manuscript.

      Reviewer #3 (Recommendations for the authors):

      Major Points:

      (1) The manuscript is generally a bit difficult to follow. The writing is overly complicated and lacks clarity at times. It should be simplified and improved.

      We have made several revisions to the text, as well as provided schematics in some of our figures which hopefully make our paper easier to understand.

      (2) I cannot find the raw variant summary data for the lon strain evolution experiment in trimethoprim (after 25 generations). Were there any other mutations identified? If not, this should be explicitly stated in the text and the variant output summary from sequencing included as supplemental data.

      We apologise for this oversight. We have now provided these data as Table 1.

      (3) What is the trimethoprim IC50 of the starting (pre-evolution) strains (i.e. wt and lon)? I can't find this information, but it is critical to interpretation.

      We had reported these values earlier in Matange N., J Bact (2020). Wild type and lon-knockout have similar MIC values for trimethoprim, though the lon mutant shows a higher IC50 value. We have now mentioned this in the results section (Line 100-101) and also provided the reference for these data.

      (4) What was the average depth of coverage for WGS? This information is necessary to assess the quality of the variant calling, especially for the population WGS.

      All genome sequencing data has a coverage at least 100x. We have added this detail to the methods section (Line 580-581).

      (5) Five replicate evolution experiments (25 generations, or 7x 10% daily batch transfers) were performed in trimethoprim for the wt and lon strains. Duplication of the folA locus occurred in 1/5 and 4/5 experiments, respectively. It is not entirely clear what type of sampling was actually done to arrive at these numbers (this needs to be stated more clearly), but presumably 1 random colony was chosen at the end of the passaging protocol for each replicate. Based on this result, the authors conclude that folA duplication occurred more frequently in the lon strain, however, this is not rigorously supported by a statistical evaluation. With N=5, one cannot rigorously conclude that a 20% frequency and 80% frequency are significantly different. Furthermore, it's not entirely clear what the mechanism of resistance is for these strains. For example, in one colony sequenced (LTMPR5), it appears no known resistance mechanism (or mutations?) were identified, and yet the IC50 = 900 nM, which is also similar to other strains.

      Indeed, we agree with the reviewer that we don’t have the statistical power to rigorously make this claim. However, since the lon-knockout showed us a greater frequency of GDA across 3 different environments we are fairly confident that loss of lon enhances the overall frequency for GDA mutations. This idea in also supported by a number of previous papers that related GDAs and IS-element transpositions with Lon, viz. Nicoloff et al, Antimicrob Agent Chemother (2007), Derbyshire et al. PNAS (1990), Derbyshire and Grindley, Mol Microbiol (1996). We have therefore not provided further justification in the revised manuscript.

      We had indeed sampled a random isolate from each of the 5 populations and have added a schematic to figure 1 that provides greater clarity.

      Having relooked at the sequencing data for LTMPR1-5 isolates (Table 1), we realised that both LTMPR4 and LTMPR5 harbour mutations in the pitA gene. We had missed this locus during the previous iteration of this manuscript and misidentified an mgrB mutations in LTMPR4. PitA codes for a metal-phosphate symporter. We have observed mutations in pitA in earlier evolution experiments with trimethoprim as well (Vinchhi and Yelpure et al. mBio 2023). Interestingly, in LTMPR5 there was a deletion of pitA, along with 17 other contiguous genes mediated by IS5. To test if loss of pitA is beneficial in trimethoprim, we tested the ability of a pitA knockout to grow on trimethoprim supplemented plates. Indeed, loss of pitA conferred a growth advantage to E. coli on trimethoprim, comparable to loss of mgrB, indicating that the mechanism of resistance of LTMPR5 may be due to loss of pitA. We have added these data to the Supplementary Figure 1 of the revised manuscript and provided a brief description in Lines 103-108. How pitA deficiency confers trimethoprim resistance is yet to be investigated. The mechanism is likely to be by activating some intrinsic resistance mechanism as loss of pitA also conferred a fitness benefit against other antibiotics. This work is currently underway in our lab and hence we do not provide any further mechanism in the present manuscript.

      (6) Although measurement error/variance is reported, statistical tests were not performed for any of the experiments. This is critical to support the rigor and reproducibility of the conclusions.

      We have added statistical testing wherever appropriate to the revised manuscript.

      (7) Lines 150-155 and Figure 2E: Putting a wt copy of mgrB back into the WTMPR4 and LTMPR1 strains would be a better experiment to dissect out the role of mgrB versus the other gene duplications in these strains on fitness. Without this experiment, you cannot confidently attribute the fitness costs of these strains to the inactivation of mgrB alone.

      We agree with the reviewer that our claim was based on a correlation alone. We have now added some new data to confirm our model (Figure 2 E, F). The costs of mgrB mutations come from hyperactivation of PhoQP. In earlier work we have shown that the costs (and benefit) of mgrB mutations can be abrogated in media supplemented with Mg<sup>2+</sup>, which turns off the PhoQ receptor (Vinchhi and Yelpure et al. mBio, 2023). We use this strategy to show that like the mgrB-knockout, the costs of WTMPR4, WTMPR5 and LTMPR1 can be almost completely alleviated by adding Mg<sup>2+</sup> to growth media. These results confirm that the source of fitness cost of TMP-resistant bacteria was not linked to GDA mutations, but to hyperactivation of PhoQP.

      (8) Figure 3F and G: Does the top symbol refer to the starting strain for the 'long-term' evolution? If so, why does WTMPR4 not have the mgrB mutation (it does in Figure 1)? Based on your prior findings, it seems odd that this strain would evolve an mgrB loss of function mutation in the absence of trimethoprim exposure.

      We thank the reviewer for pointing this error out. We have made the correction in the revised manuscript.

      (9) Figure 6A: If the marker is neutral, it should be maintained at 0.1% throughout the 'neutrality' experiment. In both plots, the proportion of some marked strains goes up and then down. This suggests either ongoing evolution (these competitions take place over 105 generations), or noisy data. I suspect these data are just inherently noisy. I don't see error bars in the plots. Were these experiments ever replicated? It seems that replicating the experiments might be able to separate out noise from signal and perhaps clarify this point and better confirm the hypothesis that the point mutants are more fit.

      These experiments were indeed noisy and the apparent enrichment is most likely a measurement error rather than a real change in frequency of competing genotypes. We have now provided individual traces for each of the competing pairs with mean and SD from triplicate observations at each time point.

      (10) Figure 6A: Please indicate which plotted line refers to which 'point mutant' using different colors. These mutants have different trimethoprim IC50s and doubling times, so it would be nice to be able to connect each mutant to its specific data plot.

      We thank the reviewer for this suggestion. We have now colour coded the different strain combinations as suggested.

      (11) Lines 284-285: I disagree that the IC50s are similar. The C-35T mutant has IC50 that is 2x that of LTMPR1. Perhaps more telling is that, compared to the folA duplication strain from the same time-point (which also carries the rpoS mutation), all of the point mutants have greater IC50s (~2x greater). 2-fold changes in IC50 are significant. It would seem that the point-mutants were likely not competing against LTMPR1 at the time they arose, so LTMPR1 might not be the best comparator if it was extinguished from the population early. I'm assuming this is why you chose a contemporary isolate (and, also, rpoS mutant) for the competition experiments. This should be explained more clearly.

      We thank the reviewer for this comment. Indeed, the reviewer is correct about the rationale behind the use of a contemporary isolate and we have provided this clarification in the revised manuscript (Line 287-289). Also, the reviewer is correct in pointing out that a two-fold difference in IC50 cannot be ignored. However, the key point here would be in assessing the differences in growth rates at the antibiotic concentration used during competition (i.e. 300 ng/mL). We are unable to see a direct correlation between the growth rates and enrichment in culture indicating that the observed trends are unlikely to be driven by ‘level of resistance’ alone. We have added these clarifications to the modified manuscript (Lines 299-301)

      Minor Points:

      (1) Line 13: Add a comma before 'Escherichia'

      We have made this change.

      (2) Line 14: Consider changing "mutations...were beneficial in trimethoprim" to "mutations...were beneficial under trimethoprim exposure"

      We have made this change.

      (3) Line 32: Is gene dosage really only "relative to the genome"? Is it not simply its relative copy number generally? Consider changing to "The dosage of a gene, or its relative copy number, can impact its level of expression..."

      We have made this change.

      (4) Line 38: The idea that GDAs are 1000x more frequent than point mutations seems an overgeneralization.

      We agree with the reviewer and have softened our claim.

      (5) Line 50: The term "hard-wired" is confusing. Please be more specific.

      We have modified this statement to “…GDAs are less stable than point mutations….”.

      (6) Line 52-53: What do you mean by "there is also evidence to suggest that...more common in bacteria than appreciated"? Are you implying the field is naïve to this fact? If there is "evidence" of this, then a reference should be included. However, it's not clear why this is important to state in the article. I would consider simply removing this sentence. Less is more in this case.

      We have removed this statement.

      (7) Lines 59-60: Enzymes catalyze reactions. Please also state the substrates for DHFR. Consider, "It catalyzes the NADPH-dependent reduction of dihydrofolate to tetrahydrofolate, and important co-factor for..."

      We have made this change.

      (8) Line 72: Please change to, "In E. coli, DHFR is encoded by folA." You do not need to state this is a gene, as it is implicit with lowercase italics.

      We have made this change.

      (9) Lines 72-86: This paragraph is a bit confusing to read, as it has several different ideas in it. Consider breaking it into two paragraphs at Line 80, "In this study,...". The first paragraph could just review the trimethoprim resistance mechanisms in E. coli and so would change the first sentence (Line 72) to reflect this topic: "In E. coli, DHFR is encoded by folA and several different resistance mechanisms have been characterized." Then, just describe each mechanism in turn. Also, by "hot spots" it would seem you are referring to "point mutations" in the gene that alter the protein sequence and cluster onto the 3D protein structure when mapped? Please be more specific with this sentence for clarity.

      We have made these changes.

      (10) Lines 92-93: Please also state the MIC value of the strain to specifically define "sub-MIC". Alternatively, you could also state the fraction MIC (e.g. 0.1 x MIC).

      We have modified this statement to “…in 300 ng/mL of trimethoprim (corresponding to ~0.3 x MIC) for 25 generations.”

      (11) Lines 95-96. Remove, "These sequencing have been reported earlier, ...(2021)". You just need to cite the reference.

      We have made this change.

      (12) Line 96: Remove the word "gene".

      We have made this change.

      (13) Figure 1 and Figure 4C: The color scheme is tough for those with the most common type of color blindness. Red/green color deficiency causes a lot of difficulty with Red/gray, red/green, green/gray. Consider changing.

      We thank the reviewer for bringing this to our notice. We have modified the colour scheme throughout the manuscript.

      (14) Figure 1: Was there a trimethoprim resistance mechanism identified for LTMPR5?

      As stated by us in response to major comment #7, LTMPR5’s resistance seems to come from a novel mechanism involving loss of the pitA gene.

      (15) Line 349-351: Please briefly define "lower proteolytic stability" as a relative susceptibility to proteolytic degradation and make sure it is clear to the reader that this causes less DHFR. This needs to be clarified because it is confusing how a mutation that causes DHFR proteolytic instability would lead to an increase in trimethoprim IC50. So, you also need to mention that some mutations can cause both increased trimethoprim inhibition and lower proteolytic stability simultaneously. It seems the Trp30Arg mutation is an example of this, as this mutation is associated with a net increase in trimethoprim resistance despite the competing effects of the mutation on enzyme inhibition and DHFR levels.

      We thank the reviewer for this comment and agree that the text in the original manuscript did not fully convey the message. We have made modifications to this section (Lines 359-363) in the revised manuscript in agreement with the reviewer’s suggestions.

    1. eLife Assessment

      This is an important study with solid evidence that multi-voxel fMRI activity patterns for threat-conditioned stimuli are altered by learning CS-US contingencies. The analyses are dense but mostly rigorous. The protocol is quite nuanced and complex, but the authors have done a fair job of explaining and presenting the results, and the results could be further improved by adjustment for multiple comparisons. The readability could be improved for an audience without highly-specialised knowledge of the field and the fMRI analytical approach.

    2. Reviewer #1 (Public review):

      Summary:

      The authors conducted a human neuroimaging study investigating the role of context in the representation of fear associations when the contingencies between a conditioned stimulus and shock unconditioned stimulus switch between contexts. The novelty of the analysis centered on neural pattern similarity to derive a measure of context and cue stability and generalization across different regions of the brain. Given the complexity and nuance of the results, it is kind of difficult to provide a concise summary. But during fear and reversal, there was cue generalization (between current CS+ cues) in the canonical fear network, and "item stability" for cues that changed their association with the shock in the IFG and precuneus. Reinstatement was quantified as pattern similarity for items or sets of cues from the earlier phases to the test phases, and they found different patterns in the IFG and dmPFC. A similar analytical strategy was applied to contexts.

      Strengths:

      Overall, I found this to be a novel use of MVPA to study the role of context in the reversal/extinction of human fear conditioning that yielded interesting results. The paper was overall well-written, with a strong introduction and fairly detailed methods and results. The lack of any univariate contrast results from the test phases was used as motivation for the neural pattern similarity approach, which I appreciated as a reader.

      Weaknesses:

      This is quite a complicated protocol and analysis plan. The authors did a decent job explaining it, given the complexity of the approach and the dense results. But it did take reading it a couple of times to start to understand it. I'm not sure if there is a simpler way to describe the approach though. Just an observation. But perhaps there is a better way to explain the density of the different comparisons between the multiple cues and contexts. It can be difficult to totally avoid jargon in a complex scientific article, but the paper is very jargon-y.

      Here are a few more comments and stray observations, in no particular order of importance.

      (1) I had a difficult time unpacking lines 419-420: "item stability represents the similarity of the neural representation of an item to other representations of this same item."

      (2) The authors use the phrase "representational geometry" several times in the paper without clearly defining what they mean by this.

      (3) The abstract is quite dense and will likely be challenging to decipher for those without a specialized knowledge of both the topic (fear conditioning) and the analytical approach. For instance, the goal of the study is clearly articulated in the first few sentences, but then suddenly jumps to a sentence stating "our data show that contingency changes during reversal induce memory traces with distinct representational geometries characterized by stable activity patterns across repetitions..." this would be challenging for a reader to grok without having a clear understanding of the complex analytical approach used in the paper.

      (4) Minor: I believe it is STM200 not the STM2000.

      (5) Line 146: "...could be particularly fruitful as a means to study the influence of fear reversal or extinction on context representations, which have never been analyzed in previous fear and extinction learning studies." I direct the authors to Hennings et al., 2020, Contextual reinstatement promotes extinction generalization in healthy adults but not PTSD, as an example of using MVPA to decipher reinstatement of the extinction context during test.

      (6) This is a methodological/conceptual point, but it appears from Figure 1 that the shock occurs 2.5 seconds after the CS (and context) goes off the screen. This would seem to be more like a trace conditioning procedure than a standard delay fear conditioning procedure. This could be a trivial point, but there have been numerous studies over the last several decades comparing differences between these two forms of fear acquisition, both behaviorally and neurally, including differences in how trace vs delay conditioning is extinguished.

      (7) In Figure 4, it would help to see the individual data points derived from the model used to test significance between the different conditions (reinstatement between Acq, reversal, and test-new).

    3. Reviewer #2 (Public review):

      Summary:

      This is a timely and original study on the geometry of macroscopic (2.5 mm) brain representations of multiple cues and contexts in Pavlovian fear conditioning. The authors report that these representations differ between initial learning, and reversal learning, and remain stable during extinction.

      Strengths:

      The authors address an important question and use a rigorous experimental methodology.

      Weaknesses:

      The findings are limited (a) by the chosen spatial resolution (2.5 mm) which is far away from what modern fMRI can achieve, and (b) by the statistical analysis method. While transparently reported, their voxel-wise correction for multiple comparisons rests on a false discovery rate (i.e. 5% of the reported findings should be considered false positives) and there is no correction for the number of hypothesis tests (with an exception in some post hoc tests). Furthermore, there are some minor presentation issues that the authors could address to improve clarity.

    4. Author response:

      We would like to sincerely thank the editors and reviewers for their thoughtful comments, which provide valuable insights, and will help us enhance the overall quality of our manuscript. We will address all comments comprehensively in our revised submission.

      It appears to us that two major concerns were raised by the reviewers and highlighted by the editor, regarding statistical methodology and manuscript readability.

      As a provisional response, we would like to summarize our approach for addressing them in our revised manuscript:

      (1) Statistical Methodology

      Two specific concerns were raised regarding the statistical methods:

      First, regarding FDR versus FWE correction in our voxelwise (searchlight) analyses. We recognize that our methods section might have created some confusion on this point. While we stated that "all analyses are FDR-corrected unless noted otherwise", this was meant to refer only to ROI-based analyses. For all voxel-wise analyses, including searchlight RSA analyses, we actually employed FWE correction. This was briefly mentioned in the section on univariate analyses. However, we did not emphasize this information in the searchlight section of the methods, and it is to our understanding that this might have created some confusion.

      To clarify: we used (1) FWE correction for all voxel-based analyses and (2) FDR correction for ROI-based analyses (which could thus be considered exploratory). However, to fully address the concerns raised by the reviewers, and avoid potential confusion for the future readers, we will use exclusively FWE correction methods in the revised version of the manuscript. If some category of ROI-based analysis only yields not-significant results when corrected with FWE, we plan to report the uncorrected p-values, and pinpoint the exploratory nature of these results.

      Second, regarding the alpha threshold adjustment for searchlight analyses involving multiple comparisons within the same experimental phase: We acknowledge this concern and will address it thoroughly in our revision.

      (2) Manuscript Readability

      We agree that readability should be improved despite the paradigm's inherent complexity. In our revision, we will:

      - Replace non-essential technical terminology with clearer descriptions

      - Improve writing quality in particularly dense or conceptually complex sections

      - Enhance the overall structure to better guide readers through our methods and findings

    1. eLife assessment

      The study presents a useful computational analysis of how the ratio between excitatory and inhibitory neural numbers affects coding capacity. The authors show that increasing the proportion of inhibitory neurons (as observed in upper cortical layers compared to the input recipient layer 4) increases the dimensionality of neural activity and improves the encoding of time-varying stimuli. However, the evidence about the role of the inhibitory population in coding is incomplete because numerical results are neither supported by analytical mathematical results nor include controls for changes in firing thresholds or subtypes of inhibitory neurons.

    2. Reviewer #1 (Public review):

      Summary:

      The authors seek to understand the role of different ratios of excitatory to inhibitory (EI) neurons, which in experimental studies of the cerebral cortex have been shown to range from 4 to 9. They do this through a simulation study of sparsely connected networks of excitatory and inhibitory neurons.

      Their main finding is that the participation ratio and decoding accuracy increase as the E/I ratio decreases. This suggests higher computational complexity.

      This is the start of an interesting computational study. However, there is no analysis to explain the numerical results, although there is a long literature of reduced models for randomly connected neural networks which could potentially be applied here. (For example, it seems that the authors could derive a mean field expression for the expected firing rate and variance - hence CV - which could be used to target points in parameter space (vs. repeated simulation in Figures 1,2).) The paper would be stronger and more impactful if this was attempted.

      Strengths:

      Some issues I appreciated are:

      (1) The use of a publicly available simulator (Brian), which helps reproducibility. I would also request that the authors supply submission or configuration scripts (if applicable, I don't know Brian).

      (2) A thorough exploration of the parameter space of interest (shown in Figure 2).

      (3) A good motivation for the underlying question: other things being equal, how does the E/I ratio impact computational capacity?

      Weaknesses:

      (1) Lack of mathematical analysis of the network model

      Major issues I recommend that the authors address (not sure whether these are "weaknesses"):

      (1) In "Coding capacity in different layers of visual cortex" the authors measure PR values from layers 2/3 and 4 in VISp and find that layer 2/3 has a higher PR than layer 4.

      But in Dahmen et al. 2020 (https://doi.org/10.1101/2020.11.02.365072 ), the opposite was found (see Figure 2d of Dahmen et al.): layer 2 had a lower PR than layer 4. Can the authors explain how that difference might arise? i.e. were they analyzing the same data sets? If so why the different results? Could it have to do with the way the authors subsample for the E/I ratio?

      From the Methods of that paper: "Visual stimuli were generated using scripts based on PsychoPy and followed one of two stimulus sequences ("brain observatory 1.1" and<br /> "functional connectivity"). We focused on spontaneous neural activity registered while the animal was not performing any task. In each session, the spontaneous activity condition lasted 30 minutes while the animal was in front of a screen of mean grey luminance. We, therefore, analyzed 26 of the original 58 sessions corresponding to the "functional connectivity" subdataset as they included such a period of spontaneous activity. " This suggests to me they may have analyzed recordings with the other stimulus sequence; however, the hypothesis that E/I ratio should modulate dimensionality would not seem to "care" about which stimulus sequence was used.

      (2) In Discussion (pg. 20, line 383): "They showed that brain regions closer to sensory input, like the thalamus, have higher dimensionality than those further away, such as<br /> the visual cortex. " How is this consistent with the hypothesis that "higher dimensionality might be linked to more complex cognitive functions"?

      (3) What is the probability of connection between different populations? e.g. the probability of there being a synaptic connection between any two E cells? I could not find a statement about this. It should be included in the Methods.

      (4) pg. 27, line 540: "Synchronicity within the network" For each cell pair, the authors use the maximum cross-correlation over time lag. I don't think I have seen this before. Can the authors explain why they use this measurement, vs (a) integrated cross-correlation or (b) cross-correlation at some time scale? Also, it seems like this fails to account for neuron pairs for which there is a strong inhibitory correlation.

      (5) "When stimulated, a time-varying input, μext(t), is applied to 2,000 randomly selected excitatory neurons. " I would guess that computing PR would depend on the overlap of the 500 neurons analyzed and this population. Do the authors check or control for that?

      5b) Related: to clarify, are the 500 neurons chosen from the analysis equally likely to be E or I neurons?

    3. Reviewer #2 (Public review):

      Summary:

      Alizadeh et al. investigate how varying cellular E/I (excitatory/inhibitory) composition impacts coding across cortical layers. They build on findings from a recent study (Huang et al., 2022) that demonstrated a decrease in the fraction of inhibitory neurons from L2/3 to L4. Using a network of excitatory and inhibitory leaky integrate-and-fire neurons, they systematically assess how these anatomical features influence the dimensionality of network activity and coding capacity. Their key finding is that increasing the proportion of inhibitory neurons enhances the dimensionality of activity and improves the encoding of time-varying stimuli.

      Strengths:

      The authors use a clear methodology and well-established model of network activity that allows them to relate network parameters to the coding properties. They systematically evaluate the impact of the key features of the inhibitory population. Thus, in addition to changing the fraction of inhibitory cells, they control for the inhibitory firing threshold of inhibitory neurons and connection strength between inhibitory and excitatory cells. Furthermore, they show their modeling results are aligned with the analysis of the spiking activity in L2/3 vs. L4 from the Allen Institute data.

      Weaknesses:

      One general shortcoming of this approach is that it focuses on a small preselected number of network features. For example, it is unclear to what extent the results would be affected by other aspects of the organization of cortical columns, such as subclasses of inhibitory cells (SOM, VIP, PV), specific differences in synapses, realistic population sizes, or even connectivity between layers. Similarly, the models of L2/3 and L4 are constrained based on a limited set of observations, and it has not been demonstrated whether the same findings hold true for V1 recordings analyzed by the authors.

      The modeling relies on anatomical data from the barrel cortex, but the decoding comparison is based on V1 data. This raises questions about how anatomical differences between regions may influence the conclusions.

      The coding capacity appears inversely correlated with the firing rate, which in this study is largely influenced by the properties of the inhibitory population. It would be important to confirm that the observed changes in coding capacity and participation ratio are not solely driven by firing rate changes.

    1. eLife Assessment

      This study presents a useful pipeline for de novo design of antimicrobial peptides active both against bacteria and viruses. The method is based on deep learning, using a GAN generator and a regression tasked to predict antimicrobial activity. The experimental evidence supporting the conclusions is solid, with 24 validated peptides, although some additional justifications of the computational strategy would be a plus. This work will be of interest to the community working on machine learning for biomedical applications and specifically on antimicrobial peptides.

    2. Reviewer #1 (Public review):

      This manuscript presents a pipeline incorporating a deep generative model and peptide property predictors for the de novo design of peptide sequences with dual antimicrobial/antiviral functions. The authors synthesized and experimentally validated three peptides designed by the pipeline, demonstrating antimicrobial and antiviral activities, with one leading peptide exhibiting antimicrobial efficacy in animal models.

      Overall, the authors have addressed each major comment through new experiments, particularly by validating 24 peptides, clarifying alignment methods, and demonstrating sequence novelty. These additions have strengthened the manuscript. To further refine the work, it would be helpful to briefly describe any steps taken to mitigate GAN pathologies (such as mode collapse), provide a short rationale for the use of five AVP classifiers and how they complement each other, and clearly present the expanded experimental data (including MIC values and antiviral results) in the main text. Finally, the authors should also compare their approach with recently described deep-learning-enabled antibiotic discovery methods.

    3. Reviewer #2 (Public review):

      Summary:

      This study marks a noteworthy advance in the targeted design of AMPs, leveraging a pioneering deep learning framework to generate potent bifunctional peptides with specificity against both bacteria and viruses. The introduction of a GAN for generation and a GCN-based AMPredictor for MIC predictions is methodologically robust and a major stride in computational biology. Experimental validation in vitro and in animal models, notably with the highly potent P076 against a multidrug-resistant bacterium and P002's broad-spectrum viral inhibition, underpins the strength of their evidence. The findings are significant, showcasing not just promising therapeutic candidates, but also demonstrating a replicable means to rapidly develop new antimicrobials against the threat of drug-resistant pathogens.

      Strengths:

      The de novo AMP design framework combines a generative adversarial network (GAN) with an AMP predictor (AMPredictor), which is a novel approach in the field. The integration of deep generative models and graph-encoding activity regressors for discovering bifunctional AMPs is cutting-edge and addresses the need for new antimicrobial agents against drug-resistant pathogens. The in vitro and in vivo experimental validations of the AMPs provide strong evidence to support the computational predictions. The successful inhibition of a spectrum of pathogens in vitro and in animal models gives credibility to the claims. The discovery of effective peptides, such as P076, which demonstrates potent bactericidal activity against multidrug-resistant A. baumannii with low cytotoxicity, is noteworthy. This could have far-reaching implications for addressing antibiotic resistance. The demonstrated activity of the peptides against both bacterial and viral pathogens suggests that the discovered AMPs have a wide therapeutic potential and could be effective against a range of pathogens.

      Comments on revisions: I have no further comments on revisions.

    4. Reviewer #3 (Public review):

      Summary:

      Dong et al. described a deep learning-based framework of antimicrobial (AMP) generator and regressor to design and rank de novo antimicrobial peptides (AMPs). For generated AMPs, they predicted their minimum inhibitory concentration (MIC) using a model that combines the Morgan fingerprint, contact map and ESM language model. For their selected AMPs based on predicted MIC, they also use a combination of antiviral peptide (AVP) prediction models to select AMPs with potential antiviral activity. They experimentally validated 3 candidates for antimicrobial activity against S. aureus, A. baumannii, E. coli, and P. aeruginosa, and their toxicity on mouse blood and three human cell lines. The authors select their most promising AMP (P076) for in vivo experiments in A. baumannii-infected mice. They finally test the antiviral activity of their 3 AMPs against viruses.

      Strengths:

      - The development of de novo antimicrobial peptides (AMPs) with the novelty of being bifunctional (antimicrobial and antiviral activity).

      - Novel, combined approach to AMP activity prediction from their amino acid sequence.

      Weaknesses:

      - I missed the justification for combined antiviral and antibacterial activities. As the authors responded, less than 10% of the training data has antiviral activity. Therefore, I do not understand how the high percentage of antiviral activities was achieved. Especially reading that the antiviral filtering did not have an influence on the number of antiviral peptides obtained.

      - I had difficulty in reading the story because of the use of acronyms without referring to their full name for the first time, and incomplete information annotation in figures and captions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      This manuscript presents a pipeline incorporating a deep generative model and peptide property predictors for the de novo design of peptide sequences with dual antimicrobial/antiviral functions. The authors synthesized and experimentally validated three peptides designed by the pipeline, demonstrating antimicrobial and antiviral activities, with one leading peptide exhibiting antimicrobial efficacy in animal models. However, the manuscript as it stands, has several major limitations on the computational side.

      Thanks for your comments. 

      Major issues:

      (1) The choice of GAN as the generative model. There are multiple deep generative frameworks (e.g., language models, VAEs, and diffusion models), and GANs are known for their training difficulty and mode collapse. Could the authors elaborate on the specific rationale behind choosing GANs for this task?

      We thank the reviewer for his/her concern on GAN models. We agree that there are some limitations of GAN itself such as its training difficulty, but we cannot deny its potential in generating biological sequences, especially in AMP generation. GAN and VAE are the two most commonly used generative models in the field of AMP design (Curr Opin Struct Biol 2023, 83:102733). AMPGAN (J Chem Inf Model, 2021, 61, 2198-2207.), Multi-CGAN (J Chem Inf Model 2024, 64, 1, 316–326), PepGAN (ACS Omega, 2020, 5, 22847-22851) and others have verified its application ability on peptide design. Moreover, PandoraGAN (Sn Comput Sci 2023, 4, 607) is one of the few works on AVP generation which is also based on GAN architecture. GAN updates the generator weights on the backpropagation from the discriminator directly rather than manually defined complicated loss function, which alleviates the reliance on input data. Our current results demonstrated that the trained GAN generator could produce novel sequences that featured high antimicrobial activity, both validated in silico and in vitro

      (2) The pipeline is supposed to generate peptides showing dual properties. Why were antiviral peptides not used to train the GAN? Would adding antiviral peptides into the training lead to a higher chance of getting antiviral generations?

      A major mechanism of antimicrobial peptides is to disrupt cell membranes. Thus, some antimicrobial peptides are reported with broad-spectrum antibacterial and antiviral activities, since the virus shares a membrane structure with bacteria, especially the enveloped viruses. In APD3 database, 244 of 3940 AMPs are labeled with antiviral activities. In constrast, most reported antiviral peptides inhibit the viruses by binding to specific targets (proteins and nucleic acids) related to viral proliferation so that they may not have antibacterial effects. Therefore, we trained the GAN with the AMP dataset. We chose this AMP dataset mainly for AMPredictor (with detailed logMIC label against E.coli) and then used the same dataset to train a GAN for simplification. 

      In the revised manuscript, we also tested adding available antiviral peptides from AVPdb to train the GAN model. The number of AVPs is 1,788 after removing overlaps with used AMP dataset. The GAN architecture and hyperparameters remain the same. After generating a batch of sequences with this trained generator, we scored them by AMPredictor and filtered them with five AVP classifiers. As expected, the predicted MIC values shifted to higher performance with 17 sequences < 5 μM and 39 sequences < 10 uM, and previous numbers are 26 and 42 in the manuscript. Among 39 sequences < 10 μM, 13 passed all five AVP classifiers and 17 passed four (33.3% and 43.6%, respectively). Previous ratios are 40.5% and 35.7% (17 and 15 out of 42). Two generators perform roughly the same for generating AVPs (76.9% vs. 76.1%) as evaluated by our rules (4 or more positives), but the generator trained solely with AMPs provided more AVPs with higher possibility (5 positives).

      We also experimentally tested dozens of generated peptides from two versions of generators (v1 for training solely on AMPs, v2 for training with AVPs, Figure 2 in revised manuscript). The ‘antiviral’ feature of a peptide was checked when significant inhibition was observed in immunofluorescence assays against HSV-1 at the concentration of 10 µM. Six and seven antiviral peptides were found out of 12 tested peptides from generators v1 and v2, respectively. Therefore, the success rates for two versions of generators are about 60% (including three reported peptides in the original manuscript) and show no significant difference.

      (3) For the antimicrobial peptide predictor, where were the contact maps of peptides sourced from?

      The contact maps of AMPs were predicted from ESM, which were obtained at the same time when obtaining the ESM embeddings (Methods section, Page 24, Line 538: Pretrained language model esm1b_t33_650M_UR50S was used to provide the embeddings and the contact maps.)

      (4) Morgan fingerprint can be used to generate amino acid features. Would it be better to concatenate ESM features with amino acid-level fingerprints and use them as node features of GNN?

      We thank the reviewer for this suggestion. We test using ESM and fingerprint (FP) features on graph nodes and the result is shown in Author response table 1. AMPredictor (ESM on nodes, FP after GNN) still performed slightly better than concatenating FP on node features on four regression metrics. 

      Author response table 1.

      Results of AMPredictor with fingerprint on nodes 

      (5) Although the number of labeled antiviral peptides may be limited, the input features (ESM embeddings) should be predictive enough when coupled with shallow neural networks. Have the authors tried simple GNNs on antiviral prediction and compared the prediction performance to those of existing tools?

      We thank the reviewer for his/her suggestion on AVP predictions. We haven’t tried it. An important reason is that we focused on developing regressors instead of binary classifiers. Currently available AVP data with numerical labels did not support training a reliable regressor, for their limited amount as well as heterogenous virus target and experimental assay. Therefore, we decided to use reported AVP classifiers as an additional filter following AMPredictor. Since only using one classifier may lead to bias, we chose five AVP classifiers as ensemble votes. 

      (6) Instead of using global alignment to get match scores, the authors should use local alignment.

      We calculated the match scores by global alignment methods referred to AMPGAN v2 (J Chem Inf Model 2021, 61, 2198−2207), CLaSS (Nat Biomed Eng 2021 5, 613–623), and AMPTrans-lstm (Comput Struct Biotechnol J 2022, 21, 463-471), to check the similarity between the generated sequences and any sequences in the training set. In addition, we also used local alignment to check the novelty of peptides (regarding the next question). 

      (7) How novel are the validated peptides? The authors should run a sequence alignment to get the most similar known AMP for each validated peptide, and analyze whether they are similar.

      We have listed the most similar AMP segments to our generated peptides from the training set and DRAMP database (28,233 sequences after filtering out those containing irregular characters). BLAST parameters were set as CLaSS (Nat Biomed Eng 2021 5, 613–623) for short peptides. The lowest Evalue of P001 aligned with the training set is 1.2, and no hits were found for P001 with DRAMP. Two E-values of P002 are 1.4 and 0.46. P076 had no hits in the training set and got a high E-value of 7.0 with DRAMP. Detailed alignments are shown below. This result indicates that our three validated AMPs are novel. 

      Since we generated more sequences using two versions of generator for validation, we also checked the BLAST E-value of these validated peptides. The results are listed in Table S3. All sequences obtained E-values > 0.1 and some of them had no hits when aligned with the training set or the DRAMP database. 

      Author response image 1.

      Alignments of three validated peptides.

      (8) Only three peptides were synthesized and experimentally validated. This is too few and unacceptable in this field currently. The standard is to synthesize and characterize several dozens of peptides at the very least to have a robust study.

      We thank the reviewer for the suggestion and promoted our models to generate >10 times more peptides in the revised manuscript. We have synthesized and tested more peptides in vitro and added these results in the revised manuscript (Figure 2). From two versions of generators (trained with or without AVPs), we selected 24 peptides in total for antibacterial and antiviral validations. All 24 peptides showed antibacterial activity towards at least bacterial strain, and 13 peptides were screened out through the quick antiviral test. This result indicates the capability of our design method for bifunctional AMPs with a notable success rate (60%).

      Reviewer #2 (Public Review):

      Summary:

      This study marks a noteworthy advance in the targeted design of AMPs, leveraging a pioneering deeplearning framework to generate potent bifunctional peptides with specificity against both bacteria and viruses. The introduction of a GAN for generation and a GCN-based AMPredictor for MIC predictions is methodologically robust and a major stride in computational biology. Experimental validation in vitro and in animal models, notably with the highly potent P076 against a multidrug-resistant bacterium and P002's broad-spectrum viral inhibition, underpins the strength of their evidence. The findings are significant, showcasing not just promising therapeutic candidates, but also demonstrating a replicable means to rapidly develop new antimicrobials against the threat of drug-resistant pathogens.

      Strengths:

      The de novo AMP design framework combines a generative adversarial network (GAN) with an AMP predictor (AMPredictor), which is a novel approach in the field. The integration of deep generative models and graph-encoding activity regressors for discovering bifunctional AMPs is cutting-edge and addresses the need for new antimicrobial agents against drug-resistant pathogens. The in vitro and in vivo experimental validations of the AMPs provide strong evidence to support the computational predictions. The successful inhibition of a spectrum of pathogens in vitro and in animal models gives credibility to the claims. The discovery of effective peptides, such as P076, which demonstrates potent bactericidal activity against multidrug-resistant A. baumannii with low cytotoxicity, is noteworthy. This could have far-reaching implications for addressing antibiotic resistance. The demonstrated activity of the peptides against both bacterial and viral pathogens suggests that the discovered AMPs have a wide therapeutic potential and could be effective against a range of pathogens.

      We thank the reviewer for the comments.

      Reviewer #3 (Public Review):

      Summary:

      Dong et al. described a deep learning-based framework of antimicrobial (AMP) generator and regressor to design and rank de novo antimicrobial peptides (AMPs). For generated AMPs, they predicted their minimum inhibitory concentration (MIC) using a model that combines the Morgan fingerprint, contact map, and ESM language model. For their selected AMPs based on predicted MIC, they also use a combination of antiviral peptide (AVP) prediction models to select AMPs with potential antiviral activity. They experimentally validated 3 candidates for antimicrobial activity against S. aureus, A. baumannii, E. coli, and P. aeruginosa, and their toxicity on mouse blood and three human cell lines. The authors select their most promising AMP (P076) for in vivo experiments in A. baumannii-infected mice. They finally test the antiviral activity of their 3 AMPs against viruses.

      Strengths:

      -The development of de novo antimicrobial peptides (AMPs) with the novelty of being bifunctional (antimicrobial and antiviral activity).

      -Novel, combined approach to AMP activity prediction from their amino acid sequence.

      Weaknesses:

      (1) I missed justification on why training AMPs without information of their antiviral activity would generate AMPs that could also have antiviral activity with such high frequency (32 out of 104).

      Thanks for your inquiry. A major mechanism of antimicrobial peptides is to disrupt cell membranes. Thus, some antimicrobial peptides are reported with broad-spectrum antibacterial and antiviral activities, since the virus shares a membrane structure with bacteria, especially the enveloped viruses. In APD3 database, 244 of 3940 AMPs are labeled with antiviral activities. However, several reported antiviral peptides inhibit the viruses by binding to specific targets (proteins and nucleic acids) related to viral proliferation so that they may not have antibacterial effects. Therefore, we trained the GAN with the AMP dataset. We chose this AMP dataset mainly for AMPredictor (with detailed logMIC label against E.coli) and then used the same dataset to train a GAN for simplification. In addition, it’s not 32 antiviral candidates out of 104 but 32 out of 42 peptides with predicted MIC < 10 µM because we did the filtering process stepwise. 

      In revision, we also tested adding available antiviral peptides from AVPdb to train the GAN model (generator v2). The number of AVPs is 1,788 after removing overlaps with used AMP dataset. The GAN architecture and hyperparameters remain the same. We used generator v2 to obtain a batch of sequences and screened out bifunctional candidates following the same procedure. 30 out of 39 peptides with predicted MIC < 10 µM passed four or five AVP predictors. Therefore, two generators perform roughly the same for generating AVP candidates (76.9% vs. 76.1%). 

      (2) The justification for AMP predictor advantages over previous tools lacks rationale, comparison with previous tools (e.g., with the very successful AMP prediction approach described by Ma et al. 10.1038/s41587-022-01226-0), and proper referencing.

      Thanks for your suggestion. Ma et al. proposed ensemble binary classification models to mine AMPs from metagenomes successfully. However, we concentrated on the development of regression models. As a regressor, AMPredictor predicts the specific logMIC value of the input sequences instead of giving a yes/no answer. Since the training settings and evaluation metrics are different for the classification and regression tasks, we could not compare AMPredictor with Ma et al. directly. Instead, we compared the performance of AMPredictor with some regression baseline models (Figure S2a) and our model outperformed them. 

      (3) Experimental validation of three de novo AMPs is a very low number compared to recent similar studies.

      Thanks for pointing out this shortcoming. We have synthesized and tested more peptides in vitro and added these results in the revised manuscript (Figure 2). From two versions of generators (trained with or without AVPs), we selected 24 peptides in total for antibacterial and antiviral validations. All 24 peptides showed antibacterial activity towards at least bacterial strain, and 13 peptides were screened out through the quick antiviral test. This result indicates the capability of our design method for bifunctional AMPs with a notable success rate (60%).

      (4) I have concerns regarding the in vivo experiments including i) the short period of reported survival compared to recent studies (0.1038/s41587-022-01226-0, 10.1016/j.chom.2023.07.001, 0.1038/s41551-022-00991-2) and ii) although in Figure 2 f and g statistics have been provided, log scale y-axis would provide a better comparative representation of different conditions.

      Thank you for your suggestions. 

      i) In current study, we monitored the survival of mice with peritoneal bacterial infection for 48 h.

      Because abdominal bacterial infection can induce severe sepsis and cause mouse death within 40 h (Sci Adv 2019, 5(7), eaax1946), the 48 h is sufficient to evaluate the therapeutic efficacy of antimicrobial peptides (Nat Biotechnol 2019, 37(10), 1186-1197).

      ii) In Figure 2f and 2g (3f and 3g in the revised manuscript), the y-axis has already been in log-scale and tick labels are marked in scientific notation.

      (5) I had difficulty reading the story because of the use of acronyms without referring to their full name for the first time, and incomplete annotation in figures and captions.

      Thank you for pointing this. We have checked the manuscript carefully and modified the figure captions during revision. 

      Reviewer #2 (Recommendations For The Authors):

      (1) To validate the generalizability of the model, it would be prudent to include data on AMPs targeting a broader range of bacteria and viruses. This could help ensure that the peptides designed are not narrowly focused on E. coli but are effective against a more extensive set of pathogens. 

      Thanks for your suggestions. We just incorporated AMPs with E. coli activity labels since it is the most common strain among available AMP databases. As for a regressive model (AMPredictor), the fitting object should be defined concisely, which means limited targeting bacteria. Some other articles also focused on E. coli labels as well (Nat Commun 2023, 14, 7197; mSystems 2023, 8, e0034523). 

      We used the same processed dataset to train the GAN generator for simplification. Most reported AMPs have the potential to target various microbes. We have counted the antimicrobial labels of these peptides in our dataset, shown in Figure S1b. In addition to E. coli, some of the peptides target Grampositive S. aureus, fungus C. albicans, and other bacterial species as well. Our experimental validation also reveals the wide spectrum of designed peptides inhibiting Gram-negative, Gram-positive, drugresistant bacteria, and enveloped viruses. With the expansion of well-curated AMP databases, we expect to update the model with larger scale datasets in the near future. 

      (2) Conduct sensitivity analyses to understand how minor changes in the peptide sequences impact the model’s predictions. This will reduce the chances of overlooking potential AMP candidates due to the model’s inability to capture subtle changes.

      Thank you for this valuable suggestion. We kept similar known peptide sequences in the training sets regarding that a single mutation may have an impact on their antimicrobial performances. We took P001 as an example to perform the sensitivity analysis by site saturation mutagenesis in silico. Author response image 2 represents the change of antimicrobial activity scores as predicted by AMPredictor. Since the predicted MIC of P001 is 0.949 µM (experimentally measured value is 0.80 µM), most single mutations lead to higher scores (i.e., worse performance), especially Asp (D) and Glu (E) residues with negative charges. The largest change value of single amino acid replacement is 25.51 (W6D). Although this value may not reflect the actual changes, it is enough to be distinguished when screening and ranking candidate sequences.

      Author response image 2.

      Site saturated mutagenesis of P001. Color shows the change of predicted MIC against E. coli as predicted by AMPredictor (lower score is better).

      (3) Given the relatively short length of the peptides, typically ranging from 10 to 20 residues, the authors might consider employing a fully-connected graph in the peptide’s graphical representation. This approach could potentially simplify the model without sacrificing the descriptive power due to the limited size of the peptides.

      Thanks for your suggestions. We tested fully-connected graph edge encodings and the results on the test set were shown in Author response table 2 below. We found that AMPredictor with peptide contact map still performed better on Pearson correlation coefficient and CI, while using fully-connected graphs reached a slightly improved RMSE and MSE. Nonetheless, using fully-connected graph demands about 10time memory and more computational costs when processing more complicated message-passing. Therefore, the involvement of structural information is still a preferred choice.

      Author response table 2.

      Results of AMPredictor with different graph edge encodings

      (4) Upon reviewing Table S1, it is apparent that the application of ESM embeddings alone achieves commendable prediction accuracy. It would be intriguing to investigate whether the adoption of the more recent ESM models-specifically the second-generation ESM2 t36_3B, t48_15B, and t33_650Mcould enhance model performance beyond that observed using the esm1b_t33_650M_UR50S model described in the manuscript. 

      Thanks for your suggestions. Here, we included various ESM2 models’ outputs as our node features and presented the results in Author response table 3. Notably, the dimensions of esm2_t36_3B and esm2_t48_15B are 2560 and 5120, respectively, while both esm2_t33_650M and esm1b_t33_650M are 1280 dimensions. 

      Interestingly, we found that larger models don’t lead to improved performance. ESM-1b version still holds the best metrics in RMSE, MSE, and Pearson correlation coefficient. This indicates that the choice of pretrained model versions depended on specific downstream tasks. 

      Author response table 3.

      Results of AMPredictor with different ESM versions

      (5) It may be pertinent to reevaluate the use of the MM-PBSA approach within the scope of this study. Typically, MM-PBSA is utilized to estimate the free energy differences between the bound and unbound states of solvated molecules. The application of MM-PBSA is to calculate binding energies between proteins and membranes is unconventional and infrequently documented in the literature. Therefore, it is recommended that the authors consider omitting this portion of the manuscript, or provide a robust justification for its inclusion and application in this context.

      Thanks for your comments on MM/PBSA methods. There have been several literatures using this approach to calculate peptide-membrane binding free energy (Langmuir 2016, 32, 1782-1790; J Cell Biochem 2018, 119, 9205-9216; J Chem Inf Model 2019, 59, 3262-3276; Molecular Therapy Oncolytics 2019, 16, 7-19; Microbiology Spectrum 2023, 11, e0320622; J Chem Inf Model 2023, 63, 5823-5833) and we referred to their settings, such as the dielectric constant. All of these works built similar all-atom systems including cationic antimicrobial peptides and membrane bilayers, and utilized MM/PBSA method to describe the absorption process of the peptide from an unbound initial state. The order of magnitude of our calculation results is consistent with other reported works. Additionally, computational results may provide supporting evidence and we discussed that this quantitative energy calculation should be considered along with other observed metrics. 

      Reviewer #3 (Recommendations For The Authors):

      The weaknesses I mentioned in the Public Review may be addressed by improving the writing and presentation and corrections to the text and figures.

      Thanks for your suggestion. We have carefully checked and improved the presentation of text and figures in the revised manuscript.

    1. eLife Assessment

      This study is important, and the findings add substantially to the evidence base regarding CCR5 antagonist drugs for neuroprotection and stroke management. The authors adhered to the expected systematic review and meta-analysis standards, and the presented evidence is convincing.

    2. Joint Public Review:

      This is an interesting, timely, and high-quality study on the potential neuroprotective capabilities of C-C chemokine receptor type 5 (CCR5) antagonists in ischemic stroke. The focus is on preclinical investigations.

      An outstanding feature is that stroke patient representatives have directly participated in the work. Although this is often called for, it is hardly realized in research practice, so the work goes beyond established standards.

      The included studies were assessed regarding the therapeutic impact and their adherence to current quality assurance guidelines such as STAIR and SRRR, another important feature of this work. While overall results were promising, there were some shortcomings regarding guideline adherence.

      The paper is very well written and concise yet provides much highly useful information. It also has very good illustrations, and extremely detailed and transparent supplements.

      [Editors' note: The authors have responded appropriately to the comments shared by the reviewers. The authors have provided a good academic justification for not needing to update the literature search, as one of the reviewers had suggested.]

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper is well-organized, with clearly defined sections. The systematic review methodology is thorough, with clear eligibility criteria, search strategy, and data collection methods. The risk of bias assessment is also detailed and useful for evaluating the strength of evidence. The involvement of a patient panel is noticeable and positive, ensuring the research addresses real-world concerns and aligning scientific inquiry with patient perspectives. The statistical approach used for analyzing seems appropriate.

      The authors are encouraged to take into account the following points:

      As the authors have acknowledged, there is a high risk of bias across all included studies, particularly in randomization, selective outcome reporting, and incomplete data, which could be highlighted more explicitly in the paper's discussion section, particularly the potential implications for the generalizability of the results. The authors can also suggest mitigation strategies for future studies (e.g., better randomization, blinding, reporting standards, etc.).

      We agree that it is important to highlight mitigation strategies that will allow preclinical researchers to more transparently report future studies. We have directed readers to ensure reporting in alignment with the ARRIVE 2.0 guidelines for further details on reporting of preclinical studies, as follows in paragraph two of the Discussion, “Future studies should carefully incorporate all elements of the ARRIVE 2.0 guidelines to help ensure that all results are transparently reported and improve confidence in the findings.(41)”

      None of the studies include female animals, and the use of young adult animals (instead of aged models) limits the applicability of the findings to the human stroke population, where stroke incidence is higher in older adults and perhaps the gender issue must be included to reflect the translational aspects. The authors can add to the paper's discussion section that perhaps future preclinical studies should include both sexes and aged animals to align better with the clinical population and improve the translation of findings. Another point is the comorbidity. Comorbidities such as diabetes and hypertension are prevalent in stroke patients. How can these be considered in preclinical designs? The authors should emphasize the importance of future research incorporating such comorbid models to enhance clinical relevance. None of the studies had independent replication of their findings, which is a key limitation, especially for a field with high translational expectations. This should be highlighted as a critical next step for validating the efficacy of CCR5 antagonists.

      We agree that these are important evidence gaps to address. Although we highlighted these gaps in paragraph 3 of the Discussion, we have now added a more explicit call to action for researchers to address these gaps at the end of the relevant paragraph as follows, “Future preclinical research should aim to address these evidence gaps to further increase the clinical relevance and comprehensiveness of evidence for CCR5 antagonists in stroke.”

      The studies accessed limited cognitive outcomes (only one reported a cognitive outcome). Given the importance of cognitive recovery post-stroke, this is a gap to highlight in the discussion. Future studies should include more diverse and comprehensive behavioral assessments, including cognitive and emotional domains, to fully evaluate the therapeutic potential.

      We have expanded on this important point in paragraph four of the Discussion, which explores the alignment of the preclinical literature to the CAMAROS trial, as follows, “Finally, clinically relevant secondary outcomes in the CAMAROS trial, such as cognitive and emotional domains as measured by the Montreal Cognitive Assessment (MoCA) and Stroke Aphasia Depression Questionnaire (SADQ) were not modelled in the preclinical literature. Although one study included a cognitive outcome, the other treatment parameters of this study were not aligned to the CAMAROS trial. Future preclinical studies should assess a more diverse and comprehensive battery of clinically relevant behavioural tasks, which could be based on the range of outcomes employed in the CAMAROS trial, or those found in the SRRR recommendations.(9)”

      This addition highlights the lack of supporting preclinical evidence for cognitive recovery post-stroke. We also offer recommendations on discrete ways to address this gap in future preclinical studies by taking inspiration from the outcomes used in CAMAROS as well as the SRRR guidelines used throughout our assessment of the CCR5 literature.  

      The timing of CCR5 administration across studies varies widely (from pre-stroke to several days post-stroke) complicating the interpretation and comparison of results. The authors are encouraged to add that future preclinical studies could focus on narrowing the therapeutic window to more clinically relevant time points.

      We agree with the review and feel that this recommendation is currently captured in paragraph three of our Discussion -  “However, demonstration of efficacy under a wider range of conditions, such as in aged animals, females, animals with stroke-related comorbidities, more clinically relevant timing of dose administrations, or in conjunction with rehabilitative therapies are necessary to provide further confidence in these findings.” As mentioned above, we added a new sentence to the end of this paragraph to make it more explicit that these are gaps that should be addressed by future preclinical research. “Future preclinical research should aim to address these evidence gaps to further increase the clinical relevance and comprehensiveness of evidence for CCR5 antagonists in stroke.” We also added the word “clinically” to the original sentence mentioned above to more explicitly align with the reviewer’s recommendation.

      The paper identifies some alignment with clinical trials, but there are several gaps, too, particularly in the types of behavioral tests used in preclinical studies versus those in clinical trials. If this systematic review and meta-analysis aim to formulate a set of recommendations for future studies, it is important that the authors also propose specific preclinical behavioral tasks that could better align with clinical measures used in trials, like functional assessments related to human stroke outcomes.

      As mentioned above, we added a sentence to Discussion paragraph four, the comparison to the CAMAROS trial, that provides recommendations as to the behavioural tasks that would be useful to employ in future studies. Namely, “Future preclinical studies should assess a more diverse and comprehensive battery of clinically relevant behavioural tasks, which could be modelled after the range of outcomes employed in the CAMAROS trial, or those found in the SRRR recommendations.(9)” The SRRR recommendations that we reference here provide discrete consensus recommendations for interested readers on behavioural task selection, as well as priority rankings based on rodent species, to better align with clinical measures used in trials.

      The discussion needs some revisions. It could benefit from an expanded explanation of CCR5's mechanistic role in neuroplasticity and stroke recovery. For instance, linking CCR5 antagonism more closely with molecular pathways related to synaptic repair and remyelination would enhance the quality of the discussion and understanding of the drugs' potential.

      We have provided a synthesis of CCR5’s proposed mechanistic roles in the Supplementary Materials, Figure S1 (for a summary pathway diagram), and Table S3 (for a list of potential mechanistic pathways and supporting evidence presented in each paper). Given our focus on study quality and alignment with translational recommendations, we felt that it was more appropriate to not focus on mechanistic elements in the Discussion.  Indeed, the appraisal of the quality of support for each potential mechanism was beyond the scope of our present analysis.  

      While the tool is used to assess the risk of bias, it might be helpful to integrate a broader framework for evaluating the quality of included studies. This could include sample size justifications, statistical power analysis, or the use of pre-registration in animal studies. These elements can also introduce bias or minimize those if in place.

      We agree these are important and the SYRCLE risk of bias tool we used addresses many major domains of bias mentioned by the reviewer (e.g., selection bias, performance bias, detection bias, attrition bias, reporting bias). For example, the SYRCLE item of  “selective outcome reporting” domain address pre-registration by asking “Was the study protocol available and were all of the study’s pre-specified primary and secondary outcomes reported in the current manuscript?”. The SYRCLE Risk of Bias tool represents the current state of the art for risk of bias assessment in preclinical systematic reviews and aligns well with similar tools used clinically, such as the Cochrane Risk of Bias tool. Although the tool does not assess statistical power, we would note that this is considered to be a separate issue from internal validity, and it is the reason this is not even assessed by the Cochrane risk of bias tool used in clinical systematic reviews. 

      Please also highlight confounding factors that might have influenced the results in the included studies, such as variation in stroke models, dosing regimens, or behavioral assessment methods.

      We agree that exploring potential confounding factors is an important element of the assessment. We highlight potential confounding factors in several parts of the Results and Discussion, such as in our Synthesis of Behavioural Outcomes section, “…equivalent infarct volumes were not demonstrated between the treated and control groups in this cohort, which could potentially lead to confounding effects.” and Comprehensiveness of Preclinical Evidence section, “All studies tested both behavioral and histological outcomes and demonstrated neuroprotective effects, but most studies failed to measure and control post-stroke temperature, which could potentially confound the observed neuroprotection (Table S4).(32) Most histological measurements were also assessed at <72 hours, which could confound the observed neuroprotective effects if cell death was merely delayed.(32) For CCR5 antagonists as a post-stroke recovery-inducing treatment, one experiment assessed the effects of initiating CCR5 administration in a similar post-stroke phase as the CAMAROS trial. This experiment (Joy et al.)(6) did not demonstrate that each treatment group had equivalent baseline stroke volumes, which may potentially confound observed behavioral effects.”

      Although there are many factors that could potentially confound the observed results, we believe that we have addressed some of the most prominent examples that are known in the preclinical stroke literature. We expanded our statement in the final sentence of the Results to highlight this, “Overall, our assessments highlight a variety of knowledge gaps, potential confounding factors, and areas of misalignment between the preclinical evidence and clinical trial parameters that could be improved with further preclinical experimentation.

      There is some discussion of the meta-analysis' limitations due to the few studies, but this point could be more thoroughly addressed. Please consider including a more critical discussion of the limitations of pooling data from heterogeneous study designs, stroke models, and outcome measures. What can this lead to? Is it reliable to do so, or does it lack scientific rigor? The authors are encouraged to formulate a balanced discussion adding, positive and negative aspects.

      We appreciate the reviewer’s insightful comment regarding the limitations related to pooling data from heterogeneous study designs, stroke models, and outcome measures. We have added to the original limitations described in the first paragraph of our Discussion with additional text to provide a better balance about the potential risks and benefits of the meta-analysis strategy that we undertook in the present study.

      “Pooling data across heterogenous experimental designs, animal/stroke models, and treatment parameters, as we have done with the infarct volume analysis in the present study, can introduce variability that increases the risk of overestimating or underestimating the true effect of the intervention.(38) Treatment effects observed across model systems and therapeutic compounds may represent different biological mechanisms. Despite this potential limitation, meta-analysis can provide valuable insights, especially in preclinical settings where the sample sizes of individual studies may be too small to detect significant effects on their own. In these cases, pooling data across studies can help identify overarching estimates of benefits and harm, highlight subgroups of interest, and help guide areas of future research. As described in the results above, we attempted to mitigate the risks of inappropriate data pooling through careful investigation of heterogeneity, subgroup analyses, and differentiation between outcomes where we felt that meta-analytic pooling was (infarct volume) and was not (behavioural outcomes) appropriate. Overall, we believe that our results indicate that further investigation is warranted to determine the optimal timing of administration and behavioral domains under which CCR5 antagonists exhibit the strongest post-stroke neuroprotective and recovery-inducing effects.”

      The conclusion should more explicitly acknowledge that while CCR5 antagonists show potential, the findings are still preliminary due to the limitations in the preclinical studies (high bias risk, lack of diverse animal models). Overall, the conclusion can end with a call for rigorous, well-controlled, and replicated studies with improved alignment to clinical populations and trials to show that the conclusion remains inconclusive, considering what has been analyzed here.

      We modified our concluding paragraph to highlight that the current evidence should be considered preliminary, as follows, “In conclusion, CCR5 antagonists show promise in preclinical studies for stroke neuroprotection, corresponding reduction in impairment, as well as improved functional recovery related to neural repair in the late sub-acute/early chronic phase. However, high risk of bias and the limited (or no) evidence in clinically relevant domains underscore the need for more rigorous and transparent preclinical research to further strengthen the current preliminary evidence available in the literature.”

      Reviewer #2 (Public review):

      Summary:

      This is an interesting, timely, and high-quality study on the potential neuroprotective capabilities of C-C chemokine receptor type 5 (CCR5) antagonists in ischemic stroke. The focus is on preclinical investigations.

      Strengths:

      The results are timely and interesting. An outstanding feature is that stroke patient representatives have directly participated in the work. Although this is often called for, it is hardly realized in research practice, so the work goes beyond established standards.

      The included studies were assessed regarding the therapeutic impact and their adherence to current quality assurance guidelines such as STAIR and SRRR, another important feature of this work. While overall results were promising, there were some shortcomings regarding guideline adherence.

      The paper is very well written and concise yet provides much highly useful information. It also has very good illustrations and extremely detailed and transparent supplements.

      Weaknesses:

      Although the paper is of very high quality, a couple of items that may require the authors' attention to increase the impact of this exciting work further. Specifically:

      Major aspects:

      (1) I hope I did not miss that (apologies if I did), but when exactly was the search conducted? Is it possible to screen the recent literature (maybe up to 12/2024) to see whether any additional studies were published?

      We added the following statements to the “Information sources and search strategy” section of Materials and Methods to clarify the timing and intention of our search strategy, “The search was conducted October 25, 2022, to align with the listed launch date of the CAMAROS trial (September 15, 2022). Our intention in doing so was to collate and assess all preclinical evidence that could have feasibly informed the clinical trial. We sought to assess the comprehensiveness of evidence and readiness for translation of CCR5 antagonist drugs at the time of their actual translation into human clinical trials, as well as the alignment of the CAMAROS trial design to the existing preclinical evidence base.”

      Although we agree that an update of the search provides valuable information for the field, we believe that the studies entering the literature after the launch of the CAMAROS trial fill a different conceptual niche than those prior to trial launch (since newer preclinical studies explicitly did not inform decisions to move to clinical trials or clinical trial design). It is our view that newer studies should be assessed from a lens of how effectively they close knowledge gaps that were present at trial launch and emulate the conditions of clinical trial populations and design parameters (which represent the de facto most “clinically relevant” conditions). Such an analysis would require a different approach that is outside the scope and aims of the present study. The present study provides an assessment of the preclinical literature up to the date of the translation of CCR5 antagonist drugs into human clinical trials (via the CAMAROS trial), which we believe will serve as a valuable prospective benchmark for evaluating the predictiveness of preclinical evidence after the results of the CAMAROS trial emerge.

      (2) Please clearly define the difference between "study" and "experiment," as this is not entirely clear. Is an "experiment" a distinct investigation within a particular publication (=study) that can describe more than one such "experiment"? Thanks for clarifying.

      We have now added definitions for “studies” and “experiments” immediately after the first time they are mentioned in paragraph one of the Study Selection section of Results, as follows: “Herein, “studies” refer to the published articles as a unit, while “experiments” refer to distinct investigations within each published article used to test various hypotheses (i.e., a subunit of “studies” comprised of a select cohort of animals).”

      (3) Is there an opportunity to conduct a correlation analysis between the quality of a study (for instance, after transforming the ROB assessment into a kind of score) and reported effect sizes for particular experiments or studies? This might be highly interesting.

      This is an interesting suggestion, which under different circumstances could provide insights into potential associations between study quality and effect size, as have been observed in the literature (e.g., Macleod et al., 2008; PMID:18635842). However, we are unable to assess this relationship in the present dataset as all studies were scored as “high risk of bias”, meaning that there was no variability in terms of observed study quality.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Minor aspects:

      (1) The scope of the work is perfectly in line with very recent STAIR recommendations, which strongly suggest assessing potential interventions that may augment impact and improve outcomes in recanalization procedures (Wechsler et al., doi: 10.1161/STROKEAHA.123.044279; PMID 37886850). The authors may to discuss their work in light of these recent recommendations.

      We thank the reviewer for highlighting the more recent STAIR recommendation document, as well as its focus on assessing interventions in conjunction with recanalization procedures. An item related to the importance of combining novel interventions with established recanalization procedures was included as part of Table S4 but was not highlighted in the main text. We have added to the final paragraph of the Results section “Comprehensiveness of preclinical evidence” to highlight that no studies tested CCR5 antagonist drugs in conjunction with recanalization procedures as follows, “…no studies assessed behavioural effects on upper extremity skilled reaching / grasping or potential interactions of CCR5 antagonists with rehabilitative therapies or established recanalization procedures (Table S4).(35–38)” The Weschler reference provided by the reviewer has now been cited as well.

      (2) The authors may wish to consider the term "cerebroprotective" rather than "neuroprotective" unless neurons are the only cells to which a respective statement applies.

      We agree that “cerebroprotective” is the more appropriate term and have thus substituted it wherever we previously used “neuroprotective”.

      (3) The paper features a mixture between American (e.g.," hemorrhagic") and British English (e.g., "favours"). Although this is not untypical for Canadian English, deciding on one or the other may be an option.

      Given eLife’s basis in the UK, we have modified the language used throughout to be consistent with British English style.

    1. eLife Assessment

      This manuscript provides valuable mechanistic insight into NSCLC progression, both in terms of tumour metastasis and the development of chemoresistance. The authors draw upon a range of techniques and assays and the evidence shown is solid and has been strengthened by incorporation of suggestions by the two reviewers. The work presented will be of interest to cancer biologists and more broadly to those interested in NSCLC translational studies.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Phosphodiesterase 1A Physically Interacts with YTHDF2 and Reinforces the Progression of Non-Small Cell Lung Cancer" explores the role of PDE1A in promoting NSCLC progression by binding to the m6A reader YTHDF2 and regulating the mRNA stability of several novel target genes, consequently activating the STAT3 pathway and leading to metastasis and drug resistance.

      Strengths:

      The study addresses a novel mechanism involving PDE1A and YTHDF2 interaction in NSCLC, contributing to our understanding of cancer progression.

    3. Reviewer #2 (Public review):

      Summary

      This revised manuscript investigates the role and the mechanism by which PDE1 impacts NSCLC progression. They provide evidence to demonstrate that PDE1 binds to m6A reader YTHDF2, in turn, regulating STAT3 signaling pathway through its interaction, promoting metastasis and angiogenesis.

      Strength:

      The study uncovers a novel PDE1A/YTHDF2/SOCS2/STAT3 pathway in NSCLC progression and the findings provide a potential treatment strategy for NSCLC patients with metastasis.

      Weakness:

      In discussion, it is stated in the revised version that "the role of YTHDF2 in PDE1A-driven tumor metastasis should be elucidated in future studies", however, given that physical interaction of PDE1A and YTHDF2 plays a critical role in PDE1A-mediated NSCLC metastasis, whether YTHDF2 mimicking the effect of PDE1A in metastasis will strength the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript entitled "Phosphodiesterase 1A Physically Interacts with YTHDF2 and Reinforces the Progression of Non-Small Cell Lung Cancer" explores the role of PDE1A in promoting NSCLC progression by binding to the m6A reader YTHDF2 and regulating the mRNA stability of several novel target genes, consequently activating the STAT3 pathway and leading to metastasis and drug resistance.

      Strengths:

      The study addresses a novel mechanism involving PDE1A and YTHDF2 interaction in NSCLC, contributing to our understanding of cancer progression.

      Weaknesses:

      The following issues should be addressed:

      (1) The body weight changes and/or survival times of each group in the in vivo metastasis studies should be provided.

      Thank you for your suggestion! We have already provided the body weight of each group in the in vivo metastasis studies in FigureS4D and FigureS5D (see below).

      (2) In Figure 7, the direct binding between YTHDF2 and the potential target genes should be further validated by silencing YTHDF2 to observe the half-life of the mRNA levels of target genes, in addition to silencing PDE1A.

      Thank you for your suggestion! We have found that siYTHDF2 does not significantly affect expression of SOCS2 in NSCLC cells (see author response image 1 below). We hypothesize that YTHDF2 functions as a m6A reader to recognize the target mRNA, thus if YTHDF2 is silence by siRNA, there is still some expression in the cells, allowing it to continue recognizing and exerting its function. Therefore, the mRNA of SOCS2 could not significantly affect expressed. However, PDE1A functions as a degrader of mRNA, thus when it is disrupted, the mRNA degradation effect could be strong.

      Author response image 1.

      SOCS2 mRNA expression after siYTHDF2 in NSCLC cells

      (3) In Figure 7, the potential methylation sites of "A" on the target genes such as SOCS2 should be verified by mutation analysis, followed by m6A IP or reporter assays.

      Thank you for your suggestion! The m6A IP or reporter assays may be carried out to detect the potential methylation sites in future. We have added the suggestion in manuscript “Meanwhile, YTHDF2 might act as an m6A RNA “reader” by interacting with PDE1A, but the mechanism might need further investigation”.

      (4) In Figure 6G, the correlation between the mRNA levels of STAT3 and YTHDF2 needs clarification. According to the authors' mechanism, the STAT3 pathway is activated, rather than upregulation of mRNA levels (or protein levels, as shown in Figure 6F). Figure 7 does not provide evidence that STAT3 is a bona fide target gene regulated by YTHDF2.

      Thank you for your suggestion! The reviewer is right, STAT3 pathway is activated, rather than upregulation of mRNA levels by YTHDF2, so the relationship between YTHDF2 mRNA and STAT3 mRNA is not suitable for this study. Meanwhile, the relationship between YTHDF2 mRNA and STAT3 mRNA is not as strong as we expected with Pearson value 0.37. Thus, we have already deleted Figure 6G in the revised version.

      (5) The final figure, which discusses sensitization to cisplatin by PDE1A suppression, does not appear to be closely related to the interaction or regulation of PDE1A/YTHDF2. If the authors claim this is an m6A-associated event, additional evidence is needed. Otherwise, this part could be removed from the manuscript.

      Thank you for your suggestion! We have already deleted Figure 8 just as the reviewer suggested.

      Reviewer #2 (Public review):

      This manuscript aims to investigate the biological impact and mechanisms of phosphodiesterase 1A (PDE1A) in promoting non-small cell lung cancer (NSCLC) progression. They first analyzed several databases and used three established NSCLC cell lines and a normal cell line to demonstrate that PDE1A is overexpressed in lung cancer and its expression negatively correlated with the outcomes of patients. Based on this data, they suggested PDE1A could be considered as a novel prognostic predictor in lung cancer treatment and progression. To study the biological function of PDE1A in NSCLC, they focused on testing the effect of inhibition of PDE1A genetically and pharmacologically on cell proliferation, migration, and invasion in vitro. They also used an experimental metastasis model via tail vein injection of H1299 cells to test if PDE1A promoted metastasis. By database analysis, they also decided to investigate if PDE1A promoted angiogenesis by co-culturing NSCLC cells with HUVECs as well as assessing the tumors from the subcutaneous xenograft model. However, in this model, whether PDE1A modulation impacted tumor metastasis was not examined. To address the mechanism of how PDE1A promotes metastasis, the authors again performed a bioinformatic and GSEA enrichment analysis and confirmed PDE1A indeed activated STAT3 signaling to promote migration. In combination with IP followed by Mass spectrometry, they found PDE1A is a partner of YTHDF2, the cooperation of PDE1A and YTHDF2 negatively regulated SOCS2 mRNA as demonstrated by RIP assay, and ultimately activated STAT3 signaling. Finally, the authors shifted the direction from metastasis to chemoresistance, specifically, they found that PDEA1 inhibitions sensitized NSCLC cells to cisplatin through MET and NRF2 signaling.

      Strength:

      Overall, the manuscript was well-written and the majority of the data supported the conclusions. The authors used a series of methods including cell lines, animal models, and database analysis to demonstrate the novel roles and mechanism of how PDE1 promotes NSCLC invasion and metastasis as well as cisplatin sensitivity. Given that PDE1A inhibitors have been perused to use in clinic, this study provided valuable findings that have the translational potential for NSCLC treatment.

      Weaknesses:

      The role of YTHDF2 in PDE1A-promoted tumor metastasis was not investigated. To make the findings more clinical and physiologically relevant, it would be interesting to test if inhibition of PDE1A impacts metastasis using lung cancer orthotopic and patient-derived xenograft models. It is also important to use a cisplatin-resistant NSCLC cell line to test if a PDE1A inhibitor has the potential to sensitize cisplatin in vitro and in vivo.

      Thank you for your suggestion! The role of YTHDF2 in PDE1A-promoted tumor metastasis may need in vivo analysis. Therefore, we discussed the point in the discussion section “In addition, it is worth testing if PDE1A inhibition affects metastasis in lung cancer orthotopic and patient-derived xenograft models. The role of YTHDF2 in PDE1A-driven tumor metastasis should be elucidated in future studies”.

      The reviewer is absolutely right, it is very important to use a cisplatin-resistant NSCLC cell line to test the potential effect of PDE1A in sensitization to cisplatin. The current data could not support the conclusion, more data is needed to make the final conclusion. As suggested by reviewer 1, we have deleted these data in this version.

      Furthermore, this study relied heavily on different database analyses, although providing novel and compelling data that was followed up and confirmed in the paper, it is critical to have detailed statistical description section on data acquisition throughout the manuscript.

      Thank you for your suggestion! We have already added the detailed statistical description section in Figure legends.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Scale Bar Display: Scale bars should be included in Figures 4F, 5F, and 6E to ensure clarity and accuracy in the presented microscopic images.

      Thank you for your suggestion! We have already added the scale bars on Figures 4F, 5F, and 6E.

      (2) HE Staining Images: The authors are suggested to provide more images for HE staining of lungs to offer a comprehensive visual representation and to substantiate the findings.

      Thank you for your suggestion! We have already provided more images for HE staining of lungs in Figure S4E and Figure S5E.

      Reviewer #2 (Recommendations for the authors):

      It would be helpful to clarify several points in the manuscript for better understanding.

      (1)The HELF cells were stated between the epithelial cell line (page 7, line 118) and fibroblast (page 12, line 288) which needs to be clarified. It is not clear if the cells used in this study were periodically authenticated.

      Thank you for your suggestion! We have already revised the expression of HELF cells, and it is actually the human lung fibroblasts.

      (2) More details could be added to the methods such as the amount of Matrigel coated for invasion assay and the components for the lysis buffer and IP buffer.

      Thank you for your suggestion! We have already added more details in the Methods section.

      (3) Providing the rationale for using 20% FBS instead of using some chemoattracts such as EGF, LPA, or HGF or a low level of FBS for migration will be helpful.

      Thank you for your suggestion! Although chemoattracts are suitable for cell migration experiment, and 20% FBS is also suitable for cell migration experiment. We listed the literatures using this system below for example.

      (1) Xiaolin Peng, Zhengming Wang, Yang Liu. et al. Oxyfadichalcone C inhibits melanoma A375 cell proliferation and metastasis via suppressing PI3K/Akt and MAPK/ERK pathways, Life Sciences, 2018, 206, 35-44. https://doi.org/10.1016/j.lfs.2018.05.032

      (2) Rong, S., Dai, B., Yang, C. et al. HNRNPC modulates PKM alternative splicing via m6A methylation, upregulating PKM2 expression to promote aerobic glycolysis in papillary thyroid carcinoma and drive malignant progression. J Transl Med, 2024, 22, 914 (2024). https://doi.org/10.1186/s12967-024-05668-9

      (4) For HPA analysis In Figure 1, it would be great to assess how many lung cancer cases are NSCLC and define IDO/area for the y-axis.

      Thank you for your suggestion! There are 19 samples were analyzed, they are all NSCLC sample, and we have already revised our manuscript accordingly. Meanwhile, we also made a mistake, it should be IOD/area which means Integral optical density/area. We have revised the Figures and Figure legends.

      (5) On page 23, line 480, "Therefore, this study reveals the effect and mechanism of PDEA1 in promoting HCC metastasis...", should HCC be NSCLC?

      Thank you for your suggestion! We have already revised the manuscript accordingly.

      (6) Specific scramble siRNAs should be clearly shown in their respective figures. In Figure 7F, it is not clear why DMSO did not scramble siRNA was used as the control.

      Thank you for your suggestion! It is our fault to show the DMSO in Figure 5F, DMSO is the negative control of Figure 5G, and we have revised the Figure 5F and 5G accordingly.

  2. Feb 2025
    1. eLife Assessment

      This valuable study addresses a central question in systems neuroscience (validation of active inference models of exploration) using a combination of behaviour, neuroimaging, and modelling. The data provided offers solid evidence that humans do perceive, choose and learn in a manner consistent with the essential ingredients of active inference, and that quantities that correlate with relevant parameters of this active inference scheme are encoded in different regions of the brain.

    2. Reviewer #1 (Public review):

      Summary:

      This paper presents a compelling and comprehensive study of decision-making under uncertainty. It addresses a fundamental distinction between belief-based (cognitive neuroscience) formulations of choice behavior with reward-based (behavioral psychology) accounts. Specifically, it asks whether active inference provides a better account of planning and decision making, relative to reinforcement learning. To do this, the authors use a simple but elegant paradigm that includes choices about whether to seek both information and rewards. They then assess the evidence for active inference and reinforcement learning models of choice behavior, respectively. After demonstrating that active inference provides a better explanation of behavioral responses, the neuronal correlates of epistemic and instrumental value (under an optimized active inference model) are characterized using EEG. Significant neuronal correlates of both kinds of value were found in sensor and source space. The source space correlates are then discussed sensibly, in relation to the existing literature on the functional anatomy of perceptual and instrumental decision-making under uncertainty.

      Comments on revisions:

      Many thanks for attending to my previous comments. I think your manuscript is now easier to read - and your new (Bayesian) analyses are described clearly.

    3. Reviewer #3 (Public review):

      Summary:

      This paper aims to investigate how the human brain represents different forms of value and uncertainty that participate in active inference within a free-energy framework, in a two-stage decision task involving contextual information sampling, and choices between safe and risky rewards, which promotes shifting between exploration and exploitation. They examine neural correlates by recording EEG and comparing activity in the first vs second half of trials and between trials in which subjects did and did not sample contextual information, and perform a regression with free-energy-related regressors against data "mapped to source space."

      Strengths:

      This two-stage paradigm is cleverly designed to incorporate several important processes of learning, exploration/exploitation and information sampling that pertain to active inference. Although scalp/brain regions showing sensitivity to the active-inference related quantities do not necessarily suggest what role they play, they are illuminating and useful as candidate regions for further investigation. The aims are ambitious, and the methodologies are impressive. The paper lays out an extensive introduction to the free energy principle and active inference to make the findings accessible to a broad readership.

      Weaknesses:

      It is worth noting that the high lower-cutoff of 1 Hz in the bandpass filter, included to reduce the impact of EEG noise, would remove from the EEG any sustained, iteratively updated representation that evolves with learning across trials, or choice-related processes that unfold slowly over the course of the 2-second task windows. It is thus possible there are additional processes related to the active inference quantities that are missed here. This is not a flaw as one must always try to balance noise removal against signal removal in filter settings - it is just a caveat. As the authors also note, the regions showing up as correlated with model parameters change depending on source modelling method and correction for multiple comparisons, warranting some caution around the localisation aspect.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a compelling and comprehensive study of decision-making under uncertainty. It addresses a fundamental distinction between belief-based (cognitive neuroscience) formulations of choice behavior with reward-based (behavioral psychology) accounts. Specifically, it asks whether active inference provides a better account of planning and decision making, relative to reinforcement learning. To do this, the authors use a simple but elegant paradigm that includes choices about whether to seek both information and rewards. They then assess the evidence for active inference and reinforcement learning models of choice behavior, respectively. After demonstrating that active inference provides a better explanation of behavioral responses, the neuronal correlates of epistemic and instrumental value (under an optimized active inference model) are characterized using EEG. Significant neuronal correlates of both kinds of value were found in sensor and source space. The source space correlates are then discussed sensibly, in relation to the existing literature on the functional anatomy of perceptual and instrumental decision-making under uncertainty.

      We are deeply grateful for your careful review of our work and your suggestions. Your insights have helped us identify areas where we can strengthen the arguments and clarify the methodology. We hope to apply the idea of active inference to our future work, emphasizing the integrity of perception and action.

      Reviewer #1 (Recommendations For The Authors):

      Many thanks for attending to my previous suggestions. I think your presentation is now much clearer and nicely aligned with the active inference literature.

      There is one outstanding issue. I think you have overinterpreted the two components of epistemic value in Equation 8. The two components that you have called the value of reducing risk and the value of reducing ambiguity are not consistent with the normal interpretation. These two components are KL divergences that measure the expected information gain about parameters and states respectively.

      If you read the Schwartenbeck et al paper carefully, you will see that the first (expected information gain about parameters) is usually called novelty, while the second (expected information gain about states) is usually called salience.

      This means you can replace "the value of reducing ambiguity" with "novelty" and "the value of reducing risk" with "salience".

      For your interest, "risk" and "ambiguity" are alternative ways of decomposing expected free energy. In other words, you can decompose expected free energy into (negative) expected information gain and expected value (as you have done). Alternatively, you can rearrange the terms and express expected free energy as risk and ambiguity. Look at the top panel of Figure 4 in:

      https://www.sciencedirect.com/science/article/pii/S0022249620300857

      I hope that this helps.

      We deeply thank you for your recommendations about the interpretation of the epistemic value in Equation 8. We have now corrected them to Novelty and Salience:

      In addition, in order to avoid terminology conflicts with active inference and to describe these two different uncertainties, we replaced Ambiguity in the article with Novelty, referring to the uncertainty that can be reduced by sampling, and replaced Risk with Variability, referring to the uncertainty inherent in the environment (variance).

      Reviewer # 2 (Public Review):

      Summary:

      Zhang and colleagues use a combination of behavioral, neural, and computational analyses to test an active inference model of exploration in a novel reinforcement learning task..

      Strengths:

      The paper addresses an important question (validation of active inference models of exploration). The combination of behavior, neuroimaging, and modeling is potentially powerful for answering this question.

      I appreciate the addition of details about model fitting, comparison, and recovery, as well as the change in some of the methods.

      We are deeply grateful for your careful review of our work and your suggestions. And we are also very sorry that in our last responses, there were a few suggestions from you that we did not respond them appropriately in our manuscript. We hope to be able to respond to these suggestions well in this revision. Thank you for your contribution to ensuring the scientificity and reproducibility of the work.

      The authors do not cite what is probably the most relevant contextual bandit study, by Collins & Frank (2018, PNAS), which uses EEG.

      The authors cite Collins & Molinaro as a form of contextual bandit, but that's not the case (what they call "context" is just the choice set). They should look at the earlier work from Collins, starting with Collins & Frank (2012, EJN).

      We deeply thank you for your comments. Now we add the relevant citations in the manuscript (line 46):

      “These studies utilized different forms of multi-armed bandit tasks, e.g the restless multi-armed bandit tasks (Daw et al., 2006; Guha et al., 2010), risky/safe bandit tasks (Tomov et al., 2020; Fan et al., 2022; Payzan et al., 2013), contextual multi-armed bandit tasks (Collins & Frank, 2018; Schulz et al., 2015; Collins & Frank, 2012)”

      Daw, N. D., O'doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876-879.

      Guha, S., Munagala, K., & Shi, P. (2010). Approximation algorithms for restless bandit problems. Journal of the ACM (JACM), 58(1), 1-50.

      Tomov, M. S., Truong, V. Q., Hundia, R. A., & Gershman, S. J. (2020). Dissociable neural correlates of uncertainty underlie different exploration strategies. Nature communications, 11(1), 2371.

      Fan, H., Gershman, S. J., & Phelps, E. A. (2023). Trait somatic anxiety is associated with reduced directed exploration and underestimation of uncertainty. Nature Human Behaviour, 7(1), 102-113.

      Payzan-LeNestour, E., Dunne, S., Bossaerts, P., & O’Doherty, J. P. (2013). The neural representation of unexpected uncertainty during value-based decision making. Neuron, 79(1), 191-201.

      Collins, A. G., & Frank, M. J. (2018). Within-and across-trial dynamics of human EEG reveal cooperative interplay between reinforcement learning and working memory. Proceedings of the National Academy of Sciences, 115(10), 2502-2507.

      Schulz, E., Konstantinidis, E., & Speekenbrink, M. (2015, April). Exploration-exploitation in a contextual multi-armed bandit task. In International conference on cognitive modeling (pp. 118-123).

      Collins, A. G., & Frank, M. J. (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. European Journal of Neuroscience, 35(7), 1024-1035.

      Placing statistical information in a GitHub repository is not appropriate. This needs to be in the main text of the paper. I don't understand why the authors refer to space limitations; there are none for eLife, as far as I'm aware.

      We deeply thank you for your comments. We calculated the average t-value of the brain regions with significant results over the significant time, and added the t-value results to the main text and supplementary materials.

      In answer to my question about multiple comparisons, the authors have added the following: "Note that we did not attempt to correct for multiple comparisons; largely, because the correlations observed were sustained over considerable time periods, which would be almost impossible under the null hypothesis of no correlations." I'm sorry, but this does not make sense. Either the authors are doing multiple comparisons, in which case multiple comparison correction is relevant, or they are doing a single test on the extended timeseries, in which case they need to report that. There exist tools for this kind of analysis (e.g., Gershman et al., 2014, NeuroImage). I'm not suggesting that the authors should necessarily do this, only that their statistical approach should be coherent. As a reference point, the authors might look at the aforementioned Collins & Frank (2018) study.

      We deeply thank you for your comments. We have now replaced all our results with the results after false discovery rate correction and added relevant descriptions (line 357,358):

      “The significant results after false discovery rate (FDR) (Benjamini et al., 1995, Gershman et al., 2014) correction were shown in shaded regions. Additional regression results can be found in Supplementary Materials.”

      Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1), 289-300.

      Gershman, S. J., Blei, D. M., Norman, K. A., & Sederberg, P. B. (2014). Decomposing spatiotemporal brain patterns into topographic latent sources. NeuroImage, 98, 91-102.

      After FDR correction, our results have changed slightly. We have updated our Results and Discussion section.

      It should be acknowledged that the changes in these results may represent a certain degree of error in our data (perhaps because the EEG data is too noisy or because of the average template we used, ‘fsaverage’). Therefore, we added relevant discussion in the Discussion section (line527-529):

      “It should be acknowledged that our EEG-based regression results are somewhat unstable, and the brain regions with significant regression are inconsistent before and after FDR correction. In future work, we should collect more precise neural data to reduce this instability.”

      I asked the authors to show more descriptive comparison between the model and the data. Their response was that this is not possible, which I find odd given that they are able to use the model to define a probability distribution on choices. All I'm asking about here is to show predictive checks which build confidence in the model fit. The additional simulations do not address this. The authors refer to figures 3 and 4, but these do not show any direct comparison between human data and the model beyond model comparison metrics.

      We deeply thank you for your comments. We now compare the participants’ behavioral data and the model’s predictions trial by trial (Figure 5). We can clearly see the participants’ behavioral strategies in different states and trials and the model’s prediction accuracy. We have added the discussion related to Figure 5 (line 309-318):

      “Figure 5 shows the comparison between the active inference model and the behavioral data, where we can see that the model can fit the participants behavioral strategies well. In the “Stay-Cue" choice, participants always tend to choose to ask the ranger and rarely choose not to ask. When the context was unknown, participants chose the “Safe" option or the “Risky" option very randomly, and they did not show any aversion to variability. When given “Context 1", where the “Risky" option gave participants a high average reward, participants almost exclusively chose the “Risky" option, which provided more information in the early trials and was found to provide more rewards in the later rounds. When given “Context 2", where the “Risky" option gave participants a low average reward, participants initially chose the “Risky" option and then tended to choose the “Safe" option. We can see that participants still occasionally chose the “Risky" option in the later trials of the experiment, which the model does not capture. This may be due to the influence of forgetting. Participants chose the “Risky" option again to establish an estimate of the reward distribution.”

      Reviewer # 2 (Recommendations For The Authors):

      In the supplement, there are missing references ("[?]").

      Thank you very much for pointing out this. We have now fixed this error.

      Reviewer # 3 (Public review):

      Summary:

      This paper aims to investigate how the human brain represents different forms of value and uncertainty that participate in active inference within a free-energy framework, in a two-stage decision task involving contextual information sampling, and choices between safe and risky rewards, which promotes shifting between exploration and exploitation. They examine neural correlates by recording EEG and comparing activity in the first vs second half of trials and between trials in which subjects did and did not sample contextual information, and perform a regression with free-energy-related regressors against data "mapped to source space."

      Strengths:

      This two-stage paradigm is cleverly designed to incorporate several important processes of learning, exploration/exploitation and information sampling that pertain to active inference. Although scalp/brain regions showing sensitivity to the active-inference related quantities do not necessary suggest what role they play, they are illuminating and useful as candidate regions for further investigation. The aims are ambitious, and the methodologies impressive. The paper lays out an extensive introduction to the free energy principle and active inference to make the findings accessible to a broad readership.

      Weaknesses:

      In its revised form the paper is complete in providing the important details. Though not a serious weakness, it is important to note that the high lower-cutoff of 1 Hz in the bandpass filter, included to reduce the impact of EEG noise, would remove from the EEG any sustained, iteratively updated representation that evolves with learning across trials, or choice-related processes that unfold slowly over the course of the 2-second task windows.

      We are deeply grateful for your careful review of our work and your suggestions. We are very sorry that we did not modify our filter frequency (it would be a lot of work to modify it). Thank you very much for pointing this out. We noticed the shortcoming of the high lower-cutoff of 1 Hz in the bandpass filter. We will carefully consider the filter frequency when preprocessing data in future work. Thank you very much!

    1. eLife Assessment

      In this important study, Li and others identified cell membrane receptors for juvenile hormone (JH), a terpenoid hormone in insects that regulates their development and reproduction. While intracellular receptors for JH are well characterized, membrane receptors for JH have remained elusive for many years. The authors provide convincing evidence indicating that two receptor tyrosine kinases (RTKs), CAD96CA and FGFR1, modulate the genomic effects of JH by phosphorylating the intracellular receptors in the cotton bollworm, Helicoverpa armigera. Although differential functions of the two RTKs and potential effects of the other endogenous ligands of these RTKs on JH signaling remain unclear, this study lays a foundation for future studies.

    2. Reviewer #2 (Public review):

      Summary:

      Juvenile hormone (JH) is a pleiotropic terpenoid hormone in insects that mainly regulates their development and reproduction. In particular, its developmental functions are described as the "status quo" action, as its presence in the hemolymph (the insect blood) prevents metamorphosis-initiating effects of ecdysone, another important hormone in insect development, and maintains the juvenile status of insects.

      While such canonical functions of JH are known to be mediated by its intracellular receptor complex composed of Met and Tai, there have been multiple reports suggesting the presence of cell membrane receptor(s) for JH, which mediate non-genomic effects of this terpenoid hormone. In particular, the presence of receptor tyrosine kinases (RTKs) that phosphorylate Met/Tai in response to JH and thus indirectly affect the canonical JH signaling pathway has been strongly suggested. Given the importance of JH in insect physiology and the fact that the JH signaling pathway is a major target of insect growth regulators, elucidating the identify and functions of putative JH membrane receptors is of great significance form both basic and applied perspectives.

      In the present study, the authors identified candidate receptors for such cell membrane JH receptors, CAD96CA and FGFR1, in the cotton bollworm, Helicoverpa armigera.

      Strengths:

      Their in vitro analyses are conducted thoroughly using multiple methods, which overall support their claim that these receptors can bind to JH and mediate their non-genomic effects.

      Their CRISPR-Cas-mediated mutagenesis in vivo shows that mutation of the two RTKs causes acceleration of pupation, which is consistent with the mutant phenotype of the intracellular JH receptor, Met1. Although this is different from the typical phenotype one would expect from JH signaling deficiency in lepidopteran insects (i.e. precocious metamorphosis), the results overall support their claim that these two RTKs modulate genomic JH effects by phosphorylating the intracellular receptors.

      Weaknesses:

      Although their loss-of-function analyses suggest that the two RTKs likely have redundant functions in vivo, it is unclear whether they have any different functions in mediating JH functions in different physiological contexts. It also remains unknown whether other endogenous ligands for these RTKs affect canonical, genomic JH signaling in vivo.

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      […] Weaknesses:

      Unfortunately, the revised manuscript does not show significant improvement. While the identification of the receptors is highly convincing, important issues about the biological relevance remain unaddressed. First, the main point I raised about the first version of this article is that the redundancy and/or specificity of the two receptors should be clarified, even though I understand that it cannot be deeply investigated here. I believe that this point, shared by all reviewers, is highly relevant for the scope of this work. In this revised version, it is still unclear how to reconcile gain and loss-of-function experiments and the different expression profiles of the receptors. Second, the newly added explanations and pieces of discussion provided about the mild in vivo phenotypes of early pupation upon Cad96ca or Fgfr1 knock-out do not clarify the issue but instead put emphasis on methodological issues. Indeed, it is not clear whether the mild phenotypes reflect the biological role of Cad96ca and Fgfr1, or the redundancy of these two RTKs (and/or others), or some issue with the knock-out strategy (partial efficiency, mosaicism...). Finally, parts of the updated discussion and the modifications to the figures are confusing.

      Thank you for asking the questions. We performed additional experiments, including editing Met1 individually (single knockout), Cad96ca and Fgfr1 together (double knockout), and Met1, Cad96ca and Fgfr1 together (triple knockout) using CRISPR/Cas9. The results showed that single mutation of Cad96ca or Fgfr1 caused precocious pupation, respectively. The double mutation of Cad96ca and Fgfr1 caused earlier pupation and death compared to the single mutation of Cad96ca or Fgfr1. The triple mutation of Met1, Cad96ca and Fgfr1 caused most serious effect on pupation time and death. These data suggested that both CAD96CA and FGFR1 can transmit JH signal to prevent pupation independently and cooperatively, and the JH exert a complete regulatory role through cell membrane receptors and intracellular receptor of JH. We increased the results in Lines 242-263 and discussion in Lines 328-375.

      CAD96CA and FGFR1 have similar functions in JH signaling, including transmitting JH signal for Kr-h1 expression, larval status maintaining, rapid intracellular calcium increase, phosphorylation of transcription factors MET1 and TAI, and high affinity to JH III. CAD96CA and FGFR1 are essential in the JH signal pathway, and the loss-of-function of each is sufficient to trigger strong effects on pupation, suggesting they can transmit JH signal individually. The difference is that CAD96CA expression has no tissue specificity, and the Fgfr1 gene is highly expressed in the midgut. A possibility is that CAD96CA and FGFR1 play roles by forming homodimer or heterodimer with each other or with other RTKs in tissues, which needs to be addressed in future studies. CAD96CA and FGFR1 transmit JH III signals in three different insect cell lines, suggesting their conserved roles in other insects.

      The mild phenotypes shown in the previous picture, Fig 4E, were counted from all the surviving individuals injected with gRNA, including mutated and non-mutated individuals. In fact, there is no phenotype of pupation on time in the mutants. According to the first round of reviewers' comments, we found that it was inappropriate to count all the surviving individuals injected with gRNA, so we replaced the picture by counting the phenotypes of all successfully mutated individuals in the second version to avoid the confusion of the phenotypes.

      Reviewer #2 (Public review):

      […] Weaknesses:

      Results of their in vivo experiments, particularly those of their loss-of-function analyses using CRISPR mutants are still preliminary, and the results rather indicate that these membrane receptors do not have any physiologically significant roles in vivo. More specifically, previous studies in lepidopteran species have clearly and repeatedly shown that precocious metamorphosis is the hallmark phenotype for all JH signaling-deficient larvae. In contrast, the present study showed that Cad96ca and Fgfr1 G0 mutants only showed slight acceleration in their pupation timing, which is not a typical phenotype one would expect from JH signaling deficiency. This is inconsistent with their working model provided in Figure 6, which indicates that these cell membrane JH receptors promote the canonical JH signaling by phosphorylating Met/Tai. If the authors argue that this slight acceleration of pupation is indeed a major JH signaling-deficient phenotype in Helicoverpa, they need to provide more data to support their claim by analyzing CRISPR mutants of other genes involved in JH signaling, such as Jhamt and Met. An alternative explanation is that there is functional redundancy between CAD96CA and FGFR1 in mediating phosphorylation of Met/Tai. This possibility can be tested by analyzing double knockouts of these two receptors. Currently, the validity of their calcium imaging analysis in Figure 5 is also questionable. When performing calcium imaging in cultured cells, it is critically important to treat all the cells at the end of each experiment with a hormone or other chemical reagents that universally induce calcium increase in each particular cell line. Without such positive control, the validity of calcium imaging data remains unknown, and readers cannot properly evaluate their results.

      Thank you for the comments. We took your suggestions and performed additional experiments, editing Met1 individually (single knockout), Cad96ca and Fgfr1 together (double knockout), and Met1, Cad96ca and Fgfr1 together (triple knockout) using CRISPR/Cas9. We increased the results in Lines 242-263 and discussion in Lines 328-375.

      About the calcium imaging in cultured cells (now Fig 6), our goal is to examine the roles of CAD96CA and FGFR1 in JH trigged cellular responses. The experiment was well designed and controlled and the results were validated. For examples: JH III induced intracellular Ca<sup>2+</sup> release and extracellular Ca<sup>2+</sup> influx in Sf9 and S2 cells, but DMSO could not. However, knockdown of Cad96ca and Fgfr1 significantly decreased JH III-induced intracellular Ca<sup>2+</sup> release and extracellular Ca<sup>2+</sup> influx (Figure 6A, B), and Kr-h1 expression (Figure 6—figure supplement 1A and B), suggesting that CAD96CA and FGFR1 had a general function to transmit JH signal in S. frugiperda and D. melanogaster.

      Wild mammalian HEK-293T cells had no significant changes in calcium ion levels under JH III induction, because there is no CAD96CA and FGFR1 in mammal cells (Figure 6C). However, when HEK-293T cells were overexpressed insect CAD96CA or FGFR1, respectively, JH III triggered rapid cytosolic Ca<sup>2+</sup> release and influx (Figure 6D).

      An increase in Ca<sup>2+</sup> was not detected in mutants of CAD96CA-M3 and CAD96CA-M4 under JH III induction (Figure 6E) and nor in FGFR1-M4 (Figure 6F). These results confirmed that CAD96CA and FGFR1 play roles in transmitting JH III signal.

      We carefully revised the description of the results and methods to help people understand the study.

      Reviewer #3 (Public review):

      […] Weaknesses:

      The authors have provided evidences that the Cad96Ca and FGF1 RTK receptors contribute to JH signaling through CRISPR/Cas9, inducing precocious metamorphosis, although not to the same extent as absence of JH. Therefore, it still remains unclear whether these RTKs are completely required for pathway activation or only necessary for high activation levels during the last larval stage. While the authors have included some additional data, the mechanism by which different RTKs function in transducing JH signaling in a tissue specific manner is still unclear. As the authors note in the discussion, it is possible that other RTKs may also play a role in facilitating the transduction of JH signaling. Lastly, the study does not yet explain how RTKs with known ligands could also bind JH and contribute to JH signaling activation. Although receptor promiscuity has been suggested as a possible mechanism, future studies could explore whether activation of RTK pathways by their known ligands induces certain levels of JH transducer phosphorylation, which, in the presence of JH, could contribute to full pathway activation without the need for direct JH-RTK binding.

      Thank you for your comments. To address your questions, we carried out additional experiments. The relevant results have been incorporated into Lines 242-263, and the corresponding discussion has been added to Lines 328-375.

      We agree with your suggestions that the future studies should resolve the questions such as how different RTKs function in transducing JH signaling in a tissue specific manner; whether other RTKs can transduce JH signal; how RTKs with known ligands could also bind JH and contribute to JH signaling activation; and how the RTK pathways are activated by their ligands.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) First, some of the new paragraphs, repeatedly used in the point-by-point answer to the reviewers, are highly confusing and need proofreading (i.e. 225-230; 320-340)

      Thank you for your advice. We have carefully revised the manuscript and the point-by-point answer to avoid repetition.

      (2) While the double knock-down or knock-out of Cad96ca and Fgfr1 is expected to provide valuable information regarding their respective functions, the authors indicated that they wouldn't provide experiments in that direction. It is not clear to me if they have tried or not. The Crispr/Cas9 approach might be difficult to put in place to test this interaction. However, couldn't the authors try the double knock-down compared to single knock-downs using dsRNA? This method gave convincing results to test the role of the putative receptors in mediating JH-induced developmental delay in vivo (Figure 1).

      Thank you for your suggestion. We added experiments, editing Met1 individually (single knockout), Cad96ca and Fgfr1 together (double knockout), and Met1, Cad96ca and Fgfr1 together (triple knockout) using CRISPR/Cas9, the new evidence fully defined the physiological roles of these receptors in JH signaling in vivo. We increased the results in Lines 242-263 and discussion in Lines 328-375.

      (3) Concerning the effect of Crispr knock-out on pupation timing, this paragraph was added: "The low death rate after Cad96ca and Fgfr1 knockout might be because of following reasons, including the editing efficiency (67% and 61% for Cad96ca mutant and Fgfr1 mutant, respectively), the chimera of the gene knockout at the G0 generation, and the redundant RTKs that play similar roles in JH signaling". A similar explanation applies to the pupation phenotype itself... I am therefore wondering whether the Crispr/Cas9 approach (at the G0 generation) is the best strategy. Since the dsRNA knock-down gave efficient (and probably more reproducible) results according to Figure 1B-C, why not using the same approach for analyzing loss-of-function phenotypes?

      (4) Similarly, this new paragraph regarding the knock-out strategy by Crispr is problematic: "However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier" (lines 225-230).

      - How does the editing efficiency relate to the mutation efficiency few lines earlier (not clearly explained in the methods)? Were the animals homozygous or heterozygous for the mutations? - A shortened feeding stage can only be invoked if previous developmental transitions are unaffected. Such statement should be supported by a better description of the developmental timing phenotype (as suggested already by reviewer 2).

      Thank you for your questions in (3) and (4). The editing rates of 67% and 61% for Cad96ca and Fgfr1 in individuals were calculated from the PCR products, indicating that the cells were mosaics by CRISPR/Cas9 editing. The mutants produced by CRISPR/Cas9 are mosaics. We removed the content to the methods section and increased the detail methods, Lines 705-717.

      We increased discussion: "The phenotypes of gene mutation in H. armigera are somehow different from those obtained by homozygous mutation in other animals, due to the mosaic mutation by CRISPR/Cas9. In addition, RNAi of Cad96ca and Fgfr1 was observed precocious pupation as was the case in CRISPR/Cas9, suggesting the RNAi can be used for the study of gene function in insect, especially when the gene editing is embryonic lethal". Lines 367-380.

      We removed the improper description of the phenotypes in the results, such as that of the feeding stage. By increasing experiments of editing Met1 individually (single knockout), Cad96ca and Fgfr1 together (double knockout), and Met1, Cad96ca and Fgfr1 together (triple knockout) to define the physiological roles of these receptors in JH signaling in vivo.

      (5) Importantly, I don't understand where the new version of the figure 4E stems from. The « pupation on time » (blue) category present in the first version of the figure has now disappeared for mutant animals. Why? In the first, my understanding was that, among the mutant animals, around 50% had precocious pupation. In the new version of the figure 4E, the "pupation on time" category is missing, and the percentages of early pupation are therefore strongly increased... The explanations provided in the text are not clear regarding the reanalysis of the mutant phenotypes. In the first version of the manuscript, the following explanation was given: "In 61 survivors of Cas9 protein and Cad96ca-gRNA injection, 30 mutants were identified by the earlier pupation and sequencing (an editing efficiency of 49.2%)". Were all animals sequenced, or only the 30 displaying earlier pupation? Were the 31 others not sequenced or did they have no mutation? Could it be, as suggested by the first version of the figure, that some mutant animals did not display early pupation? It was indeed stated in the text that: "CRISPR/Cas9 editing by Cad96ca-gRNA or Fgfr1-gRNA injection resulted in earlier pupation (Figure 4D) for about 23-24 h by comparison with normal pupation in 46% and 54% of larvae, respectively". This new version of the figure should be explained.

      Thank you for your reminder. The phenotype of pupation on time appeared in the first version, because we counted the phenotypes of all the surviving individuals injected with gRNA, that is, the survivors in Figure 4C, which including mutated and non-mutated individuals. According to the comments from first round of reviewers, we realized that it was inappropriate to count all the surviving individuals injected with gRNA, since there is no phenotype of pupation on time in the mutants. Therefore, in the second version, we replaced the picture by counting the phenotypes of all successfully mutated individuals, namely the mutants in Figure 4C.

    1. eLife Assessment

      In this valuable study, Li and others identified cell membrane receptors for juvenile hormone (JH), a terpenoid hormone in insects important for their development and reproduction. While intracellular receptors for JH have been well characterized, membrane receptors for JH remained elusive for many years. Although the authors provide solid evidence to indicate that the receptor tyrosine kinases they identified bind to JH in vitro and induce non-genomic responses in cultured cells, their loss-of-function phenotypes are not consistent with known JH functions, so additional work is required to define physiological roles of these receptors.

    2. Reviewer #1 (Public review):<br /> <br /> Summary:

      Juvenile Hormone (JH) plays a key role in insect development and physiology. Although the intracellular receptor for JH was identified long ago, a number of studies have shown that part of JH functions should be fulfilled through binding to an unknown membrane receptor, which was proposed to belong to the RTK family. In this study, the authors screened all RTKs from the H. armigera genome for their ability to mediate responses to JH III treatment both in cultured cells and in developping animals. They also present convincing evidence that CAD96CA and FGFR1 directly bind JH III, and that their role might be conserved in other insect species.

      Strengths:

      Altogether, the experimental approach is very complete and elegant, providing evidence for the role of CAD96CA and FGFR1 in JH signalling using different techniques and in different contexts. I believe that this work will open new perspectives to study the role of JH and better understand what is the contribution of signalling through membrane receptors for JH-dependent developmental processes.

      Weaknesses:

      Unfortunately, the revised manuscript does not show significant improvement. While the identification of the receptors is highly convincing, important issues about the biological relevance remain unaddressed.

      First, the main point I raised about the first version of this article is that the redundancy and/or specificity of the two receptors should be clarified, even though I understand that it cannot be deeply investigated here. I believe that this point, shared by all reviewers, is highly relevant for the scope of this work. In this revised version, it is still unclear how to reconcile gain and loss-of-function experiments and the different expression profiles of the receptors.

      Second, the newly added explanations and pieces of discussion provided about the mild in vivo phenotypes of early pupation upon Cad96ca or Fgfr1 knock-out do not clarify the issue but instead put emphasis on methodological issues. Indeed, it is not clear whether the mild phenotypes reflect the biological role of Cad96ca and Fgfr1, or the redundancy of these two RTKs (and/or others), or some issue with the knock-out strategy (partial efficiency, mosaicism...).

      Finally, parts of the updated discussion and the modifications to the figures are confusing.

    3. Reviewer #2 (Public review):

      Summary:

      Juvenile hormone (JH) is a pleiotropic terpenoid hormone in insects that mainly regulates their development and reproduction. In particular, its developmental functions are described as the "status quo" action, as its presence in the hemolymph (the insect blood) prevents metamorphosis-initiating effects of ecdysone, another important hormone in insect development, and maintains the juvenile status of insects.

      While such canonical functions of JH are known to be mediated by its intracellular receptor complex composed of Met and Tai, there have been multiple reports suggesting the presence of cell membrane receptor(s) for JH, which mediate non-genomic effects of this terpenoid hormone. In particular, the presence of receptor tyrosine kinase(s) that phosphorylate Met/Tai in response to JH and thus indirectly affect the canonical JH signaling pathway has been strongly suggested. Given the importance of JH in insect physiology and the fact that the JH signaling pathway is a major target of insect growth regulators, elucidating the identify and functions of putative JH membrane receptors is of great significance from both basic and applied perspectives.

      In the present study, the authors identified candidate receptors for such cell membrane JH receptors, CAD96CA and FGFR1, in the cotton bollworm Helicoverpa armigera.

      Strengths:

      Their in vitro analyses are conducted thoroughly using multiple methods, which overall supports their claim that these receptors can bind to JH and mediate their non-genomic effects.

      Weaknesses:

      Results of their in vivo experiments, particularly those of their loss-of-function analyses using CRISPR mutants are still preliminary, and the results rather indicate that these membrane receptors do not have any physiologically significant roles in vivo. More specifically, previous studies in lepidopteran species have clearly and repeatedly shown that precocious metamorphosis is the hallmark phenotype for all JH signaling-deficient larvae. In contrast, the present study showed that Cad96ca and Fgfr1 G0 mutants only showed slight acceleration in their pupation timing, which is not a typical phenotype one would expect from JH signaling deficiency. This is inconsistent with their working model provided in Figure 6, which indicates that these cell membrane JH receptors promote the canonical JH signaling by phosphorylating Met/Tai.

      If the authors argue that this slight acceleration of pupation is indeed a major JH signaling-deficient phenotype in Helicoverpa, they need to provide more data to support their claim by analyzing CRISPR mutants of other genes involved in JH signaling, such as Jhamt and Met. An alternative explanation is that there is functional redundancy between CAD96CA and FGFR1 in mediating phosphorylation of Met/Tai. This possibility can be tested by analyzing double knockouts of these two receptors.

      Currently, the validity of their calcium imaging analysis in Figure 5 is also questionable. When performing calcium imaging in cultured cells, it is critically important to treat all the cells at the end of each experiment with a hormone or other chemical reagents that universally induce calcium increase in each particular cell line. Without such positive control, the validity of calcium imaging data remains unknown, and readers cannot properly evaluate their results.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Li et al. identified CAD96CA and FGF1 among 20 receptor tyrosine kinase receptors as mediators of JH signaling. By performing a screen in HaEpi cells with overactivated JH signaling, the authors pinpointed two main RTKs that contribute to the transduction of JH. Using the CRISPR/Cas9 system to generate mutants, the authors confirmed that these RTKs are required for normal JH activation, as precocious pupariation was observed in their absence. Additionally, the authors demonstrated that both CAD96CA and FGF1 exhibit a high affinity for JH, and their activation is necessary for the proper phosphorylation of Tai and Met, transcription factors that promote the transcriptional response. Finally, the authors provided evidence suggesting that the function of CAD96CA and FGF1 as JH receptors is conserved across insects.

      Strengths:

      The data provided by the authors are convincing and support the main conclusions of the study, providing ample evidence to demonstrate that phosphorylation of the transducers Met and Tai mainly depends on the activity of two RTKs. Additionally, the binding assays conducted by the authors support the function of CAD96CA and FGF1 as membrane receptors of JH. The study's results validate, at least in H. amigera, the predicted existence of membrane receptors for JH.

      Weaknesses:

      The authors have provided evidences that the Cad96Ca and FGF1 RTK receptors contribute to JH signaling through CRISPR/Cas9, inducing precocious metamorphosis, although not to the same extent as absence of JH. Therefore, it still remains unclear whether these RTKs are completely required for pathway activation or only necessary for high activation levels during the last larval stage.

      While the authors have included some additional data, the mechanism by which different RTKs function in transducing JH signaling in a tissue specific manner is still unclear. As the authors note in the discussion, it is possible that other RTKs may also play a role in facilitating the transduction of JH signaling.

      Lastly, the study does not yet explain how RTKs with known ligands could also bind JH and contribute to JH signaling activation. Although receptor promiscuity has been suggested as a possible mechanism, future studies could explore whether activation of RTK pathways by their known ligands induces certain levels of JH transducer phosphorylation, which, in the presence of JH, could contribute to full pathway activation without the need for direct JH-RTK binding.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Juvenile Hormone (JH) plays a key role in insect development and physiology. Although the intracellular receptor for JH was identified long ago, a number of studies have shown that part of JH functions should be fulfilled through binding to an unknown membrane receptor, which was proposed to belong to the RTK family. In this study, the authors screened all RTKs from the H. armigera genome for their ability to mediate responses to JH III treatment both in cultured cells and in developing animals. They also present convincing evidence that CAD96CA and FGFR1 directly bind JH III, and that their role might be conserved in other insect species.

      Strengths:

      Altogether, the experimental approach is very complete and elegant, providing evidence for the role of CAD96CA and FGFR1 in JH signalling using different techniques and in different contexts. I believe that this work will open new perspectives to study the role of JH and better understand what is the contribution of signalling through membrane receptors for JH-dependent developmental processes.

      Weaknesses:

      I don't see major weaknesses in this study. However, I think that the manuscript would benefit from further information or discussion regarding the relationship between the two newly identified receptors. Experiments (especially in HEK-293T cells) suggest that CAD96CA and FGFR1 are sufficient on their own to transduce JH signalling. However, they are also necessary since loss-of-function conditions for each of them are sufficient to trigger strong effects (while the other is supposed to be still present).

      Thank you for the suggestion. We have added the discussion in the text: "CAD96CA and FGFR1 have similar functions in JH signaling, including transmitting JH signal for Kr-h1 expression, larval status maintaining, rapid intracellular calcium increase, phosphorylation of transcription factors MET1 and TAI, and high affinity to JH III. CAD96CA and FGFR1 are essential in the JH signal pathway, and loss-of-function for each is sufficient to trigger strong effects on pupation. The difference is that CAD96CA expression has no tissue specificity, and the Fgfr1 gene is highly expressed in the midgut; possibly, it plays a significant role in the midgut. Other possibility is that they play roles by forming heterodimer with each other or other RTKs, which needs to be addressed in future study. CAD96CA and FGFR1 transmit JH III signals in three different insect cell lines, suggesting their conserved roles in other insects.".

      In addition, despite showing different expression patterns, the two receptors seem to display similar developmental functions according to loss-of-function phenotypes. It is therefore unclear how to draw a model for membrane receptor-mediated JH signalling that includes both CAD96CA and FGFR1.

      Thank you for your question. We have modified the figure and the legends to make the conception clear.

      Reviewer #2 (Public Review):

      Summary:

      Juvenile hormone (JH) is a pleiotropic terpenoid hormone in insects that mainly regulates their development and reproduction. In particular, its developmental functions are described as the "status quo" action, as its presence in the hemolymph (the insect blood) prevents metamorphosis-initiating effects of ecdysone, another important hormone in insect development, and maintains the juvenile status of insects. While such canonical functions of JH are known to be mediated by its intracellular receptor complex composed of Met and Tai, there have been multiple reports suggesting the presence of cell membrane receptor(s) for JH, which mediate non-genomic effects of this terpenoid hormone. In particular, the presence of receptor tyrosine kinase(s) that phosphorylate Met/Tai in response to JH and thus indirectly affect the canonical JH signaling pathway has been strongly suggested. Given the importance of JH in insect physiology and the fact that the JH signaling pathway is a major target of insect growth regulators, elucidating the identification and functions of putative JH membrane receptors is of great significance from both basic and applied perspectives. In the present study, the authors identified candidate receptors for such cell membrane JH receptors, CAD96CA and FGFR1, in the cotton bollworm Helicoverpa armigera.

      Strengths:

      Their in vitro analyses are conducted thoroughly using multiple methods, which overall supports their claim that these receptors can bind to JH and mediate their non-genomic effects.

      Weaknesses:

      Results of their in vivo experiments, particularly those of their loss-of-function analyses using CRISPR mutants are still preliminary, and the results rather indicate that these membrane receptors do not have any physiologically significant roles in vivo. More specifically, previous studies in lepidopteran species have clearly and repeatedly shown that precocious metamorphosis is the hallmark phenotype for all JH signaling-deficient larvae. In contrast, the present study showed that Cad96ca and Fgfr1 G0 mutants only showed a slight acceleration in their pupation timing, which is not a typical phenotype one would expect from JH signaling deficiency. This is inconsistent with their working model provided in Figure 6, which indicates that these cell membrane JH receptors promote the canonical JH signaling by phosphorylating Met/Tai.

      If the authors argue that this slight acceleration of pupation is indeed a major JH signaling-deficient phenotype in Helicoverpa, they need to provide more data to support their claim by analyzing CRISPR mutants of other genes involved in JH signaling, such as Jhamt and Met. An alternative explanation is that there is functional redundancy between CAD96CA and FGFR1 in mediating phosphorylation of Met/Tai. This possibility can be tested by analyzing double knockouts of these two receptors.

      Thank you for your question and suggestion. The cadherin 96ca (CAD96CA) and fibroblast growth factor receptor 1 (FGFR1) were finally determined as JH cell membrane receptors by their roles in JH regulated-gene expression, maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and their JH-binding affinity. Their roles as JH cell membrane receptors were further determined by knockdown and knockout of them in vivo and in cell lines, and overexpression of them in mammal HEK-293T heterogeneously. Figure 6 is drafted by these solidate evidences.

      Cad96ca and Fgfr1 G0 mutants caused slight acceleration of pupation is one of the types of evidence of JH signaling-deficient. Othe evidences include a set of gene expression and the block of JH induced-rapid intracellular calcium increase.

      Kr-h1 is a typical indicator gene at the downstream of Jhamt and in JH signaling, so we used it as an indicator to examine JH signaling. Jhamt and Met or other genes might be affected in Cad96ca and Fgfr1 G0 mutants, which can be examined in future study.

      We have discussed the question that Cad96ca and Fgfr1 G0 mutants only showed a slight acceleration in their pupation timing: "Homozygous Cad96ca null Drosophila die at late pupal stages (Wang et al., 2009). However, we found that 86% of the larvae of the Cad96ca mutant successfully pupated in G0 generation, although earlier than the control. Similarly, null mutation of Fgfr1 or Fgfr2 in mouse is embryonic lethal (Arman et al., 1998; Deng et al., 1994; Yamaguchi et al., 1994). In D. melanogaster, homozygous Htl (Fgfr) mutant embryos die during late embryogenesis, too (Beati et al., 2020; Beiman et al., 1996; Gisselbrecht et al., 1996). However, in H. armigera, 91% of larvae successfully pupated in G0 generation after Fgfr1 knockout. The low death rate after Cad96ca and Fgfr1 knockout might be because of following reasons, including the editing efficiency (67% and 61% for Cad96ca mutant and Fgfr1 mutant, respectively), the chimera of the gene knockout at the G0 generation, and the redundant RTKs that play similar roles in JH signaling, similar to the redundant roles of MET and Germ-cell expressed bHLH-PAS (GCE) in JH signaling (Liu et al., 2009), which needs to obtain alive G1 homozygote mutants and double knockout of these two receptors in future study. We indeed observed that the eggs did not hatch successfully after mixed-mating of G0 Cad96ca mutant or Fgfr1 mutant, respectively, but the reason was not addressed further due to the embryonic death. By the similar reasons, most of the Cad96ca and Fgfr1 mutants showed a slight acceleration of pupation (about one day) without the typical precocious metamorphosis (at least one instar earlier) phenotype caused by JH signaling defects (Daimon et al., 2012; Fukuda, 1944; Riddiford et al., 2010) and JH pathway gene deletions (Abdou et al., 2011; Liu et al., 2009). On other side, JH can regulate gene transcription by diffusing into cells and binding to the intracellular receptor MET to conduct JH signal, which might affect the results of gene knockdown and knockout.".

      Currently, the validity of their calcium imaging analysis in Figure 5 is also questionable. When performing calcium imaging in cultured cells, it is critically important to treat all the cells at the end of each experiment with a hormone or other chemical reagents that universally induce calcium increase in each particular cell line. Without such positive control, the validity of calcium imaging data remains unknown, and readers cannot properly evaluate their results.

      Thank you for your question. For Figure 5, our goal was to demonstrate that JH can induce calcium mobilization through CAD96CA and FGFR1. Controls have been established between different experimental groups within the same cell, as well as between different cells. Increasing the positive experimental group would make the results more complex.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Li et al. identified CAD96CA and FGF1 among 20 receptor tyrosine kinase receptors as mediators of JH signaling. By performing a screen in HaEpi cells with overactivated JH signaling, the authors pinpointed two main RTKs that contribute to the transduction of JH. Using the CRISPR/Cas9 system to generate mutants, the authors confirmed that these RTKs are required for normal JH activation, as precocious pupariation was observed in their absence. Additionally, the authors demonstrated that both CAD96CA and FGF1 exhibit a high affinity for JH, and their activation is necessary for the proper phosphorylation of Tai and Met, transcription factors that promote the transcriptional response. Finally, the authors provided evidence suggesting that the function of CAD96CA and FGF1 as JH receptors is conserved across insects.

      Strengths:

      The data provided by the authors are convincing and support the main conclusions of the study, providing ample evidence to demonstrate that phosphorylation of the transducers Met and Tai mainly depends on the activity of two RTKs. Additionally, the binding assays conducted by the authors support the function of CAD96CA and FGF1 as membrane receptors of JH. The study's results validate, at least in H. amigera, the predicted existence of membrane receptors for JH.

      Weaknesses:

      The study has several weaknesses that need to be addressed. Firstly, it is not clear what criteria were used by the authors to discard several other RTKs that were identified as repressors of JH signaling. For example, while NRK and Wsck may not fulfill all the requirements to become JH receptors, other evidence, such as depletion analysis and target gene expression, suggests they are involved in proper JH signaling activation.

      Thank you for your question. We screened the RTKs sequentially, including examining the roles of 20 RTKs identified in the H. armigera genome in JH regulated-gene expression to obtain primary candidates, followed by screening of the candidates by their roles in maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and affinity to JH. WSCK was not involved in the phosphorylation of MET and TAI and was discarded during subsequent screening. NRK did not bind to JH III, did not meet the screening strategy, and was discarded.

      We increased the information in the Introduction: "We screened the RTKs sequentially, including examining the roles of 20 RTKs identified in the H. armigera genome in JH regulated-gene expression to obtain primary candidates, followed by screening of the candidates by their roles in maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and affinity to JH. The cadherin 96ca (CAD96CA) and fibroblast growth factor receptor 1 (FGFR1) were finally determined as JH cell membrane receptors by their roles in JH regulated-gene expression, maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and their JH-binding affinity. Their roles as JH cell membrane receptors were further determined by knockdown and knockout of them in vivo and cell lines, and overexpression of them in mammal HEK-293T heterogeneously.".

      We increased discussion: "This study found six RTKs that respond to JH induction by participating in JH induced-gene expression and intracellular calcium increase, however; they exert different functions in JH signaling, and finally CAD96CA and FGFR1 are determined as JH cell membrane receptors by their roles in JH induced-phosphorylation of MET and TAI and binding to JH III. We screen the RTKs transmitting JH signal primarily by examining some of JH induced-gene expression. By examining other genes or by other strategies to screen the RTKs might find new RTKs functioning as JH cell membrane receptors; however, the key evaluation indicators, such as the binding affinity of the RTKs to JH and the function in transmitting JH signal to maintain larval status are essential.".

      Secondly, the expression of the six RTKs, which, when knocked down, were able to revert JH signaling activation, was mainly detected in the last larval stage of H. amigera. However, since JH signaling is active throughout larval development, it is unclear whether these RTKs are completely required for pathway activation or only needed for high activation levels at the last larval stage.

      Thank you for the question. We knocked down the genes at last larval stage to observe pupation, which is a relatively simple and easily to be observed target to examine the role of the gene in JH-maintained larval status. The results from CRISPR/Cas9 experiments showed: "Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.".

      Additionally, the mechanism by which different RTKs exert their functions in a specific manner is not clear. According to the expression profile of the different RTKs, one might expect some redundant role of those receptors. In fact the no reversion of phosphorilation of tai and met upon depletion of Wsck in cells with overactivated JH signalling seems to support this idea.

      Nevertheless, and despite the overlapping expression of the different receptors, all RTKs seem to be required for proper pathway activation, even in the case of FGF1 which seems to be only expressed in the midgut. This is an intriguing point unresolved in the study.

      Thank you for your comments. Yes, from our study, different RTKs exert their functions in a specific manner. We have increased discussion: "This study found six RTKs that respond to JH induction by participating in JH induced-gene expression and intracellular calcium increase, however; they exert different functions in JH signaling, and finally CAD96CA and FGFR1 are determined as JH cell membrane receptors by their roles in JH induced-phosphorylation of MET and TAI and binding to JH III. We screen the RTKs transmitting JH signal primarily by examining some of JH induced-gene expression. By examining other genes or by other strategies to screen the RTKs might find new RTKs functioning as JH cell membrane receptors; however, the key evaluation indicators, such as the binding affinity of the RTKs to JH and the function in transmitting JH signal to maintain larval status are essential.".

      Finally, the study does not explain how RTKs with known ligands could also bind JH and contribute to JH signaling activation. in Drosophila, FGF1 is activated by pyramus and thisbe for mesoderm development, while CAD96CA is activated by collagen during wound healing. Now the authors claim that in addition to these ligands, the receptors also bind to JH. However, it is unclear whether these RTKs are activated by JH independently of their known ligands, suggesting a specific binding site for JH, or if they are only induced by JH activation when those ligands are present in a synergistic manner. Alternatively, another explanation could be that the RTK pathways by their known ligands activation may induce certain levels of JH transducer phosphorylation, which, in the presence of JH, contributes to the full pathway activation without JH-RTK binding being necessary.

      Thank you for your professional questions. It is an exciting and challenging to explore the molecular mechanism by which multiple ligands transmit signals through the same receptor. It requires a long-term research plan and in-depth studies. We added discussion in the text: "CAD96CA (also known as Stitcher, Ret-like receptor tyrosine kinase) activates upon epidermal wounding in Drosophila embryos (Tsarouhas et al., 2014) and promotes growth and suppresses autophagy in the Drosophila epithelial imaginal wing discs (O'Farrell et al., 2013). There is a CAD96CA in the genome of the H. armigera, which is without function study. Here, we reported that CAD96CA prevents pupation by transmitting JH signal as a JH cell membrane receptor. We also showed that CAD96CA of other insects has a universal function of transmitting JH signal to trigger Ca2+ mobilization, as demonstrated by the study in Sf9 cell lines of S. frugiperda and S2 cell lines of D. melanogaster.

      FGFRs control cell migration and differentiation in the developing embryo of D. melanogaster (Muha and Muller, 2013). The ligand of FGFR is FGF in D. melanogaste_r (Du et al., 2018_). FGF binds FGFR and triggers cell proliferation, differentiation, migration, and survival (Beenken and Mohammadi, 2009; Lemmon and Schlessinger, 2010). Three FGF ligands and two FGF receptors (FGFRs) are identified in Drosophila (Huang and Stern, 2005). The Drosophila FGF-FGFR interaction is specific. Different ligands have different functions. The activation of FGFRs by specific ligands can affect specific biological processes (Kadam et al., 2009). The FGFR in the membrane of Sf9 cells can bind to Vip3Aa (Jiang et al., 2018). One FGF and one FGFR are in the H. armigera genome, which has yet to be studied functionally. The study found that FGFR prevents insect pupation by transmitting JH signal as a JH cell membrane receptor. Exploring the molecular mechanism and output by which multiple ligands transmit signals through the same receptor is exciting and challenging.".

      Reviewer #1 (Recommendations For The Authors):

      As an experimental suggestion, I will only propose that authors test the double knock-down/knock-out or overexpression of CAD96CA and FGFR1 to give some hints into how redundant/independent the two receptors are.

      Thank you very much for your professional advice. We agree with your point of view that double knockout of CAD96CA and FGFR1 is very important to resolve the redundant/independent of the two receptors, which can make our research more complete. Unfortunately, due to experimental difficulty and time constraints, we did not provide supplementary experiments. In this study, we aim to screen the cell membrane receptors of JH. Therefore, we focused on which RTKs can function as receptors. This article is a preliminary study to identify the cell membrane receptors of JH. To further understand the relationship between the two membrane receptors, we will conduct in-depth research in future work.

      Apart from that, here are some minor points about the manuscript:

      Figure 2A: changing the scale on the y-axis would help to better see the different genotypes (similar to the way it is presented in Figure 5).

      Thanks for your reminding, we have changed the scale in Figure 2A.

      Figure 4J: image settings could be improved to better highlight the green fluorescence.

      Thank you for your advice, we have improved the imaged in Figure 4J.

      In general, the manuscript would benefit from some proofreading since a number of sentences are incorrect.

      Thanks for your reminding, we have carefully revised the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) Although the authors note that there are 21 RTK genes in Drosophila (line 55), I can only see 16 Drosophila RTKs in Figure 1 - Figure Supplement 1. Some important Drosophila RTKs such as breathless are missing. The authors need to redraw the phylogenetic tree.

      Thanks for your reminding, we have presented the new phylogenetic tree in Figure 1-figure supplement 1.

      (2) The accelerated pupation phenotype in Cad96ca and Fgfr1 G0 mutants needs to be better described. In particular, it is critical to examine which developmental stage(s) are shortened in these mutant larvae. Refer to a similar study on a JH biosynthetic enzyme in Bombyx (PMID: 22412378) regarding how to describe the developmental timing phenotype.

      Thank you for your advice. We have re-shown Figure 4E and added the explanation in the text: "In 61 survivors of Cas9 protein plus Cad96ca-gRNA injection, 30 mutants were sequenced, and a mutation efficiency was 49.2%. Similarly, in the 65 survivors of Cas9 protein plus Fgfr1-gRNA injection, 35 mutants were sequenced, and a mutation efficiency was 53.8% (Figure 4C). The DNA sequences, deduced amino acids and off–target were analyzed (Figure 4—figure supplement 1). Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.".

      (3) The editing efficiency described in lines 211-213 is obscure. Does this indicate the percentage of animals with noisy sequencing spectra or the percentage of mutation rates analyzed by TA cloning?

      Thanks for your reminder. We have revised the description in the text: "In 61 survivors of Cas9 protein plus Cad96ca-gRNA injection, 30 mutants were sequenced, and a mutation efficiency was 49.2%. Similarly, in the 65 survivors of Cas9 protein plus Fgfr1-gRNA injection, 35 mutants were sequenced, and a mutation efficiency was 53.8% (Figure 4C). The DNA sequences, deduced amino acids and off–target were analyzed (Figure 4—figure supplement 1). Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.".

      (4) In Figures 4F and G, the authors examined expression levels of some JH/ecdysone responsive genes only at 0 hr-old 6th instar larvae. This single developmental stage is not enough for this analysis. In particular, the expression level of Fgfr1 only goes up in the mid-6th instar according to their own data (Figure 1-Figure Supplement 4), so it is critical to examine expression levels of these genes at least throughout the 6th larval instar.

      Thank you for your advice. Indeed, it is essential to detect the expression levels of JH/ecdysone response genes in the whole sixth instar larvae. Because we observed that the mutation has a shorter feeding stage at the sixth instar, we examined the expression level of the JH/ecdysone response gene at the early sixth instar. Due to the number of mutants obtained in the experiment was small and non-destructive sampling could not be performed in sixth instar period, there were no enough samples to test. In the future, we will generate Cad96ca Fgfr1 double mutations to carry out studies and detect the expression level of JH/ecdysone response genes in the whole sixth instar.

      (5) As mentioned above, some important Drosophila RTKs such as breathless are missing in their analyses. As breathless is a close paralog of heartless (Htl), I am sure that Drosophila breathless is also orthologous to Helicoverpa FGFR1. The authors therefore need to analyze breathless in Figure 5B in addition to Htl.

      Thank you for your advice. We added experiments and the results are shown in Figure 5B and Figure 5—figure supplement 1.

      (6) More discussion about the reason why dsNrk and dsWsck can provide resistance to JHIII in Figure 1 is required.

      Thank you for your advice. We added explanation in the discussion: "It is generally believed that the primary role of JH is to antagonize 20E during larval molting (Riddiford, 2008). The knockdown of Cad96ca, Nrk, Fgfr1, and Wsck showed phenotypes resistant to JH III induction and the decrease of Kr-h1 and increase of Br-z7 expression, but knockdown of Vegfr and Drl only decrease Kr-h1, without increase of Br-z7. Br-z7 is involved in 20E-induced metamorphosis in H. armigera (Cai et al., 2014), whereas, Kr-h1 is a JH early response gene that mediates JH action (Minakuchi et al., 2009) and represses Br expression (Riddiford et al., 2010). The high expression of Br-z7 is possible due to the down-regulation of Kr-h1 in Cad96ca, Nrk, Fgfr1 and Wsck knockdown larvae. The different expression profiles of Br-z7 in Vegfr and Drl knockdown larvae suggest other roles of Vegfr and Drl in JH signaling, which need further study."

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors should consider optimizing their experimental approach by depleting the six candidate RTKs in an early larval stage rather than using a sensitized background with JH application in the last larval stage.

      Thank you for your precious suggestion. We knocked down the genes at last larval stage to observe pupation, which is a relatively simple and easily to be observed target to examine the role of the gene in JH-maintained larval status. The results from CRISPR/Cas9 experiments showed: "Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.". To know the roles of other RTKs in the whole larval development needs future work since a lot of experiments are needed.

      (2) Including a positive control for JH signaling, such as met or tai, would strengthen the assays and provide a benchmark for evaluating the downregulation of target genes and phenotype reversion upon JH application. This addition, especially in Figure 1, would enhance the interpretability of the results.

      Thank you for your suggestion. We agree with your point of view that adding the detection of Met or Tai as a positive control. Our laboratory has reported in previous studies that knockdown of Met leads to decreased expression of genes in the JH signaling pathway and precocious pupation (PMID: 24872508), so we did not repeat this related experiment in this study. In the future, when performg Cad96ca and Fgfr1 double mutant experiments, Met mutant can be generated as a control to provide more references for the interpretation of the results.

      (3) I recommend revising the manuscript to improve readability, particularly in the Results section, where descriptions of the binding part are particularly dense.

      Thank you for your advice. We have carefully revised the manuscript.

      (4) In line 122, please add the reference Wang et al., 2016.

      Thank you for your reminding, we have added the reference in line 125 of the new manuscript.

      (5) The authors should clarify why they chose to test the possible binding to JH of only Cad96CA, FGFR1, and NRK after conducting various assays while including OTK in the study as a negative control. This explanation should be included in the text.

      Thank you for the suggestion. We added the explanation, as described in the text: "We screened the RTKs sequentially, including examining the roles of 20 RTKs identified in the H. armigera genome in JH regulated-gene expression to obtain primary candidates, followed by screening of the candidates by their roles in maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and affinity to JH. The cadherin 96ca (CAD96CA) and fibroblast growth factor receptor 1 (FGFR1) were finally determined as JH cell membrane receptors by their roles in JH regulated-gene expression, maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and their JH-binding affinity. Their roles as JH cell membrane receptors were further determined by knockdown and knockout of them in vivo and cell lines, and overexpression of them in mammal HEK-293T heterogeneously.".

      "Since Cad96CA, FGFR1, and NRK were not only involved in JH-regulated Kr-h1 expression, JH III-induced delayed pupation, and calcium levels increase, but also involved in MET and TAI phosphorylation, we further analyzed their binding affinity to JH III. OTK did not respond to JH III, so we used it as a control protein on the cell membrane to exclude the possibility of nonspecific binding.".

      (6) The observed embryonic lethality of cad96ca and FGF1 mutants in Drosophila contrasts with the ability of the respective mutants in H. armigera to reach the pupal stage. The authors should discuss this significant difference.

      Thank you for the suggestion. We added the explanation in the discussion, as described in the text: "Homozygous Cad96ca null Drosophila die at late pupal stages (Wang et al., 2009). However, we found that 86% of the larvae of the Cad96ca mutant successfully pupated in G0 generation, although earlier than the control. Similarly, null mutation of Fgfr1 or Fgfr2 in mouse is embryonic lethal (Arman et al., 1998; Deng et al., 1994; Yamaguchi et al., 1994). In D. melanogaster, homozygous Htl (Fgfr) mutant embryos die during late embryogenesis, too (Beati et al., 2020; Beiman et al., 1996; Gisselbrecht et al., 1996). However, in H. armigera, 91% of larvae successfully pupated in G0 generation after Fgfr1 knockout. The low death rate after Cad96ca and Fgfr1 knockout might be because of following reasons, including the editing efficiency (67% and 61% for Cad96ca mutant and Fgfr1 mutant, respectively), the chimera of the gene knockout at the G0 generation, and the redundant RTKs that play similar roles in JH signaling, similar to the redundant roles of MET and Germ-cell expressed bHLH-PAS (GCE) in JH signaling (Liu et al., 2009), which needs to obtain alive G1 homozygote mutants and double knockout of these two receptors in future study. We indeed observed that the eggs did not hatch successfully after mixed-mating of G0 Cad96ca mutant or Fgfr1 mutant, respectively, but the reason was not addressed further due to the embryonic death. By the similar reasons, most of the Cad96ca and Fgfr1 mutants showed a slight acceleration of pupation (about one day) without the typical precocious metamorphosis (at least one instar earlier) phenotype caused by JH signaling defects (Daimon et al., 2012; Fukuda, 1944; Riddiford et al., 2010) and JH pathway gene deletions (Abdou et al., 2011; Liu et al., 2009). On other side, JH can regulate gene transcription by diffusing into cells and binding to the intracellular receptor MET to conduct JH signal, which might affect the results of gene knockdown and knockout.".

      (7) Building upon the previous point, it is noteworthy that the cad96ca and FGF1 mutants exhibit only a 24-hour early pupation phenotype, contrasting with the 48-hour early pupation induced by Kr-h1 depletion. This discrepancy suggests that while the function of these RTKs is necessary, it may not be sufficient to fully activate JH signaling. The expression profile of these receptors, primarily observed in the last larval stage, supports this hypothesis.

      Thank you for your suggestion. We added the explanation in the discussion, as described in the text: "Homozygous Cad96ca null Drosophila die at late pupal stages (Wang et al., 2009). However, we found that 86% of the larvae of the Cad96ca mutant successfully pupated in G0 generation, although earlier than the control. Similarly, null mutation of Fgfr1 or Fgfr2 in mouse is embryonic lethal (Arman et al., 1998; Deng et al., 1994; Yamaguchi et al., 1994). In D. melanogaster, homozygous Htl (Fgfr) mutant embryos die during late embryogenesis, too (Beati et al., 2020; Beiman et al., 1996; Gisselbrecht et al., 1996). However, in H. armigera, 91% of larvae successfully pupated in G0 generation after Fgfr1 knockout. The low death rate after Cad96ca and Fgfr1 knockout might be because of following reasons, including the editing efficiency (67% and 61% for Cad96ca mutant and Fgfr1 mutant, respectively), the chimera of the gene knockout at the G0 generation, and the redundant RTKs that play similar roles in JH signaling, similar to the redundant roles of MET and Germ-cell expressed bHLH-PAS (GCE) in JH signaling (Liu et al., 2009), which needs to obtain alive G1 homozygote mutants and double knockout of these two receptors in future study. We indeed observed that the eggs did not hatch successfully after mixed-mating of G0 Cad96ca mutant or Fgfr1 mutant, respectively, but the reason was not addressed further due to the embryonic death. By the similar reasons, most of the Cad96ca and Fgfr1 mutants showed a slight acceleration of pupation (about one day) without the typical precocious metamorphosis (at least one instar earlier) phenotype caused by JH signaling defects (Daimon et al., 2012; Fukuda, 1944; Riddiford et al., 2010) and JH pathway gene deletions (Abdou et al., 2011; Liu et al., 2009). On other side, JH can regulate gene transcription by diffusing into cells and binding to the intracellular receptor MET to conduct JH signal, which might affect the results of gene knockdown and knockout.".

      (8) The expression profile of the RTK hits described in Supplementary Figure 4A appears to be limited to the last larval stage until pupation. The authors should clarify whether these receptors are expressed earlier, and the meaning of the letters in the plot should be described in the figure legend.

      Thank you for the suggestion. We added the explanation in the Figure 1—figure supplement 4 legend, as described in the text: "The expression profiles of Vegfr1, Drl, Cad96ca, Nrk, Fgfr1, and Wsck during development. 5F: fifth instar feeding larvae; 5M: fifth instar molting larvae; 6th-6 h to 6th-120 h: sixth instar at 6 h to sixth instar 120 h larvae; P0 d to P8 d: pupal stage at 0-day to pupal stage at 8-day F: feeding stage; M: molting stage; MM: metamorphic molting stage; P: pupae.".

      We are very sorry, but due to time limitations, we will investigate the expression profile of RTK throughout the larval stage in future work.

      (9) In Figure 4, panels F and G, the levels of Kr-h1 are shown in cad96ca and FGF1 mutants in the last larval stage. The authors should indicate whether Kr-h1 levels are also low in earlier larval stages or only detected in the last larval stage, as this would imply that these RTKs are only required at this stage.

      Thank you for your suggestion. In this study, the Cad96ca and Fgfr1 mutants' feeding stage was shortened in the sixth instar, and they entered the metamorphic molting stage earlier. So, we detected the expression of Kr-h1 in the sixth instar. It is an excellent idea to detect the expression of Kr-h1 at various larvae stages to analyze the stages in which CAD96CA and FGFR1 play a role and to study the relationship between CAD96CA and FGFR1 in future.

      (10) While Figure 5 demonstrates JH-triggered calcium ion mobilization in Sf9 cells and S2 cells, the authors should also include data on JH signaling target genes, such as Kr-h1, for a more comprehensive analysis.

      Thank you for your advice. We added experiments, as described in the text: "To demonstrate the universality of CAD96CA and FGFR1 in JH signaling in different insect cells, we investigated JH-triggered calcium ion mobilization and Kr-h1 expression in Sf9 cells developed from S. frugiperda and S2 cells developed from D. melanogaster. Knockdown of Cad96ca and Fgfr1 (named Htl or Btl in D. melanogaster), respectively, significantly decreased JH III-induced intracellular Ca2+ release and extracellular Ca2+ influx, and Kr-h1 expression (Figure 5A, B, Figure 5—figure supplement 1A and B). The efficacy of RNAi of Cad96ca and Fgfr1 was confirmed in the cells (Figure 5—figure supplement 1C and D), suggesting that CAD96CA and FGFR1 had a general function to transmit JH signal in S. frugiperda and D. melanogaster.".

      (11) The authors should consider improving the quality of images and some plots, particularly enlarging panels showing larval and pupal phenotypes, such as Figure 1B and Supplementary Figure C. Additionally, adding a plot showing the statistical analysis of the phenotype in Supplementary Figure C would enhance clarity. Some plots are overly busy and difficult to read due to small size, such as Figure 1C, Figure 2A, and all the plots in Figure 3. Figure 4E also requires improvement for better readability.

      Thank you for your suggestion. We have adjusted Figure 1B, Figure 1C, Figure 1—figure supplement 1C, Figure 2A and Figure 4E. However, for Figure 3, we have not found a better way to arrange and adapt them, considering the overall arrangement of the results and the page space, so we keep them in their original state.

    1. eLife Assessment

      This is a fundamental body of work reporting anatomical, molecular, and functional mapping of the central complex in Drosophila. There were a few concerns of a minor nature, and all were addressed by the authors. The tools generated and the findings, which include characterization of neuromodulators used by different cells, will undoubtedly serve as a foundation for future studies of this brain region. The data are compelling and likely to have a major impact.

    2. Reviewer #1 (Public review):

      Summary:

      This work is meant to help create a foundation for future studies of the Central Complex, which is a critical integrative center in the fly brain. The authors present a systematic description of cellular elements, cell type classifications, behavioral evaluations and genetic resources available to the Drosophila neuroscience community.

      Strengths:

      The work contributes new, useful and systematic technical information in compelling fashion to support future studies of the fly brain. It also continues to set a high and transparent standard by which large-scale resources can be defined and shared.

      Weaknesses:

      Manuscript revisions by the authors addressed all proposed weaknesses from the original version.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Wolff et al. describe an impressive collection of newly created split-GAL4 lines targeting specific cell types within the central complex (CX) of Drosophila. The CX is an important area in the brain that has been involved in the regulation of many behaviors including navigation and sleep/wake. The authors advocate that to fully understand how the CX functions, cell-specific driver lines need to be created. In that respect, this manuscript will be of very important value to all neuroscientists trying to elucidate complex behaviors using the fly model. In addition, and providing a further very important finding, the authors went on to assess neurotransmitter/neuropeptides and their receptors expression in different cells of the CX. These findings will also be of great interest to many and will help further studies aimed at understanding the CX circuitries. The authors then investigated how different CX cell types influence sleep and wake. While the description of the new lines and their neurochemical identity is excellent, the behavioral screen seems to be unfinished and could have been more matured.

      Strengths:

      (1) The description of dozens of cell-specific split-GAL4 lines is extremely valuable to the fly community. The strength of the fly system relies on the ability to manipulate specific neurons to investigate their involvement in a specific behavior. Recently, the need to use extremely specific tools has been highlighted by the identification of sleep-promoting neurons located in the VNC of the fly as part of the expression pattern of the most widely used dorsal-Fan Shaped Body (dFB) GAL4 driver. These findings should serve as a warning to every neurobiologist, make sure that your tool is clean. In that respect, the novel lines described in this manuscript are fantastic tools that will help the fly community.<br /> (2) The description of neurotransmitter/neuropeptides expression pattern in the CX is of remarkable importance and will help design experiments aimed at understanding how the CX functions.

      Weaknesses:

      (1) I find the behavioral (sleep) screen of this manuscript to be incomplete. It appears to me that this part of the paper is not as developed as it could be. The authors have performed neuronal activation using thermogenetic and/or optogenetic approaches. For some cell types, only thermogenetic activation is shown. There is no silencing data and/or assessment of sleep homeostasis or arousal threshold. The authors find that many CX cell types modulate sleep and wake but it's difficult to understand how these findings fit one with the other. It seems that each CX cell type is worthy of its own independent study and paper. I am fully aware that a thorough investigation of every CX neuronal type in sleep and wake regulation is a herculean task. So, altogether I think that this manuscript will pave the way for further studies on the role of CX neurons in sleep regulation.<br /> (2) Linked to point 1, it is possible that the activation protocols used in this study are insufficient for some neuronal types. The authors have used 29{degree sign} for thermogenetic activation (instead of the most widely used 31{degree sign}) and a 2Hz optogenetic activation protocol. The authors should comment on the fact that they may have missed some phenotypes by using these mild activation protocols.<br /> (3) There are multiple spelling errors in the manuscript that need to be addressed.

      Comments on revisions:

      I am satisfied with the authors response. This paper provides excellent starting points for additional studies into the role of different CX cell types in sleep and wake.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This work is meant to help create a foundation for future studies of the Central Complex, which is a critical integrative center in the fly brain. The authors present a systematic description of cellular elements, cell type classifications, behavioral evaluations and genetic resources available to the Drosophila neuroscience community.

      Strengths:

      The work contributes new, useful and systematic technical information in compelling fashion to support future studies of the fly brain. It also continues to set a high and transparent standard by which large-scale resources can be defined and shared.

      Weaknesses:

      manuscript p. 1

      "The central complex (CX) of the adult Drosophila melanogaster brain consists of approximately 2,800 cells that have been divided into 257 cell types based on morphology and connectivity (Scheer et al., 2020; Hulse et al. 2021; Wolff et al., 2015)."

      The 257 accumulated cell types have informational names (e.g., PBG2‐9.s‐FBl2.b‐NO3A.b) in addition to their associations with specific Gal4 lines and specific EM Body IDs. All this is very useful. I have one suggestion to help a reader trying to get a "bird's eye view" of such a large amount of detailed and multi-layered information. Give each of the 257 CX cell types an arbitrary number: 1 to 257. In fact, Supplemental File 2 lists ~277 cell types each with a number in sequence, so perhaps in principle, it is there. This could expedite the search function when a reader is trying to cross-reference CX cell type information from the text, to the Figures and/or to the Supplemental Figures. Also, the use of (arbitrary) cell type numbers could expedite the explanation of which cell types are included in any compilation of information (e.g., which ones were tested for specific NT expression).

      In this report we adhered to the nomenclature introduced in Hulse et al. 2021. We agree that the nomenclature of cell types in the CX is imperfect. There are inherent limitations to what can be done with present data. Even between the hemibrain and FAFB/Flywire EM datasets, it was not possible to derive a one-to-one correspondence in many cases, largely because we do not yet have enough information to distinguish between natural variation within a cell type and distinct cell types (see Schlegel et al. 2024).  Moreover, many cell type distinctions depend on connectivity differences that are observable only in EM datasets but not in LM images. Several research groups are currently engaged in a comprehensive and collaborative effort to update the CX nomenclature that will extend over the next few months as additional connectomes become available. This work will require hundreds of hours of effort from anatomical and computational experts in multiple laboratories who have a strong interest in the CX. Since the correspondence between the established Hulse et al nomenclature we use and this new nomenclature will be made clear, it will be easy to transfer our data to that new nomenclature. For all these reasons, we believe we should not unilaterally introduce any new naming systems at this time.

      manuscript p 2

      "Figure 2 and Figure 2-figure supplements 1-4 show the expression of 52 new split-GAL4 lines with strong GAL4 expression that is largely limited to the cell type of interest. .... We also generated lines of lesser quality for other cell types that in total bring overall coverage to more than three quarters of CX cell types."

      This section describes the generation and identification of specific split Gal4 lines, and the presentation is generally excellent. It represents an outstanding compendium of information. My reading of the text suggests ~200 cell types have Gal4 lines that are of immediate use (having high specificity or v close-to-high). Use of an arbitrary number system (mentioned above) could augment that description for the reasons stated. For example, which of the 257 cell types are represented by split Gal4 lines that constitute the ~1/3 representing "high-quality lines "? A second comment relates to this study 's functional analysis of the contributions of CX cell types to sleep physiology. The recent literature contains renewed interest in the specific expression patterns of Gal4 lines that can promote sleep-like behaviors. In particular Gal4 line expression outside the brain (in the VNC and outside the CNS) have been raised as important elements that need be included for interpretation interpretation of sleep regulation. This present study offers useful information about a large number of expression patterns, as well as a basis with which to seek additional information., including mention of VNC expression in many cases However, perhaps I missed it, but I could not find a short description of the over-all strategy used to describe the expression patterns and feel that could be helpful. Were all Gal4 lines studied for expression in the VNC? and in the peripheral NS? It is probably published elsewhere, but even a short reprise would still be useful.

      We added a couple of sentences to clarify that the lines were imaged in the adult female brain and VNC and many were also imaged in males. These data, including the ability to download the original confocal stacks, are contained in an on-line web source cited in the text. We also make clear that we did not assay expression outside of the brain, optic lobes and VNC. Therefore, we cannot rule out expression in the peripheral nervous system (other than detected in the axons of sensory neurons in the CNS) or in muscle or other non-neuronal cell types.

      manuscript p 9

      Neurotransmitter expression in CX cell types

      "To determine what neurotransmitters are used by the CX cell types, we carried out fluorescent in situ hybridization using EASI-FISH (Eddison and Irkhe, 2022; Close et al., 2024) on brains that also expressed GFP driven from a cell-type-specific split GAL4 line. In this way, we could determine what neurotransmitters were expressed in over 100 different CX cell types based on ...."

      Reading this description, I was uncertain whether the >100 cell types mentioned were tested with all the NT markers by EASI-FISH? Also, assigning arbitrary numbers to the cell types (same suggestion as above) could help the reader more readily ascertain which were the ~100 cell types classified in this context.

      The specific probes used for each cell type are indicated in Figure 9 and in Supplemental File 1.

      manuscript p 10

      "Our full results are summarized below, together with our analysis of neuropeptide expression in the same cell types."

      I recommend specifying which Figures and Tables contain the "full results" indicated.

      We changed the wording to read:

      “Our full results are summarized, together with our analysis of neuropeptide expression in the same cell types, in Figures 5 -9 and in Supplemental File 1.”

      NP expression in CX cell types

      Similar to the comments regarding studies of NT expression: were all ~100 cell types tested with each of the 17 selected NPs? Arbitrary numerical identifies could be useful for the reader to determine which cell types/ lines were tested and which were not yet tested.

      We expanded the description in Methods to now read:

      “For neurotransmitters, the specific probes used for each cell type are indicated in Figure 9 and in Supplemental File 1. For neuropeptides, each of the 17 selected NP probes shown in Figure 5—figure supplement 1 was used on all cell types in Figure 9 except those marked by “—” in the neuropeptide column.”

      manuscript p. 11

      "The neuropeptide expression patterns we observed fell into two broad categories."

      This section presents information that is extensive and extremely useful. It supports consideration of peptidergic cell signaling at a circuits level and in a systematic fashion that will promote future progress in this field. I have two comments. First, regarding the categorization of two NP expression patterns, discernible by differences in cell number: this idea mirrors one present in prior literature. Recently the classification of the transcription factor DIMM summarizes this same two-way categorization (e.g., doi: 10.1371/journal.pone.0001896). That included the fact that a single NP can be utilized by cell of either category.

      We inserted a sentence to acknowledge this earlier work:

      “Such large neurosecretory cells often express the transcription factor DIMM (Park et al. 2008).”

      Second, regarding this comment:

      "In contrast, neuropeptides like those shown in Figure 6 appear to be expressed in dozens to hundreds of cells and appear poised to function by local volume transmission in multiple distinct circuits."

      Signaling by NPs in this second category (many small cells) suggests more local diffusion, a smaller geographic expanse compared to "volume" signaling by the sparser larger peptidergic cells. Given this, I suggest re-consideration in using the term "volume" in this instance, perhaps in favor of "local" or "paracrine". This is only a suggestion and in fact rests almost entirely on speculation/ interpretation, as the field lacks a strong empirical basis to say how far NPs diffuse and act. A recent study in the fly brain of peptide co-transmitters (doi: 10.1016/j.cub.2020.04.025) provides an instructive example in which differences between the spatial extents of long-range (peptide 1) versus short-range (peptide 2) NP signaling may be inferred in vivo.

      We have modified the text to now read:

      “those shown in Figure 6 are expressed in dozens to hundreds of cells and appear poised to function by transmission to nearby cells in multiple distinct circuits.”  

      Spab was mentioned (Figure 6 legend) but discarded as a candidate NP to include based on a personal communication, as was Nplp1. The manuscript did not include reasons to do so, nor include a reference to spab peptide. I suggest including explicit reasons to discard candidate NPs.

      While there is strong supportive evidence for many NPs in Drosophila, the fact that other transcripts express NPs is more circumstantial often relying simply on sequence analysis and without convincing evidence for a specific cognate receptor. We note that Spab is not listed as a neuropeptide in the current release of FlyBase. In these cases, we relied on the opinion of individuals with extensive experience in studying Drosophila NPs. The results obtained with the probes for Spab and Nplp1 are still available in Supplemental File 1.

      In Fig 9-supplement 1, neurotransmitter biosynthetic enzymes were measured by RNA-seq for given CX cell types to augment the cell type classification. The same methods could be used to support cell type classification regarding putative peptidergic character (in Figure 9 supplement 2) by measuring expression levels of critical, canonical neuropeptide biosynthetic enzymes. These include the proprotein convertase dPC2 (amon); the carboxypeptidase dCPD/E (silver); and the amidating enzymes dPHM; dPal1; dPal2. PHM is most related to DBM (dopamine beta monooxygenase), the rate limiting enzyme for DA production, and greater than 90% of Drosophila neuropeptides are amidated. If the authors are correct in surmising widespread use of NPs by CX cell types (and I expect they are), there could be diagnostic value to report expression levels of this enzyme set across many/most CX cell types.

      In our admittedly limited experience, most cells express these enzymes and the level we observed in confirmed NP expressing cell types was not reproducibly higher.  (The complete data for all genes for the cell types we assayed are available from our deposition in the NCBI Gene Expression Omnibus with accession number GSE271123.) Given our small sample size we chose not to comment on this in the paper.

      Comment #6

      Screen of effects on Sleep behavior

      This work is large in scope and as suggested likely presents excellent starting points for many follow-up studies. I again suggest assigning stable number identities to the elements described. In this case, not cell types, but split Gal4 lines. This would expedite the cross-referencing of results across the four Supplemental Files 3-6. For example, line SS00273 is entry line #27 in S Files 3 and 4, but line entry #18 in S Files 5 and 6.

      We believe the interested reader can make this correspondence by searching the supplemental files which are excel spreadsheets. We note that both driver lines and cell types have stable identifiers that are used across Figures and Tables: the line numbers (for example, SS00273) for driver lines and the Hulse et al cell type names for cell types.

      manuscript p 26

      Clock to CX

      "Not surprisingly, the connectome reveals that many of the intrinsic CX cell types with sleep phenotypes are connected by wired pathways (Figure 12 and Figure 12-figure supplement 1)."

      Do intrinsic CX cells with sleep phenotypes also connect by wired pathways to CX cells that do not have sleep phenotypes?

      Yes, but we do not have high confidence that negative sleep phenotypes in our assays indicate no role in sleep.

      "The connectome also suggested pathways from the circadian clock to the CX. Links between clock output DN1 neurons to the ExR1 have been described in Lamaze et al. (2018) and Guo et al. (2018), and Liang et al. (2019) described a connection from the clock to ExR2 (PPM3) dopaminergic neurons."

      The introduction to this section indicates a focus on connectome-defined synaptic contacts. Whereas the first two studies cited featured both physiological and anatomic evidence to support connectivity from clock cells to CX, the third did not describe any anatomical connections, and that connection may in fact be due to diffuse not synaptic signaling

      I could not easily discern the difference between Figs 12 and 12-S1? These appear to be highly-related circuit models, wherein the second features more elements. Perhaps spell out the basis for the differences between the two models to avoid ambiguity.

      We clarify the supplemental diagram differs from the one in the main text by the inclusion of additional connections:

      “The strongest of these connections are diagrammed in Figure 12, with Figure 12—figure supplement 1 also showing additional weaker connections.”

      "...the cellular targets of Dh31 released from ER5 are unknown, however previous work (Goda et al., 2017; Mertens et al., 2005; Shafer et al., 2008) has shown that Dh31 can activate the PDF receptor raising the possibility of autocrine signaling."

      Regarding pharmacological evidence for Dh31 activation of Pdfr: strong in vivo evidence was developed in doi: 10.1016/j.neuron.2008.02.018: a strong pdfr mutation greatly reduces response to synthetic dh31 in neurons that normally express Pdfr

      We added the Shafer et al., 2008 reference. 

      manuscript p 30

      "Unexpectedly, we found that all neuropeptide-expressing cell types also expressed a small neurotransmitter."

      Did this conclusion apply only to CX cell types? - or was it also true for large peptidergic neurons? Prior evidence suggests the latter may not express small transmitters (doi: 10.1016/j.cub.2009.11.065). The question pertains to the broader biology of peptidergic neurons, and is therefore outside the strict scope of the main focus area - the CX. However, the text did initially consider peptidergic neurons outside the CX, so the information may be pertinent to many readers.

      We did not look at other cell types in the current study and so cannot provide an answer.

      Reviewer #2 (Public review):

      Summary:

      In this paper, Wolff et al. describe an impressive collection of newly created split-GAL4 lines targeting specific cell types within the central complex (CX) of Drosophila. The CX is an important area in the brain that has been involved in the regulation of many behaviors including navigation and sleep/wake. The authors advocate that to fully understand how the CX functions, cell-specific driver lines need to be created. In that respect, this manuscript will be of very important value to all neuroscientists trying to elucidate complex behaviors using the fly model. In addition, and providing a further very important finding, the authors went on to assess neurotransmitter/neuropeptides and their receptors expression in different cells of the CX. These findings will also be of great interest to many and will help further studies aimed at understanding the CX circuitries. The authors then investigated how different CX cell types influence sleep and wake. While the description of the new lines and their neurochemical identity is excellent, the behavioral screen seems to be limited.

      Strengths:

      (1) The description of dozens of cell-specific split-GAL4 lines is extremely valuable to the fly community. The strength of the fly system relies on the ability to manipulate specific neurons to investigate their involvement in a specific behavior. Recently, the need to use extremely specific tools has been highlighted by the identification of sleep-promoting neurons located in the VNC of the fly as part of the expression pattern of the most widely used dorsal-Fan Shaped Body (dFB) GAL4 driver. These findings should serve as a warning to every neurobiologist, make sure that your tool is clean. In that respect, the novel lines described in this manuscript are fantastic tools that will help the fly community.

      (2) The description of neurotransmitter/neuropeptides expression pattern in the CX is of remarkable importance and will help design experiments aimed at understanding how the CX functions.

      Weaknesses:

      (1) I find the behavioral (sleep) screen of this manuscript to be limited. It appears to me that this part of the paper is not as developed as it could be. The authors have performed neuronal activation using thermogenetic and/or optogenetic approaches. For some cell types, only thermogenetic activation is shown. There is no silencing data and/or assessment of sleep homeostasis or arousal threshold. The authors find that many CX cell types modulate sleep and wake but it's difficult to understand how these findings fit one with the other. It seems that each CX cell type is worthy of its own independent study and paper. I am fully aware that a thorough investigation of every CX neuronal type in sleep and wake regulation is a herculean task. So, altogether I think that this manuscript will pave the way for further studies on the role of CX neurons in sleep regulation.

      (2) Linked to point 1, it is possible that the activation protocols used in this study are insufficient for some neuronal types. The authors have used 29{degree sign} for thermogenetic activation (instead of the most widely used 31{degree sign}) and a 2Hz optogenetic activation protocol. The authors should comment on the fact that they may have missed some phenotypes by using these mild activation protocols.

      Our primary goal was to test the feasibility of using these tools in assessing sleep and wake function of neurons within the CX. In the process we uncovered several new neurons within the DFB-EB network that control sleep and make connections with previously identified sleep regulating neurons. For all single cell type lines and lines with sparse patterns and no VNC expression we present both optogenetics and thermogenetic data. The lines for which we only have thermogenetic but no optogenetic data are those which have multiple cell types or VNC expression. We felt that optogenetic data for these non-specific or contaminated lines would not reliably indicate a role for individual cell types in sleep regulation.

      Many previous studies that have used 31 degrees have done so for shorter durations and often using different times of the day for manipulations. The lack of consistency between studies using this temperature may be due in part to the fact that 31 degrees alters behaviors of flies (including controls) and, for this reason, is usually not used for 24-hour activation durations.

      To keep the screen consistent and ensure we capture changes in both daytime and nighttime sleep we used 29 degrees. The behavior of control flies is not as disrupted or altered at this temperature, and 29 degrees for activation is routinely used in behavioral experiments.

      We similarly selected an optogenetic stimulation protocol that minimizes the response of flies to the red-light pulses. We chose this protocol because we found, in earlier experiments in a different project, that this level of stimulation was able to elicit activation phenotypes across a range of cell types (including several known clock neurons). However, we cannot rule out false negatives in both the TrpA and optogenetic experiments and agree that we might have missed some phenotypes.

      Finally, as the reviewer rightfully points out, a thorough, detailed investigation of each cell type is a herculean task. We screened in both genders with very sparse, and often cell-type-specific, driver lines while using two distinct modes of activation and different methods for assessing sleep. For these reasons, we believe the GAL4 lines we identified provide excellent starting points for the additional investigations that will be required to better understand the roles of specific cell types.

      (3) There are multiple spelling errors in the manuscript that need to be addressed.

      Reviewer #3 (Public review):

      Summary:

      The authors created and characterized genetic tools that allow for precise manipulation of individual or small subsets of central complex (CX) cell types in the Drosophila brain. They developed split-GAL4 driver lines and integrated this with a detailed survey of neurotransmitter and neuropeptide expression and receptor localization in the central brain. The manuscript also explores the functional relevance of CX cell types by evaluating their roles in sleep regulation and linking circadian clock signals to the CX. This work represents an ambitious and comprehensive effort to provide both molecular and functional insights into the CX, offering tools and data that will serve as a critical resource for researchers.

      Strengths:

      (1) The extensive collection of split-GAL4 lines targeting specific CX cell types fills a critical gap in the genetic toolkit for the Drosophila neuroscience community.

      (2) By combining anatomical, molecular, and functional analyses, the authors provide a holistic view of CX cell types that is both informative and immediately useful for researchers across diverse disciplines.

      (3) The identification of CX cell types involved in sleep regulation and their connection to circadian clock mechanisms highlights the functional importance of the CX and its integrative role in regulating behavior and physiological states.

      (4) The authors' decision to present this work as a single, comprehensive manuscript rather than fragmenting it into smaller publications each focusing on separate central complex components is commendable. This decision prioritizes accessibility and utility for the broader neuroscience community, which will enable researchers to approach CX-related questions with a ready-made toolkit.

      Weaknesses:

      While the manuscript is an outstanding resource, it leaves room for more detailed mechanistic exploration in some areas. Nonetheless, this does not diminish the immediate value of the tools and data provided.

      Appraisal:

      The authors have succeeded in achieving their aims of creating well-characterized genetic tools and providing a detailed survey of neurochemical and functional properties in the CX. The results strongly support their conclusions and open numerous avenues for future research. The work effectively bridges the gap between genetic manipulation, molecular characterization, and functional assessment, enabling a deeper understanding of the CX's diverse roles.

      Impact and Utility

      This manuscript will have a significant and lasting impact on the field, providing tools and data that facilitate new discoveries in the study of the CX, sleep regulation, circadian biology, and beyond. The genetic tools developed here are likely to become a standard resource for Drosophila researchers, and the comprehensive dataset on neurotransmitter and neuropeptide expression will inspire investigations into the interplay between neuromodulation and classical neurotransmission.

      Additional Context

      The breadth and depth of the resources presented in this manuscript justify its publication without further modification. By delivering an integrated dataset that spans anatomy, molecular properties, and functional relevance, the authors have created a resource that will serve the neuroscience community for years to come.

      Recommendations for the authors:

      Reviewing Editor:

      The reviewers suggest that a nomenclature, perhaps a numbering system, be adopted for different cell types and Gal4 drivers in order to facilitate reading of the manuscript and cross-referencing.

      We agree that a comprehensive reanalysis of the CX nomenclature is in order, but it is premature for us to attempt that as part of this study. This is best done after additional connectomes are generated to help resolve the degree of variation in morphology and connectivity between the same cell in multiple animals.

      Reviewer #3 (Recommendations for the authors):

      The authors have characterized a large number of split-GAL4 drivers targeting individual or small subsets of CX cell types. This manuscript delivers a detailed anatomical, molecular, and functional mapping of the CX.

      By integrating data on neurotransmitters, neuropeptides, and their receptors, the authors provide a holistic view of CX cell types that will undoubtedly serve as a foundation for future studies.

      The use of these genetic tools to identify CX cell types affecting sleep, as well as those linking the circadian clock to the CX, represents a significant advance. These findings hint at the diverse and integrative roles of the CX in regulating both behavior and physiological states.

      The authors' decision to present this work as a single, comprehensive manuscript rather than fragmenting it into smaller publications each focusing on separate central complex components is commendable. This decision prioritizes accessibility and utility for the broader neuroscience community, which will enable researchers to approach CX-related questions with a ready-made toolkit.

      While the manuscript leaves room for further exploration and mechanistic studies, the breadth and depth of the resources presented are more than sufficient to justify publication in their current form.

      The data on neuropeptide and receptor expression patterns, especially the observation that all examined CX cell types co-express a small neurotransmitter, opens intriguing new avenues of inquiry into the interplay between classical neurotransmission and neuromodulation in this region.

      This manuscript has provided a much-needed resource for the Drosophila neuroscience community and beyond. This work will facilitate important discoveries in CX function, sleep regulation, circadian biology, and more.

    1. eLife Assessment

      This important study challenges conventional life-history theory by demonstrating that reproductive-survival trade-offs are minimal in birds, except when reproductive effort is experimentally exaggerated. The evidence is solid, drawing from a meta-analysis of over 30 bird species, and effectively separates the effects of individual quality from reproductive costs. The findings will be of broad interest to evolutionary biologists and ecologists studying life-history trade-offs and reproductive strategies.

    2. Reviewer #4 (Public review):

      Summary:

      This is an important study that underscores that reproduction-survival trade-offs are not manifested (contrary to what generally accepted theory predicts) across a range of studies on birds. This has been studied by a meta-analytical approach, gathering data from a set of 46 papers (30 bird species). The overall conclusion is that there are no trade-offs apparent unless experimental manipulations push the natural variability to extreme values. In the wild, the general pattern for within-species variation is that birds with (naturally) larger clutches survive better.

      Likely impact:

      I think this is an important contribution to a slow shift in how we perceive the importance of trade-offs in ecology and evolution in general. While the current view still is that one individual excelling in one measure of its life history (i.e. receiving benefits) must struggle (i.e. pay costs) in another part. However, a positive correlation between all aspects of life history traits is possible within an individual (such as due to developmental conditions or fitting to a particular environment). Simply, some individuals can perform generally better (be of good quality than others).

    3. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #4 (Public review):

      We would like to thank the reviewer for their careful consideration of our manuscript. The suggestions have been useful in improving our manuscript. Please see our responses to the specific comments below.

      Summary:

      This is an important study that underscores that reproduction-survival trade-offs are not manifested (contrary to what generally accepted theory predicts) across a range of studies on birds. This has been studied by a meta-analytical approach, gathering data from a set of 46 papers (30 bird species). The overall conclusion is that there are no trade-offs apparent unless experimental manipulations push the natural variability to extreme values. In the wild, the general pattern for within-species variation is that birds with (naturally) larger clutches survive better.

      Strengths:

      I agree this study highlights important issues and provides good evidence of what it claims, using appropriate methods.

      Weaknesses:

      I also think, however, that it would benefit from broadening its horizon beyond bird studies. The conclusions can be reinforced through insights from other taxa. General reasoning is that there is positive pleiotropy (i.e. individuals vary in quality and therefore some are more fit (perform better) than others. Of course, this is within their current environment (biotic, abiotic, social. ...), with consequences of maintaining genetic variation across generations - outlined in Maklakov et al. 2015 (https://doi.org/10.1002/bies.201500025). This explains the outcomes of this study very well and would come to less controversy and surprise for a more general audience.

      I have two fish examples in my mind where this trade-off is also discounted. Of course, given that it is beyond brood-caring birds, the wording in those studies is slightly different, but the evolutionary insight is the same. First, within species but across populations, Reznick et al. (2004, DOI: 10.1038/nature02936) demonstrated a positive correlation between reproduction and parental survival in guppies. Second, an annual killifish study (2021, DOI: 10.1111/1365-2656.13382) showed, within a population, a positive association between reproduction and (reproductive) aging.

      In fruit flies, there is also a strong experimental study demonstrating the absence of reproduction-lifespan trade-offs (DOI: 10.1016/j.cub.2013.09.049).

      I suggest that incorporating insights from those studies would broaden the scope and reach of the current manuscript.

      We would like to thank the reviewer for this useful insight and for highlighting these studies. We have added detail in our discussion around positive correlations observed in the wild, and how positive pleiotropy has been presented as an explanation. We have also added the suggested studies as references to demonstrate the reproduction-lifespan trade-off has been shown to be absent. See lines 257-260.

      Likely impact:

      I think this is an important contribution to a slow shift in how we perceive the importance of trade-offs in ecology and evolution in general. While the current view still is that one individual excelling in one measure of its life history (i.e. receiving benefits) must struggle (i.e. pay costs) in another part. However, a positive correlation between all aspects of life history traits is possible within an individual (such as due to developmental conditions or fitting to a particular environment). Simply, some individuals can perform generally better (be of good quality than others).

      We would like to thank the reviewer for highlighting the importance of our study. We hope our study will help the research community reflect on the importance of trade-offs between life-history traits and consider other possible explanations as to why variation in life-history traits is maintained within species.

    1. eLife Assessment

      This important study presents a transcriptomic analysis of enterochromaffin cells in the intestine. The evidence supporting the authors' claims is solid, although the functional analysis is focused on the Piezo2-expressing subset in the colon. The work will be of interest to biologists working on intestinal mucosal biology.

    2. Reviewer #2 (Public review):

      Summary:

      The authors investigated the expression profile of enterochromaffine (EC) cells after creating a new tryptophan hydroxylase 1 (Tph1) GFP-reporter mouse using scRNAseq and confirmative RNAscope analysis. They distinguish 14 clusters of Tph1+ cells found along the gut axis. The manuscript focuses on two of these, (i) a multihormonal cell type shown to express markers of pathogen/toxin and nutrient detection in the proximal small intestine, and (ii) on a EC-cluster in the distal colon, which expresses Piezo2, rendering these cells mechanosensitive. In- and ex- vivo data explore the role of the mechanosensitive EC population for intestinal/colonic transit, using chemogenetic activation, diptheria-toxin receptor dependent cell ablation and conditional gut epithelial specific Piezo2 knock-out. Whilst some of these data are confirmative of previous reports - Piezo2 has been implicated in mechanosensitive serotonin release previously, as referred to by the authors - the data are solid and emphasize the importance of mechanosensitive serotonin release for colonic propulsion. The transcriptomic data will guide future research.

      Strengths:

      The transcriptomic data, whilst confirmative, is more granular than previous data sets. Employing new tools to establish a role of mechanosensitive EC cells for colonic and thus total intestinal transit.

      Weaknesses:

      (1) The proposed villus/crypt distribution of the14 cell types is not verified adequately. The RNAscope and immunohistochemistry samples presented do not allow assessment if this interpretation is correct - spatial transcriptomics, now approaching single cell resolution, likely will help to verify this claim.

      (2) The physiological function and/or functionality of most of the transcriptomically enriched gene products has not been assessed. Whilst a role for Piezo2 expressing cells for colonic transit is convincingly demonstrated the nature of the mechanical stimulus or the stimulus-secretion coupling downstream of Piezo2 activation is not clear.

      Comments on revisions: I am happy with the manuscript as is.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors have performed extensive work generating reporter mice and performing single-cell analysis combined with in situ hybridization to arrive at 14 clusters of enterochromaffin (EC) cells. Then, they focus on Piezo channel expression in distal EC cells and find that these channels might play a role in regulating colonic motility. Overall, this is an informative study that comprehensively classifies EC cells in different regions of the small and large intestine. From a functional point of view, however, the authors seem to ignore the fact that the expression of Piezo-2-IRES-Cre is broad, which would raise concerns regarding their physiological conclusions.

      The authors may wish to consider the following specific points: 

      It is surprising that the number of ileal EC cells is less than that of the distal colon, and it would be interesting to know whether the authors can comment about ileal EC cells. It is unclear why ileal ECs were not included in the study, even though they are mentioned in the diagram (Fig. 2c).

      We have discussed the rationale for excluding ileal ECs in the methods section under “Elimination of ileal GFP+ cells”. In our initial scRNA-seq experiment, our yield of epithelial cells and GFP positive cells was low, and a large proportion of these cells appeared to not have fully committed to the EC lineage. Also to note, we have previously seen fewer ECs in the distal ileum than upper small intestine and colon (PMID: 26803512). Given the low yield, and some uncertainty regarding the nature of the ileal EC population sorted by our methods, we considered that data from ileal ECs may not be an accurate representation of ileal EC cell diversity. Thus, we did not use ileal ECs in our second scRNA-seq experiment.

      Based on their analysis, there are 10 EC cell clusters in SI while there are only 4 clusters in the colon. The authors should comment on whether this is reflective of lesser diversity among colonic ECs or due to the smaller number of colonic ECs collected.

      The 4 clusters identified in the colon are consistent with previous a previous publication (Glass et al., Mol. Metab. 2017, PMID: 29031728), supporting the idea that these clusters are representative of the major clusters of colonic ECs. Nonetheless, we anticipate that with greater sample sizes (in any region) further resolution of subtypes could be resolved. 

      The authors previously described that distal colonic EC cells exhibit various morphologies (Kuramoto et al., 2021). Do Ascl1(+) EC cells particularly co-localize with EC cells with long basal processes? Also, to validate the RNA seq data, the authors might show co-localization between Piezo2/Ascl1/Tph1 in distal EC cells. It would be interesting to see whether Ascl1-CreER (which is available in Jax) specifically labels distal colonic EC cells as this could provide a good genetic tool to specifically manipulate distal colonic EC cells.

      We have shown co-localization between Piezo2/Ascl1/Tph1 in Supplementary Figure 6a. Unfortunately we did not study cell morphology in the Ascl1 smRNA-FISH experiments as these used thin cryosections, whereas morphological assessment of EC processes is best performed with thick (>60 µm) sections. It would be interesting if neuronal-like expression profiles correlate with neuronal-like morphology, which could be addressed in future studies with spatial transcriptomics. 

      The authors used Piezo2-IRES-Cre mice, whose expression is rather broad. They might examine the distribution of Chrm3-mCitrine in the intestine (IF/IHC would be straightforward). And if the expression is in other cell types (which is most likely the case), they should justify that the observed phenotype derives from Piezo2-expressing EC cells. Alternatively, they could use Piezo2-Cre;ePetFlp (or Vil-Flp);Chrm3 to specifically express DREADD receptors in distal colonic EC cells. Also, what does 5HT release look like in jejunal EC cells in Piezo-CHRM3 mice?

      Unfortunately we no longer have access to the animals to do these experiments.

      For the same reasons as above, DTR experiments may also be non-specific. For example, based on the IF staining (Fig. 6b,d), there seems to be a loss of Tph1+ cells in the proximal colon of Piezo2-DTR mice, so the effects of the Piezo2-DTR likely extend beyond the distal colon. 

      Figures 6b and d show distal colon, not proximal colon. Our Tph1<sup>+</sup> cell counts indicate there was no loss of Tph1 cells in the proximal colon following intraluminal administrations of DT. 

      It is unclear why the localized loss of Piezo2 in Piezo2-DTR mice alters small intestinal transit (Fig. 6g,h). The authors should discuss the functional differences observed between Piezo2-DTR (intraluminal app) and Vil1Piezo2 KO mice i.e., small intestinal transit, 5HT release, etc. Are these differences due to the residual Piezo2 expression in Piezo2 KO mice? In this context, the authors may want to discuss their findings in the context of recent papers, such as those from the Patapoutian and Ginty groups. 

      We have made the following amendment to speculate on the reason for delayed small intestinal transit in the DTR experiments:

      “There are a several possible explanations for this. Some Piezo2+ cells in the small intestine could have been depleted. Alternatively, 5-HT released from Piezo2+Tph1+ cells in the distal colon may provide feedback to the small intestine to accelerate motility, and thus depletion of these cells would result in slower intestinal transit.” 

      We have also added a comment speculating on why we did not see similar slowing of small intestinal transit in the Villlin-Cre Piezo2 KO:

      “No difference was observed in small intestine transit… in contrast to the DTR experiments, in which small intestinal transit was delayed. This could be due to the depletion of EC cells in the DTR experiments, whereas they are retained in the Villin-Cre Piezo2 KO mice. 5-HT secretion from ECs can be induced by other stimulants (even when Piezo2 is knocked out), and thus colonic 5-HT could be providing feedback to the small intestine to accelerate motility in the Villin-Cre Piezo2 KO mice. Residual Piezo2 expression in these mice could also be contributing to this effect.”

      We have added a comment on neural Piezo2 in the discussion:

      “However, in contrast to Piezo2 signalling in ECs which results in accelerated gut transit, Piezo2 signalling in DRG neurons appears to slow transit (refs: Wolfson et al., Cell 2023; PMID: 37541195; Servin-Venves et al., Cell 2023, PMID: 37541196).”

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the expression profile of enterochromaffin (EC) cells after creating a new tryptophan hydroxylase 1 (Tph1) GFP-reporter mouse using scRNAseq and confirmative RNAscope analysis. They distinguish 14 clusters of Tph1+ cells found along the gut axis. The manuscript focuses on two of these, (i) a multihormonal cell type shown to express markers of pathogen/toxin and nutrient detection in the proximal small intestine, and (ii) on a EC-cluster in the distal colon, which expresses Piezo2, rendering these cells mechanosensitive. In- and ex- vivo data explore the role of the mechanosensitive EC population for intestinal/colonic transit, using chemogenetic activation, diptheria-toxin receptor dependent cell ablation and conditional gut epithelial specific Piezo2 knock-out. Whilst some of these data are confirmative of previous reports - Piezo2 has been implicated in mechanosensitive serotonin release previously, as referred to by the authors - the data are solid and emphasize the importance of mechanosensitive serotonin release for colonic propulsion. The transcriptomic data will guide future research.

      Strengths:

      The transcriptomic data, whilst confirmative, is more granular than previous data sets. Employing new tools to establish a role of mechanosensitive EC cells for colonic and thus total intestinal transit. 

      Weaknesses: 

      (1) The proposed villus/crypt distribution of the 14 cell types is not verified adequately. The RNAscope and immunohistochemistry samples presented do not allow assessment of whether this interpretation is correct - spatial transcriptomics, now approaching single-cell resolution, would be likely to help verify this claim.

      Spatial transcriptomics would be excellent in validating the spatial distribution of the EC cell types in future studies. In our work, although the villus/crypt cluster annotations are assumptions (based on the differential expression of Neurog3, Tac1, and Sct, which is well supported by the literature), we have validated the spatial segregation of key markers. We quantified the crypt/villus location of Cartpt, Ucn3, and Trpm2 overlap with Tph1 (Figure 2d), Oc3, Cck, and Tph1 (Figure 3d), and TK/5-HT (Supplementary Fig 2d). This work supports our predictions on the spatial distribution of these clusters.

      (2) The physiological function and/or functionality of most of the transcriptomically enriched gene products has not been assessed. Whilst a role for Piezo2 expressing cells for colonic transit is convincingly demonstrated, the nature of the mechanical stimulus or the stimulus-secretion coupling downstream of Piezo2 activation is not clear.

      While we have not investigated the mechanical forces involved in activating Piezo2, we can at least say that physiological mechanical stimulation activates Piezo2, as we measured fecal pellet output in the DTR experiments. 

      Reviewer #2 (Recommendations For The Authors):

      (1) Please state (even more) clearly if/that the apparently GFP+/Tph1+ cells which clustered with the GFP- cells (Suppl. Fig1d/e) were excluded from the subsequent analysis. The detectable Chg-a/b expression in the GFP- cells in Suppl. Fig1f seems to suggest that these (if they have been included in the GFP- group here) are genuine ECs. How do these cells relate to the non-EC cells in Fig1d, which seem to lack Tph1 expression? And given the information in the methods, what %age of these cells derived from the ileum?

      To clarify, data shown in Suppl. Fig 1d/e/f was from our first single cell profiling experiment whereas our subsequent clustering analysis utilizes data from a second (independent) single cell profiling experiment (e.g. Fig1d). 

      In the first profiling experiment, 23% of GFP<sup>+</sup> cells clustered with GFP<sup>-</sup> cells, and for the purposes of Suppl. Figures 1d/e/f, we called these “non-ECs”. In the second profiling experiment (e.g. shown in Fig 1d) we performed a more detailed cluster analysis focusing on only GFP<sup>+</sup> cells. In this second experiment, 19% of GFP<sup>+</sup> cells were identified as “non-EC cells” based on the presence of markers for stem cells, transit amplifying cells (TACs), immature enterocytes, mature enterocytes, colonocytes, T lymphocytes and mucosal mast cells (see Fig 1d and Suppl. Fig 1g). Similar to the first profiling dataset, many of the GFP<sup>+</sup> “non-EC cells” in the second dataset express Tph1, Chga, and Chgb, generally at lower levels than the “EC cells” (Suppl. Fig1i). It is possible that the stem cell and transit amplifying cell clusters are cells that are differentiating into EC cells. However, given that they have not fully committed to the lineage yet, we do not consider it appropriate to classify them as “EC cells”. With regards to the other “non-EC” clusters, we do not think that the expression of EC cell marker genes (Tph1, Chga, and Chgb) is evidence enough to call them genuine “EC cells” given the concurrent expression of markers of other lineages (e.g. enterocyte and mast cell markers Suppl. Fig 1g). The expression of Tph1 in murine mast cells is known, however the expression in enterocytes is unexpected and could be a result of imperfect/incomplete differentiation. Since the ileum was not included in the second profiling experiment we do not think the GFP<sup>+</sup> “non-EC cells” are an artifact from the ileum. 

      We have made some adjustments in the first section of the results to clarify some thoughts on this matter:

      “It is possible that some GFP is expressed in cells that have not yet fully committed to the EC lineage, or that there is some expression in cells outside this lineage, for example, in mast cells. Given the small sample size, we did not further investigate these cells in this dataset. In Supplementary Figures 1 d and f we refer to the GFP<sup>+</sup> cells that clustered with the GFP<sup>-</sup> cells as “non-EC cells”.”

      “It is possible that the stem cell and transit amplifying cell clusters include cells that are in the process of differentiating into EC cells. However, given that they have not fully committed to the lineage, we do not consider it appropriate to classify them as “EC cells” for the purposes of analyzing EC cell types in this study.”

      (2) The authors state: "Notably, OSR2 and HOXB13 were restricted to the ileum and rectum respectively in humans (Fig. 1f)." - the statement regarding OSR2 seems too strong, given that only the ileal part of the human small intestine was examined and that there is a small signal in the proximal colon in Figure 1f.

      Thanks, we have made the following amendment:

      "Notably, OSR2 and HOXB13 were preferentially enriched in the ileum and rectum respectively in these human samples (Fig. 1f)."

      (3) Please clarify Suppl Fig2g/h labelling as villus and crypt enrichment ("...enrichment in villus clusters (g) or crypt clusters (h)."), when enrichment for some genes in cluster 4 is shown in both g and h. Why was duodenal cluster 6 excluded from this subset of data?

      We suspect (although have not proven) that cluster 4 is at a later stage in maturation/migration than cluster, as indicated by a somewhat ‘middle ground’ level of Sct expression, and generally being ‘in between’ the villus clusters and cluster 5 in expression levels of differentially expressed genes shown in Suppl Fig 2g/h. We have added the following comment to the figure legend to clarify this. We have not included cluster 6 as it is transcriptionally quite distinct from the other clusters:

      “Note that cluster 4 shares some features in common with crypt and villus clusters and may represent cells at an intermediate stage of development.”

      (4) "Using smRNA-FISH, we further mapped Olfr558 and Il12a transcripts to a separate subset of EC cells expressing Cpb2 (Fig. 4b,c), confirming the presence of two subpopulations of EC cells associated with different physiological roles in the proximal colon." - Claiming populations with different physiological functionality seems a strong statement given the relatively weak Cpb2 signals observed and that mRNA detection necessarily is a transcriptomic time limited snap-shot. Please reformulate.

      We have made the following revision:

      “Using smRNA-FISH, we further mapped Olfr558 and Il12a transcripts to a separate subset of EC cells expressing Cpb2 (Fig. 4b,c), supporting the idea that there are subpopulations of EC cells in the proximal colon with gene transcripts associated with different physiological roles.”

      (5) What are the white signals in the overlay in Fig5a, given that the Piezo1 probe (white) apparently did not give any staining by itself? Please consider a positive control for the Piezo1 probe.

      The white signals in the overlay are Piezo1 staining that we do observe at what we consider background levels (also visible in the single-channel image).

      (6) "Systematic administration of DT led to lethality in the Piezo2-DTR mice within 12 hours, but not in the Rosa26LSL-DTR or Piezo2-cre mice (data not shown), likely due to the essential function of Piezo2 in respiration" - presumably this should be corrected to "Systemic administration ...".

      Thanks, this has been corrected to "Systemic administration ...".

      (7) "Although gastric emptying (GE) was not affected in the Piezo2-DTR animals after DT treatment, small intestine transit (SIT) time, a measurement to assess the motility of small intestine, presented a small but statistically significant slowdown in the former group (Fig. 6g,h), suggesting that some Piezo2+ cells in the small intestine were depleted." - alternatively there could, of course, be a slowing of SIT in response to slower colonic transit independent of small intestinal epithelial Piezo2 or 5HT - to me this seems more likely given that even proximal colonic cells are spared in Fig6c and this should be discussed.

      Thanks, that is a good point. We have made an amendment, which is shown in response to reviewer 1.

      (8) In the context of the Villin-Cre experiments it should be discussed that other colonic EECs although express Piezo2, which might contribute to the observed phenotypes.

      In our study, 97.7% of Piezo2+ cells in the distal colon had detectable Tph1 expression, suggesting that there is not a significant degree of overlap with other EEC types.

      (9) MC4R is several times referred to as a nutrient-sensing moeity (e.g. in the discussion: "...and receptors associated with nutrient sensing (Casr and Mc4r), ...") - whilst the melanocortin system is important for nutrient homeostasis, MC4R is itself not a "nutrient sensor", a term usually reserved for the detection of macronutrients, such as amino acids, fatty acids, and monosaccharides; please reformulate. 

      We have amended this to “nutrient sensing and homeostasis”.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The objective of this study was to infer the population dynamics (rates of differentiation, division, and loss) and lineage relationships of clonally expanding NK cell subsets during an acute immune response.

      Strengths:

      A rich dataset and thorough analysis of a particular class of stochastic models.

      Weaknesses:

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these, there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and time points) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.

      We thank the reviewer for the insightful comment. We are currently in the process of developing alternate models based on the above comment and the references (Bresser Nat Imm 2022 and Heinzel Nat Imm 2017). We plan to include the results from the analysis in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Wethington et al. investigated the mechanistic principles underlying antigen-specific proliferation and memory formation in mouse natural killer (NK) cells following exposure to mouse cytomegalovirus (MCMV), a phenomenon predominantly associated with CD8+ T cells. Using a rigorous stochastic modeling approach, the authors aimed to develop a quantitative model of NK cell clonal dynamics during MCMV infection.

      Initially, they proposed a two-state linear model to explain the composition of NK cell clones originating from a single immature Ly49+CD27+ NK cell at 8 days post-infection (dpi). Through stochastic simulations and analytical investigations, they demonstrated that a variant of the two-state model incorporating NK cell death could explain the observed negative correlation between NK clone sizes at 8 dpi and the percentage of immature (CD27+) NK cells (Page 8, Figure 1e, Supplementary Text 1). However, this two-state model failed to accurately reproduce the first (mean) and second (variance and covariance) moments of the measured CD27+ and CD27- NK cell populations within clones at 8 dpi (Figure 1g).

      To address this limitation, the authors increased the model's complexity by introducing an intermediate maturation state, resulting in a three-stage model with the transition scheme: CD27+Ly6C- → CD27-Ly6C- → CD27-Ly6C+. This three-stage model quantitatively fits the first and second moments under two key constraints: (i) immature CD27+ NK cells exhibit faster proliferation than CD27- NK cells, and (ii) there is a negative correlation (upper bound: -0.2) between clone size and the fraction of CD27+ cells. The model predicted a high proliferation rate for the intermediate stage and a high death rate for the mature CD27-Ly6C+ cells.

      Using NK cell reporter mice data from Adams et al. (2021), which tracked CD27+/- cell population dynamics following tamoxifen treatment, the authors validated the three-stage model. This dataset allowed discrimination between NK cells originating from the bone marrow and those pre-existing in peripheral blood at the onset of infection. To test the prediction that mature CD27- NK cells have a higher death rate, the authors measured Ly49H+ NK cell viability in the mice spleen at different time points post-MCMV infection. Experimental data confirmed that mature (CD27-) NK cells exhibited lower viability compared to immature (CD27+) NK cells during the expansion phase (days 4-8 post-infection).

      Further mathematical analyses using a variant of the three-stage model supported the hypothesis that the higher death rate of mature CD27- cells contributes to a larger proportion of CD27- cells in the dead cell compartment, as introduced in the new variant model.

      Altogether, the authors proposed a three-stage quantitative model of antigen-specific expansion and maturation of naïve Ly49H+ NK cells in mice. This model delineates a maturation trajectory: (i) CD27+Ly6C- (immature) → (ii) CD27-Ly6C- (mature I) → (iii) CD27-Ly6C+ (mature II). The findings highlight the highly proliferative nature of the mature I (CD27-Ly6C-) phenotype and the increased cell death rate characteristic of the mature II (CD27-Ly6C+) phenotype.

      Strengths:

      By designing models capable of explaining correlations, first and second moments, and employing analytical investigations, stochastic simulations, and model selection, the authors identified the key processes underlying antigen-specific expansion and maturation of NK cells. This model distinguishes the processes of antigen-specific expansion, contraction, and memory formation in NK cells from those observed in CD8+ T cells. Understanding these differences is crucial not only for elucidating the distinct biology of NK cells compared to CD8+ T cells but also for advancing the development of NK cell therapies currently under investigation.

      Weaknesses:

      The conclusions of this paper are largely supported by the available data. However, a comparative analysis of model predictions with more recent works in the field would be desirable. Moreover, certain aspects of the simulations, parameter inference, and modeling require further clarification and expansion, as outlined below:

      (1) Initial Conditions and Grassmann Data: The Grassmann data is used solely as a constraint, while the simulated values of CD27+/CD27- cells could have been directly fitted to the Grassmann data, which assumes a 1:1 ratio of CD27+/CD27- at t = 0. This approach would allow for an alternative initial condition rather than starting from a single CD27+ cell, potentially improving model applicability.

      We thank the reviewer for this comment. We are working on performing the above analysis and plan to include results from the analysis in the revised manuscript.

      (2) Correlation Coefficients in the Three-State Model: Although the parameter scan of the three-state model (Figure 2) demonstrates the potential for achieving negative correlations between colony size and the fraction of CD27+ cells, the authors did not present the calculated correlation coefficients using the estimated parameter values from fitting the three-state model to the data. Including these simulations would provide additional insight into the parameter space that supports negative correlations and further validate the model.

      We will include the above calculation in the revised manuscript.

      (3) Viability Dynamics and Adaptive Response: The authors measured the time evolution of CD27+/- dynamics and viability over 30 days post-infection (Figure 4). It would be valuable to test whether the three-state model can reproduce the adaptive response of CD27- cells to MCMV infection, particularly the observed drop in CD27- viability at 5 dpi (prior to the 8 dpi used in the study) and its subsequent rebound at 8 dpi. Reproducing this aspect of the experiment is critical to determine whether the model can simultaneously explain viability dynamics and moment dynamics. Furthermore, this analysis could enable sensitivity analysis of CD27- viability with respect to various model parameters.

      We will include some discussion of potential mechanisms of cell viability in this experiment.

    2. eLife Assessment

      This study combines mathematical models and experimental data to analyse the emergence of heterogeneity within clonal NK cell responses during antigen-specific cell expansion. Although it comprises different experimental data and tests different theoretical hypotheses, the main claims remain incomplete and would benefit from the consideration of several previous findings about clonal immune responses and corresponding mathematical approaches. The study presents valuable findings with the potential to provide key insights about NK cell development if proposed claims could be confirmed by additional analyses.

    3. Reviewer #1 (Public review):

      Summary:

      The objective of this study was to infer the population dynamics (rates of differentiation, division, and loss) and lineage relationships of clonally expanding NK cell subsets during an acute immune response.

      Strengths:

      A rich dataset and thorough analysis of a particular class of stochastic models.

      Weaknesses:

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these, there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and time points) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.

    4. Reviewer #2 (Public review):

      Summary:

      Wethington et al. investigated the mechanistic principles underlying antigen-specific proliferation and memory formation in mouse natural killer (NK) cells following exposure to mouse cytomegalovirus (MCMV), a phenomenon predominantly associated with CD8+ T cells. Using a rigorous stochastic modeling approach, the authors aimed to develop a quantitative model of NK cell clonal dynamics during MCMV infection.

      Initially, they proposed a two-state linear model to explain the composition of NK cell clones originating from a single immature Ly49+CD27+ NK cell at 8 days post-infection (dpi). Through stochastic simulations and analytical investigations, they demonstrated that a variant of the two-state model incorporating NK cell death could explain the observed negative correlation between NK clone sizes at 8 dpi and the percentage of immature (CD27+) NK cells (Page 8, Figure 1e, Supplementary Text 1). However, this two-state model failed to accurately reproduce the first (mean) and second (variance and covariance) moments of the measured CD27+ and CD27- NK cell populations within clones at 8 dpi (Figure 1g).

      To address this limitation, the authors increased the model's complexity by introducing an intermediate maturation state, resulting in a three-stage model with the transition scheme: CD27+Ly6C- → CD27-Ly6C- → CD27-Ly6C+. This three-stage model quantitatively fits the first and second moments under two key constraints: (i) immature CD27+ NK cells exhibit faster proliferation than CD27- NK cells, and (ii) there is a negative correlation (upper bound: -0.2) between clone size and the fraction of CD27+ cells. The model predicted a high proliferation rate for the intermediate stage and a high death rate for the mature CD27-Ly6C+ cells.

      Using NK cell reporter mice data from Adams et al. (2021), which tracked CD27+/- cell population dynamics following tamoxifen treatment, the authors validated the three-stage model. This dataset allowed discrimination between NK cells originating from the bone marrow and those pre-existing in peripheral blood at the onset of infection. To test the prediction that mature CD27- NK cells have a higher death rate, the authors measured Ly49H+ NK cell viability in the mice spleen at different time points post-MCMV infection. Experimental data confirmed that mature (CD27-) NK cells exhibited lower viability compared to immature (CD27+) NK cells during the expansion phase (days 4-8 post-infection).

      Further mathematical analyses using a variant of the three-stage model supported the hypothesis that the higher death rate of mature CD27- cells contributes to a larger proportion of CD27- cells in the dead cell compartment, as introduced in the new variant model.

      Altogether, the authors proposed a three-stage quantitative model of antigen-specific expansion and maturation of naïve Ly49H+ NK cells in mice. This model delineates a maturation trajectory: (i) CD27+Ly6C- (immature) → (ii) CD27-Ly6C- (mature I) → (iii) CD27-Ly6C+ (mature II). The findings highlight the highly proliferative nature of the mature I (CD27-Ly6C-) phenotype and the increased cell death rate characteristic of the mature II (CD27-Ly6C+) phenotype.

      Strengths:

      By designing models capable of explaining correlations, first and second moments, and employing analytical investigations, stochastic simulations, and model selection, the authors identified the key processes underlying antigen-specific expansion and maturation of NK cells. This model distinguishes the processes of antigen-specific expansion, contraction, and memory formation in NK cells from those observed in CD8+ T cells. Understanding these differences is crucial not only for elucidating the distinct biology of NK cells compared to CD8+ T cells but also for advancing the development of NK cell therapies currently under investigation.

      Weaknesses:

      The conclusions of this paper are largely supported by the available data. However, a comparative analysis of model predictions with more recent works in the field would be desirable. Moreover, certain aspects of the simulations, parameter inference, and modeling require further clarification and expansion, as outlined below:

      (1) Initial Conditions and Grassmann Data: The Grassmann data is used solely as a constraint, while the simulated values of CD27+/CD27- cells could have been directly fitted to the Grassmann data, which assumes a 1:1 ratio of CD27+/CD27- at t = 0. This approach would allow for an alternative initial condition rather than starting from a single CD27+ cell, potentially improving model applicability.

      (2) Correlation Coefficients in the Three-State Model: Although the parameter scan of the three-state model (Figure 2) demonstrates the potential for achieving negative correlations between colony size and the fraction of CD27+ cells, the authors did not present the calculated correlation coefficients using the estimated parameter values from fitting the three-state model to the data. Including these simulations would provide additional insight into the parameter space that supports negative correlations and further validate the model.

      (3) Viability Dynamics and Adaptive Response: The authors measured the time evolution of CD27+/- dynamics and viability over 30 days post-infection (Figure 4). It would be valuable to test whether the three-state model can reproduce the adaptive response of CD27- cells to MCMV infection, particularly the observed drop in CD27- viability at 5 dpi (prior to the 8 dpi used in the study) and its subsequent rebound at 8 dpi. Reproducing this aspect of the experiment is critical to determine whether the model can simultaneously explain viability dynamics and moment dynamics. Furthermore, this analysis could enable sensitivity analysis of CD27- viability with respect to various model parameters.

    1. eLife Assessment

      The study by Ma et al. provides fundamental findings and compelling evidence that Pyrotinib after trastuzumab-based adjuvant therapy in patients with HER2-positive breast cancer (PERSIST): A multicenter phase II trial. The findings enhance the understanding of HER2-positive breast cancer. The claims are fully supported by the types of experiments that were performed.

    2. Reviewer #1 (Public review):

      Summary:

      This study introduces a novel therapeutic strategy for patients with high-risk HER2-positive breast cancer and demonstrates that the incorporation of pyrotinib into adjuvant trastuzumab therapy can improve invasive disease-free survival.

      Strengths:

      The study features robust logic and high-quality data. Data from 141 patients across 23 centers were analyzed, thereby effectively mitigating regional biases and endowing the research findings with high applicability.

      Weaknesses:

      (1) Introduction and Discussion: Update the literature regarding the efficacy of pyrotinib combined with trastuzumab in treating HER2-positive advanced breast cancer.<br /> (2) Did all the data have a normal distribution? Expand the description of statistical analysis.<br /> (3) The novelty and innovative potential of your manuscript compared to the published literature should be described in more detail in the abstract and discussion section.<br /> (4) Figure legend should provide a bit more detail about what readers should focus on.<br /> (5) P-values should be clarified for the analysis.<br /> (6) The order (A, B, and C) in Figure 3 should be labeled in the upper left corner of the Figure.

      Comments on revisions:

      The authors responded well to my questions.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study introduces a novel therapeutic strategy for patients with high-risk HER2-positive breast cancer and demonstrates that the incorporation of pyrotinib into adjuvant trastuzumab therapy can improve invasive disease-free survival.

      Strengths:

      The study features robust logic and high-quality data. Data from 141 patients across 23 centers were analyzed, thereby effectively mitigating regional biases and endowing the research findings with high applicability.

      Weaknesses:

      (1) Introduction and Discussion: Update the literature regarding the efficacy of pyrotinib combined with trastuzumab in treating HER2-positive advanced breast cancer.

      Thank you for this helpful suggestion. The literature regarding the efficacy of pyrotinib combined with trastuzumab in treating HER2-positive advanced breast cancer referenced in our manuscript was the PHILA study, but we mistakenly cited its corrections (reference 14). We revised this reference as suggested.

      Changes in the text: Page 6, line 347-353.

      (2) Did all the data have a normal distribution? Expand the description of statistical analysis.

      As the sample size increases, the sampling distribution of the mean follows a normal distribution even when the underlying distribution of the original variable is non-normal, allowing the use of a normal distribution to calculate their confidence interval. We believe it is unnecessary to specifically describe whether the data followed a normal distribution in this study. Therefore, we did not revise the statistical section.

      (3) The novelty and innovative potential of your manuscript compared to the published literature should be described in more detail in the abstract and discussion section.

      Thank you for your suggestion. The word count for abstracts recommended by eLife is around 250 words. Therefore, we did not compare the present study with published literature in detail in the abstract, as this might exceed the recommended word limit. We revised the discussion section to provide a more detailed comparison between published literature and our study, and to analyze the novelty of our findings accordingly.

      Changes in the text: Page 11, line 177-180.

      (4) Figure legend should provide a bit more detail about what readers should focus on.

      Thank you for this suggestion. We did not revise the figure legend of Figure 1, as it provides a common description. For the figure legend of Figure 2, we added the method used to estimate the invasive disease-free survival curve. For the figure legend of Figure 3, we added more details regarding methods and numbers of patients in different subgroups.

      Changes in the text: Page 7, line 463-472.

      (5) P-values should be clarified for the analysis.

      Thank you for this comment. All subgroup analyses were post-hoc and lacked predefined hypotheses. Kaplan-Meier curves were used to present the subgroup results with the aim of performing descriptive statistics rather than inferential statistics. Therefore, we did not calculate their p-values.

      (6) The order (A, B, and C) in Figure 3 should be labeled in the upper left corner of the Figure.

      Thanks for this comment. We revised Figure 3 accordingly.

      Changes in the text: Figure 3.

      Reviewer #2 (Public review):

      In this manuscript, Cao et al. evaluated the efficacy and safety of 12 months pyrotinib after trastuzumab-based adjuvant therapy in patients with high-risk, HER2-positive early or locally advanced breast cancer. Notably, the 2-year iDFS rate reached 94.59% (95% CI: 88.97-97.38) in all patients, and 94.90% (95% CI: 86.97-98.06) in patients who completed 1-year treatment of pyrotinib. This is an interesting and uplifting results, given that in ExteNET study, the 2-year iDFS rate was 93.9% (95% CI 92·4-95·2) in the 1-year neratinib group, and the 5-year iDFS survival was 90.2%, and 1-year treatment of neratinib in ExteNET study did not translate into OS benefit after 8-year follow-up. In this case, readers will be eagerly anticipating the long-term follow-up results of the current PERSIST study, as well as the results of the phase III clinical trial (NCT03980054).

      I have the following comments:

      (1) The introduction of the differences between pyrotinib and neratinib in terms of mechanism, efficacy, resistance, etc. is supposed to be included in the text so that authors could better highlight the clinical significance of the current trial.

      Thanks for this comment.

      In terms of mechanism, pyrotinib and neratinib are both irreversible pan-HER tyrosine kinase inhibitors that target HER1, HER2 and HER4 by covalently binding to ATP binding sites. Overall, the similarities between them far outweigh the differences. This is the reason why we referenced the ExteNET study, which used neratinib as extended adjuvant therapy, for the sample size calculation.

      Regarding efficacy, currently, no head-to-head studies comparing efficacy of pyrotinib and neratinib have been reported, and the comparison of the efficacy between them using historical data from different studies have inevitable bias due to differences in treatment regimens, study populations, assessment criteria, etc.

      Regarding resistance, only a few studies with small sample size and case reports have investigated their mechanisms of resistance, and the underlying mechanisms have not been fully understood.

      Collectively, we believe that the similarities in the mechanisms of these two drugs far outweigh their differences, and their efficacy and resistance cannot be reasonably compared. Moreover, the sample size calculation was conducted based on the premise that the two drugs are similar. After careful consideration, we believe that overanalyzing the differences between neratinib and pyrotinib would shift the focus of this manuscript. Therefore, we did not discuss their differences in the article.

      (2) Please make sure that a total of 141 patients were enrolled in the study, 38 patients had a treatment duration of less than or equal to 6 months, and a total of 92 and 31 patients completed 1-year and 6-month treatment of extended adjuvant pyrotinib, respectively, which means 7 patients had a treatment duration of fewer than 6 months.

      Thank you for raising this relevant question. There were 141 patients enrolled in the study and received study treatment, and a total of 92 and 31 patients completed 1-year and 6-month treatment of extended adjuvant pyrotinib. Of the remaining 18 patients, 16 patients had a treatment duration of fewer than 6 months, and 2 patients had a treatment duration longer than 6 months but less than 1 year.

      (3) The previous surgery history should be provided, and how many patients received lumpectomy, and mastectomy.

      Thank you for your suggestion. All patients in the present study underwent breast cancer surgery. Unfortunately, we did not collect data on the specific types of surgeries performed.

      Recommendations for the authors:

      Reviewing Editor:

      I have carefully reviewed the content and findings of your study, and while I recognize the potential impact of your research, there are several critical aspects that need to be addressed to fully appreciate the contribution of your work.

      Significance of Findings:

      Your study provides valuable insights into the efficacy and safety of pyrotinib as an extended adjuvant therapy following trastuzumab-based treatment in patients with high-risk HER2-positive breast cancer. The 2-year invasive disease-free survival (iDFS) rate of 94.59% is notably high and suggests that pyrotinib could be a promising option for patients who have completed trastuzumab therapy. This is particularly significant given the unmet need for effective therapies that can extend disease-free survival in this patient population.

      Strength of Evidence:

      The strength of the evidence presented is supported by the multicenter phase II trial design, which included a substantial number of patients across 23 centers in China. The rigorous methodology, including the use of the Kaplan-Meier method for estimating iDFS and the application of the Brookmeyer-Crowley method for confidence intervals, adds to the credibility of your findings. However, the single-arm study design without a control group limits the ability to draw definitive conclusions about the comparative effectiveness of pyrotinib.

      In conclusion, your study presents intriguing findings that contribute to the field of breast cancer therapy. However, the current evidence, while suggestive of pyrotinib's potential, requires further validation in controlled trials to confirm its efficacy and optimal use in clinical practice. I encourage you to address the issues raised and consider resubmitting a revised version of your work.

      Thank you for your comments. We acknowledge the limitation of our single-arm study design without a control group and agree that it restricts definitive conclusions about the comparative effectiveness of pyrotinib. This limitation was noted in our manuscript. Furthermore, we have revised our manuscript in response to the issues raised by the reviewers.

    1. eLife Assessment

      Through cellular, developmental, and physiological analysis, this valuable study identifies a gene that regulates the relative growth of roots and shoots under salt stress. The holistic approach taken provides convincing evidence that this member of a larger tandemly duplicated gene family together with an upstream regulator contributes to salt tolerance. The manuscript will be of interest to plant biologists studying mechanisms of abiotic stress tolerance and gene family evolution.

    2. Reviewer #1 (Public review):

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root: shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Comments on latest version:

      The authors have largely addressed my concerns and comments. I have no additional comments for this round of review.

    3. Reviewer #2 (Public review):

      Summary:

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study which demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      Comments on latest version:

      All of the issues that I raised in previous reviews have been addressed by the authors. That said, there are several points that I see have come up in subsequent reviews that remain unresolved.

      In response to Reviewer 1, comment 2, regarding changes in expression differences, the authors are misinterpreting simple statistical results. They say that they performed Tukey tests for differences of means, finding, for example, that two means have the same group assignments (in this case, both "c,d") but then argue that "we still observed a clear reduction in WRKY75 transcript abundance." This is not how statistical tests work - we cannot perform a formal test for means and then just do an eyeball test. They also misinterpret the result in which one mean is assigned "b,c,d" results and a second "c,d" - these are statistically overlapping means.

      Having said this, I do think that the subtle differences in expression between these different alleles is not critical to the central message of the study. It can be difficult to recapitulate results between labs, much less between different synthetic alleles. I think, in this case, we can let readers decide for themselves whether the reported differences - or lack thereof - is important for follow-up work.

    4. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aim to assess the effect of salt stress on root:shoot ratio, identify the underlying genetic mechanisms, and evaluate their contribution to salt tolerance. To this end, the authors systematically quantified natural variations in salt-induced changes in root:shoot ratio. This innovative approach considers the coordination of root and shoot growth rather than exploring biomass and the development of each organ separately. Using this approach, the authors identified a gene cluster encoding eight paralog genes with a domain-of-unknown-function 247 (DUF247), with the majority of SNPs clustering into SR3G (At3g50160). In the manuscript, the authors utilized an integrative approach that includes genomic, genetic, evolutionary, histological, and physiological assays to functionally assess the contribution of their genes of interest to salt tolerance and root development.

      Comments on revisions:

      As the authors correctly noted, variations across samples, genotypes, or experiments make achieving statistical significance challenging. Should the authors choose to emphasize trends across experiments to draw biological conclusions, careful revisions of the text, including titles and figure legends, will be necessary to address some of the inconsistencies between figures (see examples below). However, I would caution that this approach may dilute the overall impact of the work on SR3G function and regulation. Therefore, I strongly recommend pursuing additional experimental evidence wherever possible to strengthen the conclusions.

      (1) Given the phenotypic differences shown in Figures S17A-B, 10A-C, and 6A, the statement that "SR3G does not play a role in plant development under non-stress conditions" (lines 680-681) requires revision to better reflect the observed data.

      Thank you to the reviewer for the comment. We appreciate the acknowledgment that variations among experiments are inherent to biological studies. Figures 6A and S17 represent the same experiment, which initially indicated a phenotype for the sr3g mutant under salt stress. To ensure that growth changes were specifically normalized for stress conditions, we calculated the Stress Tolerance Index (Fig. 6B). In Figure 10, we repeated the experiment including all five genotypes, which supported our original observation that the sr3g mutant exhibited a trend toward reduced lateral root number under 75 mM NaCl compared to Col-0, although this difference was not significant (Fig. 10B). Additionally, we confirmed that the wrky75 mutant showed a significant reduction in main root growth under salt stress compared to Col-0, consistent with findings reported in The Plant Cell by Lu et al. 2023. For both main root length and lateral root number, we demonstrated that the double mutants of wrky75/sr3g displayed growth comparable to wild-type Col-0. This result suggests that the sr3g mutation compensates for the salt sensitivity of the wrky75 mutant.

      We completely agree with the reviewer that there is a variation in our results regarding the sr3g phenotype under control conditions, as presented in Fig. 6A/Fig. S17 and Fig. 10A-C. In Fig. 6A/Fig. S17, we did not observe any consistent trends in main root or lateral root length for the sr3g mutant compared to Col-0 under control conditions. However, in Fig. 10A-C, we observed a significant reduction in main root length, lateral root number, and lateral root length for the sr3g mutant under control conditions. We believe this may align with SR3G’s role as a negative regulator of salt stress responses. While loss of this gene benefits plants in coping with salt stress, it might negatively impact overall plant growth under non-stress conditions. This interpretation is further supported by our findings on the root suberization pattern in sr3g mutants under control conditions (Fig. 8B), where increased suberization in root sections 1 to 3, compared to Col-0, could inhibit root growth. While SR3G's role in overall plant fitness is intriguing, it is beyond the scope of this study. We cannot rule out the possibility that SR3G contributes positively to plant growth, particularly root growth. That said, we observed no differences in shoot growth between Col-0 and the sr3g mutant under control conditions (Fig. 7). Additionally, we calculated the Stress Tolerance Index for all aspects of root growth shown in Fig. 10 and presented it in Fig. S25.

      To address the reviewer request on rephrasing the lines 680-681 from"SR3G does not play a role in plant development under non-stress conditions" (lines 680-681) statement, this statement is found in lines 652-653 and corresponds to Fig. 7, where we evaluated rosette growth in the WT and sr3g mutant under both control and salt stress conditions. We did not observe any significant differences or even trends between the two genotypes under control conditions, confirming the accuracy of the statement. To clarify further, we have added “SR3G does not play a role in rosette growth and development under non-stress conditions”.

      (2) I agree with the authors that detecting expression differences in lowly expressed genes can be challenging. However, as demonstrated in the reference provided (Lu et al., 2023), a significant reduction in WRKY75 expression is observed in T-DNA insertion mutant alleles of WRKY75. In contrast, Fig. 9B in the current manuscript shows no reduction in WRKY75 expression in the two mutant alleles selected by the authors, which suggests that these alleles cannot be classified as loss-of-function mutants (line 745). Additionally, the authors note that the wrky75 mutant exhibits reduced main root length under salt stress, consistent with the phenotype reported by Lu et al. (2023). However, other phenotypic discrepancies exist between the two studies. For example, 1) Lu et al. (2023) report that w¬rky75 root length is comparable to WT under control conditions, whereas the current manuscript shows that wrky75 root growth is significantly lower than WT; 2) under salt stress, Lu et al. (2023) show that wrky75 accumulates higher levels of Na+, whereas the current study finds Na+ levels in wrky75 indistinguishable from WT. To confirm the loss of WRKY75 function in these T-DNA insertion alleles the authors should provide additional evidence (e.g., Western blot analysis).

      We sincerely appreciate the reviewer acknowledging the challenge of detecting expression differences in lowly expressed genes, such as transcription factors. Transcription factors are typically expressed at lower levels compared to structural or enzymatic proteins, as they function as regulators where small quantities can have substantial effects on downstream gene expression.

      That said, we respectfully disagree with the reviewer’s interpretation that there is no reduction in WRKY75 expression in the two mutant lines tested in Fig. 9C. Among the two independent alleles examined, wrky75-3 showed a clear reduction in expression compared to WT Col-0 under both control and salt stress conditions. Using the Tukey test to compare all groups, we observed distinct changes in the assigned significance letters for each case:

      Col/root/control (cd) vs wrky75-3/root/control (cd): Although the same significance letter was assigned, we still observed a clear reduction in WRKY75 transcript abundance. More importantly, the variation in expression is notably lower compared to Col-0.

      Col/shoot/control (bcd) vs wrky75-3/shoot/control (a): This is significant reduction compared to Col

      Col/root/salt (cd) vs wrky75-3/root/salt (bcd): Once again, the reduction in WRKY75 transcript levels corresponds to changes in the assigned significance letters.

      Col/shoot/salt (bc) vs wrky75-3/shoot/salt (ab): Once again, the reduction in WRKY75 transcript levels corresponds to changes in the assigned significance letters.

      To address the reviewer’s comment regarding the significant reduction in WRKY75 expression observed in T-DNA insertion mutant alleles of WRKY75 in the reference by Lu et al., 2023, we would like to draw the reviewer’s attention to the following points:

      a) Different alleles: The authors in The Plant Cell used different alleles than those used in our study, with one of their alleles targeting regions upstream of the WRKY75 gene. While we identified one of their described alleles (WRKY75-1, SALK_101367) on the T-DNA express website, which targets upstream of WRKY75, the other allele (wrky75-25) appears to have been generated through a different mechanism (possibly an RNAi line) that is not defined in the Plant Cell paper and does not appear on the T-DNA express website. The authors mentioned they have received these seeds as gifts from other labs in the acknowledgement ”We thank Prof. Hongwei Guo (Southern University of Science and Technology, China) and Prof. Diqiu Yu (Yunnan University, China) for kindly providing the WRKY75<sub>pro</sub>:GUS, 35S<sub>pro</sub>:WRKY75-GFP, wrky75-1, and wrky75-25 seeds. We thank Man-cang Zhang (Electrophysiology platform, Henan University) for performing the NMT experiment”.

      However, in our study, we selected two different T-DNAs that target the coding regions. While this may explain slight differences in the observed responses, both studies independently link WRKY75 to salt stress, regardless of the alleles used. For your reference, we have included a screenshot of the different alleles used.

      Author response image 1.

      b) Different developmental stages: They measured WRKY75 expression in 5-day-old seedlings. In our experiment, we used seedlings grown on 1/2x MS for 4 days, followed by transfer to treatment plates with or without 75 mM NaCl for one week. As a result, we analyzed older plants (12 days old) for gene expression analysis. Despite the difference in developmental stage, we were still able to observe a reduction in gene expression.

      c) Different tissues: The authors of The Plant Cell used whole seedlings for gene expression analysis, whereas we separated the roots and shoots and measured gene expression in each tissue type individually. This approach is logical, as WRKY75 is a root cell-specific transcription factor with higher expression in the roots compared to the shoots, as demonstrated in our analysis (Fig. 9C).

      Based on the reasoning above, we did work with loss-of-function mutants of WRKY75, particularly wrky75-3. To more accurately reflect the nature of the mutation, we have changed the term "loss-of-function" to "knock-down" in line 717.

      The reviewer mentioned phenotypic discrepancies between the two studies. We agree that there are some differences, particularly in the magnitude of responses or expression levels. However, despite variations in the alleles used, developmental stages, and tissue types, both studies reached the same conclusion: WRKY75 is involved in the salt stress response and acts as a positive regulator. We have discussed the differences between our study and The Plant Cell in the section above, summarizing them into three main points: different alleles, different developmental stages, and different tissue types.

      To address the reviewer’s comment regarding "Lu et al. (2023) report that wrky75 root length is comparable to WT under control conditions, whereas the current manuscript shows that wrky75 root growth is significantly lower than WT": We evaluated root growth differently than The Plant Cell study. In The Plant Cell (Fig. 5, H-J), root elongation was measured in 10-day-old plants with a single time point measurement. They transferred five-day-old wild-type, wrky75-1, wrky75-25, and WRKY75-OE plants to 1/2× MS medium supplemented with 0 mM or 125 mM NaCl for further growth and photographed them 5 days after transfer. In contrast, our study used 4-day-old seedlings, which were transferred to 1/2 MS with or without 0, 75, or 125 mM salt for additional growth (9 days). Rather than measuring root growth only at the end, we scanned the roots every other day, up to five times, to assess root growth rates. Essentially, the precision of our method is higher as we captured growth changes throughout the developmental process, compared to the approach used in The Plant Cell. We do not underestimate the significance of the work conducted by other colleagues in the field, but we also recognize that each laboratory has its own approach and specific practices. This variation in experimental setup is intrinsic to biology, and we believe it is important to study biological phenomena in different ways. Especially as the common or contrasting conclusions reached by different studies, performed by different labs and using different experimental setups are shedding more light on reproducibility and gene contribution across different conditions, which is intrinsic to phenotypic plasticity, and GxE interactions.

      The Plant Cell used a very high salt concentration, starting at 125 mM, while we were more cautious in our approach, as such a high concentration can inhibit and obscure more subtle phenotypic changes.

      To address the reviewer’s comment on "Lu et al. (2023) show that wrky75 accumulates higher levels of Na+, whereas the current study finds Na+ levels in wrky75 indistinguishable from WT," we would like to highlight the differences in the methodologies used in both studies. The Plant Cell measured Na+ accumulation in the wrky75 mutant using xylem sap (Supplemental Figure S10), which appears to be a convenient and practical approach in their laboratory. In their experiment, wild-type and wrky75 mutant plants were grown in soil for 3 weeks, watered with either a mock solution or 100 mM NaCl solution for 1 day, and then xylem sap was collected for Na+ content analysis. In contrast, our study employed a different method to measure Na+ and K+ ion content, using Inductively Coupled Plasma Atomic Emission Spectroscopy (ICP-AES) for root and shoot Na+ and K+ measurements. Additionally, we collected samples after two weeks on treatment plates and focused on the Na+/K+ ratio, which we consider more relevant than net Na+ or K+ levels, as the ratio of these ions is a critical determinant of plant salt tolerance. With this in mind, we observed a considerable non-significant increase in the Na+/K+ ratio in the shoots of the wrky75-3 mutant (assigned Tukey’s letter c) compared to the Col-0 WT (assigned Tukey’s letters abc) under 125 mM salt, suggesting that this mutant is salt-sensitive. Importantly, the Na+/K+ ratio in the double wrky75/sr3g mutants was reduced to the WT level under the same salt conditions, further indicating that the salt sensitivity of wrky75 is mitigated by the sr3g mutation.

      Based on the reasons mentioned above, we believe that conducting additional experiments, such as Western blot analysis, is unnecessary and would not contribute new insights or alter the context of our findings.

      Reviewer #2 (Public review):

      Summary:

      Salt stress is a significant and growing concern for agriculture in some parts of the world. While the effects of sodium excess have been studied in Arabidopsis and (many) crop species, most studies have focused on Na uptake, toxicity and overall effects on yield, rather than on developmental responses to excess Na, per se. The work by Ishka and colleagues aims to fill this gap.

      Working from an existing dataset that exposed a diverse panel of A. thaliana accessions to control, moderate, and severe salt stress, the authors identify candidate loci associated with altering the root:shoot ratio under salt stress. Following a series of molecular assays, they characterize a DUF247 protein which they dub SR3G, which appears to be a negative regulator of root growth under salt stress.

      Overall, this is a well-executed study which demonstrates the functional role played by a single gene in plant response to salt stress in Arabidopsis.

      Review of revised manuscript:

      The authors have addressed my point-by-point comments to my satisfaction. In the cases where they have changed their manuscript language, clarified figures, or added analyses I have no further comment. In some cases, there is a fruitful back-and-forth discussion of methodology which I think will be of interest to readers.

      I have nothing to add during this round of review. I think that the paper and associated discussion will make a nice contribution to the field.

      We sincerely appreciate the reviewer’s recognition of the significance of our work to the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Lines 518-519: The statement that other DUF247s exhibit similar expression patterns to SR3G, suggesting their responsiveness to salt stress, is not fully supported by Fig. S14. Please clarify the specific similarities (and differences) in the expression patterns of the DUF247s shown in Fig. S14, as their expression appears to be spatially and temporally diverse. Additionally, the scale is missing in Fig. S14.

      We thank the reviewer. We fixed the text and added expression scales to Figure S14.

      Line 684, Fig. 6A should be 7A.

      Thanks. It is fixed.

      Line 686, Fig. 7A should be 7B.

      Thanks. It is fixed.

      Lines 721-723: The signal quantification in Fig. 8B does not support the claim that "in section one,..., sr3g-5 showed more suberization compared to Col-0." Given the variability and noise often associated with histological dyes such as Fluorol Yellow staining, conclusions should be cautiously grounded in robust signal quantification. Additionally, please specify the number of biological replicates used in both Fig. 8B and C.

      We thank the reviewer for their comments. We believe the statement in the text accurately reflects our results presented in Figure 8B, where we stated “non-significant, but substantially higher levels of root suberization in sr3g-5 compared to Col-0 in sections one to three of the root under control condition (Fig. 8B).” Therefore, we kept the statement and have included the number of biological replicates in the figure legend.

      Lines 731-732: Please provide a more detailed explanation of how the significant changes in suberin monomer levels align with the Fluorol Yellow staining results, and clarify how these findings support the proposed negative role of SR3G in root suberization.

      Fluorol Yellow is a lipophilic dye widely used to label suberin in plant tissues, specifically in roots in this study. Given the inherent variability in histological assays, we confirmed the increase in suberization using an alternative method, Gas Chromatography–Mass Spectrometry (GC-MS). Both approaches revealed elevated suberin levels in the sr3g mutant compared to Col-0. Since the overall suberin content was higher in the mutant under both control and salt stress conditions, we proposed that SR3G acts as a negative regulator of root suberization.

      Lines 686-688 and Figure S24: The authors calculated water mass as FW-DW. A more standard approach for calculating water content is (FW-DW)/FW x 100. Please update the text or adjust the calculation accordingly. Additionally, if the goal is to test differences between WT and the mutant within each condition, a t-test would be a more appropriate statistical method.

      We thank the reviewer. We added water content % to the figure S24. We kept the statistical test as it is as we wanted to be able to observe changes across conditions and genotypes.

      Lines 633-635 states that "No significant difference was observed between sr3g-4 and Col-0 (Fig. S18), except for the Stress Tolerance Index (STI) calculated using growth rates of lateral root length and number." However, based on the Figure S18 legend and statistical analysis (i.e., ns), it appears that the sr3g-4 mutant shows no alterations in root system architecture compared to Col-0. Please revise the text to accurately reflect the results of the statistical analysis.

      We thank the reviewer. We now fixed the text to reflect the result.

      Lines 698-707: The statistical analysis does not support the reported differences in the Na+/K+ ratio for the single and double mutants of sr3g-5 and wrky75-3 (Fig. 10D, where levels connected by the same letters indicate they are not significantly different). Furthermore, the conclusion that "the SR3G mutation indeed compensated for the increased Na+ accumulation observed in the wrky75 mutant under salt stress" is also based on non-significant differences (Fig. S25B). Please revise the text to accurately reflect the results of the statistical analysis. Additionally, since each mutant is compared to the WT, I recommend using Dunnett's test for statistical analysis.

      We thank the reviewer for their feedback. We have carefully revised the text to better support our findings. As previously mentioned, variations among samples are evident and are well-reflected across all our datasets. We have presented all data and focused on identifying trends within our samples to guide interpretation.

      We observed that the SR3G mutation effectively compensated for the increased Na+ accumulation observed in the wrky75 mutant under salt stress. A closer examination of the shoot Na+/K+ ratio under 125 mM salt shows that the wrky75 single mutant has a higher Na+/K+ ratio (indicated by the letter "c") compared to Col-0 (indicated by "abc") and the two double mutants (also indicated by "abc"). Therefore, we have retained the statistical analysis as originally conducted, and maintain our conclusions as is.

      Figure 6: data in panel C present the Na/K ratio, not Na+ content. Based on the statistical analysis of root Na+ levels presented in Fig. S17C, there is no significant difference between sr3g-5 and WT. Please update the title of Fig. 6. In addition, in panel A, the title of the Y-axis and figure legend should be "Lateral root growth rate" without the word length, and in panel C, the statistical analysis is missing.

      We thank the reviewer. We updated Fig. 6 title and fixed the Y-axis in panel A, and added statistical letters to panel C. Legend was updated to reflect the changes.

      Figure 7: Please clearly label the time points where significant differences between genotypes are observed for both early and late salt treatments. Was there a significant difference recorded between WT and sr3g-5 on day 0 under early salt stress? Such differences may arise from initial variations in plant size within this experiment, as indicated by Fig. 7B, where significant differences in rosette area are evident starting from day 0. Additionally, please indicate the statistical analysis in panel E.

      We thank the reviewer for this suggestion. We updated the figure with a statistical test added to the panel E. Although the difference between sr3g mutant and Col-0 is indeed significant in its growth rate at day 0, we would like to draw the attention of the reviewer that this growth rate was calculated over the 24 hours after adding salt stress. Therefore, this difference in growth rate is related to exposure to salt stress. Moreover, the growth rate between Col-0 and sr3g mutant does not differ in two other treatments (Control and Late Salt Stress) further supporting the conclusion that sr3g is affecting rosette size and growth rate only under early salt stress conditions.

      We have also added the Salt Tolerance Index calculation to Figure S24 as additional evidence, controlling for potential differences in size between Col-0 and sr3g mutant.

      Figure S17: statistical analysis is not indicated in panels A, B, and D.

      We thank the reviewer for spotting that. We updated the figure with a statistical test.

      Figures S21-23: The quality of these figures is insufficient, hindering the ability to effectively interpret the authors' results and main message. Furthermore, a Dunnett's test, rather than a t-test, is the appropriate statistical method for this analysis.

      We thank the reviewer for this observation. We have now added a high resolution figures for all supplemental figures, which should increase the resolution of the figures. As we are comparing all of the genotypes to Col-0 one-by-one - the results of individual t-tests are sufficient for this analysis.

    1. eLife Assessment

      The study conducted by Fang et al. offers significant and fundamental insights, notably enhancing our understanding of angiogenesis. While some of the claims are supported by convincing experimental approaches, others lack sufficient validation. Additionally, there are instances where critical experimental controls appear to be absent.

    2. Reviewer #1 (Public review):

      Summary:

      In this study by Fang et al., the authors show how STAMBPL1 promotes TNBC angiogenesis via a feed-forward GRHL3/HIF1a/VEGFA axis. They demonstrate that STAMBPL1 interacts with FOXO1, define the required domains in each protein, and illustrate that this interaction facilitates FOXO1 transcriptional factor activity, which then activates GRHL3/HIF1a/VEGFA signaling. Lastly, they show that the combination of VEGFR and FOXO1 inhibitors can synergistically suppress STAMBPL1-overexpressing TNBC.

      Strengths:

      The manuscript is clearly written, and the results are well explained. The observation that STAMBPL1 mediates GRHL3 transcription through its interaction with FOXO1 is novel. The findings also have important translational potential.

    3. Reviewer #2 (Public review):

      Summary:

      In their manuscript, Fang and colleagues make a notable contribution to the field of oncology, particularly in advancing our understanding of triple-negative breast cancer (TNBC). The research delineates the role of STAMBPL1 in promoting angiogenesis in TNBC through its interaction with FOXO1 and the subsequent activation of the GRHL3/HIF1A/VEGFA axis. The evidence presented is robust, with a combination of in vitro experiments, RNA sequencing, and in vivo studies providing a comprehensive view of the molecular mechanisms at play. The strength of the evidence is anchored in the systematic approach and the utilization of multiple methodologies to substantiate the findings.

      Strengths:

      The manuscript presents a methodologically robust framework, incorporating RNA-sequencing, chromatin immunoprecipitation (ChIP) assays, and a suite of in vitro and in vivo model systems, which collectively substantiate the claims regarding the pro-angiogenic role of STAMBPL1 in TNBC. The employment of multiple cellular models, conditioned media to assess HUVEC functional responses, and xenograft tumor models in murine hosts offers a comprehensive evaluation of STAMBPL1's impact on angiogenic processes.A salient strength of this work is the identification of GRHL3 as a transcriptional target of STAMBPL1 and the demonstration of a physical interaction between STAMBPL1 and FOXO1, which modulates GRHL3-driven HIF1A transcription. The study further suggests a potential therapeutic strategy by revealing the synergistic inhibitory effects of combined VEGFR and FOXO1 inhibitor treatment on TNBC tumor growth.

      Weaknesses:

      A potential limitation of the study is the reliance on specific cellular and animal models, which may constrain the extrapolation of these findings to the broader spectrum of human TNBC biology. Furthermore, while the study provides evidence for a novel regulatory axis involving STAMBPL1, FOXO1, and GRHL3, the multifaceted nature of angiogenesis may implicate additional regulatory factors not exhaustively addressed in this research.

      Appraisal of Achievement and Conclusion Support:

      The authors have successfully demonstrated that STAMBPL1 promotes HIF1A transcription and activates the HIF1α/VEGFA axis in a non-enzymatic manner, leading to increased angiogenesis in TNBC. The results are generally supportive of their conclusions, with clear evidence that STAMBPL1 upregulates HIF1α expression and enhances the activity of HUVECs. The study also shows that STAMBPL1 interacts with FOXO1 to promote GRHL3 transcription, which in turn activates HIF1A.

      Impact on the Field and Utility:

      This research is poised to exert a substantial impact on the oncological research community by uncovering the role of STAMBPL1 in TNBC angiogenesis and by identifying the STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA axis as a potential therapeutic target. The findings could pave the way for the development of novel therapeutic strategies for TNBC, a subtype characterized by a paucity of effective treatment options. The methodologies utilized in this study are likely to be valuable to the research community, offering a paradigm for investigating the role of deubiquitinating enzymes in oncogenic processes.

      Additional Context:

      It would be beneficial for readers to understand the broader context of TNBC research and the current challenges in treating this aggressive cancer subtype. The significance of this work is heightened by the lack of effective treatments for TNBC, making the identification of new therapeutic targets particularly important. Furthermore, understanding the specific mechanisms by which STAMBPL1 regulates HIF1α expression could provide insights into hypoxia signaling in other cancer types as well.

    4. Reviewer #3 (Public review):

      In this manuscript, Fang et al. describe a new oncogenic function of the STAMBPL1 protein in triple-negative breast cancer (TNBC). STAMBPL1 is a deubiquitinase that has been poorly studied in cancer. Previous reports identify it as a promoter of epithelial to mesenchymal transition or an inhibitor of cisplatin-induced cell death, but its participation to other cancer phenotypes has not been investigated. Fang et al. find that in cell line models of TNBC, STAMBPL1 promotes expression of the transcription factor HIF-1a and its downstream target VEGF, with the consequence of stimulating neo-angiogenesis in vitro and in vivo. Mechanistically, the authors find that this occurs via a non-enzymatic and indirect mechanism, that is by promoting the expression of GRHL3, a transcription factor that in turn binds to the HIF-1a promoter to stimulate its transcription. Interestingly, the way by which STAMPB1 promotes GRHL3 expression is by facilitating the transcriptional activity of FOXO1, a known regulator of GRHL3. Because the authors find that STAMBPL1 and FOXO1 interact, they suggest that STAMBPL1 may promote the formation of an active transcriptional complex containing FOXO1, perhaps by facilitating the recruitment of transcriptional coactivators.

      In conclusion, these data position for the first time the STAMBPL1 deubiquitinase in a FOXO-GRHL3 regulatory axis for the control of VEGF expression and tumor angiogenesis.

      The main weaknesses of this work are that the relevance of this molecular axis to the pathogenesis of TNBC is not clear, and it is not clearly established whether this is a regulatory pathway that occurs in hypoxic conditions or independently of oxygen levels.

      Major criticisms:

      (1) Both FOXO1 and GRHL3 have been previously described as tumor suppressors, with reports of FOXO1 inhibiting tumor angiogenesis. Therefore, this work describes an apparently contradictory function of these proteins in TNBC. While it is not surprising that the same genes perform divergent functions in different tumor contexts, a stronger evidence in support of the oncogenic function of these two genes should be provided to make the data more convincing.<br /> To strengthen the notion that STAMBPL1, FOXO and GRHL3 are overexpressed in TNBC, the authors have utilized the BCIP tool to analyze their expression in the Metabric database. According to this analysis, the levels of STAMBPL1and GRHL3 are not higher in breast cancer than in adjacent tissues, and the levels of FOXO1 are lower. Nonetheless, the authors observe that their expression levels are significantly (yet not dramatically) higher in TNBC compared to non-TNBC (Fig.S6A-C). However, these new data do not provide convincing evidence of the relevant tumor suppressive function of these genes in TNBC, as neither is more expressed in tumors compared to adjacent normal tissues.

      (2) Because STAMBPL1 overexpression in normoxic conditions is sufficient to cause HIF-1a protein accumulation, it is not clear why the authors then use hypoxic conditions to analyze the effect of STAMBPL1 on HIF-1a transcription Avoiding HIF1-a protein degradation should not have any effect on its transcription. At the same time, it is not clear nor is being explained why different hypoxic conditions are sometimes used, resulting in different mRNA levels of HIF-1a and its downstream targets and quite significant fluctuations within the same cell line from one experimental setting to the next. In conclusion, it is not clear what is the relevance of the new HIF-1a regulatory axis described in this paper in normoxic or hypoxic conditions.

      (3) Another critical point is that necessary experimental controls are sometimes missing, and this is reducing the strength of some of the conclusions enunciated by the authors. As an example, experiments where overexpression of STAMBPL1 is coupled to silencing of FOXO1 to demonstrate dependency lack FOXO1silencing the absence of STAMBPL1 overexpression. Because diminishing FOXO1 expression affects HIF-1a/VEGF transcription even in the absence of STAMBPL1 (shown in Figure 7C, D), it is not clear if the data presented in Figure 7G are significant. The difference between HIF-1a expression upon FOXO1 silencing should be compared in the presence or absence of STAMBPL1 overexpression to understand if FOXO1 impacts HIF-1a transcription dependently or independently of STAMBPL1.

      In addition, some minor comments to improve the quality of this manuscript are provided.

      (1) In Figures 2A and D, where endogenous versus STAMBPL1 expression is shown, it is not clear what is the molecular weight of these proteins as they both appear to be of 55 KDa, even though according to the authors the exogenous protein is bigger than the endogenous and the lower band in Figure 2D is reported to be the endogenous STAMBPL1.

      (2) In Figure 2, the effect of STAMBPL1 overexpression on HIF-1a mRNA is minor. At the same time, it seems that the protein levels of HIF-1a are quite high (or at least visible by WB) in normoxic cells even in the absence of STAMBPL1 overexpression. This raises questions about the type of regulation that HIF-1a is subjected to in these cells.

      In general, because only two cell lines are used in this study and the data in patients do not appear to strongly support an oncogenic function of STAMBPL1 in TNBC (via its overexpression), data should be more solid and additional experiments should be provided to substantiate the oncogenic function of this pathway in TNCB.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The mechanism by which STAMBPL1 mediates GRHL3 transcription through its interaction with FOXO1 is not sufficiently discussed, especially in relation to how STAMBPL1 regulates FOXO1. Some reported effects are modest.

      We appreciate the reviewer’s comments. In response, we have added a discussion on the potential mechanisms by which STAMPBL1 regulates FOXO1 transcriptional activity in Discussion, highlighted in red on page 18, lines 342 to 352. The specific reply content is as follows: “The transcriptional activity of FOXO1 is primarily regulated by its nucleocytoplasmic shuttling process (Van Der Heide, Hoekman et al. 2004). The PI3K/AKT pathway promotes the phosphorylation of FOXO1, resulting in the formation of a complex with members of the 14-3-3 family (including 14-3-3σ, 14-3-3ε, and 14-3-3ζ), which facilitates its export from the nucleus and inhibits its transcriptional activity (Huang and Tindall 2007, Tzivion, Dobson et al. 2011). It’s reported that TDAG51 prevents the binding of 14-3-3ζ to FOXO1 in the nucleus by interacting with FOXO1, thereby enhancing its transcriptional activity through increased accumulation within the nucleus (Park, Jeon et al. 2023). Our results indicate that the overexpression of STAMBPL1 and STAMBPL1-E292A did not affect the protein levels of FOXO1 (Fig.7E and Fig.S5E), but STAMBPL1 co-localizes with FOXO1 in the nucleus (Fig.7M) and interacts with it (Fig.7N and Fig.S5I-J). This suggests that STAMBPL1 enhances the transcriptional activity of FOXO1 on GRHL3 by interacting with nuclear FOXO1.” The result was added to Supplementary Figure 5 as Fig.S5E.

      Reviewer #2 (Public review):

      (1) A potential limitation of the study is the reliance on specific cellular and animal models, which may constrain the extrapolation of these findings to the broader spectrum of human TNBC biology. Furthermore, while the study provides evidence for a novel regulatory axis involving STAMBPL1, FOXO1, and GRHL3, the multifaceted nature of angiogenesis may implicate additional regulatory factors not exhaustively addressed in this research.

      We appreciate the valuable suggestions provided by the reviewer. In Discussion, we have added an in-depth discussion of the limitations of the study, as well as an analysis of the regulatory factors related to tumor angiogenesis, which highlighted in red on pages 20 to 21, lines 396 to 412. The relevant content added is as follows: “In this study, we utilized two triple-negative breast cancer cell lines, HCC1806 and HCC1937, along with human primary umbilical vein endothelial cells (HUVECs) and a nude mouse breast orthotopic transplantation tumor model to investigate the regulatory mechanism by which STAMBPL1 activates the GRHL3/HIF1α/VEGFA signaling pathway through its interaction with FOXO1, thereby promoting angiogenesis in TNBC. The results of this study have certain limitations regarding their applicability to human TNBC biology. Furthermore, in addition to the HIF1α/VEGFA signaling pathway emphasized in this study, tumor cells can continuously release or upregulate various pro-angiogenic factors, such as Angiopoietin and FGF, which activate endothelial cells, pericytes (PCs), cancer-associated fibroblasts (CAFs), endothelial progenitor cells (EPCs), and immune cells (ICs). This leads to capillary dilation, basement membrane disruption, extracellular matrix remodeling, pericyte detachment, and endothelial cell differentiation, thereby sustaining a highly active state of angiogenesis (Liu, Chen et al. 2023). It is important to collect clinical TNBC tissue samples in the future to analyze the expression of the STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA signaling axis. Furthermore, patient-derived organoid and xenograft models are useful to elucidate the regulatory relationship of this axis in TNBC angiogenesis”

      Reviewer #3 (Public review):

      The main weaknesses of this work are that the relevance of this molecular axis to the pathogenesis of TNBC is not clear, and it is not clearly established whether this is a regulatory pathway that occurs in hypoxic conditions or independently of oxygen levels.

      (1) With respect to the first point, both FOXO1 and GRHL3 have been previously described as tumor suppressors, with reports of FOXO1 inhibiting tumor angiogenesis. Therefore, this works describes an apparently contradictory function of these proteins in TNBC. While it is not surprising that the same genes perform divergent functions in different tumor contexts, a stronger evidence in support of the oncogenic function of these two genes should be provided to make the data more convincing. As an example, the data in support of high STAMBPL1, FOXO and GRHL3 gene expression in TNBC TCGA specimens provided in Figure 8 is not very strong and it is not clear what the non-TNBC specimens are (whether other breast cancers or other tumors, perhaps those tumors whether these genes perform tumor suppressive functions). To strengthen the notion that STAMBPL1, FOXO and GRHL3 are overexpressed in TNCB, the authors could provide a comparison with normal tissue, as well as the analysis of other publicly available datasets (like the NCI Clinical Proteomic Tumor Analysis Consortium as an example). Finally, is it not clear what are the basal protein expression levels of STAMBPL1 in the cell lines used in this study, as based on the data presented in Figures 2D and F it appears that the protein is not expressed if not exogenously overexpressed. It would be helpful if the authors addressed this issue and provided further evidence of STAMBPL1 expression in TNBC cell lines.

      We appreciate the suggestions. In this study, we utilized the BCIP online tool to analyze the Metabric database, incorporating adjacent normal tissues as controls. Although the expression levels of STAMBPL1, FOXO1, and GRHL3 in breast cancer tissues are not uniformly higher than those in adjacent tissues, their expression levels in triple-negative breast cancer (TNBC) are significantly elevated compared to non-TNBC. The results of this re-analysis have been added in Supplementary Figure 6 as Fig.S6A-C.

      About the question of the basal protein expression levels of STAMBPL1 in the cell lines used in this study, our response is that Fig. 2A showed the endogenous level of STAMBPL1 in HCC1806 and HCC1937. For Fig. 2D and 2F, the overexpressed STAMBPL1 was fused with a 3xFlag tag, resulting in a higher molecular weight compared to the endogenous STAMBPL1. In the revised Figure 2, we have indicated the positions of the endogenous (Endo.) and exogenous (OE.) STAMBPL1 bands with arrows.

      (2) Linked to these considerations is the second major criticism, namely that it is not made clear if this new regulatory axis is proposed to act in normoxic or hypoxic conditions. The experiments presented in this paper are performed in both conditions but a clear explanation as to why cells are exposed to hypoxia is not given and would be necessary being that HIF-1a transcription and not protein stability is being analyzed. Also, different hypoxic conditions are sometimes used, resulting in different mRNA levels of HIF-1a and its downstream targets and quite significant fluctuations within the same cell line from one experimental setting to the next. The authors should provide an explanation as to why experimental conditions are changed and, more importantly, the experiments presented in Figure 2 should be performed also in normoxia.

      Thanks for the comments. Under normoxic conditions, HIF1α is recognized by pVHL due to hydroxylation and is rapidly degraded via the proteasomal pathway. In contrast, under hypoxic conditions, HIF1α protein is accumulated. To investigate the effect of STAMBPL1 knockdown on HIF1A gene transcription levels, we conducted experiments under hypoxic conditions to avoid interference from the rapid degradation of HIF1α at the protein level, as shown in Figures 2B-C. Furthermore, under normoxic conditions, the overexpression of STAMBPL1 had been demonstrated to significantly enhance the protein levels of HIF1α and upregulate the transcription of VEGFA through HIF1α. To avoid the potential impact of excessive accumulation of HIF1α protein under hypoxic conditions on its protein level detection and the transcription of downstream VEGFA, the related experiments shown in Figure 2D-G were performed under normoxic conditions. We have explained the corresponding experimental conditions in the “Result” and “Figure legends” according to the reviewer's comments, highlighted in red.

      (3) Another critical point is that necessary experimental controls are sometimes missing, and this is reducing the strength of some of the conclusions enunciated by the authors. As examples, experiments where overexpression of STAMBPL1 is coupled to silencing of FOXO1 to demonstrate dependency lack FOXO1 silencing the absence of STAMBPL1 overexpression. Because diminishing FOXO1 expression affects HIF-1a/VEGF transcription even in the absence of STAMBPL1 (shown in Figure 7C, D), it is not clear if the data presented in Figure 7G are significant. The difference between HIF-1a expression upon FOXO1 silencing should be compared in the presence or absence of STAMBPL1 overexpression to understand if FOXO1 impacts HIF-1a transcription dependently or independently of STAMBPL1.

      Thank you for this comment. For Fig.7G-H, our experimental objective was to determine whether the activation of HIF1A/VEGFA transcription by STAMBPL1 via FOXO1. Therefore, under STAMBPL1 overexpression, we knocked down FOXO1 to investigate whether FOXO1 silencing could reverse the upregulation of HIF1A/VEGFA transcription induced by STAMBPL1 overexpression.

      (4) In addition, some minor comments to improve the quality of this manuscript are provided.

      (4.1) As a general statement, the manuscript is extremely synthetic. While this is not necessarily a negative feature, sometimes results are discussed in the figure legends and not in the main text (as an example, western blots showing HIF-1a expression) and this makes it hard to read thought the data in an easy and enjoyable manner.

      Thank you for this suggestion. We have revised the figure legends to make them clearer and more concise, highlighted in red.

      (4.2) The effect of STAMBPL1 overexpression on HIF-1a transcription is minor (Figure 2) The authors should explain why they think this is the case and whether hypoxia may provide a molecular environment that is more permissive to this type of regulation.

      Thank you for the comment. Under normoxic conditions, we conducted WB to examine the protein expression of HIF1α after the overexpression of STAMBPL1 and the knockdown of HIF1α. To visually illustrate the impact of STAMBPL1 overexpression on HIF1A protein levels, as well as the effectiveness of HIF1α knockdown, we annotated the grayscale analysis results of the bands in Figures 2D and 2F. As the reviewer pointed out, under normoxic conditions, HIF1α is rapidly degraded, which may explain why the upregulation of HIF1α protein levels by STAMBPL1 overexpression is not very pronounced.

      (4.3) HIF-1a does not appear upregulated at the protein level protein by STAMBPL1 or GRLH3 overexpression, even though this is stated in the legends of Figures 2 and 6. The authors should show unsaturated western blots images and provide quantitative data of independent experiments to make this point.

      Thank you for this comment. We have added the unsaturated image of HIF1α into Fig.2D, and performed a grayscale analysis of the HIF1α bands in Fig.2D and Fig.6A to indicate the relative protein level of HIF1α.

      Reviewer #1 (Recommendations for the authors):

      (1) The authors previously reported that STAMBPL1 stabilizes MKP1 in TNBC. However, in this study, they focus on HIF1a. Given that STAMBPL1 affects HIF1a expression, it would be valuable to examine the levels of ROS in TNBC cells with or without STAMBPL1, as ROS is known to influence HIF1a stability.

      Thank you for your comments. It’s known that STAMBPL1 functions as a deubiquitinating enzyme. However, our study reveals that the upregulation of HIF1α by STAMBPL1 is independent of its deubiquitinating activity. This conclusion is supported by the observation that overexpression of the deubiquitinase active site mutant, STAMBPL1-E292A, also upregulated HIF1α expression (Figure 1F). Moreover, STAMBPL1 overexpression enhanced HIF1α transcription (Figures 4E and S3E), while STAMBPL1 knockdown was able to inhibit the transcription of HIF1α (Figures 2B-C). These results indicate that STAMBPL1 mediates the transcription of HIF1α but does not affect the stability of HIF1α. For these reasons, we think that it is unnecessary to examine the ROS levels.

      (2) Figure 1A: The regulation of HIF1a mRNA by STAMBPL1, but not its protein levels, could be better addressed by using MG132 to rule out the impact of protein degradation.

      Thanks for this comment. Under normoxic conditions, the oxygen-sensitive prolyl hydroxylases PHD1-3 act on HIF1α, specifically inducing hydroxylation at the proline 402 and 564 residues. These hydroxylated residues are recognized by the pVHL/E3 ubiquitin ligase complex, leading to ubiquitination and subsequent degradation via the proteasome pathway. Conversely, under hypoxic conditions, PHD1-3 are inactivated, and non-hydroxylated HIF1α is not recognized by the pVHL/E3 ubiquitin ligase complex, thereby avoiding ubiquitination and proteasomal degradation (DOI: 10.1073/pnas.95.14.7987, DOI: 10.1515/BC.2004.016, and DOI: 10.1042/BJ20040620). The mechanism of HIF1α accumulation under hypoxia is analogous to the action of the proteasome inhibitor MG132. When we treated cells with hypoxia, the ubiquitination and proteasomal degradation pathway of HIF1α was blocked. At this time, STAMBPL1 knockdown could downregulate the expression of HIF1α (Fig.1A). Meanwhile, since the knockdown of STAMBPL1 significantly downregulated the mRNA level of HIF1α under hypoxia (Fig.2B-C), we concluded that STAMBPL1 affects the expression of HIF1α by mediating its transcription. In addition, MG132 will block all proteasomal substrate degradation and may affect HIF1α mRNA levels indirectly.

      (3) Figure 2D and 2F: The effect of STAMBPL1 in promoting HIF1a expression is quite mild, and the effect of HIF1a knockdown is also modest. Given the high levels of STAMBPL1 in TNBC cell lines (Figure 2A), it would be better to repeat these experiments in a STAMBPL1-knockdown setting for clearer insights.

      We appreciate this insightful suggestion. Considering that the regulation of HIF1α expression by STAMBPL1 occurs at the transcriptional level, and to prevent excessive accumulation of HIF1a during hypoxia that could confound the effect of STAMBPL1 overexpression on HIF1α regulation, we opted to overexpress STAMBPL1 under normoxic conditions and subsequently knock down HIF1α, as shown in Fig.2D and Fig.2F. This approach allowed us to observe that STAMBPL1 overexpression can upregulate HIF1a expression to some extent. Additionally, in response to the reviewer's suggestion to knock down STAMBPL1, we have conducted the corresponding experiments, with results presented in Fig.1A-E and Fig.2B-C.

      (4) Figure 4A: Why does the RNA-seq pattern differ significantly between the two siRNAs? Additionally, the authors should clarify why they focus primarily on transcription factors, as other mechanisms, such as mRNA stability and RNA modification, could also influence gene transcription.

      Thank you for this comment. Two siRNAs for STAMBPL1 were designed and synthesized by a biotechnology company. Although both siRNAs target STAMBPL1, they target different sequences. While both siRNAs effectively knocked down STAMBPL1 (Fig. 1A and Fig. 2A), the possibility of off-target effects cannot be completely ruled out. Therefore, we needed to use two siRNAs simultaneously for RNA-seq, ensuring that the gene expression changes observed are due to the knockdown of STAMBPL1 by focusing on genes downregulated by both two siRNAs. Additionally, among the 27 genes downregulated by both two siRNAs, only 18 genes were annotated. Of these 18 genes, except for GRHL3, which is a transcription factor reported to be involved in gene transcription regulation, the remaining 17 genes have no documented association with RNA transcription, stability, or modification. Therefore, we focused on the GRHL3 gene.

      (5) Figure 5G: To investigate whether STAMBPL1 and GRHL3 function epistatically in the pathway, a double knockdown of STAMBPL1 and GRHL3 should be examined. Additionally, a double knockdown of STAMBPL1 and FOXO1 should be assessed.

      Thank you for your comment. In Figure 5G, we aimed to assess the knockdown efficiency of GRHL3 using siRNAs. To determine whether STAMBPL1 upregulates the HIF1a/VEGFA axis via GRHL3, we overexpressed STAMBPL1 and subsequently knocked down GRHL3. Our findings indicated that STAMBPL1 overexpression indeed enhanced the HIF1a/VEGFA axis, which was rescued by the knockdown of GRHL3, as shown in Figures 4E-F and S3E-F. Similarly, upon overexpressing STAMBPL1 and knocking down FOXO1, we observed that STAMBPL1 overexpression increased the GRHL3/HIF1a/VEGFA axis, which could also be rescued by knocking down FOXO1, as shown in Figures 7F-H. These results suggest that STAMBPL1 upregulates the GRHL3/HIF1a/VEGFA axis through FOXO1. We do not think it is a right way to double knock down STAMBPL1 and FOXO1 or GRHL3.

      (6) Figure 7: It remains unclear how STAMBPL1 regulates FOXO1. The authors show that STAMBPL1 increases the transcriptional activation of FOXO1 at the GRHL3 promoter, but it is not clear if STAMBPL1 is required for FOXO1 binding to the GRHL3 promoter. To address this, STAMBPL1-knockdown should be included to examine its effect on FOXO1 binding to the GRHL3 promoter. Furthermore, it would be important to determine whether the STAMBPL1-FOXO1 interaction is essential for GRHL3 transcription. Since the interaction sites of STAMBPL1-FOXO1 have been mapped, a mutant disrupting the interaction would provide better insight into how STAMBPL1 promotes GRHL3 transcription by interacting with FOXO1.

      Thank you for this comment. It has been reported that FOXO1 promotes the transcription of the GRHL3 gene by interacting with its promoter (DOI: 10.1093/nar/gkw1276). We also verified through ChIP assay that FOXO1 can bind to the promoter of GRHL3 gene (Fig.7I) and mediate its transcription. Specifically, knocking down FOXO1 significantly down-regulated the mRNA level of GRHL3 (Fig.7B), and the GRHL3 promoter lacking FOXO1 binding site almost completely lost transcriptional activity (Fig.7J), indicating that FOXO1 is crucial for the transcriptional activity of the GRHL3 promoter. Overexpression of STAMBPL1 enhances the activating effect of FOXO1 on the transcriptional activity of the GRHL3 promoter (Fig.7K). However, the up-regulation of GRHL3 transcription by overexpression of STAMBPL1 is completely blocked by FOXO1 knockdown (Fig.7F), and the knockdown of FOXO1 essentially blocks the binding of STAMBPL1 to the GRHL3 promoter (Fig.7L), suggesting that STAMBPL1 affects the transcriptional expression of GRHL3 based on FOXO1. As we added in Discussion, the transcription factor activity of FOXO1 is mainly regulated by its nucleoplasm shuttling process, and the accumulation of FOXO1 in nucleus can enhance its transcription factor activity (DOI: 10.1042/BJ20040167; DOI: 10.15252/embj.2022111867). In our research, neither STAMBPL1 nor its mutant of deubiquitinating enzyme site affected the expression of FOXO1 (Fig.S5E), but STAMBPL1 and FOXO1 co-located in the nucleus (Fig.7M), and they interacted with each other (Fig.7N, Fig.S5I-J). Therefore, we speculate that STAMBPL1 interacts with FOXO1 in the nucleus, obstructs the binding of FOXO1 with the members of 14-3-3 family, inhibits the export of FOXO1, thereby enhancing its transcriptional activity. This interaction between STAMBPL1 and FOXO1 does not necessarily affect the binding of FOXO1 with DNA, including the GRHL3 promoter.

      (7) Figure 8 A-C: What is the correlation among the expressions of STAMBPL1, FOXO1, and GRHL3 in TNBC tumors compared to non-TNBC tumors?

      Thank you for your comment. In Figure 8A-C, we analyzed the expression levels of STAMBPL1, FOXO1, and GRHL3 in both TNBC and non-TNBC samples using the BCIP. The results indicate that the expression levels of these three genes are significantly higher in TNBC compared to non-TNBC samples. To investigate the correlation among the expressions of STAMBPL1, FOXO1, and GRHL3 in TNBC versus non-TNBC, we further utilized the Metabric data. Besides the positive correlation trend between STAMBPL1 and GRHL3 expression in TNBC clinical samples (Pearson R = 0.27), no significant correlation was observed in the expression levels of STAMBPL1, FOXO1, and GRHL3 in TNBC and non-TNBC clinical samples (as shown in Author response image 1 below). Since STAMBPL1 and FOXO1 are involved as protein molecules in the transcriptional regulation of GRHL3 gene, and the data obtained from the Metabric database are the transcriptional levels of these three genes, this might be the reason why the correlation between their expressions was not observed.

      Author response image 1.

      Reviewer #2 (Recommendations for the authors):

      The authors have thoroughly elucidated the role of STAMBPL1 in TNBC. However, it would be beneficial to discuss the potential clinical implications of these findings, such as how targeting STAMBPL1 or FOXO1 might impact current treatment strategies for TNBC. However, several issues need to be addressed.

      Major:

      (1) While the study provides an exhaustive analysis of the molecular mechanisms, a comparison with other subtypes of breast cancer could enhance our understanding of the specificity of the STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA axis in TNBC.

      Thank you for your comment. According to report, STAMBPL1 is significantly associated with the mesenchymal characteristics of breast cancer (DOI: 10.1038/s41416-020-0972-x). We utilized cBioPortal (http://www.cbioportal.org/) to analyze the expression of STAMBPL1 across various clinical subtypes of breast cancer. The results indicated that STAMBPL1 is highly expressed in invasive breast cancer, which has been added to Supplementary Figure 6 as Fig.S6D. Given that TNBC is an aggressive type of invasive breast cancer, we further examined the expression of STAMBPL1 in TNBC compared to non-TNBC using BCIP (http://omicsnet.org/bcancer/database). Our findings revealed that the expression level of STAMBPL1 in TNBC was elevated relative to its levels in non-TNBC (Fig.8A). Additionally, since tumor angiogenesis is a critical factor influencing the metastasis of cancer cells, our study focused specifically on the pro-angiogenic effects of STAMBPL1 in TNBC.

      (2) The authors might consider discussing any potential off-target effects of the siRNA and shRNA used in the study to bolster the conclusions drawn from the knockdown experiments.

      We appreciate the reviewer's suggestion. It is well-known that siRNA or shRNA have off-target effects. To address this concern, we employed two siRNAs for each gene knockdown in our study. Specifically, we knocked down genes such as STAMBPL1, FOXO1, GRHL3, and HIF1A in two TNBC cell lines, HCC1806 and HCC1937, using two siRNAs. Except for siRNA#1 targeting HIF1A, which did not show a significant knockdown effect in HCC1806 cells (Fig.2D and Fig.6A), the knockdown effects of other siRNAs on their respective genes were effective, and the resulting phenotypes were consistent. As shown in Fig.2F and Fig.S4H, siRNA#1 targeting HIF1A had a significant knockdown effect in HCC1937 cells. The lower knockdown efficiency of this siRNA in HCC1806 cell line might be attributed to cell-specific factors.

      (3) It would be advantageous if the authors could provide further details on the patient demographics and tumor characteristics in the TCGA database analysis to better comprehend the clinical relevance of their findings.

      Thanks for the reviewer's suggestions. We have now indicated the number of clinical samples in each group in the legend of Fig.8A-C. Since we utilized the BCIP online database to analyze and compare the expression levels of the three genes STAMBPL1, FOXO1, and GRHL3 in TNBC and non-TNBC, we are unable to obtain more specific information regarding the tumor characteristics of each sample. However, our analysis clearly shows that the expression levels of these three genes are significantly higher in TNBC compared to non-TNBC.

      (4) The authors should consider discussing any limitations regarding the generalizability of their findings, such as potential variations among different TNBC subtypes or the specificity of their observations to certain stages of the disease.

      We appreciate the reviewer's comment. Accordingly, we have added a discussion on the limitation of this study in Discussion, highlighted in red font on pages 20 to 21, lines 396 to 412. In addition, we utilized the bc-GenExMiner online database to conduct a comparative analysis of STAMBPL1 expression in different subtypes of non-TNBC and TNBC. The result indicates that STAMBPL1 is highly expressed in mesenchymal-like and basal-like TNBC, which has been added into Supplementary Figure 6 as Fig.S6E. Since these two subtypes of TNBC are highly invasive and metastatic, it suggests that targeting the signaling pathway of STAMBPL1/FOXO1/GRHL3/HIF1α/VEGFA may offer clinical benefits for patients with invasive TNBC.

      Minor:

      The paper is generally well-written, but it's crucial to maintain vigilance for subject-verb agreement, proper use of tense, and consistent terminology.

      Thank you for this suggestion. We have thoroughly revised the article for issues such as grammar, including tense, subject-verb agreement, and terminology.

    1. eLife Assessment

      This important study is an advancement towards the understanding of animal nervous system organization and evolution by providing an exceptional, high-quality and detailed description of the entire connectome of the 3-day larva of the marine annelid Platynereis dumerilii. It provides a wealth of data on cell type diversity and the modules that interconnect them. Its strength is the massive amount of high-quality data, although this is also partly a weakness as it can make the work difficult to read and digest scientifically. This work lays the foundations for studies on cell type diversity, segmental vs. intersegmental connectivity, and mushroom bodies, but will certainly also be of use to scientists interested in other nervous systems parts, their functions, and evolution.

    2. Reviewer #1 (Public review):

      Summary:

      This paper provide a resource for researchers studying the marine annelid Platynereis dumerilii. It is only the third whole body connectome to be assembled and thus provides a comparison with those less complex animals: the nematode Caenorhabditis elegans and the tunicate Ciona intestinialis. The paper catalogs all cells in the body, not just neurons, and details how sensory neurons, interneurons, motor neurons, and effector organs are connected. From this, the authors are able to extract information about the organization of different aspects of the nervous system. These include the extent of recurrent connectivity, unimodal and multimodal sensory processing, and long-range and short-range connectivity.

      Several interesting conclusion are drawn, including the concept that circuit evolution might have proceeded by duplication and diversion of cell types, much as it has been posited that gene evolution has occurred. It also informs the understanding of the evolution of segmental body plans in annelids by mapping and comparing cells in each segment.

      Strengths:

      This paper contains a wealth of data. The raw dataset is available. The codes and scripts are provided to allow interested readers to utilize this dataset.

      The analysis is painstakingly meticulous. The diagrams are organized to orient the reader to the complexities this overwhelming analysis

      Weaknesses:

      The strength of the paper is also its weakness. It contains so much data and analysis that it is burdensome to read and understand. There are 16 multi-panel data figures in the main text and another 38 supplemental figures and 5 videos.

      The impact of the paper is diminished by its size and depth. The paper could be broken up into smaller thematic papers that would be more accessible to researchers interested in particular topics. For example, there could be a single paper on the mushroom body and another paper on the segmental organization.

      Comments on revisions:

      The authors have addressed all of my concerns.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewing Editor Note:

      The two reviewers have provided thoughtful and constructive feedback that we hope will be of use to the authors to improve their manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The section on "Circuit evolution by duplication and divergence" (starting on line 622) should cite:

      Chakraborty, Mukta, and Erich D. Jarvis. "Brain evolution by brain pathway duplication." Philosophical Transactions of the Royal Society B: Biological Sciences 370, no. 1684 (2015): 20150056.

      and

      Roberts, Ruairí JV, Sinziana Pop, and Lucia L. Prieto-Godino. "Evolution of central neural circuits: state of the art and perspectives." Nature Reviews Neuroscience 23, no. 12 (2022): 725-743.

      It should also reference that the concept originated from genetics:

      Ohno, Susumu. Evolution by gene duplication. Springer Science & Business Media, 1970

      These papers have now been cited: “Duplication and divergence of circuits was also proposed as a possible mechanism for the evolution of brain pathways for vocal learning in song-learning birds, spoken language in humans [@chakraborty2015brain] and other circuits [@roberts2022evolution].”

      and: Our reconstructions identified a potential case for circuit evolution by duplication and divergence [@tosches2017developmental; @roberts2022evolution], a concept that originated from genetics [@ohno1970evolution].

      The terms outgoing and incoming synapses were confusing. The more common terminology is pre and postsynaptic elements. For example, in Fig 1, the label Sensory neuron outgoing and incoming was confusing because I mistakenly thought it was referring to the neurons and I could not figure out what an outgoing sensory neuron was.

      We have now changed ‘incoming’ to ‘postsynaptic’ and ‘outgoing’ to ‘presynaptic’.

      In L-O, there should be an indicator on the figures that they refer to the locations of synaptic sites, as it does in F.

      We have now replaced the labels ‘incoming’ and ‘outgoing’ with ‘presyn’ and ‘postsyn’ for Figure 1 panels L-O to make it clear that these are synaptic sites.

      Figure 2. - last panel of muscle motor - it would be helpful to have names of muscles instead of just having 5 'muscle motor' of different colors

      Each muscle-motor module contains a large number and type of muscles and motor neurons. Labelling them by the name of individual muscle types is therefore not practical at this resolution. The three-day-old Platynereis larvae has 53 different muscle cell types. Their anatomy and classification, together with the details of motoneuron innervation have been described in detail elsewhere (Jasek et al 2022 https://doi.org/10.7554/eLife.71231).

      Figure 3. D and E are hard to understand from the figure; The shading is the number of neurons; that scale should be shown somewhere.

      We are not sure we understand the comment. These plots are histograms that show the distribution of the number of cells across categories. The y axis is the number of neuronal or non-neuronal cell types in each bin.

      PageRank is an algorithm that Google uses. In Figure 4, it seems to be used to indicate centrality. A brief explanation in the text would be useful.

      We have now added an explanation of the centrality measures used. “PageRank is an algorithm used by Google to rank webpages and scores the number and quality of the incoming links of a node [@page1999pagerank], betweenness centrality measures the number of shortest paths that pass through a node in a graph [@freeman1977set],  and authority measures the extent of inputs to a node by hubs in a network [@kleinberg1999authoritative].”

      Figure 5. The labels on some images are not clear. They are on top of each other and elements of the figure

      We have now moved the position of the labels to minimise overlap. We have also added an interactive html file with the network shown in Figure 5 panel A to help the exploration of the network. Added: “Figure 5—source data 1. Interactive html file with the network shown in panel A.”

      There are differences in line thickness in several figures, such as Figure 9 (A and B) and Figure 12 (D and I and N) that presumably means numbers of synaptic contacts. It would be useful to know what the scale is.

      We have now added labels of line thickness to the networks in Figure 4, Figure 5 – figure supplement 2, Figure 9, Figure 12, Figure 7 – figure supplement 1, Figure 15 and Figure 16.

      Reviewer #2 (Recommendations For The Authors):

      (1) Suggestions for improved or additional experiments, data, or analyses.

      (2) Recommendations for improving the writing and presentation.

      Perhaps we require a comprehensive inventory detailing all the innovations compared to previous, more limited publications, particularly in relation to the 2017 publication and 2020 preprint.

      We have provided this detail in Supplementary table 1 that lists all cell types. We included the reference for previously published cell types in the ‘reference’ column except for those that were also described in the 2020 preprint. The current manuscript is a greatly revised and extended version of the original 2020 preprint. In addition, in the online connectome database (https://catmaid.jekelylab.ex.ac.uk), all cell types that were previously published are annotated with the notation ‘FirstAuthor_et_al_year’.

      It is a bit frustrating given the huge amount of graphs, analyses, tables, and networks that are presented in the manuscript, we do not see much of the original EM pictures except for a few examples of cell type blow-ups. It would be useful for future workers in the field to have eventually a sort of compendium of how the authors actually recognized each cell type, without having to connect to the original CATMAID annotation.

      Most neuronal cell types (with the exception of some characteristic sensory neurons such as photoreceptor cells and mechanosensory cells) were not classified based on ultrastructural features, but on features of neurite morphology, body position and synaptic connectivity. It would be therefore not possible to represent most of the cell types with a single layer of an original EM picture. However, in order to make the morphological skeleton characteristics more accessible to the reader, we have now added a comprehensive website ( https://jekelylab.github.io/Platynereis_connectome/)  including all cell types together with their interactive 3D rendering.

      “Interactive 3D morphological renderings of each cell type together with their main annotations can also be explored on a webpage (https://jekelylab.github.io/Platynereis_celltype_compendium.html).”

      The Platynereis 3-day larva is obviously only one transient stage in the developmental cycle of the animal, and it is a very specialized stage (called metatrochophore in annelid jargon), during which the animal does not yet feed, relying instead on its copious yolk. Moreover, it is a stage whose purpose is limited to dispersion, with no complex behavior or social interaction that later stages are going to display. While this work represents a substantial leap forward in understanding neural integration in a whole animal, it must be kept in mind that compared to an adult or growing juvenile, there are likely a considerable number of cells, cell types, and neural modules missing in this larva. This is clearly not a weakness of this study per se, but readers may find it interesting to be presented with this perspective and therefore more biological details about the Platynereis life cycle and associated behaviors.

      Obviously, understanding how the constantly developing nervous system of a worm-like Platynereis gets reshuffled in time will be a great subject to investigate. The authors mention that the 3-day larva displays more than 4000 neuronal cells not yet differentiated. Readers may be interested in their location. Are there niches of neural stem cells? A description of what may be missing from the larva in terms of cell types compared to the adult may be useful.

      We have now added further explanation into the Introduction about the early nectochaete larval stage: “The early nectochaete larva represents a transient dispersing stage in the life cycle of Platynereis. During this stage the larvae do not feed yet but rely on maternally provided yolk. Compared to the juvenile and adult stages it is expected that a considerable number of cell types will be only developing or completely missing at this stage. Three-day-old larvae do not yet have sensory palps and other sensory appendages (cirri), they do not crawl or feed and lack visceral muscles and an enteric nervous system.”

      The location of developing neurons is shown in Figure 3—figure supplement 1 panel I.

      Juvenile or adult cell types have not yet been described in any detail that is close to the level of detail we now provide for the nectochaete larva, therefore a meaningful comparison of cell-type complements across stages is not yet feasible.

      (3) Minor corrections to the text and figures.

      Figure 1: "outgoing" not "outgoung" in panels M, O, Q.

      Corrected

      Line 128: We may need a precise definition of "cable length".

      We have included a definition of cable length in the Methods section under a new subheading ‘Quantitative analysis of neuron morphologies’.

      In all Figures: information on the orientation of the worm's view is sometimes missing in figures, which could make interpretation difficult for the reader, especially for anterior views with no D/V indication. The authors should indicate the orientation for each panel or provide a general orientation in the figure if all panels are oriented the same.

      We have now added D/V or A/P indication to all figures.

      Figure 23: "right view, left side" is confusing.

      We have changed this to “ Each panel shows a ventral (left panel) and a left-side view (right panel).”

      Line 406 : the first mention of the Platynereis cryptic segment, as far as I know, is Saudemont et al, 2008.

      Thank you for pointing this out. We added the citation.

      Figure 45: descending and decussating, 2nd and 3rd line of the legend.

      Corrected

      The format of data source tables is not homogeneized with some files in Excel format and others in plain comma format.

      We have homogeneized the file formats of the supplements and source data. We have .csv files or .rds (R data format) files for the more complex data, such as tibble graphs that cannot be represented in a simple .csv format.

    1. eLife Assessment

      This study presents an important methodological advance to improve the sensitivity of PCR for detecting Trypanosoma cruzi in blood, combining DNA fragmentation, deep sampling, and blood cell pellet analysis. The findings offer solid evidence of enhanced detection sensitivity and shed light on parasite load dynamics during chronic infection in mammalian reservoirs. The evidence is sound for macaques and the method shows promise in expanding detection limits, but there is some variability in the limits of detection and small sample size of human samples. This work will be of interest to parasitologists, epidemiologists, and clinicians using molecular diagnostics to monitor responses to etiological treatments for Chagas disease.

    2. Reviewer #1 (Public review):

      This study presents a refined approach to enhance the sensitivity of PCR for detecting Trypanosoma cruzi in blood by employing DNA fragmentation and deep sampling, involving multiple replicate PCR reactions. Combined with serial blood sampling, these methods enabled consistent detection of the parasite in infected humans, non-human primates, and dogs, including hosts with very low parasitemia levels.

      Inspired by earlier methods that cleaved kinetoplast DNA (kDNA) to improve target distribution, this study targets nuclear satellite DNA repeats, which are tandemly arranged in T. cruzi chromosomes. By fragmenting DNA prior to PCR, the authors reduced subsampling errors, breaking large fragments into smaller, evenly distributed units. This improved the frequency of positive reactions and reduced variability among replicate Cq values.

      Using contrived blood samples, the study demonstrated that this approach significantly enhances PCR positivity. Moreover, the findings suggest that cell pellets from blood yield higher concentrations of parasite DNA compared to whole blood, prompting a reevaluation of current diagnostic practices, which predominantly use whole blood lysates.

      The study also highlights the importance of deep sampling. Serial testing across multiple blood samples mitigated the variability in parasitemia, addressing challenges first noted in early xenodiagnosis studies (Cerisola et al., 1977).

      The proposed DNA extraction and amplification procedures effectively captured parasitemia dynamics, achieving detection sensitivities with quantification limits as low as ~0.00025 parasite equivalents/mL, approaching the detection of a single target copy per reaction.

      This work underscores the utility of deep-sampling PCR in monitoring parasitemia dynamics and guiding treatment strategies, especially in chronic infections. It also stresses the importance of treating individuals with low parasitic loads, as immune control may change over time.

      Strengths:

      The strategies used for increasing PCR sensitivity offer the potential for enhancing treatment monitoring and understanding the dynamics of parasite-host interactions in chronic Chagas disease.

      Weaknesses:

      While the study offers valuable insights for research in T.cruzi infection dynamics and monitoring of trypanocidal drugs efficacy, its broader adoption depends on the development of cost-effective and scalable alternatives to labor-intensive techniques such as sonication, currently required for DNA fragmentation. Additionally, the reliance on blood cell pellets and the DNA fragmentation protocol introduces extra processing steps, which may not be feasible for many clinical laboratories, particularly in resource-limited endemic areas that require simpler and more streamlined procedures.

    3. Reviewer #2 (Public review):

      Summary:

      This study introduces a valuable methodological innovation for detecting Trypanosoma cruzi, the causative agent of Chagas disease, using "deep-sampling PCR" which combines DNA fragmentation with multiple qPCR replications (>300 in some cases) on each sample. The authors aim to overcome the limitations of current qPCR methods by increasing the sensitivity of detection, which is fundamental for evaluating treatment responses in chronic Chagas disease patients. The work also evaluates the approach in multiple host species (macaques, humans, and dogs), at different times and across different sample types, including whole blood, blood cell pellets, plasma, and tissues.

      Strengths:

      The primary strength of this study lies in its methodological novelty, particularly the combination of multiple parallel PCR reactions and DNA fragmentation to enhance sensitivity. It is a sort of brute-force method for detecting the parasite. This approach promises the detection of parasitic DNA at levels significantly lower than those achievable with standard qPCR methods. Additionally, the authors demonstrate the utility of this method in tracking parasitemia dynamics and post-treatment responses in macaques and dogs, providing valuable insights for both research and clinical applications.

      Weaknesses:

      (1) Methodological Concerns on detection and quantification limits

      Some methodological inconsistencies and limitations were observed that merit consideration. In Figure 1, there is a clear lack of consistency with theoretical expectations and with the trends observed in Figure 4A. Based on approximate calculations, having 10^-7 parasite equivalents with 100,000 target copies per parasite implies an average of 0.01 target copies per reaction. This would suggest an amplification rate of approximately 1 in 100 reactions, yet the observed 30% amplification appears disproportionately high. In addition, Figure 4A (not fragmented) shows lower values of positivity than Figure 1 for 10^-5 and 10^-6 dilutions showing this inconsistency among experiments. Some possible explanations could account for this inconsistency: (1) an inaccurate quantification of the starting number of parasites used for serial dilutions, or (2) random contamination not detected by negative controls, potentially due to a low number of template molecules.

      Similarly, Figure 5B presents another inconsistency in theoretical expectations for amplification. The authors report detecting amplification in reactions containing 10^-9 parasites after DNA fragmentation. Based on the figure, at least 3 positives (as I can see because raw data is not available) out of 388 PCRs are observed at this dilution. Assuming 100,000 copies of satellite DNA per parasite, the probability of a single copy being present in a 10^-9 dilution is approximately 1/10,000. If we assume this as the probability of amplification of a PCR (an approximation), by using a simple binomial calculation, the probability of at least 3 positive reactions out of 388 is approximately 9.39 x 10^-6 (in ideal conditions, likely lower in real-world scenarios). This translates to a probability of about 1 in 100,000 to observe such frequency of positives, which is highly improbable and suggests either inaccuracies in the initial parasite quantification or issues with contamination. In addition, at 10^-6 PE/reactions (the proposed limit of quantification) it is observed that 40% of repetitions are amplified. The number of repetitions is not specified but probably more than 50 according to the graph. Such dilution implies 0.1 targets per reaction (assuming 100.000 copies divided by 10^6), which means a total of 5 target molecules to distribute among the reactions (0.1 targets multiplied by 50 reactions). It seems highly improbable that 40% of the reactions (20/50) would amplify under the described conditions. Even considering 200.000 target copies per parasite implies 0.2 targets per reaction and an average of 10 molecules to distribute among 50 reactions. The approximate probability of the observation of at least 20/50 positives can be calculated by determining the probability of a reaction to receive targets by assuming a random distribution of the targets among the tubes, p= 1 - (1 - 1/50)^10, and then by using a binomial distribution to determine the probability that at least 20 reactions receive at least one target copy. The probability of at least 20/50 positive reactions in a dilution of 10^-6 parasites (200.000 target copies per parasite) is 0.00028. Consequently, the observed result is highly unlikely.

      2) Lack of details on contamination detection

      Additionally, the manuscript does not provide enough details on how cross-contamination was detected or managed. It is unclear how the negative controls (NTCs) and no-template controls were distributed across plates, in terms of both quantity and placement. This omission is critical, as the low detection thresholds targeted in this study increase the risk of false positives by contamination. To ensure reliability and reproducibility, future uses of the technique would benefit from more standardized and clearly documented protocols for control placement and handling.

      3) Unclear relevance for treatment monitoring in Humans

      In Figure 7A, the results suggest that the deep-sampling PCR method does not provide a clearly significant improvement over conventional qPCR in humans. Of the 9 samples tested, 6 (56%) were consistently amplified in all or nearly all reactions, indicating these samples could also be reliably detected with standard PCR protocols. Two additional samples were detected only with the deep-sampling approach, increasing sensitivity to 78%; however, these detections might be attributable to random chance given the limited sample size. While the authors acknowledge the small sample size in the discussion, they do not address the fact that a similar increase in sensitivity was reported in citation 5, where only 3 samples were tested with 3 replicates each. This raises an important question: how many PCR reactions are needed in human samples to reach a plateau in detection rates? This issue should be further discussed to contextualize the results and their implications.

      Despite these limitations, this work represents a promising step forward in the development of highly sensitive diagnostic tools for T. cruzi. It offers a novel foundation for advancing the detection and monitoring of parasitemia, which could significantly benefit Chagas disease research community and clinicians focused on neglected tropical diseases. While addressing the methodological inconsistencies and improving robustness will be critical, this study provides valuable insights and data that could lead to future innovations in parasitological research and diagnostics.

    4. Author Response:

      Reviewer #1 (Public review):

      […] Strengths:

      The strategies used for increasing PCR sensitivity offer the potential for enhancing treatment monitoring and understanding the dynamics of parasite-host interactions in chronic Chagas disease.

      Weaknesses:

      While the study offers valuable insights for research in T.cruzi infection dynamics and monitoring of trypanocidal drugs efficacy, its broader adoption depends on the development of cost-effective and scalable alternatives to labor-intensive techniques such as sonication, currently required for DNA fragmentation. Additionally, the reliance on blood cell pellets and the DNA fragmentation protocol introduces extra processing steps, which may not be feasible for many clinical laboratories, particularly in resource-limited endemic areas that require simpler and more streamlined procedures.

      We agree that this methodology is likely to be used primarily as a research tool and for selective use in the field (e.g. drug trials) and unlikely to be standard in many clinical labs, irrespective of resources. We note the protocol does not require cell pellets (although that fraction provides the highest sensitivity) and that the fragmentation step is not at all labor-intensive. But to achieve consistent detection across the range of parasite burden known to occur in chronic T. cruzi infection, appropriately processed DNA from higher volumes of blood than are now routinely used for detection of T. cruzi, will be required.

      Reviewer #2 (Public review):

      […] Strengths:

      The primary strength of this study lies in its methodological novelty, particularly the combination of multiple parallel PCR reactions and DNA fragmentation to enhance sensitivity. It is a sort of brute-force method for detecting the parasite. This approach promises the detection of parasitic DNA at levels significantly lower than those achievable with standard qPCR methods. Additionally, the authors demonstrate the utility of this method in tracking parasitemia dynamics and post-treatment responses in macaques and dogs, providing valuable insights for both research and clinical applications.

      Weaknesses:

      (1) Methodological Concerns on detection and quantification limits

      Some methodological inconsistencies and limitations were observed that merit consideration. In Figure 1, there is a clear lack of consistency with theoretical expectations and with the trends observed in Figure 4A. Based on approximate calculations, having 10^-7 parasite equivalents with 100,000 target copies per parasite implies an average of 0.01 target copies per reaction. This would suggest an amplification rate of approximately 1 in 100 reactions, yet the observed 30% amplification appears disproportionately high. In addition, Figure 4A (not fragmented) shows lower values of positivity than Figure 1 for 10^-5 and 10^-6 dilutions showing this inconsistency among experiments. Some possible explanations could account for this inconsistency: (1) an inaccurate quantification of the starting number of parasites used for serial dilutions, or (2) random contamination not detected by negative controls, potentially due to a low number of template molecules.

      Similarly, Figure 5B presents another inconsistency in theoretical expectations for amplification. The authors report detecting amplification in reactions containing 10^-9 parasites after DNA fragmentation. Based on the figure, at least 3 positives (as I can see because raw data is not available) out of 388 PCRs are observed at this dilution. Assuming 100,000 copies of satellite DNA per parasite, the probability of a single copy being present in a 10^-9 dilution is approximately 1/10,000. If we assume this as the probability of amplification of a PCR (an approximation), by using a simple binomial calculation, the probability of at least 3 positive reactions out of 388 is approximately 9.39 x 10^-6 (in ideal conditions, likely lower in real-world scenarios). This translates to a probability of about 1 in 100,000 to observe such frequency of positives, which is highly improbable and suggests either inaccuracies in the initial parasite quantification or issues with contamination. In addition, at 10^-6 PE/reactions (the proposed limit of quantification) it is observed that 40% of repetitions are amplified. The number of repetitions is not specified but probably more than 50 according to the graph. Such dilution implies 0.1 targets per reaction (assuming 100.000 copies divided by 10^6), which means a total of 5 target molecules to distribute among the reactions (0.1 targets multiplied by 50 reactions). It seems highly improbable that 40% of the reactions (20/50) would amplify under the described conditions. Even considering 200.000 target copies per parasite implies 0.2 targets per reaction and an average of 10 molecules to distribute among 50 reactions. The approximate probability of the observation of at least 20/50 positives can be calculated by determining the probability of a reaction to receive targets by assuming a random distribution of the targets among the tubes, p= 1 - (1 - 1/50)^10, and then by using a binomial distribution to determine the probability that at least 20 reactions receive at least one target copy. The probability of at least 20/50 positive reactions in a dilution of 10^-6 parasites (200.000 target copies per parasite) is 0.00028. Consequently, the observed result is highly unlikely.

      We disagree with the reviewer on both of these points. 

      First, the mean (S.D.) Cq values of the 10-3 PE unfragmented dataset in Figure 1 (40 replicates) and Figure 4a (88 replicates) are nearly identical at 30.02 (0.5813) and 30.21 (1.071), respectively, demonstrating a highly accurate initial quantification of parasites to make these 2 separate dilution series (reviewer’s point 1.1).  At this concentration of parasites in blood, and with unfragmented DNA, each aliquot for PCR has an equal chance of receiving some parasite DNA (hence all reactions are positive) and a reasonably good chance of receiving similar amounts of parasite DNA (the Cq values cluster with relatively low S.D.).  However further dilutions from this parasite input result in some aliquots that receive no parasite DNA and a much wider variation in the amount of parasite DNA/aliquot in samples that are positive (Cq mean (SD) of 34.47 (2.732) for 10-4 in Figure 1).  This result demonstrates that these dilution series do not follow binomial distribution as suggested by the reviewer. This is likely because each template for amplification is not independently distributed. Instead, they are known to be clustered (on individual chromosomes or chromosome fragments) in the DNA. Indeed, this observation of widely varying Cq values in dilutions below 10-3 strongly suggested this clustering and was the impetus for fragmenting the DNA (see manuscript line 209).  The impact of declustering achieved by DNA fragmentation supports this conclusion (when the DNA is fragmented, 100% of aliquots are positive at 10-4 PE, 10X less than in unfragmented samples, and the Cq values are tightly grouped (mean 33.47, S.D. 0.3358), indicating the unequal distribution of targets upon dilution, rather than counting, pipetting errors or contamination as responsible for the lack of a binomial distribution of targets with increasing dilution. Thus, when entities are clustered and can’t be fully declustered, a simple binomial (or Poisson) distribution of counts cannot be assumed in the serial dilutions.  Clustering results in more complicated distribution patterns, and it becomes difficult to predict precisely how these clusters will distribute from one dilution to the next (and thus differences in proportions of positives in different dilution series, as observed herein).

      This clustering and unequal distribution of amplification targets also addresses the reviewer’s second comment with respect to the unlikelihood of detecting at least one positive at a high dilution.  If we accept the reviewer’s estimate of 100,000 copies of target per parasite, then at 10-4 PE/aliquot - a dilution at which all aliquots are PCR positive in the fragmented samples (Figures 4a and 5b) – each aliquot would be expected to have on average 10 target sequences and the chances of detecting at least one positive reaction from 400 aliquots would be respectively 98% for the 10-7 dilution, 33% for 10-8 and 4% for 10-9 PE per aliquot. These percentages would change (increase) with a higher copy number of targets per genome, and if the targets are still clustered to some degree (which we would expect they would be even in the fragmented DNA).  Thus, the chances of detecting positive PCRs at 10-9 PE is low, but it is not “highly improbable”. 

      Taking the reviewer’s second example of the frequency of positive reactions at 10-6 PE and the assumption of 200,000 target copies per genome (referring to Fig 5B, we believe), the mean template copies per aliquot would be 0.2 at this dilution. Assuming a negative binomial distribution of the still clustered templates (although mechanically fragmented, it would be highly unlikely that they would be completely declustered), then the probability of an aliquot being positive at the 10-6 PE dilution would be 16.7%.  Our results in Figure 4A (26%) and Figure 5B (37.5%) are slightly higher but not “highly unlikely” as suggested.

      We do not know the target copy number in the parasites used to make these serial dilution profiles herein but that is certainly different from the copy number in the parasites infecting each of the hosts from which we have analyzed blood.  Thus, we do not propose that this assay can quantify the absolute parasite burden in a host nor do we see a benefit in trying to do so (see paragraph beginning line 384). Such quantification requires assumptions about not only the target copy number in the parasites in a host, but also that fragmentation is 100% efficient, and particularly, that a single or multiple blood samples accurately reflects the whole host parasite burden (clearly shown not to be the case with the data from serial bleeds presented in Figures 3 and 5). But we standby the conclusion that deep-sampling PCR when employed as presented herein, gives an accurate assessment of the presence of infection and relative parasite burden differences between hosts, and in the same hosts over time or under treatment and that the results presented are not compromised by inaccuracies in quantifying parasites for spiked samples or by sample contamination.

      (2) Lack of details on contamination detection

      Additionally, the manuscript does not provide enough details on how cross-contamination was detected or managed. It is unclear how the negative controls (NTCs) and no-template controls were distributed across plates, in terms of both quantity and placement. This omission is critical, as the low detection thresholds targeted in this study increase the risk of false positives by contamination. To ensure reliability and reproducibility, future uses of the technique would benefit from more standardized and clearly documented protocols for control placement and handling.

      We present a section in the Materials and Methods on preventing contamination and a case example when these precautions failed when preparing the dilution standards containing very high numbers of parasites. Directly responding to the reviewer, sixteen no template controls were included in every 384 well assay plate and we never obtained amplification products from those reactions. Additionally, as noted in the manuscript, uninfected macaques were negative on a collective >15,000 PCR reactions.

      We understand the concern about contamination but we believe that we have taken the appropriate precautions and our data fully support that the positives we detect are real positives, not contaminations. It would be reckless to depend on a single positive PCR reaction out of hundreds to conclude that a host is infected; multiple samples must be obtained and analyzed to be certain in such cases, as we show exhaustively with the NHP samples here.

      Rather than adding additional technical protocols such as plate layouts to this manuscript, we believe publishing a STAR Protocol or a similar detailed, step-by-step method paper would be more useful and that is our plan.

      (3) Unclear relevance for treatment monitoring in Humans

      In Figure 7A, the results suggest that the deep-sampling PCR method does not provide a clearly significant improvement over conventional qPCR in humans. Of the 9 samples tested, 6 (56%) were consistently amplified in all or nearly all reactions, indicating these samples could also be reliably detected with standard PCR protocols. Two additional samples were detected only with the deep-sampling approach, increasing sensitivity to 78%; however, these detections might be attributable to random chance given the limited sample size. While the authors acknowledge the small sample size in the discussion, they do not address the fact that a similar increase in sensitivity was reported in citation 5, where only 3 samples were tested with 3 replicates each. This raises an important question: how many PCR reactions are needed in human samples to reach a plateau in detection rates? This issue should be further discussed to contextualize the results and their implications.

      We disagree with the reviewer’s conclusion here.  First, it is not known how the “conventional” PCR would have performed in the human samples used herein as this was not done.  However, it is very likely that it would have performed significantly worse for the following reasons.  “Conventional” PCR for T. cruzi has a number of variations, but the most common approach is to mix whole blood 1:1 with a guanidine:EDTA solution, and then extract DNA for PCR from 100-300 ul of this mix.  Thus, at best, one has the equivalent of 150 ul of blood that is being analyzed for the presence of T. cruzi DNA.  In contrast, in the protocol described herein, we extract DNA from ~5 ml of blood and use aliquots from that DNA for PCR.  Thus, even before fragmenting or deep-sampling, the approach described herein is sampling 33X more blood that the conventional protocol, thus likely increasing by over 30-fold the chances of detecting parasite DNA in blood from an infected subject. The smaller the volume of blood sampled as well as the number of samples obtained greatly impact the ability to detect T. cruzi infection in some hosts.  This is clearly demonstrated in the extensive screening done in NHPs in this study and there is no reason to believe that the situation will be different in humans and dogs.  So the relevance of these enhancements are clear for any host with T. cruzi infection; humans are not unique in this regard.

      We don’t believe there will be a “plateau in detection rates”; individuals are either infected or not and the ability to detect that infection (whether with T. cruzi or any other pathogen) depends on the sensitivity of the test and the quantity of the sample available to be screened.    Perhaps what is being asked is ‘how many PCR reactions have to be performed to be sure that someone is NOT infected?’.  There is not a discrete answer to this and related questions, but by making some assumptions, one can make some estimates.  The approach described herein is approaching single copy target detection and if this is true then one would need to PCR amplify ALL of the DNA from a blood sample to assure detection of that single template copy (so for a 200ug of DNA one might obtain from 5-10 ml of blood, 1600 PCR reactions of 125 ng each; 95% and 99% confidence could be obtained with 1520 and 1584 PCRs, respectively). But any conclusion from this testing applies only to that individual blood sample and we show clearly in the NHP studies that multiple samples have to be analyzed to detect parasite DNA in hosts with very low parasite burden – some samples contain parasite DNA and others do not. Thus hundreds of negative PCRs from a single or even multiple samples is unfortunately not definitive. 

      Such limitations exist for detection of any pathogen.  A more important question for the future may be ‘is there a level of infection below which the risk of disease development is sufficiently low as to not be of concern clinically?’.  Such is the standard in drug-controlled HIV infections, for example. The improvements we document in this work provides the means to answer such questions and additional improvements may be possible as well. But to be absolutely certain that a host is not infected by T. cruzi, one would have to sample some subjects (likely a small minority of the entire pool) multiple times and perform 1000’s of PCR reactions – as we done for the most difficult to detect macaques in this study.

      Despite these limitations, this work represents a promising step forward in the development of highly sensitive diagnostic tools for T. cruzi. It offers a novel foundation for advancing the detection and monitoring of parasitemia, which could significantly benefit Chagas disease research community and clinicians focused on neglected tropical diseases. While addressing the methodological inconsistencies and improving robustness will be critical, this study provides valuable insights and data that could lead to future innovations in parasitological research and diagnostics.

      As discussed in detail above, we do not agree that this study has any methodological inconsistencies nor that it lacks robustness.

    1. eLife Assessment

      This important study suggests that the composition of the extracellular matrix in a mouse model of liver fibrosis changes depending on the cause of liver fibrosis. The data could be used as a foundation for future antifibrotic therapies. The strength of evidence is convincing with respect to the use of animal models and proteomic analysis. The study provides a helpful inventory of proteins up or down-regulated.

    2. Reviewer #1 (Public review):

      Summary:

      Jirouskova and colleagues in their study have carried out an in depth proteomic characterization of the dynamics of the liver fibrotic response and the resulting resolution in two distinct models of liver injury: CCl4-induced model of hepatotoxicity and pericentral/bridging liver fibrosis and the DDC feeding model of obstructive cholestasis and periportal fibrosis. They focussed on both the insoluble extracellular matrix (ECM) components as well as the soluble secreted factors produced by hepatic stellate cells (HSCs) and/or portal fibroblasts (PFs). They identified compartment- and time-resolved proteomic signatures in the two models with disease-specific factors or matrisomes. Their study also identified phenotypic differences between the models such as that while the CCl4-induced model induced profound hepatotoxicity followed by resolution, the DDC model induced more lasting liver damage and proteomic changes that resembled advanced human liver fibrosis favouring hepatocarcinogenesis.

      Overall, this comprehensive and very well conducted study is rigorous and well planned. The conclusions are supported by compelling studies and analyses. One caveat is the lack of mechanistic experiments to prove causality, but this can be carried out in follow-up studies.

      Strengths:

      • A major strength in the study is that the experiments are rigorous and very well conducted. For instance, the authors utilized two models of liver fibrosis to study different aspects of the pathology - hepatotoxicity vs cholestasis. In addition, 4 time points for each model were investigated - 2 for fibrosis development and 2 for fibrosis resolution. They have taken 3 components for proteomic analyses - total lysates, insoluble ECM components as well as the soluble secreted factors. Thus, the authors provide a comprehensive overview of the fibrosis and resolution process in these models.

      • Another great strength of the study is that the methodology utilized was able to dissect unique pathways relevant for each model as well as common targets. For example, the authors identified known pathways such as mTOR signalling to be differentially regulated in the CCl4 vs DDC model. mTOR signalling was increased in the DDC model that is associated with hyperproliferation. Thus showing that the approach taken is specific enough to distinguish between the two similar (both induce fibrosis) but distinct mechanisms (hepatotoxicity vs cholestasis) is a strong point of the study.

      Weaknesses:

      • A caveat of the study is that the authors have not conducted mechanistic (gain of function/loss of function) studies from any of their identified targets to truly prove causality. This remains one of the limitations of this study. Thus, future studies should investigate this point in detail. For instance, it would have been intriguing to dissect if knocking out specific genes involved in one specific model or genes common to both would yield distinct phenotypic outcomes.

    3. Reviewer #2 (Public review):

      Summary:

      The authors suggest that ECM abundance and composition change depending on the aetiology of liver fibrosis. To understand this they have investigated the proteome in two models of animal fibrosis and resolution. They suggest their findings could provide a foundation for future anti-fibrotic therapies.

      The revised version has been improved. Although some areas remain (described below), it is perhaps the dataset that will be most valuable.

      Strengths:

      The dataset appears well supported and will be valuable.

      Weaknesses:

      The manuscript is still fairly descriptive but on balance this is a useful dataset and appears to have broad support in that regard.

      There are no conclusions that can be drawn from their rebuttal regarding the human data they included as it is one patient per group and will most likely change dramatically with more patients. As such this area is still an issue but they have improved some of the data elsewhere.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Weaknesses:

      (1) The authors themselves propose in their Introduction that the "ECM-associated changes are increasingly perceived as causative, rather than consequential"; however, they have not conducted mechanistic (gain of function/loss of function) studies either in vitro or in vivo from any of their identified targets to truly prove causality. This remains one of the limitations of this study. Thus, future studies should investigate this point in detail. For instance, it would have been intriguing to dissect if knocking out specific genes involved in one specific model or genes common to both would yield distinct phenotypic outcomes.

      We agree with the reviewer that our study does not provide mechanistic verification of the function of identified targets with suggested role in the development and/or resolution of fibrosis. The current study was primarily conducted in order to identify these possible targets with focus on the identification of differences in extracellular matrix deposited in two selected models of liver fibrosis with different modes of action. To conduct further studies using knock-out/in models for verification of causality of proposed targets was at this point well beyond our intention. However, we are fully aware of the potential of identified molecules and further studies to disect their roles in liver diseases are part of future plans.

      (2) The majority of the conclusions are derived primarily from the proteomic analyses. Although well conducted, it would strengthen the study to corroborate some of the major findings by other means such as IHC/IF with the corresponding quantifications and not only representative images.

      We have now provided additional IF images and their quantifications in accordance with the Reviewer’s suggestions to our major MS findings to strenghten the significance of the MS data (see detailed answer below).

      Reviewer #2:

      Weaknesses:

      (1) As it currently stands, the data, whilst extensive, is primarily focussed on the proteomic data which is fairly descriptive and I am not clear on the additional insight gained in their approach that is not already detailed from the extensive transcriptomic studies. The manuscript overall would benefit from some mechanistic functional insight to provide new additional modes of action relevant to fibrosis progression.  

      We agree with the reviewer that our study could initially appear descriptive. However, this characteristics is inherent to most omics studies, which tend to provide hypothesis-free testing of a large number of analytes in order to find a multitude of candidate biomarkers(1). Importantly, we believe our study provides insights that go beyond the scope of previously published transcriptomic analyses.

      Specifically, our work focuses on compartment-specific changes in the liver proteome, with an emphasis on the extracellular matrix (ECM) composition and alterations in protein solubility—features that cannot be captured by transcriptomic studies. The matrisome is more than a structural scaffold; it functions as a reservoir for secreted factors, including growth factors and cytokines, which modulate the local cellular microenvironment. Transition dynamics between the insoluble matrisome and soluble protein pools influence the signaling capabilities and bioavailability of these factors. Moreover, fibrous ECM assemblies directly impact tissue mechanics, providing cells embedded within the matrix with spatially distinct biochemical and biomechanical contexts. The current understanding of matrisome composition in the context of specific liver disease etiologies is limited. Dr. Friedman, in his 2022 review on hepatic fibrosis, highlights the unmet need to elucidate etiology-specific protein signatures of the cirrhotic liver matrisome, which could serve as disease staging or prognostic biomarkers(2). Our study addresses this gap by characterizing the distinct matrisome profiles associated with hepatotoxic- versus cholestasis-driven liver injury. We believe our findings lay the groundwork for identifying etiology-specific biomarkers and potential therapeutic targets for antifibrotic interventions, offering a novel layer of insight beyond what transcriptomic data alone can provide.

      (2) Whilst there is some human data presented it is a minimal analysis without quantification that would imply relevance to disease state. Although studying disease progression in animals is a fundamental aspect of understanding the full physiological response of fibrotic disease, without more human insight makes any analysis difficult to fulfil their suggestion that these targets identified will be of use to treat human disease.

      We thank the reviewer for this comment. Our study primarily focuses on utilizing animal models to explore the fundamental physiological processes underlying the development and resolution of fibrotic liver disease. To address the translational relevance of our findings, we concentrated on clusterin, one of the key target proteins identified during our analysis of the insoluble proteome. Specifically, we investigated its localization in human liver samples, focusing on its association with collagen deposits (Figure 6F). To this end, we analyzed human liver samples of diverse etiologies and varying degrees of fibrotic damage, including samples representing four distinct stages of HCV-induced fibrosis (Figure 6F, lower panel). While this analysis highlights the presence and localization of clusterin in fibrotic deposits, we acknowledge that our study does not include extensive quantification or mechanistic insight into clusterin's role in human liver fibrosis. We believe that the data presented in this manuscript provide a valuable foundation for future investigations into clusterin’s involvement in liver fibrosis across different etiologies. Recognizing the translational importance of this work, we have already initiated a prospective study involving human patients, which aims to conduct a more comprehensive analysis of clusterin's function and its potential as a therapeutic target.

      To further support our findings on clusterin's role in fibrosis development and resolution and to address the reviewer's concern, we quantified clusterin deposits in the available human samples representing four distinct stages of HCV-induced fibrotic disease. Using immunofluorescence (IF) images at a 20x field of view, we measured both clusterin and collagen deposits to illustrate changes in clusterin abundance during fibrosis progression (stages F1–F4) in relation to collagen deposition dynamics. The quantified data have been included for the reviewer's consideration (Figure 1). However, it is important to emphasize that this quantification was conducted on a single human sample per fibrotic stage, which limits the statistical robustness of the analysis. A more comprehensive evaluation involving additional patient samples would be necessary for a more definitive conclusion. For this reason, we propose to include these results solely in our rebuttal letter and to incorporate a more extensive analysis in our intended follow-up study, where larger cohorts will allow for a thorough investigation of clusterin's role in human liver fibrosis.

      Author response image 1.

      Dynamics of clusterin abundance with the development of HCV-induced fibrotic disease in comparison to the changes in collagen deposits. IF images of human liver sections from different stages of chronic HCV infection were immunolabeled for clusterin and collagen 1. Clusterin- and collagenpositive (<sup>+</sup>) areas (as %) from three to eight fields of view (20x objective) were evaluated for each fibrosis stage (F1-F4). 

      (3) Some of the terminology is incorrect while discussing these models of injury used and care should be taken. For example - both models are toxin-induced and I do not think these data have any support that the DDC model has a higher carcinogenic risk. An investigation into the tumour-induced risk would require significant additional models. These types of statements are incorrect and not supported by this study.

      We are grateful to the reviewer for drawing our attention to the incorrect use of the term "toxin-induced". In two instances, where the wording was incorrect, we have corrected the term to hepatotoxin-induced as it was originally intended. While we believe that our proteomic signature data and identified signaling pathways suggest a potential carcinogenic risk associated with the cholestatic, but not the hepatotoxic model, we have toned down the statements on this issue in the article to respect the reviewer's perspective. These changes, which are highlighted in the track changes mode of the article, aim to make the conclusions of the study more precise and thus improve the clarity of our conclusions.

      Reviewer #1 (Recommendations for the authors): 

      (1) In the Discussion, the authors could consider pointing out that one limitation of the study is a lack of mechanistic (gain of function/loss of function) studies either in vitro or in vivo from any of their identified targets to truly prove causality. 

      As noted earlier, we fully agree with both reviewers that a limitation of this study is its descriptive nature, which is an inherent characteristic of omics-based research. In our manuscript, we aimed to "determine compartment-specific proteomic landscapes of liver fibrosis and delineate etiology-specific ECM components," with the overarching goal of providing a foundation for future antifibrotic therapies.

      The insights gained from our study will indeed serve as a critical basis for subsequent research, where we will prioritize mechanistic investigations to elucidate the roles of the identified targets. While we acknowledge the importance of gain- or loss-of-function studies to establish causality, we believe this falls outside the primary scope of the current manuscript. Instead, we envision these mechanistic approaches as key elements of our future research efforts. For this reason, we feel it is not necessary to further expand on this limitation in the current discussion.

      (2) The majority of the conclusions are derived primarily from the proteomic analyses. Although well conducted, it would strengthen the study to corroborate some of the major findings by other means such as IHC/IF with the corresponding quantifications and not only representative images. For example, the IF stainings for ECM1 should also be quantified - ECM1. 

      To strengthen our MS findings on ECM1 expression and to address the reviewer's concern, we have now included quantification of ECM1 using IF staining at selected time points in Figure S7E and we refer to these data in the Results section (p. 12 of the current manuscript). The IF quantification data correspond well to the MS data showing increase in ECM1 expression with fibrosis development and decline with partial fibrosis resolution.

      (3) S1 - it would be important to show Sirius Red images over the time course, especially for CCl4 T4 where fibrosis resolution is occurring. Proteomics data also show this group clusters more closely with control mice and seeing a representative image would add further credibility to this point. 

      Requsted Sirius Red images are now part of the Figure S1B, documenting partial fibrosis resolution and overall parenchyma healing in T4 in both models.

      (4) How comparable are the periods of the two models? 2 weeks in one model may not be the same as 2 weeks in the other depending on the severity of the pathogenesis. 

      We appreciate the reviewer’s comment regarding the comparability of time points between the two models. Indeed, the temporal dynamics of fibrosis development differ between the models employed in our study, and we have carefully considered this aspect to ensure the validity of our comparative analysis. To address this, we started our comparisons at a stage corresponding to the onset of fibrosis in each model. Specifically, quantification of Sirius Red-positive areas, indicative of collagen deposition (Figure S1B), revealed that 2 weeks of DDC treatment produced a comparable extent of fibrosis to that observed after 3 weeks of CCl₄ treatment. This point was designated as the initial fibrosis time point (T1, Figure S1B), from which further treatment was applied to induce more advanced fibrosis. This approach allowed us to standardize the comparison of fibrosis progression between the two models.

      (5) Figure 4A-D - cell-type-specific signatures should be corroborated by actual IHC or IF stainings if possible. HNF4a (hepatocytes), CK19 (cholangiocytes), aSMA (activated fibrogenic HSCs), immune cells (B220, F4/80, Cd11b, CD11c etc).

      We thank the reviewer for this valuable suggestion. To strengthen our analysis, we have now complemented the box plots of cell type-specific signatures derived from the MS data (Figure 4A-D) with immunofluorescence (IF) staining, which has been included in the Supplemental Data (Figure S6). Specifically, we provide representative IF images from control and T1-T4 time points for each model, documenting the changes in abundance with treatment in:

      A) Hepatocytes (HNF4α), activated hepatic stellate cells (αSMA), and cholangiocytes (CK19).

      B) Immune cell populations, including B cells (B220) and macrophages/monocytes/Kupffer cells (F4/80), as these immune cell groups were not only identified in our MS analysis but also have established roles in the selected models(3, 4, 5). 

      The representative images shown in Figure S6 show the dynamics of the cellular populations in each of the models, which correspond well with the MS data (compare Figures 4A-D and S5). These additional data further validate our findings and enhance the robustness of our conclusions.

      References:

      (1) Thiele M, Villesen IF, Niu L, et al. Opportunities and barriers in omics-based biomarker discovery for steatotic liver diseases. J Hepatol 2024;81:345-359.

      (2) Friedman SL, Pinzani M. Hepatic fibrosis 2022: Unmet needs and a blueprint for the future. Hepatology 2022;75:473-488.

      (3) Best J, Verhulst S, Syn WK, et al. Macrophage Depletion Attenuates Extracellular Matrix Deposition and Ductular Reaction in a Mouse Model of Chronic Cholangiopathies. PLoS One 2016;11:e0162286.

      (4) Aoyama T, Inokuchi S, Brenner DA, et al. CX3CL1-CX3CR1 interaction prevents carbon tetrachlorideinduced liver inflammation and fibrosis in mice. Hepatology 2010;52:1390-400.

      (5) Yang W, Chen L, Zhang J, et al. In-Depth Proteomic Analysis Reveals Phenotypic Diversity of Macrophages in Liver Fibrosis. J Proteome Res 2024;23:5166-5176.

    1. eLife Assessment

      This valuable study suggests that capsaicin nanoparticle administration in rats activates the transcription factor Nrf2 by directly binding to its repressor, KEAP1, leading to the induction of cytoprotective genes and preventing alcohol-induced gastric damage, offering a potential avenue for treating alcoholism-related gastric disorders. Although improvements were made following the first revision, the evidence supporting capsaicin as an Nrf2 activator remains incomplete, as some methodological aspects still require revision and the interpretation of key data needs further clarification.

    2. Reviewer #1 (Public review):

      The paper by Gao et al. describes the effect of capsaicin on the NRF2/KEAP1 pathway. The authors carried out a set of in vitro and in vivo experiments that addressed the mechanisms of the protective effect of capsaicin on ethanol-induced cytotoxicity.

      The authors conclude that capsaicin activates NRF2, which leads to the induction of cytoprotective genes, preventing oxidative damage. The paper shows that capsaicin may directly bind to KEAP1 and that it is a noncovalent modification of the Kelch domain.

      The authors also designed new albumin-coated capsaicin nanoparticles, which were tested for the therapeutic effect in vivo.

      I appreciate the authors' experimental efforts to strengthen the study's conclusions. However, in my opinion, the paper is still not fully technically sound, which weakens the strength of the evidence.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper the authors wanted to show that capsaicin can disrupt the interaction between Keap1 and Nrf2 by directly binding to Keap1 at an allosteric site. The resulting stabilization of Nrf2 would protect CAP-treated gastric cells from alcohol- induced redox stress and damage as well as inflammation (both in vitro and in vivo)

      Strengths:

      One major strength of the study is the use of multiple methods (CoIP, SPR, BLI, deuterium exchange MS, CETSA, MS simulations, target gene expression) that consistently show for the first time that capsaicin can disrupt the Nrf2/Keap1 interaction at an allosteric site and lead to stabilization and nuclear translocation of Nrf2.<br /> Moreover, efforts to show causal involvement of the Keap/Nrf2 axis for the made cellular observations as well as addressing potential off target effects of the polypharmacological CAP appreciated.

      One point that still hampers a bit of full appreciation of the capsaicin effect in cells is that capsaicin is not investigated alone, but mostly in combination with alcohol only.<br /> Moreover, the true add-on value of the developed nanoparticles remains obscure.<br /> The partly relatively high levels of NRF2 in putatively unstressed cells question the validity of used models.

      The rationale for switching between different CAP concentrations is unclear /not entirely convincing.

      The language and introduction could be improved.

      Overall, the authors are convinced that capsaicin (although weakly) can bind to Keap1 and releases Nrf2 from degradation, with relevance for biological settings. With this, the authors provide a significant finding with marked relevance for the redox/Nrf2 as well as natural products /hit discovery communities.

      - Figure 2C: It is still not clear why naïve (unstressed /untreated cells) already show rather high nuclear abundance of Nrf2 (shouldn´t Nrf2 be continuously tagged for degradation by Keap1)<br /> - Figure 2G-H: Why switch to rather high concentrations?<br /> - Figure 2I: in the pics of mitochondria the control mitochondria look way more punctuated (likely fissed) than the ones treated with EtOH or EtOH + CAP. Wouldn´t one expect that EtOH leads to mitochondrial fission and CAP can prevent it?<br /> - Figure 3H: High basal Nrf2 levels in unstressed/untreated HEK WT cells, why?<br /> - Figure 4a: Inclusion of an additional Keap1 binding protein (one with a ETGE motif) would have been desirable (to get information on specificity/risks of off-target (unwanted) effects of CAP)<br /> - Figure 4D: Why is there no stabilization of Nrf2 by CAP in lane 2 ?<br /> - Figure 4f: 5% DMSO is a rather high solvent concentration , why so high (the solvent alone seems to have quite marked effects !)<br /> - Figure 6/7: not expert enough to judge formulations and histology scores. However, the benefit of the encapsulated capsaicin does not become entirely clear to me, as CAP and IRHSA@CAP mostly do not significantly differ in their elicited response.<br /> - Figure 7: Rebamipide was introduced as positive control in the text with an activating effect on Nrf2, but there is no induction of hmox and nqo in Figure 7f, why? It does not look as the positive control was wisely chosen.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major concerns:

      For studies investigating capsaicin binding to KEAP1, the authors used capsaicin concentrations that are toxic to cells (Figures S1D and 4F, G). In vivo studies were performed only in 3 rats per group. The T-test was used for the comparison of more than two groups. Given the well-known issues with the specificity of the NRF2 antibody, the authors should provide appropriate controls, especially for IF and IHC staining.

      We sincerely appreciate your valuable comments. We repeated the experiments about CCK8 (Figure S1d) and Pull-down (Figure 4g), and then updated the results. In September 2022, GES-1 cells were more sensitive to capsaicin (CAP) because Gibco serum from North America was used. Later, in 2024, we changed the serum from Australia(Gibco: 10099-141), and we found that such GES-1 cells raised better, so we re-ran the test, and the IC50 was seen to be 304.8 μM, so concentrations used in this paper has no obvious toxicity to cells. What’s more, we repeated the Pull-down experiment with more reasonable concentrations of 32 μM and 100 μM, and the results were still in line with expectations. In summary, we concluded that the effect of CAP on GES-1 cells is closely related to the cell state, and that treatments of CAP from 32 to 100 μM can hinder the interaction between NRF2 and the Kelch domain of KEPA1. What’s more, at the cellular level, the experimental concentration of CAP was not more than 32 μM, which is a relatively safe concentration for cells.

      Thank you very much for your comments. We also pay attention to using more repetitions to increase the reliability of the experimental results in animal experiments. Therefore, recently we supplemented the experiment of Nfe2l2Knockout mice in Figure 9 (6 mice per group). Additionally, thank you very much for your comments on the use of T-test analysis, we reviewed the statistics and changed them by one-way ANOVA.

      Finally, thanks to your concern about the specificity of NRF2 antibody, we used commercialized NRF2 antibody which have been KO/KD validated (Cat No. 16396-1-AP, Proteintech) and can be used for IF and IHC staining. Each of our fluorescence result was equipped with Western Blotting in its active form at the size of 105-110 KDa for statistical analysis, the trend was consistent with the experimental results of IF and IHC, which fully proves the correctness of the results presented (Figure 2c and Figure S8j).

      Reviewer #2 (Public Review):

      Weaknesses:

      One major weakness of the study is that plausibility is taken as proof for causality. The finding that capsaicin directly binds to Keap1 and releases Nrf2 from its fate of degradation (in vitro) is taken for granted as the sole explanation for the observed improved gastric health upon alcohol exposure (in vivo). There is no consideration or exclusion of any potential unrelated off-target effect of capsaicin, or proteins other than Nrf2 that are also controlled by Keap1. 

      Another point that hampers full appreciation of the capsaicin effect in cells is that capsaicin is not investigated alone, but mostly in combination with alcohol only.

      Thank you very much for this comment. In the introduction, we clarified as follows: “Currently, experiments conducted in rats have demonstrated that red pepper/capsaicin (CAP) had significant protective effects on ethanol-induced gastric mucosal damage, and the mechanism may be related to the promotion of vasodilation(6,7), increased mucus secretion(8) and the release of calcitonin gene-related peptide (CGRP)(9,10). However, it is noteworthy that whether the antioxidant activity of CAP works has not been fully investigated.” Therefore, we also recognize that CAP does not exert its effects through the KEAP1-NRF2 pathway alone. Your advice is very useful. We further explored the TRPV1 and DPP3 to detect the potential off-target effects of CAP respectively. Capsazepine (CAPZ), which is TRPV1 receptor antagonist did not affect the protection of CAP against GES-1 (Fig S4f and S4g), which may indicate that CAP activation of NRF2 does not have to depend on TRPV1. The binding of CAP with DPP3, containing an ETGE motif and can bind to KEPA1, was detected by BLI, and we found that the K<sub>D</sub> between CAP and DPP3 was 1.653 mM(>100 μM), which may indicate the potential off-target effect of CAP is low because CAP had a strong binding force with KEAP1 about 31.45 μM (Fig S4h and S4i).

      Thank you very much for the comment of another point. Multiple experiments have shown that CAP significantly up-regulates NRF2 in the presence of additional stimuli such as EtOH (Figure 1i),  H<sub>2</sub>O<sub>2</sub> (Figure 1l), PS-341(Figure 2e) and DTT (Figure 4d), which pattern is consistent with our understanding of allosteric regulation and as expected. Especially for the experiments of PS-341 and DTT, we had a group that only adds CAP, and it can be seen that the addition of CAP alone did not significantly up-regulate NRF2, which is completely different from traditional NRF2 activators (especially artificially designed covalent binding peptides which have serious side effects).  

      Reviewer #3 (Public Review):

      Weaknesses:

      While the study provides valuable insights into the molecular mechanisms and in vivo effects of CAP, further clinical studies are needed to validate its efficacy and safety in human subjects. The study primarily focuses on the acute effects of CAP on ethanol-induced gastric mucosa damage. Long-term studies are necessary to assess the sustained therapeutic effects and potential side effects of CAP treatment.

      Furthermore, the study primarily focuses on the interaction between CAP and the KEAP1-NRF2 axis in the context of ethanol-induced gastric mucosa damage. It may be beneficial to explore the broader effects of CAP on other pathways or conditions related to oxidative stress. CAP has been known for its interaction with the Transient Receptor Potential Vanilloid type 1 (TRPV1) channel and subsequent NRF2 signaling pathway activation. Those receptors are also expressed within the gastric mucosa and could potentially cross-react with CAP leading to the observed outcome. Including experiments to investigate this route of activation could strengthen the present study.

      While the design of CAP nanoparticles is innovative, further research is needed to optimize the nanoparticle formulation for enhanced efficacy and targeted delivery to specific tissues.

      Addressing these weaknesses through additional research and clinical trials can strengthen the validity and applicability of CAP as a therapeutic agent for oxidative stress-related conditions.

      Thank you very much for these suggestions. We also believe that CAP is very valuable and promising for protecting EtOH induced gastric mucosal injury, and actively promote patent applications and if conditions permit, longer drug research for biosecurity is essential. Because of the inherently new discovery of the binding of CAP and KEAP1, and the important role of NRF2 in various oxidative stress-related diseases, we used Human umbilical cord mesenchymal stem cells (HUC-MSCs) and  H<sub>2</sub>O<sub>2</sub> to explore the potential broader effects of CAP related to oxidative stress in cells (Figure 1l and 1m). At the same time, we also explored TRPV1 related experiments, and we were surprised to find that inhibiting TRPV1 did not affect the effect of CAP (Supplementary Figure 4f and 4g). We hope that more people can read this article and do more interesting research together.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      Although this study has been conducted in rats, a direct proof that albumin-coated capsaicin nanoparticles act through activation of Nrf2 in protecting gastric mucosa against alcohol toxicity could be well conducted in commercially available Nrf2-deficient mice.

      Thank you very much for your suggestion and the comment is very constructive for us to improve this paper. We purchased Nrf2-deficient mice (Cat. NO. NM-KO-190433) and performed experiments, and the results showed that knockout mice with Nrf2 were more sensitive to EtOH and the effects of CAP were partially eliminated (Figure 9), which further validated the role of Nrf2-related signaling pathway in EtOH-induced gastric mucosal injury and the therapeutic effect of CAP.

      Reviewer #1 (Recommendations For The Authors):

      Minor concerns include proofreading the paper. Actinomycin is not an inhibitor of translation.

      Thank you for your comment. We have revised “Actinomycin” to “Cycloheximide”.

      Reviewer #2 (Recommendations For The Authors):

      - Please have a careful look at your conclusions: just because two effects happen at the same time and may be plausible explanations for each other, it does not mean that they are really in a causative relationship in your given test system (unless unambiguously proven by additional experiments).

      Your suggestions are very constructive for us to improve this paper.

      We further discussed the role of capsaicin with TRPV1, DPP3 and Nrf2deficient mice, hoping to make our conclusions more credible to some extent. 

      - You may want to frankly discuss other targets of capsaicin (e.g. the TrpV1 receptor) that possibly could also account for your observations, and that binding to Keap1 not only releases Nrf2 from proteasomal degradation.

      Thank you for your comment. As a result, we further explored the TRPV1 and DPP3 to detect the potential off-target effects of CAP respectively. Capsazepine (CAPZ), which is TRPV1 receptor antagonist does not affect the protection of CAP against GES-1 (Fig S4f and S4g). DPP3 with an ETGE motif was detected by BLI, and we found that the K<sub>D</sub> between CAP and DPP3 was 1.653 mM, which may indicate the potential off-target effect of CAP is low (Fig S4h and S4i). At the same time, the activation of NRF2 by non-classical pathways such as CAP regulation of DPP3 or other proteins also deserves more discussion and experimental verification.

      - For Figure 1G it does not become entirely clear what has been done (and thus deduction of conclusions is hampered).

      Thank you for your comment. Network targets analysis (Figure 1g) was performed to obtain the potential mechanism of effects of CAP on ROS. Biological effect profile of CAP was predicted based our previous networkbased algorithm:drug CIPHER. Enrichment analysis was conducted based on R package ClusterProfiler v4.9.1 and pathways or biological processes enriched with significant P value less than 0.05 (Benjamini-Hochberg adjustment) were remained for further studies. Then pathways or biological processes related to ROS and significantly enriched were filtered and classified into three modules, including ROS, inflammation and immune expression. Network targets of CAP against ROS were constructed based on above analyses, and finally we combined proteomics to determine the research idea of this paper

      -  Figure 1L: is there a reason/explanation why UC.MSC needs a comparably very high concentration of capsaicin.

      Thank you for your comment. Because the experimental results of 8 μM and 32 μM on this cell were more stable, and the activation effect of NRF2 downstream was more obvious.

      -  Figure 2C: it is surprising that naïve (unstressed /untreated cells) already show a rather high nuclear abundance of Nrf2 (shouldn´t Nrf2 be continuously tagged for degradation by Keap1).

      Thank you for your comment. This is a real experimental result, and we have found in many experiments that the untreated group can also show NRF2 when immunoblotting. We think that this phenomenon may be related to the cell state at that time.

      -  Figure 2E: the claim of synergy between CAP and the proteasome inhibitor is not justified with this single figure.

      Thank you for your comment. Multiple experiments have shown that CAP significantly up-regulates NRF2 in the presence of additional stimuli such as EtOH (Figure 1i),  H<sub>2</sub>O<sub>2</sub> (Figure 1l), PS-341 (Figure 2e) and DTT (Figure 4d), which pattern is consistent with our understanding of allosteric regulation and as expected. However, this synergy does warrant more research.

      -  CHX is cycloheximide (in the main text it is referred to as actinomycin).

      Thank you very much for your comment. We have revised “Actinomycin” to “Cycloheximide”.

      -  Figures 2G-H: why switch to rather high concentrations? Is it due to the overexpression of Keap1?

      Thank you for your comment. At the time of this part of the experiment, we had obtained in vitro data on the interaction of CAP and the Kelch domain of KEAP1 (about 32 μM). To keep the results uniform and valid, we chose a relatively higher concentration.

      -  Figure 2I: in the pics of mitochondria the control mitochondria look way more punctuated (likely fissed) than the ones treated with EtOH or EtOH + CAP. Wouldn´t one expect that EtOH leads to mitochondrial fission and CAP can prevent it?

      Thank you for your comment. MitoTracker® Red CMXRos (M9940, Solarbio, China) is a cell-permeable X-rosamine derivative containing weakly sulfhydryl reactive chloromethyl functional groups that label mitochondria. This product is an oxidized red fluorescent stain (Ex=579 nm, Em=599 nm) that simply incubates the cell and can be passively transported across the cell membrane and directly aggregated on the active mitochondria. Therefore, red does not represent broken mitochondria, but active mitochondria. Quantitative analysis of the mean branch length of mitochondria was calculated using MiNA software (https://github.com/ScienceToolkit/MiNA) developed by ImageJ.

      -  Figure 3C: figure legend is somewhat poor.

      Thank you for your comment. We have revised: “KEAP1-NRF2 interaction was detected with Surface plasmon resonance (SPR) in vitro.”

      -  Figure 3E: given that CAP disrupts Nrf2/Keap1- PPI, why is there no Nrf2 stabilization seen in the fourth lane (input/lysate)?

      Thank you for your comment. The fourth lane may promote the degradation of NRF2 due to overexpression of KEAP1.

      -  Figure 3H: high basal Nrf2 levels in unstressed/untreated HEK WT cells, why?

      Thank you for your comment. This is a real experimental result, and we have found in many experiments that the untreated group can also show NRF2 when immunoblotting in 293T cells. We think that this phenomenon may be related to the cell state at that time.

      -  Figure 3G/I: this data suggests to me that the alcohol-mediated toxicity is Keap1-dependent (rather than the protection by CAP), doesn´t it?

      Thank you for your comment. We can see that KEAP1-KO cells had a high expression of NRF2, which was also in line with our expectations, and EtOH-induced GES-1 damage may be closely related to oxidative stress.

      -  Figure 4a: the inclusion of an additional Keap1 binding protein (one with an ETGE motif) would have been desirable (to get information on specificity/risks of off-target (unwanted) effects of CAP). 

      Thank you for your comment. DPP3 with an ETGE motif was detected by BLI, and we found that the K<sub>D</sub> between CAP and DPP3 was 1.653 mM, which may indicate the potential off-target effect of CAP is low (Fig S4h and S4i).

      -  Figure 4D: why is there no stabilization of Nrf2 by CAP in lane 2 ? How can the DTT-mediated boost on Nrf2 levels be explained?

      Thank you for your comment. Multiple experiments have shown that CAP significantly up-regulates NRF2 in the presence of additional stimuli such as EtOH (Figure 1i),  H<sub>2</sub>O<sub>2</sub> (Figure 1l), PS-341 (Figure 2e) and DTT (Figure 4d), which pattern is consistent with our understanding of allosteric regulation and as expected. However, this synergy does warrant more research.

      -  Figure 4f: 5% DMSO is a rather high solvent concentration, why so high (the solvent alone seems to have quite marked effects).

      Thank you for your comment. Because our maximum concentration was set relatively high, we have also recognized relevant problems and resupplemented the more critical Pull-down experiment (Figure 4g). The current DMSO of 0.2% had no effect on the experimental results.

      -  Figure 5: it should be described in the figure legend which mutant is used. Based on the previous data, I would expect an investigation of mutants carrying amino acid exchanges at the newly identified allosteric site.

      Thank you for your comment. The mutated version involved substitutions at residues Y334A, R380A, N382A, N414A, R415A, Y572A, and S602A (the orthostatic site), which are residues reported to engage NRF2 and classic Keap1 inhibitors. The exploration of newly discovered allosteric sites is worthy of further study.

      -  Figure 6/7: I am not expert enough to judge formulations and histology scores. However, the benefit of the encapsulated capsaicin does not become entirely clear to me, as CAP and IRHSA@CAP mostly do not significantly differ in their elicited response.

      Thank you for your comment. On the one hand, nanomedicine improves the safety of administration: it helps to reduce the intense spicy irritation of CAP itself when administered in the stomach; On the other hand, the dosage of drugs is reduced to a certain extent to achieve better therapeutic effect.

      -  Figure 7: rebamipide was introduced as positive control in the text with an activating effect on Nrf2, but there is no induction of hmox and nqo in Figure 7f, why?

      Thank you for your comment. The effect of addition of positive control drug (Rebamipide) on NRF2 activation is not the focus of this paper. We speculate that the transcription and translation of related genes may not be completely synchronized when Rebamipide was taken at the same time.

      -  Figure 8: the CAP effect on inflammation is visible, however, a clear causal connection between ROS/Nrf2/KEap1 is not given in the presented experiments.

      Thank you for your comment. The simple mechanics of this paper are illustrated in the Graphic diagram. The activation of NRF2 exerts both antiinflammatory and antioxidant functions, which has been reported in many articles, but the causal relationship is still open to exploration.

      Points related to presentation:  

      -  The data with the encapsulated CAP appear a little as a sidearm that does not bolster your main message (maybe take out and elaborate on this topic more extensively in another manuscript).

      -  Revise the introduction on the Nrf2 signaling pathway as it is written at the moment, someone outside the Nrf2 field might have trouble understanding it.

      -  The use of language requires proofreading and revision.

      Thank you for your comment. We rearranged and proofread it.

      Reviewer #3 (Recommendations For The Authors):

      Overall, the manuscript is well-written and the results are presented in a concise and comprehensible manner.

      Some recommendations on the experimental evidence and further suggestions:

      • The authors should state how they assessed the distribution of the data. Description of data with mean and standard deviation as well as comparisons between different groups with t-test assumes that the underlying data is normally distributed.

      Your suggestions are very constructive for us to improve the paper.  The differences in the mean values between the two groups were analyzed using the student’s t-test, while the differences among multiple groups were analyzed using a one-way ANOVA test in the GraphPad Prism software.

      Therefore, we checked and proofread the statistical analysis.

      • Additional experiments further characterising and validating the activation of CAP via direct KELCH1-binding could include parallel experiments with similar agonists like dimethyl fumarate. It would be interesting to know how CAP activation compares to DMF activation.

      Thank you very much for your comment. We believe that the activation of NRF2 by DMF has been widely reported and well-studied, so we did not purchase this drug for comparative study here. If it can be promoted clinically in the future, we may consider comparing with DMF.

      • Also, the knock-down of NRF2 would be a suggested experiment to do because it rules out that the benefit of CAP is independent of KEAP1-NRF2 binding and activation.

      Thank you very much for your suggestions. We purchased Nrf2-deficient mice and performed experiments, and the results showed that knockout mice with Nrf2 were more sensitive to ethanol and the effects of CAP were partially eliminated (Figure 9), which further validated the role of Nrf2-related signaling pathway in alcohol-induced gastric mucosal injury and the therapeutic effect of CAP.

      Some corrections on text and figures:

      • Figure 1b: incorrect spelling of DNA stain. Should be Hoechst33324.

      Thank you very much for your comment. We have revised.

      • Figure 1c: don't put the label inside the plot.

      Thank you very much for your comment. We have revised.

      • Figure 1d: choose less verbose axes titles (this also applies to other figures).

      Thank you very much for your comment. We have revised.

      • Figures 1e and 1f: please state the units.

      Thank you very much for your comment. The enzyme activity of SOD and the content of MDA were compared with that of the control group.

      • Heading 2.2: NRF2-ARE instead of NRF-ARE.

      Thank you very much for your comment. We have revised.

      • Line 118: missing expression after immune.

      Thank you very much for your comment. We have revised.

      • Figure 1g: names of proteins are not readable.

      Thank you very much for your comment. We have revised.

      • Line 120: You performed transcriptomic analyses to identify differentially expressed GENES not proteomic.

      Thank you very much for your comment. This part of the work we do is proteomics.

      • Line 122: Fold change should be stated in both directions, i.e. absolute FC like |FC| > 1. Or did you select only upregulated DEGs? Is it not log2 FC?

      Thank you very much for your comment. We have revised.

      • Figure 1h (and Supplementary Figure 1a): Missing heatmap legend for FC.

      What do the colors show? Sample (column) description missing.

      Thank you very much for your comment. We used red to indicate up-regulation, blue to indicate down-regulation, and the vertical coordinate on the right side were antioxidant genes such as GSS and SOD1, respectively, and the proportion between the treatment group and the model group (CAP + EtOH/EtOH) had been calculated and labeled.

      • Line 145: A Western blot is not a proteomic analysis.

      Thank you very much for your comment. We have revised: “Concurrently, the elevated expression levels of GSS and Trx proteins, which were also downstream targets of NRF2, further validated by western blotting (Figure 1j).”

      • Supplementary Figure 2e-j: expression fold change is not the right quantity. The signal of the actual protein was quantified. And what are you comparing to with the statistics? The stars on one bar are not clear.

      Thank you very much for your comment. The expression level of this part was normalized compared with that of the control group. The significance differentiation analysis is compared with the model group.

      • What was the concentration of  H<sub>2</sub>O<sub>2</sub> used?

      Thank you very much for your comment. 200 μM  H<sub>2</sub>O<sub>2</sub> was used.

      • Figure 2d: use a more precise y-axis label.

      Thank you very much for your comment. We do want to compare the amount of NRF2 entering the nucleus, so the relative expression is compared to the internal reference

      • Figure 2g: missing molecular weight markers.

      Thank you very much for your comment. Since the ubiquitination modification is a whole membrane, and only marking the size of HA and GAPDH is not beautiful enough here.

      • Line 221: lactate is the endproduct of the anaerobic glycolytic pathway.

      Thank you very much for your comment. We have revised.

      • Supplementary Figure 3d: should it be PKM2 (instead of PKM) and LDHA (instead of LDH). Should fit with the text in the manuscript.

      Thank you very much for your comment. We have revised.

      • Supplementary Figures 3 e-f: brackets in y-axis labels are too bold.

      Thank you very much for your comment. We have revised.

      • Figures 3a and b. Brackets should only be used if two conditions are being compared statistically. Remove the one line with ns as it could imply that you have compared the first with the last condition only.

      Thank you very much for your comment. We have revised.

      • Consistent labeling of kDa in figures (no capital K in KDa).

      Thank you very much for your comment. We have revised.

      • Figure 4a. Move kDa on top of 70.

      Thank you very much for your comment. We have revised.

      • Figure 3 g-h: Why 2% EtOH. Used 5% previously?

      Thank you very much for your comment. Because here we changed the 293T cell line, 5% EtOH concentration is too high on this cell.

      • Supplementary Figure b-e: correct typo in y-axis label: expression.

      Thank you very much for your comment. We have revised.

      • Figure 4a: correct x-axis label for temperature unit. Too bold. Not readable.

      Add a clear label and unit for y-axis.

      Thank you very much for your comment. We have revised.

      • Figure 4 b-c: should have a legend explaining colors.

      Thank you very much for your comment. Our Figure legend already contains the meaning of colors: “(b) Computational docking of CAP molecule to KEAP1 surface pockets. The Keap1 protein is represented in gray, while the CAP molecule is shown in yellow. The seven key amino acids predicted to be crucial for the interaction are highlighted in blue. (c) Partial overlap of CAPbinding pocket with KEAP1-NRF2 interface. The KEAP1-NRF2 interaction interface is represented in purple.”

      • Supplementary Figure 5a. Add axis units.

      Thank you very much for your comment. We have revised.

      • Figure 4e: Missing b ions value for number 19.

      Thank you very much for your comment. This part is not missing, but corresponds to 19 of y ions.

      • Figure 7f: adjust brackets - they are too bold.

      Thank you very much for your comment. We have revised.

      • Supplementary Figure 8b-i: labels not readable. c should be spleen.

      Thank you very much for your comment. We have revised.

      • Line 787: specify BH adjustment to Benjamini-Hochberg.

      Thank you very much for your comment. We have revised.

      • Check spelling of µl throughout the Methods section e.g. line 854 - shouldn't be "ul".

      Thank you very much for your comment. We have revised.

      • Line 974: correct spelling of species names: E. coli should be in italics.

      Thank you very much for your comment. We have revised all of these corrections on text and figures. For me, the writing of papers will be more rigorous and careful in the future.

    1. eLife Assessment

      This fundamental study reports the effects of the psychedelic drug psilocin on iPSC-derived human cortical neurons, analyzing different aspects of structural and functional neuronal plasticity. The evidence is convincing, integrating a comprehensive characterization of 5-HT2A expression and its subcellular distribution upon treatment with psilocin at different time points. The study supports the value of using iPSC-derived human cortical neurons for testing the potentially translational effects of psilocin and other psychedelic-related compounds.

    2. Reviewer #1 (Public review):

      Summary:

      This study reports the effects of psilocin on iPSC-derived human cortical neurons.

      Strengths:

      The characterization was comprehensive, involving immunohistochemistry of various markers, 5-HT2A receptors, BDNF, and TrkB, transcriptomics analyses, morphological determination, electrophysiology, and finally synaptic protein measurements. The results are in close agreement with prior work (PMID 29898390) on rat-cultured cortical neurons. Nevertheless, there is value in confirming those earlier findings and furthermore demonstrating the effects in human neurons, which are important for translation. The genetic, proteomics, and cell structure analyses used in this paper are its major strengths. The study supports the value of using iPSC-derived human cortical neurons for drug development involving psychedelics-related compounds.

      Weaknesses:

      (1) Line 140: 5-HT2A receptor expression was found via immunocytochemistry to reside in the somatodendritic and axonal compartments. However, prior work from ex vivo tissue using electron microscopy has found predominantly 5-HT2A receptor expression in the somatodendritic compartment (PMID: 12535944). Was this antibody validated to be 5-HT2A receptor-specific? Can the authors reason why the discrepancy may arise, and if the axonal expression is specific to the cultured neurons?

      (2) Line 143: It would be helpful to specify the dose of psilocin tested, and describe how this dose was chosen.

      (3) Figure 1: The interpretation is that the differential internalization in the axonal and somatodendritic compartments is time-dependent. However, given that only one dose is tested, it is also possible that this reflects dose dependence, with the longer time exposure leading to higher dose exposure, so these variables are related. That is, if a higher dose is given, internalization may also be observed after 10 minutes in the dendritic compartment.

      (4) Figure 3 & 4: What is the 'control' here? A more appropriate control for the 24 hours after psilocin application would be 24 hours after vehicle application. Here the authors are looking at before and after, but the factor of time elapsed and perturbation via application is not controlled for.

      (5) The sample size was not clearly described. In the figure legend, N = the number of neurites is provided, but it is unclear how many cells have been analyzed, and then how many of those cells belong to the same culture. These are important sample size information that should be provided. Relatedly, statistical analyses should consider that the neurites from the same cells are not independent. If the neurites indeed come from the same cells, then the sample size is much smaller and a statistical analysis considering the nested nature of the data should be used.

    3. Reviewer #2 (Public review):

      In this article, Schmidt et al use iPSC-derived human cortical neurons to test the effects the psychedelic psilocin in different models of neuroplasticity.

      Using human iPSC-derived cortical neurons, the authors test the expression of 5-HT2A and subcellular distribution, as well as the effect of different times of exposure to psilocin on 5-HT2A expression. The authors evaluated the effect of the 5-HT2 antagonist ketanserin, as well as the inhibition of dynamin-dependent endocytic pathways with dynasore. Gene expression and plasticity (structural and functional) was also evaluated after different times of exposure to psilocin.

      In general, results are interesting since they use the iPSC to evaluate the potentially translationally relevant effects of psilocin (the active metabolite of the psychedelic psilocybin). However, there are a few concerns that need to be addressed:

      (1) My main critique is the lack of experimental validation of selectivity and/or specificity of the anti-5-HT2A antibody targeting the extracellular loop of the 5-HT2A receptor (Alomone labs, cat # ASR-033). Most of the primary antibodies targeting class A GPCRs (including the 5-HT2A receptor) have very limited selectivity. Without validation (using for example knockdown techniques to decrease expression of 5-HT2A in their iPSC-derived human cortical neurons), the experiments using this antibody should be excluded from the manuscript.

      (2) Did the author evaluate whether 5-HT is present in the cell media? If it is, this may affect the functional outcomes evaluated throughout, since as the endogenous ligand it would in principle activate the 5-HT2A receptor.

      (3) Some of the datasets are not statistically analyzed (or quantified), such as Figure S1F.

      (4) Another important concern is the experimental design used to evaluate the effect of psilocin at different time points (24h, 4 days and 10 days). One of the unique and translationally interesting effects of psychedelics including psilocybin is that the in vivo plasticity-related effects (increased structural or synaptic plasticity for example) are observed post-acutely, or once the active compound psilocin is fully metabolized, or not present in the CNS directly targeting the 5-HT2A. Using the iPSC, it seems that the authors continuously exposed cells to psilocin (for hours or even days) at least for some of the experimental techniques. Since this is not the model of what occurs using an in vivo model (such as a single dose of psilocybin to mice, collecting frontal cortex samples 24-h after drug administration, once the active compound is fully metabolized), the authors' findings lack translational validity. Can the authors comment on this?

      (5) In Figure 2E, it seems that ketamine by itself is reducing BDNF density. How then the authors conclude that ketamine blocks psi-induced effects? Using a more selective 5-HT2A antagonist such as M100907 could also improve the outcome (in terms of selectivity) of this experiment.

      (6) To evaluate neurite complexity, the authors used the AAV-CamKII-mCherry viral vector, but mCherry (Fig 4A) seems to be retained in the nucleus.

      (7) Minor: Reference 36- this is a review article that does not mention the psychedelic psilocin

    4. Author response:

      We sincerely thank the reviewers for their thorough and constructive evaluation of our manuscript. We particularly appreciate their recognition of our comprehensive characterization approach, which integrates immunohistochemistry, transcriptomics, morphological assessments, and electrophysiology to understand psilocin's effects on human neurons. The reviewers highlighted that our findings closely align with and validate prior work on rat cortical neurons, while importantly extending these insights to human cells. We are encouraged by their acknowledgment that our study demonstrates the value of using iPSC-derived human cortical neurons for testing potentially translatable effects of psychedelic compounds. Their positive assessment of our work's implications for psychedelic drug development is particularly valuable, as it supports our goal of advancing the understanding of these compounds' therapeutic potential and their possible application in treating neuropsychiatric disorders.

      We are also very grateful for the reviewers' constructive criticism which will help strengthen our manuscript significantly. Based on their detailed feedback, we plan to perform several additional experiments for inclusion in the revised manuscript.

      The most important concern raised by both reviewers is about the specificity of the antibody used to detect the expression pattern and abundance of 5-HT2A receptors at the cells' surface. We acknowledge that GPCR antibodies, including those targeting 5-HT2A receptors, can be challenging in terms of specificity and reliability, particularly given the structural similarities within this receptor family. To address these concerns comprehensively, we propose the following systematic validation strategy:

      (1) Cell-Type Specific Expression Analysis: We will systematically evaluate the antibody across different developmental stages and cell lines. The results from the stainings will be correlated with RNA sequencing data to provide quantitative validation of expression patterns. Cell types to be included will be:

      · iPSCs (expected negative)

      · Neural progenitors (expected positive)

      · Mature neurons (expected positive)

      · HEK cells (expected negative) This multi-stage analysis will allow us to track receptor expression through development and verify antibody specificity across distinct cellular contexts.

      (2) Peptide Competition Study: We will perform blocking experiments using the specific peptide sequence against which the antibody was raised. By pre-incubating the antibody with its cognate peptide at established working concentration, followed by detailed documentation of signal reduction in peptide-blocked condition versus standard staining, we can demonstrate binding specificity. This approach will provide direct evidence of antibody selectivity for its intended target.

      (3) Sequence Analysis and Specificity: We will perform a comprehensive protein BLAST analysis of the antigenic peptide sequence, assess potential cross-reactivity with related receptors, and evaluate species conservation and specificity. This in silico approach will complement our experimental validation and help identify any potential off-target binding sites.

      (4) Additional Validation: While technically challenging, we will attempt knockdown studies using siRNA/shRNA approaches to provide additional validation of antibody specificity. This molecular intervention will offer another layer of validation through targeted reduction of the receptor.

      We plan to present these results in a new supplementary figure that will provide a comprehensive overview of our validation efforts. Should we not be able to convincingly demonstrate the specificity of the antibody, we will discuss with the editors and reviewers to modify Figure 1 and exclude critical parts from the manuscript. While we find the results interesting and important to communicate, an omission would not critically impact the key message of the manuscript, which is the structural and molecular changes elicited by psilocin on human neurons. The strength of our multi-modal approach means that our core findings are supported by several independent lines of evidence beyond antibody-based detection.

    1. eLife Assessment

      The authors aimed to quantify feral pig interactions in eastern Australia to inform disease transmission networks. They used GPS tracking data from 146 feral pigs across multiple locations to construct proximity-based social networks and analyze contact rates within and between pig social units. This fundamental study shows that targeting adult males in feral pig control programs could help global efforts to contain disease. The methods are compelling and the paper should be of interest to the fields of veterinary medicine, public health, and epidemiology.

    2. Reviewer #2 (Public review):

      Summary:

      The paper attempts to elucidate how feral (wild) pigs cause distortion of the environment in over 54 countries of the world, particularly Australia.

      The paper displays proof that over $120 billion worth of facilities were destroyed annually in the United States of America.

      The authors have tried to infer that the findings of their work were fundamental and possessing a compelling strength of evidence.

      Strengths:

      (1) Clearly stating feral (wild) pigs as a problem in the environment.

      (2) Stating how 54 countries were affected by the feral pigs.

      (3) Mentioning how $120 billion was lost in the US, annually, as a result of the activities of the feral pigs.

      (4) Amplifying the fact that 14 species of animals were being driven into extinction by the feral pigs.

      (5) Feral pigs possessing zoonotic abilities.

      (6) Feral pigs acting as reservoirs for endemic diseases like brucellosis and leptospirosis.

      (7) Understanding disease patterns by the social dynamics of feral pig interactions.

      (8) The use of 146 GPS-monitored feral pigs to establish their social interaction among themselves.

      Weaknesses:

      None, as the weaknesses had been already addressed.

    3. Reviewer #3 (Public review):

      Summary:

      The authors sought to understand social interactions both within and between groups of feral pigs, with the intent of applying their findings to models of disease transmission. The authors analyzed GPS tracking data from across various populations to determine patterns of contact that could support the transmission of a range of zoonotic and livestock diseases.<br /> The analysis then focused on the effects of sex, group dynamics, and seasonal changes on contact rates that could be used to base targeted disease control strategies which would prioritize the removal of adult males for reducing intergroup disease transmission.

      Strengths:

      It utilized GPS tracking data from 146 feral pigs over several years, effectively capturing seasonal and spatial variation in the social behaviors of interest. Using proximity-based social network analysis, this work provides a highly resolved snapshot of contact rates and interactions both within and between groups, substantially improving research in wildlife disease transmission.<br /> Results were highly useful and provided practical guidance for disease management, showing that control targeted at adult males could reduce intergroup disease transmission, hence providing an approach for the control of zoonotic and livestock diseases.

      Weaknesses:

      None, as the authors have already addressed the identified weaknesses.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors aimed to quantify feral pig interactions in eastern Australia to inform disease transmission networks. They used GPS tracking data from 146 feral pigs across multiple locations to construct proximity-based social networks and analyse contact rates within and between pig social units.

      Strengths:

      (1) Addresses a critical knowledge gap in feral pig social dynamics in Australia.

      (2) Uses robust methodology combining GPS tracking and network analysis.

      (3) Provides valuable insights into sex-based and seasonal variations in contact rates.

      (4) Effectively contextualizes findings for disease transmission modeling and management.

      (5) Includes comprehensive ethical approval for animal research.

      (6) Utilizes data from multiple locations across eastern Australia, enhancing generalizability.

      Weaknesses:

      (1) Limited discussion of potential biases from varying sample sizes across populations

      This is a really good comment, and we will address this in the discussion as one of the limitations of the study

      (2) Some key figures are in supplementary materials rather than the main text.

      We will move some of our supplementary material to the main text as suggested.

      (3) Economic impact figures are from the US rather than Australia-specific data.

      We included the impact figures that are available for Australia (for FDM), and we will include the estimated impact of ASF in Australia in the introduction.

      (4) Rationale for spatial and temporal thresholds for defining contacts could be clearer.

      We will improve the explanation of why we chose the spatial and temporal thresholds based on literature, the size of animals and GPS errors.

      (5) Limited discussion of ethical considerations beyond basic animal ethics approval.

      This research was conducted under an ethics committee's approval for collaring the feral pigs. This research is part of an ongoing pest management activity, and all the ethics approvals have been highlighted in the main manuscript.

      The authors largely achieved their aims, with the results supporting their conclusions about the importance of sex and seasonality in feral pig contact networks. This work is likely to have a significant impact on feral pig management and disease control strategies in Australia, providing crucial data for refining disease transmission models.

      Reviewer #2 (Public review):

      Summary:

      The paper attempts to elucidate how feral (wild) pigs cause distortion of the environment in over 54 countries of the world, particularly Australia.

      The paper displays proof that over $120 billion worth of facilities were destroyed annually in the United States of America.

      The authors have tried to infer that the findings of their work were important and possess a convincing strength of evidence.

      Strengths:

      (1) Clearly stating feral (wild) pigs as a problem in the environment.

      (2) Stating how 54 countries were affected by the feral pigs.

      (3) Mentioning how $120 billion was lost in the US, annually, as a result of the activities of the feral pigs.

      (4) Amplifying the fact that 14 species of animals were being driven into extinction by the feral pigs.

      (5) Feral pigs possessing zoonotic abilities.

      (6) Feral pigs acting as reservoirs for endemic diseases like brucellosis and leptospirosis.

      (7) Understanding disease patterns by the social dynamics of feral pig interactions.

      (8) The use of 146 GPS-monitored feral pigs to establish their social interaction among themselves.

      Weaknesses:

      (1) Unclear explanation of the association of either the female or male feral pigs with each other, seasonally.

      This will be better explained in the methods.

      (2) The "abstract paragraph" was not justified.

      We have justified the abstract paragraph as requested by the reviewer.

      (3) Typographical errors in the abstract.

      Typographical errors have been corrected in the Abstract.

      Reviewer #3 (Public review):

      Summary:

      The authors sought to understand social interactions both within and between groups of feral pigs, with the intent of applying their findings to models of disease transmission. The authors analyzed GPS tracking data from across various populations to determine patterns of contact that could support the transmission of a range of zoonotic and livestock diseases. The analysis then focused on the effects of sex, group dynamics, and seasonal changes on contact rates that could be used to base targeted disease control strategies that would prioritize the removal of adult males for reducing intergroup disease transmission.

      Strengths:

      It utilized GPS tracking data from 146 feral pigs over several years, effectively capturing seasonal and spatial variation in the social behaviors of interest. Using proximity-based social network analysis, this work provides a highly resolved snapshot of contact rates and interactions both within and between groups, substantially improving research in wildlife disease transmission. Results were highly useful and provided practical guidance for disease management, showing that control targeted at adult males could reduce intergroup disease transmission, hence providing an approach for the control of zoonotic and livestock diseases.

      Weaknesses:

      Despite their reliability, populations can be skewed by small sample sizes and limited generalizability due to specific environmental and demographic characteristics. Further validation is needed to account for additional environmental factors influencing social dynamics and contact rates.

      This is a really good point, and we thank the reviewer for pointing out this issue. We will discuss the potential biases due to sample size in our discussion. We agree that environmental factors need to be incorporated and tested for their influence on social dynamics, and this will be added to the discussion as we have plans to expand this research and conduct, the analysis to determine if environmental factors are influencing social dynamics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Consider moving some key figures from supplementary materials to the main text to strengthen the presentation of results.

      We included a new figure to strengthen the presentation of results (Figure 3a-b), which shows the node level measures by sex and for direct and indirect networks.

      (2) Expand discussion of limitations, particularly addressing potential biases from varying sample sizes across populations.

      We added more detail and clarity about this potential bias into the limitation section within the discussion: “Different populations in our study had varying numbers of collared individuals, with some populations having only two individuals at certain times. This variability in sample size across populations is a limitation when interpreting the results. Small populations are often the result of a few individuals being trapped and collared, and this does not necessarily reflect the actual number of individuals in those groups.” Moreover, while reviewing the effect of the potential bias, we found that a General Linear Mixed Effect Model (Table 1) was not optimal for analysing the effect of sex on the network measures, and therefore this analysis has been done again using a non-parametric test (Wilcoxon rank-sum test)  for direct and indirect networks based on a 5 metres threshold (Table 1).

      (3) If available, include Australia-specific economic impact data in the introduction.

      We included the impact figures that are available for Australia (for FDM) in the introduction.

      (4) Clarify the rationale for chosen spatial and temporal thresholds for defining contacts.

      This has been added in the methodology: “Direct contact was defined when two individuals interacted either at 2, 5, or 350-metre buffers within a five-minute interval [36]. A previous study used 350 metres as a spatial threshold [16], while others use the approximate average body length of an individual [36]”

      (5) Consider adding a brief discussion of ethical considerations beyond basic animal ethics approval, addressing aspects like animal welfare during collaring and potential environmental impacts.

      Feral pigs are an invasive species in Australia, and managing their population is crucial to protecting native ecosystems. The trapping and collaring of these animals have been conducted following the stringent animal welfare requirements necessary to obtain animal ethics approval in Australia. However, it is important to consider the broader ethical implications. Animal welfare during collaring is a critical aspect and involves minimising stress and physical harm to the animals. The collars used are lightweight and properly fitted only on adults due to welfare issues collaring juveniles.

      (6) Add a statement about data availability/accessibility.

      The GPS data cannot be shared; however, the R codes will be deposited in GitHub (https://github.com/Tatianaproboste/Feral-Pig-Interactions) and the link has been added in the final version.

      (7) Expand on the implications of seasonal variation in contact rates for disease management strategies in the discussion.

      We have added this information in the discussion: “For example, controlling an outbreak during summer would potentially require more resources than an outbreak in other seasons due to the higher number of contact between individuals during summer.”

      Reviewer #2 (Recommendations for the authors):

      The typographical errors in the abstract to be corrected are:

      (1) Line 22: Remove the "are" before "threaten".

      This has been corrected.

      (2) Line 24: Replace the "to" before "extinction" with "into".

      This has been corrected.

      (3) Line 28: Rephrase the sentence.

      ‘Yet social dynamics are known to vary enormously from place to place, so knowledge generated for example in USA and Europe might not easily transfer to locations such as Australia.’

      (3) Line 29: Insert a "comma" after "Here".

      This has been corrected.

      (4) Lines 33 -34: Explain, clearly, the contact rates; is it between females to females or females to males?

      We have improved this phrase and now it reads: “…. with females demonstrating higher group cohesion (female-female) and males acting as crucial connectors between independent groups.”

      (5) Line 36: Make yourselves clear about what you mean by "targeting adult male".

      We believe “targeting adult males” is correct in this context.

      Reviewer #3 (Recommendations for the authors):

      (1) Line 22 and 44, I think are threaten "are" should be removed for better clarity.

      This has been corrected.

      (2) Line 71, the source and not "force" of infection.

      The force of infection is correct here.

      (3) Line 72, population "of".

      This has been corrected.

      (4) Under statistical analysis, the software version should be included.

      R has changed to multiple versions since we started this analysis.

      (5) Terminological consistency: as far as possible try to be consistent with the terms used in the text, such as using "contact rate" instead of "interaction rate" in order not to puzzle the readers.

      We have changed most of the “interactions” to “contact” instead as suggested.

      (6) Correct Typos: Identify typos and grammatical inconsistencies of any kind, especially in those complex sentences that may be hard to follow.

      The typos have been checked.

      (7) Under the methodology, briefly describe why specific thresholds were chosen and any limitations.

      We added the following into the method: “Direct contact was defined when two individuals interacted either at 2, 5, or 350-metre buffers within a five-minute interval [36]. A previous study used 350 metres as a spatial threshold [16], while others use the approximate average body length of an individual [36]”

      (8) The discussion should be strengthened by drawing clear links between the findings and actionable management strategies.

      We have strengthened the discussion by adding more specific actionable management strategies. For example, controlling an outbreak during summer would potentially require more resources than an outbreak in other seasons due to the higher number of contacts between individuals during summer.

      (9) Did you consider additional environmental factors, such as rainfall, food availability, or habitat features, to better understand how these influence seasonal variations in pig interactions and contact rates?

      This is something that we have in mind and will explore in future research. This has been partially explored but is based on how environmental factors and seasons affect the home range (Wilson et al 2023).

      (10) Figure Legends: Add more detailed descriptions in figure legends, especially for those figures showing network metrics or contact rates.

      More information has been added to the figure legends.

      (11) The paper includes too many figures, and thus, it is recommended to simplify or merge some figures where appropriate. In particular, this is recommended for those figures that plot more network measures across thresholds. Adding clear, summarized captions with interpretation on threshold and measure significance would be a great help in interpreting complicated visualizations.

      The figure that shows the comparison between global network measures, including average local transitivity, edge density, global transitivity, mean distance and number of edges for direct and indirect networks has been moved to supplementary material (Figure S3). We also included direct and indirect model-level measures by sex as in Figure 3 and improved the captions of the figures presented in the main document.

    1. eLife Assessment

      This is an important study demonstrating that anosmia in Parkinson's disease patients is due to dysfunction in cholinergic neurons. This study provides compelling evidence, using scRNA sequencing, that cholinergic olfactory projection neurons (OPN) are consistently affected in five different fruit fly models of Parkinson's disease, exhibiting synaptic dysfunction before the onset of motor deficits. Comparisons with scRNA sequencing of patients' human brain samples reveals similar synaptic gene deregulation in cholinergic neurons of patients. This study points the possibility that targeting cholinergic neurons could be a potential avenue for early diagnosis and intervention in PD.

    2. Reviewer #1 (Public review):

      In Pech et al. the authors take advantage of a genetic model organism to investigate the convergent impact of multiple mutations linked to Parkinson's Disease (PD). To investigate this question they leverage Drosophila genetics to create wild type and mutant alleles for five different mutations linked to PD. An additional novel focus of this work is an examination of the animals in an early phase before apparent dopaminergic degeneration. Having generated this resource, authors discover apply an impressive array of experiments including behavioural assays, calcium imaging and single-cell profiling. They also cross-validate their findings in human PD brains. Strikingly, the authors discover common dysregulated genes between fly and human that converges on synaptic dysregulation. Finally, they demonstrate that even in early timepoints, there is extensive dysfunction of olfactory projection neuron calcium.

      This is a fantastic, comprehensive, timely and landmark pan-species work that demonstrates the convergence of multiple familial PD mutations onto a synaptic program. It is extremely well written and the authors have addressed all my comments in this review. I recommend this work be published as soon as possible.

    3. Reviewer #3 (Public review):

      Summary:

      This study investigates the cellular and molecular events leading to hyposmia, an early dysfunction in Parkinson's disease (PD), which develops up to 10 years prior to motor symptoms. The authors use five Drosophila knock-in models of familial PD genes (LRRK2, RAB39B, PINK1, DNAJC6 (Aux), and SYNJ1 (Synj)), three expressing human genes and two Drosophila genes with equivalent mutations.

      The authors carry out single-cell RNA sequencing of young fly brains and single-nucleus RNA sequencing of human brain samples. The authors found that cholinergic olfactory projection neurons (OPN) were consistently affected across the fly models, showing synaptic dysfunction before the onset of motor deficits, known to be associated with dopaminergic neuron (DAN) dysfunction.

      Single-cell RNA sequencing revealed significant transcriptional deregulation of synaptic genes in OPNs across all five fly PD models. This synaptic dysfunction was confirmed by impaired calcium signalling and morphological changes in synaptic OPN terminals. Furthermore, these young PD flies exhibited olfactory behavioural deficits that were rescued by selective expression of wild-type genes in OPNs.

      Single-nucleus RNA sequencing of post-mortem brain samples from PD patients with LRRK2 risk mutations revealed similar synaptic gene deregulation in cholinergic neurons, particularly in the nucleus basalis of Meynert (NBM). Gene ontology analysis highlighted enrichment for processes related to presynaptic function, protein homeostasis, RNA regulation, and mitochondrial function.

      This study provides compelling evidence for the early and primary involvement of cholinergic dysfunction in PD pathogenesis, preceding the canonical DAN degeneration. The convergence of familial PD mutations on synaptic dysfunction in cholinergic projection neurons suggests a common mechanism contributing to early non-motor symptoms like hyposmia. The authors also emphasise the potential of targeting cholinergic neurons for early diagnosis and intervention in PD.

      Strengths:

      This study presents a novel approach, combining multiple mutants to identify salient disease mechanisms. The quality of the data and analysis is of a high standard, providing compelling evidence for the role of OPN neurons in olfactory dysfunction in PD. The authors also provide evidence to show that early olfactory defects lead to later dopaminergic neuron dysfunction. The comprehensive single-cell RNA sequencing data from both flies and humans is a valuable resource for the research community. The identification of consistent impairments in cholinergic olfactory neurons, at early disease stages, is a powerful finding that highlights the convergent nature of PD progression. The comparison between fly models and human patients' brains provides strong evidence of the conservation of molecular mechanisms of disease, which can be built upon in further studies using flies to prove causal relationships between the defects described here and neurodegeneration.

      The identification of specific neurons involved in olfactory dysfunction opens up potential avenues for diagnostic and therapeutic interventions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their comments and provide answers /clarifications and new data; There were 3 important recurrent points we already address here: 

      (a) The reviewers were concerned that the observed motor defects (measured by startle induced negative geotaxis- “SING”) where a reasonable behavioral measure of DAN function.

      Previously, Riemensperger et al., 2013 (PMID: 24239353) already linked synaptic loss of the dopaminergic PAM neurons to SING impairments. Furthermore, in a separate paper that we recently posted on BioRxiv, we show that the SING defects in PD mutants are rescued when the flies are fed L-DOPA (Kaempf et al 2024; BioRxiv). In this same paper we also show a very strong correlation between SING defects and defects in dopaminergic synaptic innervation of PAM DAN onto Mushroom body neurons. Both experiments suggest that the motor defects are the result of defects in dopamine release. Altogether, these data suggest that the combination of the SING assay and a quantification of the synaptic region of PAM DAN onto Mushroom body neurons is a suitable measure for DAN function.

      (b) The reviewers asked if the OPN dysfunction in young animals is connected to dopaminergic neuron (DAN) dysfunction in later life; 

      We have conducted additional experiments and have included the results (new Figure 6): Our young PD mutants (we included Aux<sup>R927G</sup>, Synj<sup>R258Q</sup> and LRRK2<sup>G2019S</sup>) show olfactory defects, but normal DAN function (measured by assessing the TH-labeled synaptic area onto the Mushroom body neurons and by SING). Aged PD mutants show both olfactory defects and DAN dysfunction. When we express the wildtype PD gene in (a.o.) OPN of PD mutants using the GH146-Gal4 (that does not drive expression in DAN) we are able to rescue the DAN defects (synaptic area and SING) that occur later in life. This indeed suggests there is a cell non-autonomous positive effect on DAN dysfunction that occurs at later stages in the life of our PD mutants (new Figure 6a). 

      In a set of independent experiments, we also fed one of our mutants (LRRK2<sup>G2019S</sup>) nicotine, activating Nicotinic acetylcholine receptors (that are also activated by the release of acetylcholine from cholinergic neurons such as OPN). While nicotine does not rescue the olfactory preference defect, the OPN synapse morphology defect or the OPN-associated defects in Ca<sup>2+</sup>-imaging in LRRK2<sup>G2019S</sup> mutants (Figure 6b), it does rescue the DAN-associated defects, including SING, synapse loss and defects in Ca<sup>2+</sup>-imaging (Figure 6c).

      Finally, we generated human induced dopaminergic neurons derived from iPSC with a LRRK2<sup>G2019S</sup> mutation and incubated these neurons with nicotine. Again, this induced a rescue of a LRRK2-mutant-induced defect in neuronal activity measured by Ca<sup>2+</sup>-imaging. This is specific to nicotine since the rescue was absent when cells were also incubated with mecamylamine, a non-competitive antagonist of nicotinic acetylcholine receptors, trumping the effects of nicotine (Figure 6d-e").

      (c) The reviewers indicated that the GH146 Gal 4 driver is expressed in other cells than OPN and thus, they noted that the defects we observe may not only be the result of OPN dysfunction. 

      It is correct that GH146-dependent Gal expression includes OPNs (that are cholinergic) and one pair of inhibitory APL neurons (that are GABAergic) (Li et al., 2017 (PMID: 29149607), Lui et al., 2009 (PMID: 19043409)). We have adapted the text to explicitly state this. There are only 2 APL per fly brain and our single cell sequencing experiment does not have the resolution to allow us to test if these neurons had a significant number of DEG. However, as indicated above (in (b)), we are able to rescue DAN dysfunction by mimicking cholinergic output (application of nicotine). These data do not exclude that APL-neuron problems contribute to the defects we observe in our PD mutants, but they do suggest that cholinergic output is critical to maintain normal DAN function.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      This is a fantastic, comprehensive, timely, and landmark pan-species work that demonstrates the convergence of multiple familial PD mutations onto a synaptic program. It is extremely well written and I have only a few comments that do not require additional data collection. 

      Thank you for this enthusiastic endorsement.

      Major Comments:  

      neurons and the olfactory system are acutely impacted by these PD mutations. However, I wonder if this is the case:  

      (1) In the functional experiments performing calcium imaging on projection neurons I could not find a count of cell bodies across conditions. Since the loss of OPNs could explain the reduced calcium signal, this is a critical control to perform. A differential abundance test on the single-cell data would also suffice here and be easy for the authors to perform with their existing data. 

      This is indeed an important number, and we had included this in the Supplemental figure 2a.

      Also, the number of DAN and Visual projection neurons were not significantly different between the genotypes (Supplemental Figure 2a in the manuscript). 

      (2) One of the authors' conclusions is that cholinergic

      a. Most Drosophila excitatory neurons are cholinergic

      and only a subpopulation appear to be dysregulated by these mutations. The authors point out that visual neurons also have many DEGs, couldn't the visual system also be dysregulated in these flies? Is there something special about these cholinergic neurons versus other cholinergic neurons in the fly brain? I wonder if they can leverage their nice dataset to say something about vulnerability. 

      Yes, the reviewer is right, and we have changed our wording to be more specific. The reviewer also noted correctly that neurons in the visual system rank high in terms of number of DEGs, but we did not conduct elaborate experiments to assess if these visual system neurons are functional. Of note, several of our mutants show (subtle) electroretinogram defects, that are a measure of visual system integrity, but further work is needed to determine the origin of these defects. 

      The question about the nature of the underlying vulnerability pathways is interesting. In preliminary work we have selected a number of DEGs common to vulnerable cells in several PD mutants, and conducted a screen where we manipulated the expression of these DEGs and looked for rescue of the olfactory preference defects in our PD mutants. The strongest genetic interaction was with genes encoding proteins involved in proteostasis (Atg8/LC3, Lamp1 and Hsc70-4) (Reviewer Figure 3). While interesting, these results require further work to understand the underlying molecular mechanisms. We present these preliminary data here but have not included them in the main manuscript. 

      b. As far as I can tell, the cross-species analysis of DEGs (Figure 3) is agnostic to neuronal cell type, although the conclusion seems to suggest only cholinergic neurons were contrasted. Is this correct? Could you please clarify this in the text as it's an important detail. If not, Have the authors tried comparing only cholinergic neuron DEGs across species? That would lend strength to their specificity argument. The results for the NBM are impressive. Could the authors add more detail to the main text here about other regions to the main text? 

      The reviewer is correct that we compiled the DEG of all affected cells, the majority of which are cholinergic neurons. 

      For the human data we focused on the NBM samples, because it contained the highest fraction of cholinergic neurons (as compared to the other 2 regions), but even so, it was not possible to analyze the cholinergic neurons alone because the fraction of cholinergic neurons in the human material was too low to be statistically analyzed independently. Note that both wildtype and PD samples contained a low number of cholinergic neurons (i.e. the DEG differences we detected were not the result of sequencing different types of cells - see also Supplemental Figure 3b and d). We have indicated this more clearly in the text.

      c. Uniquely within the human data, are cholinergic neurons more dysregulated than others? I understand this is not an early timepoint but would still be useful to discuss. 

      As indicated in the previous point, unfortunately the fraction of cholinergic neurons in the human material was low and we were not able to analyze these cells on their own. 

      Author response image 1.

      Upregulation of protein homeostasis rescues hyposmia across familial models of PD. Results of a behavioral screen for cell-specific rescue of olfactory preference defects of young PD fly models using up and downregulation of deregulated genes in affected cell types. Genes implicated in the indicated pathways are over expressed or knocked down using GH146-Gal4 (OPN>) and UAS-constructs (over expression or RNAi) . UAS-only (-) and OPN>UAS (+) were scored in parallel and are compared to each other. n.d. not determined; Bars represent mean ± s.e.m.; grey zone indicates the variance of controls; n≥5 independent experiments per genotype, with ~50 flies each; red bars: p<0.05 in ANOVA and Bonferroni-corrected comparison to UAS-only control.

      d. In the discussion, the authors say that olfactory neurons are uniquely poised to be dysregulated as they are large and have high activity. Is this really true compared to other circuits? I didn't find the references convincing and I am not sure this has been borne out in electron microscopy reconstructions for anatomy.  

      We agree and have toned down this statement.

      Reviewer #2 (Public Review):  

      Summary:  

      Pech et al selected 5 Parkinson's disease-causing genes, and generated multiple

      Drosophila lines by replacing the Drosophila lrrk, rab39, auxilin (aux), synaptojanin

      (synj), and Pink1 genes with wild-type and pathogenic mutant human or Drosophila cDNA sequences. First, the authors performed a panel of assays to characterize the phenotypes of the models mentioned above. Next, by using single-cell RNA-seq and comparing fly data with human postmortem tissue data, the authors identified multiple cell clusters being commonly dysregulated in these models, highlighting the olfactory projection neurons. Next, by using selective expression of Ca<sup>2+</sup>-sensor GCaMP3 in the OPN, the authors confirmed the synaptic impairment in these models, which was further strengthened by olfactory performance defects.  

      Strengths:  

      The authors overall investigated the functionality of PD-related mutations at endogenous levels and found a very interesting shared pathway through singlecell analysis, more importantly, they performed nice follow-up work using multiple assays.  

      Weaknesses:  

      While the authors state this is a new collection of five familial PD knock-in models, the Aux<sup>R927G</sup> model has been published and carefully characterized in Jacquemyn et al., 2023. ERG has been performed for Aux R927G in Jacquemyn et al., 2023, but the findings are different from what's shown in Figure 1b and Supplementary Figure 1d, which the authors should try to explain. 

      We should have explained this better: the ERG assay in Jacquemyn et al., and here, in Pech et al., are different. While the ERGs in our previous publication were recorded under normal endogenous conditions, the flies in our current study were exposed to constant light for 7 days. This is often done to accelerate the degeneration phenotype. We have now indicated this in the text (and also refer to the different experimental set up compared to Jacquemyn et al).

      Moreover, according to the authors, the hPINK1control was the expression of human PINK1 with UAS-hPINK1 and nsyb-Gal4 due to technical obstacles. Having PINK1 WT being an overexpression model, makes it difficult to explain PINK1 mutant phenotypes. It will be strengthened if the authors use UAS-hPINK1 and nsyb-Gal4 (or maybe ubiquitous Gal4) to rescue hPink1L347P and hPink1P399L phenotypes.

      The UAS-hPink1 was originally created by the Lu lab (Yang et al., 2003, PMID: 12670421) and has been amply used before in Pink1 loss-of-function backgrounds (e.g. in Yang et al., 2006, PMID: 16818890). In our work, the control we refer to was UAS-hPink1 expression (driven by nSyb-gal4) in a Pink1 knock-out background. For unknown reasons we were unable to replace the fly Pink1 with a human pink1 cDNA, we explained this in the methods section and added a remark in the new manuscript.

      In addition, although the authors picked these models targeting different biology/ pathways, however, Aux and Synj both act in related steps of Clathrin-mediated endocytosis, with LRRK2 being their accessory regulatory proteins. Therefore, is the data set more favorable in identifying synaptic-related defects? 

      We picked these particular mutants, as they were the first we created in the context of a much larger collection of “PD flies” (see also Kaempf et al 2024, BioRxiv). We have made adaptations to the text to tone down the statement on the broad selection of mutants. 

      GH146-GAL4+ PNs are derived from three neuroblast lineages, producing both cholinergic and GABAergic inhibitory PNs (Li et al, 2017). Therefore, OPN neurons have more than "cholinergic projection neurons". How do we know from singlecell data that cholinergic neurons were more vulnerable across 5 models? 

      The reviewer is correct that GH146 drives expression in other cells than OPN and we now clearly state this in the text. We do present additional arguments that substantiate our conclusion that cholinergic neurons are affected: (1) our single cell sequencing identifies the most DEGs in cholinergic neurons. (2) nicotine (a compound activating cholinergic receptors) rescues dopamine-related problems in old PD-mutant flies. (3) Likewise, nicotine also alleviates problems we observed in LRRK2 mutant human induced dopaminergic neurons and this is blocked by mecamylamine, a non-competitive antagonist of nicotinic acetylcholine receptors.

      In Figure 1b, the authors assumed that locomotion defects were caused by dopaminergic neuron dysfunction. However, to better support it, the author should perform rescue experiments using dopaminergic neuron-specific Gal4 drivers. Otherwise, the authors may consider staining DA neurons and performing cell counting. Furthermore, the authors stated in the discussion, that "We now place cholinergic failure firmly ahead of dopaminergic system failure in flies", which feels rushed and insufficient to draw such a conclusion, especially given no experimental evidence was provided, particularly related to DA neuron dysfunction, in this manuscript. 

      Previously, Riemensperger et al., 2013 (PMID: 24239353) already linked synaptic loss of the dopaminergic PAM neurons to locomotion impairments (measured by SING). Furthermore, in a separate paper we show that the motor defects (SING) observed in PD mutants are rescued when the flies are fed L-DOPA, but not D-DOPA (Kaempf et al 2024; BioRxiv). In this same paper, we also show a significant correlation between SING defects and defects in dopaminergic synaptic innervation of PAM DAN onto Mushroom body neurons. We have referred to both articles in the revised manuscript.

      The statement on cholinergic failure ahead of dopaminergic failure was made in the context of the sequence of events: young flies did not show DAN defects, but they did display olfactory defects. The statement was indeed not meant to imply causality. However, we have now conducted new experiments where we express wild type PD genes using GH146-Gal4 (that does not express in DAN) in the PD mutants and assess dopaminergic-relevant phenotypes later in life (see also new Figure 6 in the manuscript). This shows that GH146Gal4-specific rescue is sufficient to alleviate the DAN-dependent SING defects in old flies. Likewise, as indicated above, application of nicotine is also sufficient to rescue the DAN-associated defects (in PD mutant flies and human induced mutant dopaminergic neurons).  

      It is interesting to see that different familial PD mutations converge onto synapses. The authors have suggested that different mechanisms may be involved directly through regulating synaptic functions, or indirectly through mitochondria or transport. It will be improved if the authors extend their analysis on Figure 3, and better utilize their single-cell data to dissect the mechanisms. For example, for all the candidates listed in Figure 3C, are they all altered in the same direction across 5 models?  

      This is indeed the case: the criteria for "commonly deregulated" included that the DEGs are changed in the same direction across several mutants. We ranked genes according to their mean gene expression across the mutants as compared it to the wildtype control: i.e. only if the DEGs are all up- or all down-regulated they end up on the top or bottom of our list. We added a remark in the revised manuscript. In preliminary work we also selected a number of the DEGs and conducted a screen where we manipulated the expression of these genes looking for rescue of the olfactory preference defects in our PD mutants. The strongest genetic interaction was with genes encoding proteins involved in proteostasis (Atg8/LC3, Lamp1 and Hsc70-4; and we also show a genetic interaction between EndoA and Lrrk in this work and in Matta et al., 2012) (Author response image 1 above). While interesting, these results require further work to understand the underlying molecular mechanisms. We present these preliminary data here, but have not included them in the main manuscript. 

      While this approach is carefully performed, the authors should state in the discussions the strengths and the caveats of the current strategy. For example, what kind of knowledge have we gained by introducing these mutations at an endogenous locus? Are there any caveats of having scRNAseq at day 5 only but being compared with postmortem human disease tissue?  

      We have included a “strengths and caveats section” in the discussion addressing these points.

      Reviewer #3 (Public Review):  

      Summary:  

      This study investigates the cellular and molecular events leading to hyposmia, an early dysfunction in Parkinson's disease (PD), which develops up to 10 years prior to motor symptoms. The authors use five Drosophila knock-in models of familial PD genes (LRRK2, RAB39B, PINK1, DNAJC6 (Aux), and SYNJ1 (Synj)), three expressing human genes and two Drosophila genes with equivalent mutations.  

      The authors carry out single-cell RNA sequencing of young fly brains and singlenucleus RNA sequencing of human brain samples. The authors found that cholinergic olfactory projection neurons (OPN) were consistently affected across the fly models, showing synaptic dysfunction before the onset of motor deficits, known to be associated with dopaminergic neuron (DAN) dysfunction.  

      Single-cell RNA sequencing revealed significant transcriptional deregulation of synaptic genes in OPNs across all five fly PD models. This synaptic dysfunction was confirmed by impaired calcium signalling and morphological changes in synaptic OPN terminals. Furthermore, these young PD flies exhibited olfactory behavioural deficits that were rescued by selective expression of wild-type genes in OPNs.  

      Single-nucleus RNA sequencing of post-mortem brain samples from PD patients with LRRK2 risk mutations revealed similar synaptic gene deregulation in cholinergic neurons, particularly in the nucleus basalis of Meynert (NBM). Gene ontology analysis highlighted enrichment for processes related to presynaptic function, protein homeostasis, RNA regulation, and mitochondrial function.  

      This study provides compelling evidence for the early and primary involvement of cholinergic dysfunction in PD pathogenesis, preceding the canonical DAN degeneration. The convergence of familial PD mutations on synaptic dysfunction in cholinergic projection neurons suggests a common mechanism contributing to early non-motor symptoms like hyposmia. The authors also emphasise the potential of targeting cholinergic neurons for early diagnosis and intervention in PD.  

      Strengths:  

      This study presents a novel approach, combining multiple mutants to identify salient disease mechanisms. The quality of the data and analysis is of a high standard, providing compelling evidence for the role of OPN neurons in olfactory dysfunction in PD. The comprehensive single-cell RNA sequencing data from both flies and humans is a valuable resource for the research community. The identification of consistent impairments in cholinergic olfactory neurons, at early disease stages, is a powerful finding that highlights the convergent nature of PD progression. The comparison between fly models and human patients' brains provides strong evidence of the conservation of molecular mechanisms of disease, which can be built upon in further studies using flies to prove causal relationships between the defects described here and neurodegeneration.  

      The identification of specific neurons involved in olfactory dysfunction opens up potential avenues for diagnostic and therapeutic interventions.  

      Weaknesses:  

      The causal relationship between early olfactory dysfunction and later motor symptoms in PD remains unclear. It is also uncertain whether this early defect contributes to neurodegeneration or is simply a reflection of the sensitivity of olfactory neurons to cellular impairments. The study does not investigate whether the observed early olfactory impairment in flies leads to later DAN deficits. Additionally, the single-cell RNA sequencing analysis reveals several affected neuronal populations that are not further explored. The main weakness of the paper is the lack of conclusive evidence linking early olfactory dysfunction to later disease progression.

      We agree that this is an interesting avenue to pursue and as indicated above in Figure 6 and in the reworked manuscript, we have now included data that strengthens the connection between early OPN defects and the later DAN dependent problems. Additional future work will be needed to elucidate the mechanisms of this cell-non autonomous effect. 

      The rationale behind the selection of specific mutants and neuronal populations for further analysis could be better qualified. 

      We have added further explanation in the reworked text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Minor Comments:  

      (1) Questions about the sequencing methods and analysis approaches. From reading the methods and main text, I was confused about aspects of the Drosophila single-cell profiling. Firstly, did the authors multiplex their fly samples? 

      No, we did not. Genotypes were separately prepared and sequenced, but they were all processed in parallel to avoid batch effects. 

      Secondly, it seems like there are two rounds of dataset integration performed, Harmony and Seurat's CCA-based method. This seems unorthodox. Could the authors comment on why they perform two integrations? 

      Thanks for pointing this out, this was a mistake in the methods section (copied from a much older version of the manuscript). In this manuscript, we only used harmony for dataset integration and removed the methods on Seurat-CCA. 

      Finally, for all dataset integrations please state in the main text how datasets were integrated (by age, genotype, etc). 

      Datasets were integrated by sample id, corresponding to individual libraries.

      (2) The authors focus on OPNs with a really nice set of experiments. I noticed however that Kenyon cells were also dysregulated. What about Olfactory sensory neurons? Could the authors provide comments on this? 

      Olfactory sensory neurons are located in the antennae of the fly brain and were not captured by our analysis. However, the GH146-Gal4-specific rescue experiments indicate these sensory neurons are likely not severely functionally impaired. Kenyon cells are an interesting affected cell type to look at in future experiments, as they are directly connected to DANs.

      (3) There are several citations of Jenett et al 2012 that seem wrong (related to single-cell datasets).

      We are sorry for this and have corrected this in the text.  

      Reviewer #2 (Recommendations For The Authors):  

      (1) In the key resources table, a line called CG5010k.o. (chchd2k.o.) was mentioned, but was not used in the paper. The authors should remove it. 

      Sorry, this was from a previous older version of the manuscript. We fixed this.

      (2) Why did the authors use human CDS for LRRK2, Rab39B, and PINK1, but fly CDS for Aux and Synj1? Is it based on the conservation of amino acid residues? Although the authors cited a review (Kalia & Lang, 2015) to justify the selection of the mutations, for the interest of a broad audience, it is recommended that the authors expand their introduction for the rationale of their selection, including the pathogenicity of each selected mutation, original human genetics evidence, conservation between fly and human. 

      (a) We used Drosophila cDNA for rescue experiments with aux and synj since knockin of the human homologues at the locus of these genes did not rescue its loss-offunction (lethality). 

      (b) We expanded the introduction to provide further explanation on the selection of our mutants we analyzed in this work. We picked these particular mutants, as they were the first we created in the context of a much larger collection of “PD flies” (see also Kaempf et al 2024, BioRxiv). We have made adaptations to the text to tone down the statement on the broad selection of mutants. 

      (3) Supplemental Figure 1a, is mRNA level normalized to an internal control? If not, it is not appropriate to compare the results directly from two primer sets, since each primer set may have different amplification efficiency. 

      We are sorry for the lack of information. Indeed, mRNA levels were determined using the Δ-Δ-CT method, where Ct values were first normalized to the housekeeping gene Rp49, and next expressed as a percent of endogenous Drosophila gene expression. We expanded the methods section and now also enlist the primers for Rp49 along with the other qPCR primers in Supplemental File 1.

      (4) For Figure 2, it may be helpful to have a supplemental table or figure showcasing the clusters with significant changes (based on cell number-adjusted DEGs) for each model, i.e., what are those black cell clusters in Figure 2? "Thus, cellular identity and cellular composition are preserved in young PD fly models." In Figure S2A, the authors only show cell composition percentages for 3 cell clusters, are the bars 95% standard error? 

      The error bars in Supplemental Figure 2a represent the 95 % CI. We have included a new supplemental table with the number of cells per cell cluster for each mutant (Supplemental File 3).

      What about the remaining 183 cell clusters? Are there any KI-model cell clusters that are statistically different than controls? What about the annotated cell types (e.g., the 81 with cell identities)? Please consider at least providing or pointing to a table to state how many have significant differences, or if there are truly none. 

      As mentioned above, we have included a new supplemental table with the number of cells per cell cluster for each mutant (Supplemental File 3).

      (5) What are the rows in the sunburst plot in Figure 3a? Please be more descriptive in the figure legend or label the figure. 

      We have expanded on this in the figure legend and now also include a summary of the SynGO analysis in Supplemental File 7. In Figure 3a, a summary sunburst plot is presented, reflecting the GO terms (inner rings, indicated in a) with their subdivided levels (the complete list is provided in Supplemental File 7). In Figure 3a’ and a” the DEG data acquired from the different datasets (human vs fly) are applied to the sunburst plot where rings are color-coded according to enrichment Q-value.

      (6) In Table S4, which clusters (in the table) have normalized residuals that are outside of the 95% confidence interval of the regression model displayed in Figure S2e? They use this analysis to adjust for cell number bias and point out the "most significant cell clusters" affected in each model. This may be helpful for readers who want to grab a full list of responsive clusters. 

      We have included this information in Supplemental File 5 (Tab “Cell types outside of CIs”) in the supplemental data of the manuscript.

      (7) The human samples used all have different LRRK2 variants: for the crossspecies comparisons, do Lrrk flies have greater similarity to the human PD cases compared to the other fly models?

      No, comparing the vulnerable gene signatures from each of the fly mutants to the DEGs from the human samples does not show any greater similarity between the LRRK mutants compared to the other mutants.

      Reviewer #3 (Recommendations For The Authors):  

      Clarifications required:  

      Some of the mutations used are not common PD-associated genes, the authors should explain the rationale behind using these particular mutants, and not using well-established fly models of PD (like for example GBA flies) or SNCA overexpression.

      We opted to use knock-ins of mutations that are causal to Parkinsonism. Given flies do not express an alpha-synuclein homologue we were not able to add this ‘as such’ to our collection. Future work can indeed also include expression models or risk factor models (like GBA). As also requested by another reviewer, we did add further rationale and explanation to the genes we chose to analyze in this work.

      Why starvation rather than lifespan for PD models? For the lifespan data shown there are no error bars, if the stats test is a log-rank or Cox proportional hazards (usually used in survival analysis, this should be stated), it would also be good to have the survival plots for all the survival during starvation, not just PINK1. 

      While starvation assays can provide valuable insights into acute metabolic and physiological stress responses, we acknowledge that lifespan is a critical parameter and would provide a more comprehensive understanding of the PD models in our study. Based on this consideration and the reviewer’s feedback we have removed the starvation data from the manuscript. Unfortunately, we did not perform lifespan experiments, which is why these data were not included in the manuscript. However, based on our observations (though not detailed analysis), all genotypes tested—except for the PINK1 mutants—appeared to have a normal lifespan. For PINK1 mutants, most flies died by 25 days of age. Therefore, we conducted our assays using 15-day-old PINK1 mutant flies.

      Do the fly models used have different lifespans, and how close to death was the SING assay performed? Different mutations show different effects, most phenotypes are really mild (hRab39BG192R has no phenotype), and PINK1 has the strongest, are these simply reflections of how strong the model is?  

      The ages of flies we analyzed are indicated in the legend. As mentioned before, all but PINK1 mutants- had a normal life span: i.e. we did not detect abnormal low number of flies or premature death at 50 days of age, except for the PINK1 mutants tested in this manuscript where most flies died by 25 days of age. Therefore, we conducted our assays using 15-day-old PINK1 mutant flies.

      Rab39G192R has no phenotype in the tests presented, suggesting no degeneration, why use RabG192R for scRNA seq? Seems an odd choice, the authors should explain. 

      Single-cell sequencing was initiated before the full phenotypic characterization of all mutants was completed. Although basic characterization of the Rab39<sup>G192R</sup> mutant PD flies revealed either no significant phenotypes or only mild effects in the assays performed (Figure 1), the sequencing data provided additional insights into potential cellular and molecular alterations. Furthermore, all PD-mutant knock-ins, including Rab39<sup>G192R</sup> mutant PD flies, show dysfunctional synaptic terminals of their OPN neurons as they had significantly weaker Ca<sup>2+</sup>-responses, even though their synaptic area was increased (Figure 4 g-h). Furthermore, all mutants also had olfactory behavior defects (Figure 5 a). 

      When the authors state that “For example, in the NBM, an area associated with PD (Arendt et al., 1983), 20% of the DEG that has an orthologous gene in the fly are also found among the most deregulated genes across PD fly models" a test should be performed to confirm this is a significant overlap (such as a hypergeometric test). 

      We have performed this test, of the 2486 significantly differential human genes, 1149 have a fly orthologue, and of these, 28.46 % overlap with the deregulated fly genes (5 % top and bottom gene as shown in Supplemental Table 7). Performing a hypergeometric test confirms that this overlap is significant, with a p-value of 9.06e<sup>76</sup>. We have included this in the text.

      The authors speak of deregulation when speaking of the overlap between human and fly DE genes, but do the over-expressed genes in flies overlap with overexpressed genes in humans, or is the direction of transcription deregulation not concordant? If it is mostly not concordant, can the authors please comment as to why they might think that is the case? 

      In our fly experiments, we identified DEG in affected cell types and then defined common DEG by looking at the average change across the fly mutants. Genes that show a consistent change (all or mostly up, or all or mostly down) in the different mutants will end at the top of our list while genes that are up in some mutants and downregulated in others will average out and not end up in our commonly deregulated gene list. For comparison to the human data, we only looked for the presence of the human homologue, but did not assess if the change occurred in the same direction. More work will be needed to define the most relevant changes, but in a mini-screen we did select a number of DEG present in fly and human datasets from different functional categories and tested if they genetically interact with our PD mutants. As shown in Reviewer Figure 3, we find that modulating proteostasis pathway-encoding genes rescue the olfactory preference defect across many PD mutants. 

      Can the authors explain why only the NMB region was used for comparison with the fly data?  

      We used the NMB because this region has the highest number of cholinergic neurons to compare the deregulation in those neurons to the deregulation in the cholinergic OPN of mutant PD flies.

      In Figure 4, can the genotypes please be stated in full and why is the hPINK1 fly giving no detectable signal? 

      Despite several attempts, we failed to knock-in wild type hPink1 in the fly pink1 locus. Therefore, the hPink1 control used throughout the manuscript was the nSybGal4>UAS-hPink1 in Pink1 knock-out background, except for Figure 4. Particularly, for experiments in this figure, we could not use UAS-hPink1 with nSyb-Gal4, since we needed OPN-specific expression of Gal4 to drive UAS-GCamP expression.

      Therefore, this was labeled as “not determined” (“n.d.”), as indicated in the figure and the legend. We explained this better in the methods section, added a remark in the new manuscript and expanded the legend of Figure 4.

      The paper states that" These findings imply that factors affecting the function of cholinergic neurons might, by the absence of insufficient innervation, lead to DAN problems and degeneration, warranting further exploration of the underlying molecular mechanisms", this should be less strong, the paper never looks at DAN, only at OPN neurons. Fly neurons are mostly cholinergic, and human neurons are mostly glutamatergic, so jumping from one system to the other might not be as straightforward, the authors should comment on this. 

      We now included a new exciting experiment where we assessed DAN function in aged PD mutants where the wildtype gene was expressed in OPN using GH146-Gal4. We find this manipulation rescued DAN defects (measured by SING) in older flies. We further corroborated our observation by “replacing” cholinergic innervation with nicotine feeding in PD mutants. Also, this rescues the SING defect as well as the defects in neuronal activity in PAM DAN (based on live synaptic calcium imaging). Finally, we also show that incubating LRRK2<sup>G2019S</sup> mutant human induced dopaminergic neurons with nicotine is sufficient to rescue functional defects in these neurons (measured using calcium imaging). We included this data in the new manuscript and show them also in Figure 6 above (new Figure 6 in the revised manuscript). 

      Experiments that would improve the manuscript:  

      Does rescue of OPN function also rescue later progressive symptoms (geotaxis response)?  

      It does, as indicated in the previous point and shown in Figure 6.

      Do the fly PD models used show DAN degeneration? This could be assessed by stains with anti-TH stains. 

      We quantified DAN cell bodies using anti-TH, but see very little or no loss. There is, however, loss of synaptic innervation of the PAM onto the mushroom bodies. We included the data in a new Figure 6 (see also Figure 6). Furthermore, we have quantified this across the genetic space of familial Parkinsonism in Kaempf et al., 2024, BioRxiv. Note that this phenotype is also rescued by expressing wildtype CDS in their OPN using GH146-Gal4.

      Minor issues: 

      The final sentence on page 5 is repetitive with the introduction. 

      Indeed, we removed the redundant sentence.

      First line of the new section on page 6, the authors probably mean cholinergic olfactory projection neurons, not just cholinergic neurons. 

      Yes, and corrected.

      At the top of page 7 the authors state: "Additionally, we also found enrichment of genes involved in RNA regulation and mitochondrial function that are also important for the functioning of synaptic terminals", where is the data showing this? The authors should point to the supplemental file showing this.  

      We now included a reference to Supplemental File 7 that includes a summary of those data. Additionally, we also included references to back this claim.

      Just before the discussion, Rab39BG193R should be Rab39BG192R.  

      Sorry for this, it is now corrected.

      Stating "fifth row" in Fig 5c and d is confusing, can the figure be labelled more clearly?  

      We modified the figure (including extra marks and colors) and expanded the legend and the main text to differentiate better between expression of the rescues in OPN versus T1 neurons revealing that only expression in OPN neurons rescues the olfactory defects while expression in T1 neurons does not.

      In the methods, the authors describe clustering done both in Scanpy and Seurant, why were both run? Which clustering was used for further analysis?

      We only used Scanpy with Harmony and removed the methods on Seurat-CCA. Thanks for pointing this out, this was a mistake in the methods section (copied from a previous version of the manuscript).

    1. eLife Assessment

      This study provides important findings on the nature of eye movement choices by human subjects. The study uses a novel approach and provides relatively clear and convincing results of the relationship between pupil size and saccade production. The results should be of interest to a broad audience interested in sensorimotor integration and sensory-guided decision-making.

    2. Reviewer #3 (Public review):

      Summary:

      This manuscript extends previous research by this group by relating variation in pupil size to the endpoints of saccades produced by human participants under various conditions including trial-based choices between pairs of spots and search for small items in natural scenes. Based on the premise that pupil size is a reliable proxy of "effort", the authors conclude that less costly saccade targets are preferred. Finding that this preference was influenced by the performance of a non-visual, attention-demanding task, the authors conclude that a common source of effort animates gaze behavior and other cognitive tasks.

      Strengths:

      Strengths of the manuscript include the novelty of the approach, the clarity of the findings, and the community interest in the problem.

      Weaknesses:

      Enthusiasm for this manuscript is reduced by the following weaknesses:

      (1) A relationship between pupil size and saccade production seems clear based on the authors' previous and current work. What is at issue is the interpretation. The authors test one, preferred hypothesis, and the narrative of the manuscript treats the hypothesis that pupil size is a proxy of effort as beyond dispute or question. The stated elements of their argument seem to go like this:<br /> PROPOSITION 1: Pupil size varies systematically across task conditions, being larger when tasks are more demanding.<br /> PROPOSITION 2: Pupil size is related to the locus coeruleus.<br /> PROPOSITION 3: The locus coeruleus NE system modulates neural activity and interactions.<br /> CONCLUSION: Therefore, pupil size indexes the resource demand or "effort" associated with task conditions.<br /> How the conclusion follows from the propositions is not self-evident. Proposition 3, in particular, fails to establish the link that is supposed to lead to the conclusion.

      (2) The authors test one, preferred hypothesis and do not consider plausible alternatives. Is "cost" the only conceivable hypothesis? The hypothesis is framed in very narrow terms. For example, the cholinergic and dopamine systems that have been featured in other researchers' consideration of pupil size modulation are missing here. Thus, because the authors do not rule out plausible alternative hypotheses, the logical structure of this manuscript can be criticized as committing the fallacy of affirming the consequent.

      (3) The authors cite particular publications in support of the claim that saccade selection is influenced by an assessment of effort. Given the extensive work by others on this general topic, the skeptic could regard the theoretical perspective of this manuscript as too impoverished. Their work may be enhanced by consideration of other work on this general topic, e.g, (i) Shenhav A, Botvinick MM, Cohen JD. (2013) The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron. 2013 Jul 24;79(2):217-40. (ii) Müller T, Husain M, Apps MAJ. (2022) Preferences for seeking effort or reward information bias the willingness to work. Sci Rep. 2022 Nov 14;12(1):19486. (iii) Bustamante LA, Oshinowo T, Lee JR, Tong E, Burton AR, Shenhav A, Cohen JD, Daw ND. (2023) Effort Foraging Task reveals a positive correlation between individual differences in the cost of cognitive and physical effort in humans. Proc Natl Acad Sci U S A. 2023 Dec 12;120(50):e2221510120.

      (4) What is the source of cost in saccade production? What is the currency of that cost? The authors state (page 13), "... oblique saccades require more complex oculomotor programs than horizontal eye movements because more neuronal populations in the superior colliculus (SC) and frontal eye fields (FEF) [76-79], and more muscles are necessary to plan and execute the saccade [76, 80, 81]." This statement raises questions and concerns. First, the basis of the claim that more neurons in FEF and SC are needed for oblique versus cardinal saccades is not established in any of the publications cited. Second, the authors may be referring to the fact that oblique saccades require coordination between pontine and midbrain circuits. This must be clarified. Second, the cost is unlikely to originate in extraocular muscle fatigue because the muscle fibers are so different from skeletal muscles, being fundamentally less fatigable. Third, if net muscle contraction is the cost, then why are upward saccades, which require the eyelid, not more expensive than downward? Thus, just how some saccades are more effortful than others is not clear.

      (5) The authors do not consider observations about variation in pupil size that seem to be incompatible with the preferred hypothesis. For example, at least two studies have described systematically larger pupil dilation associated with faster relative to accurate performance in manual and saccade tasks (e.g., Naber M, Murphy P. Pupillometric investigation into the speed-accuracy trade-off in a visuo-motor aiming task. Psychophysiology. 2020 Mar;57(3):e13499; Reppert TR, Heitz RP, Schall JD. Neural mechanisms for executive control of speed-accuracy trade-off. Cell Rep. 2023 Nov 28;42(11):113422). Is the fast relative to the accurate option necessarily more costly?

      (6) The authors draw conclusions based on trends across participants, but they should be more transparent about variation that contradicts these trends. In Figures 3 and 4 we see many participants producing behavior unlike most others. Who are they? Why do they look so different? Is it just noise, or do different participants adopt different policies?

      Comments on revisions:

      The authors have addressed the concerns and questions raised in the original review.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Vision is a highly active process. Humans move their eyes 3-4 times per second to sample information with high visual acuity from our environment, and where eye movements are directed is critical to our understanding of active vision. Here, the authors propose that the cost of making a saccade contributes critically to saccade selection (i.e., whether and where to move the eyes). The authors build on their own recent work that the effort (as measured by pupil size) that comes with planning and generating an eye movement varies with saccade direction. To do this, the authors first measured pupil size for different saccade directions for each participant. They then correlated the variations in pupil size obtained in the mapping task with the saccade decision in a free-choice task. The authors observed a striking correlation: pupil size in the mapping task predicted the decision of where to move the eyes in the free choice task. In this study, the authors provide a number of additional insightful analyses (e.g., based on saccade curvature, and saccade latency) and experiments that further support their claim that the decision to move the eyes is influenced by the effort to move the eyes in a particular direction. One experiment showed that the same influence of assumed saccade costs on saccade selection is observed during visual search in natural scenes. Moreover, increasing the cognitive load by adding an auditory counting task reduced the number of saccades, and in particular reduced the costly saccades. In sum, these experiments form a nice package that convincingly establishes the association between pupil size and saccade selection.

      We thank the reviewer for highlighting the novelty and cogency of our findings.

      In my opinion, the causal structure underlying the observed results is not so clear. While the relationship between pupil size and saccade selection is compelling, it is not clear that saccade-related effort (i.e., the cost of a saccade) really drives saccade selection. Given the correlational nature of this relationship, there are other alternatives that could explain the finding. For example, saccade latency and the variance in landing positions also vary across saccade directions. This can be interpreted for instance that there are variations in oculomotor noise across saccade directions, and maybe the oculomotor system seeks to minimize that noise in a free-choice task. In fact, given such a correlational result, many other alternative mechanisms are possible. While I think the authors' approach of systematically exploring what we can learn about saccade selection using pupil size is interesting, it would be important to know what exactly pupil size can add that was not previously known by simply analyzing saccade latency. For example, saccade latency anisotropies across saccade directions are well known, and the authors also show here that saccade costs are related to saccade latency. An important question would be to compare how pupil size and saccade latency uniquely contribute to saccade selection. That is, the authors could apply the exact same logic to their analysis by first determining how saccade latencies (or variations in saccade landing positions; see Greenwood et al., 2017 PNAS) vary across saccade directions and how this saccade latency map explains saccade selection in subsequent tasks. Is it more advantageous to use one or the other saccade metric, and how well does a saccade latency map correlate with a pupil size map?

      We thank the reviewer for the detailed comment. 1) The reviewer first points out the correlational nature of many of our results. Thereafter, 2), the reviewer asks whether saccade latencies and landing precision also predict saccade selection, and could be these potential predictors be considered alternative explanations to the idea of effort driving saccade selection? Moreover, what can pupil size add to what can be learned from saccade latency?

      In brief, although we report a combination of correlational and causal findings, we do not know of a more parsimonious explanation for our findings than “effort drives saccade selection”. Moreover, we demonstrate that oculomotor noise cannot be construed as an alternative explanation for our findings.

      (1) Correlational nature of many findings.

      We acknowledge that many of our findings are predominantly correlational in nature. In our first tasks, we correlated pupil size during saccade planning to saccade preferences in a subsequent task. Although the link between across tasks was correlational, the observed relationship clearly followed our previously specified directed hypothesis. Moreover, experiments 1 and 2 of the visual search data replicated and extended this relationship. We also directly manipulated cognitive demand in the second visual search experiment. In line with the hypothesis that effort affects saccade selection, participants executed less saccades overall when performing a (primary) auditory dual task, and even cut the costly saccades most – which actually constitutes causal evidence for our hypothesis. A minimal oculomotor noise account would not directly predict a reduction in saccade rate under higher cognitive demand. To summarize, we have a combination of correlational and causal findings, although mediators cannot be ruled out fully for the latter. That said, we do not know of a more fitting and parsimonious explanation for our findings than effort predicting saccade selection (see following points for saccade latencies). We now address causality in the discussion for transparency and point more explicitly to the second visual search experiment for causal evidence.

      “We report a combination of correlational and causal findings. Despite the correlational nature of some of our results, they consistently support the hypothesis that saccade costs predicts saccade selection [which we predicted previously, 33]. Causal evidence was provided by the dual-task experiment as saccade frequencies - and especially costly saccades were reduced under additional cognitive demand. Only a cost account predicts 1) a link between pupil size and saccade preferences, 2) a cardinal saccade bias, 3) reduced saccade frequency under additional cognitive demand, and 4) disproportional cutting of especially those directions associated with more pupil dilation. Together, our findings converge upon the conclusion that effort drives saccade selection.”

      (2) Do anisotropies in saccade latencies constitute an alternative explanation?

      First of all, we would like to to first stress that differences in saccade latencies are indeed thought to reflect oculomotor effort (Shadmehr et al., 2019; TINS). For example, saccades with larger amplitudes and saccades where distractors need to be ignored are associated with longer latencies. Therefore, even if saccade latencies would predict saccade selection, this would not contrast the idea that effort drives saccade selection. Instead, this would provide convergent evidence for our main novel conclusion: effort drives saccade selection. There are several reasons why pupil size can be used as a more general marker of effort (see responses to R2), but ultimately, our conclusions do not hinge on the employed measure of effort per se. As stressed above in 1), we see no equally parsimonious explanation besides the cost account. Moreover, we predicted this relationship in our previous publication before running the currently reported experiments and analyses (Koevoet et al., 2023). That said, we are open to discuss further alternative options and would be looking forward to test these accounts in future work against each other – we are welcoming the reviewers’ (but also the reader’s) suggestions.

      We now discuss this in the manuscript as follows:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost.

      Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      Second, we followed the reviewer’s recommendation in testing whether other oculomotor metrics would predict saccade selection. To this end, we conducted a linear regression across directions. We calculated pupil size, saccade latencies, landing precision and peak velocities maps from the saccade planning task. We then used AICbased backward model selection to determine the ‘best’ model model to determine which factor would predict saccade selection best. The best model included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences ~ pupil size + saccade latency + landing precision). Pupil size (b \=-42.853, t \= 4.791, p < .001) and saccade latency (b \=-.377, t \= 2.106, p \= .043; see Author response image 1) predicted saccade preferences significantly. In contrast, landing precision did not reach significance (b \= 23.631, t \= 1.675, p \= .104). This analysis shows that although saccade latency also predicts saccade preferences, pupil size remains a robust predictor of saccade selection. These findings demonstrate that minimizing oculomotor noise cannot fully explain the pattern of results.

      Author response image 1.

      The relationship between saccade latency (from the saccade planning task) and saccade preferences averaged across participants. Individual points reflect directions and shading represents bootstrapped 95% confidence intervals.

      We have added this argument into the manuscript, and discuss the analysis in the discussion. Details of the analysis have been added to the Supporting Information for transparency and further detail.

      “A control analysis ruled out that the correlation between pupil size and saccade preferences was driven by other oculomotor metrics such as saccade latency and landing precision (see Supporting Information).”

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      In addition to eye-movement-related anisotropies across the visual field, there are of course many studies reporting visual field anisotropies (see Himmelberg, Winawer & Carrasco, 2023, Trends in Neuroscience for a review). It would be interesting to understand how the authors think about visual field anisotropies in the context of their own study. Do they think that their results are (in)dependent on such visual field variations (see Greenwood et al., 2017, PNAS; Ohl, Kroell, & Rolfs, 2024, JEP:Gen for a similar discussion)?

      We agree that established visual field anisotropies are fascinating to be discussed in context of our own results. At the reviewer’s suggestion, we now expanded this discussion.

      The observed anisotropies in terms of saccade costs are likely related to established anisotropies in perception and early visual cortex. However, the exact way that these anisotropies may be linked remains elusive (i.e. what is cause, what is effect, are links causal?), and more research is necessary to understand how these are related.

      “The observed differences in saccade costs across directions could be linked to established anisotropies in perception [80–86], attention [87–92], saccade charac- teristics [87, 88, 92, 93], and (early) visual cortex [94–98] [also see 99]. For example, downward saccades are more costly than upward saccades, which mimics a similar asymmetry in early visual areas wherein the upper visual field is relatively under- represented [94–98]; similarly stronger presaccadic benefits are found for down- compared with upward saccades [87, 88]. Moreover, upward saccades are more pre- cise than downward saccades [93]. Future work should elucidate where saccade cost or the aforementioned anisotropies originate from and how they are related - something that pupil size alone cannot address.”

      We also added that the finding that more precise saccades are coupled with worse performance in a crowding task might be attributed to the increased effort associated with more precise saccades (Greenwood et al., 2017).

      “Adaptive resource allocation from, and to the oculomotor system parsimoniously explains a number of empirical observations. For example, higher cognitive demand is accompanied by smooth pursuits deviating more from to-be tracked targets [137], reduced (micro)saccade frequencies [Figure 4; 63, 64, 138, 139], and slower peak saccade velocities [140–142]. Relatedly, more precise saccades are accompanied with worse performance in a crowding task [93].”

      Finally, the authors conclude that their results "suggests that the eye-movement system and other cognitive operations consume similar resources that are flexibly allocated among each other as cognitive demand changes. The authors should speculate what these similar resources could mean? What are the specific operations of the auditory task that overlap in terms of resources with the eye movement system?

      We agree that the nature of joint resources is an interesting question. Our previous discussion was likely too simplistic here (see also responses to R3). We here specifically refer to the cognitive resources that one can flexibly distribute between tasks.

      Our data do not directly speak to the question of what the shared resources between the auditory and oculomotor tasks are. Nevertheless, both tasks charge working memory as saccade targets are mandatorily encoded into working memory prior to saccade onset (Van der Stigchel & Hollingworth, 2018), and the counting task clearly engages working memory. This may indicate some domain-generality between visual and auditory working memory during natural viewing (see Nozari & Martin, 2024 for a recent review), but this remains speculative. Another possibility is that not the working memory encoding associated with saccades per se, but that the execution of overt motor actions itself also requires cognitive processing as suggested by Beatty (1982): “the organization of an overt motor act places additional demands on informationprocessing resources that are reflected in the task-evoked pupillary response”.

      We have added upon this in more detail in the results and discussion sections.

      “Besides the costs of increased neural activity when exerting more effort, effort should be considered costly for a second reason: Cognitive resources are limited. Therefore, any unnecessary resource expenditure reduces cognitive and behavioral flexibility [22, 31, 36, 116]. As a result, the brain needs to distribute resources between cognitive operations and the oculomotor system. We found evidence for the idea that such resource distribution is adaptive to the general level of cognitive demand and available resources: Increasing cognitive demand through an additional pri- mary auditory dual task led to a lower saccade frequency, and especially costly sac- cades were cut. In this case, it is important to consider that the auditory task was the primary task, which should cause participants to distribute resources from the ocu- lomotor system to the counting task. In other situations, more resources could be distributed to the oculomotor system instead, for example to discover new sources of reward [22, 136]. Adaptive resource allocation from, and to the oculomotor system parsimoniously explains a number of empirical observations. For example, higher cognitive demand is accompanied by smooth pursuits deviating more from to-be tracked targets [137], reduced (micro)saccade frequencies [Figure 4; 63, 64, 138, 139], and slower peak saccade velocities [140–142]. Relatedly, more precise saccades are accompanied with worse performance in a crowding task [93]. Furthermore, it has been proposed that saccade costs are weighed against other cognitive operations such as using working memory [33, 143–146]. How would the resources between the oculomotor system and cognitive tasks (like the auditory counting task) be related? One possibility is that both consume from limited working memory resources [147, 148]. Saccades are thought to encode target objects in a mandatory fashion into (vi- sual) working memory [79], and the counting task requires participants to keep track of the auditory stream and maintain count of the instructed digit in working mem- ory. However, the exact nature of which resources overlap between tasks remain open for future investigation [also see 149]. Together, we propose that cognitive re- sources are flexibly (dis)allocated to and from the oculomotor system based on the current demands to establish an optimal balance between performance and cost minimization.”

      Reviewer #2 (Public Review):

      The authors attempt to establish presaccadic pupil size as an index of 'saccade effort' and propose this index as one new predictor of saccade target selection. They only partially achieved their aim: When choosing between two saccade directions, the less costly direction, according to preceding pupil size, is preferred. However, the claim that with increased cognitive demand participants would especially cut costly directions is not supported by the data. I would have expected to see a negative correlation between saccade effort and saccade direction 'change' under increased load. Yet participants mostly cut upwards saccades, but not other directions that, according to pupil size, are equally or even more costly (e.g. oblique saccades).

      Strengths:

      The paper is well-written, easy to understand, and nicely illustrated.

      The sample size seems appropriate, and the data were collected and analyzed using solid and validated methodology.

      Overall, I find the topic of investigating factors that drive saccade choices highly interesting and relevant.

      We thank the reviewer for pointing out the strengths of our paper.

      Weaknesses:

      The authors obtain pupil size and saccade preference measures in two separate tasks. Relating these two measures is problematic because the computations that underly saccade preparation differ. In Experiment 1, the saccade is cued centrally, and has to be delayed until a "go-signal" is presented; In Experiment 2, an immediate saccade is executed to an exogenously cued peripheral target. The 'costs' in Experiment 1 (computing the saccade target location from a central cue; withholding the saccade) do not relate to Experiment 2. It is unfortunate, that measuring presaccadic pupil size directly in the comparatively more 'natural' Experiment 2 (where saccades did not have to be artificially withheld) does not seem to be possible. This questions the practical application of pupil size as an index of saccade effort

      This is an important point raised by the reviewer and we agree that a discussion on these points improves the manuscript. We reply in two parts: 1) Although the underlying computations during saccade preparation might differ, and are therefore unlikely to be fully similar (we agree), we can still predict saccade selection between (Saccade planning to Saccade preference) and within tasks (Visual search). 2) Pupil size is a sluggish physiological signal, but this is outweighed by the advantages of using pupil size as a general marker of effort, also in the context of visual selection compared with saccade latencies.

      (1) Are delayed saccades (cost task) and the much faster saccades (preference task) linked?

      As the reviewer notes the underlying ‘type’ of oculomotor program may differ between voluntarily delayed-saccades and those in the saccade preference task. There are, however, also considerable overlaps between the oculomotor programs as the directions and amplitudes are identical. Moreover, the different types of saccades have considerable overlap in their underlying neural circuitry. Nevertheless, the underlying oculomotor programs likely still differ in some regard. Even despite these differences, we were able to measure differences across directions in both tasks, and costs and preferences were negatively and highly correlated between tasks. The finding itself therefore indicates that the costs of saccades measured during the saccade planning task generalize to those in the saccade preference task. Note also that we predicted this finding and idea already in a previous publication before starting the present study (Koevoet et al., 2023).

      We now address this interesting point in the discussion as follows:

      “We observed that aOordable saccades were preferred over costly ones. This is especially remarkable given that the delayed saccades in the planning task likely differ in their oculomotor program from the immediate saccades in the preference task in some regard.”

      (2) Is pupil size a sensible measure of saccade effort?

      As the reviewer points out, the pupillary signal is indeed relatively sluggish and therefore relatively slow and more artifical tasks are preferred to quantify saccade costs. This does not preclude pupil size from being applied in more natural settings, as we demonstrate in the search experiments – but a lot of care has to be taken to control for many possible confounding factors and many trials will be needed.

      That said, as saccade latencies may also capture differences in oculomotor effort (Shadmehr et al., 2019) they are a possible alternative option to assess effort in some oculomotor tasks (see below on why saccade latencies do not provide evidence for an alternative to effort driving saccade selection, but converging evidence). Whilst we do maintain that pupil size is an established and versatile physiological marker of effort, saccade latencies provide converging evidence for our conclusion that effort drives saccade selection.

      As for the saccade preference task, we are not able to analyze the data in a similar manner as in the visual search task for two reasons. First, the number of saccades is much lower than in the natural search experiments. Second, in the saccade preference task, there were always two possible saccade targets. Therefore, even if we were able to isolate an effort signal, this signal could index a multitude of factors such as deciding between two possible saccade targets. Even simple binary decisions go hand in hand with reliable pupil dilations as they require effort (e.g. de Gee et al., 2014).

      There are three major reasons why pupil size is a more versatile marker of saccade costs than saccade latencies (although as mentioned, latencies may constitute another valuable tool to study oculomotor effort). First, pupil size is able to quantify the cost of attentional shifts more generally, including covert attention as well as other effector systems such as head and hand movements. This circumvents the issue of different latencies of different effector systems and also allows to study attentional processes that are not associated with overt motor movements. Second, saccade latencies are difficult to interpret in natural viewing data, as fixation duration and saccade latencies are inherently confounded by one another. This makes it very difficult to separate oculomotor processes and the extraction of perceptual information from a fixated target. Thus, pupil size is a versatile marker of attentional costs in a variety of settings, and can measure costs that saccade latencies cannot (i.e. covert attention). Lastly, pupil size is highly established as a marker of effort which has been demonstrated across wide range of cognitive tasks and therefore not bound to eye movements alone (Bumke, 1911; Koevoet et al., 2024; Laeng et al., 2012; Loewenfeld, 1958; Mathôt, 2018; Robison & Unsworth, 2019; Sirois & Brisson, 2014; Strauch et al., 2022; van der Wel & van Steenbergen, 2018).

      We now discuss this as follows:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      The authors claim that the observed direction-specific 'saccade costs' obtained in Experiment 1 "were not mediated by differences in saccade properties, such as duration, amplitude, peak velocity, and landing precision (Figure 1e,f)". Saccade latency, however, was not taken into account here but is discussed for Experiment 2.

      The final model that was used to test for the observed anisotropies in pupil size across directions indeed did not include saccade latencies as a predictor. However, we did consider saccade latencies as a potential predictor originally. As we performed AICbased backward model selection, however, this predictor was removed due to the marginal predictive contribution of saccade latency beyond other predictors explaining pupil size.

      For completeness, we here report the outcome of a linear mixed-effects that does include saccade latency as a predictor. Here, saccade latencies did not predict pupil size (b \= 1.859e-03, t \= .138, p \= .889). The asymmetry effects remained qualitatively unchanged: preparing oblique compared with cardinal saccades resulted in a larger pupil size (b \= 7.635, t \= 3.969, p < .001), and preparing downward compared with upward saccades also led to a larger pupil size (b \= 3.344, t \= 3.334, p \= .003).

      The apparent similarity of saccade latencies and pupil size, however, is striking. Previous work shows shorter latencies for cardinal than oblique saccades, and shorter latencies for horizontal and upward saccades than downward saccades - directly reflecting the pupil sizes obtained in Experiment 1 as well as in the authors' previous study (Koevoet et al., 2023, PsychScience).

      As the reviewer notes, there are substantial asymmetries across the visual field in saccade latencies. These assymetries in saccade latency could also predict saccade preferences. We will reply to this in three points: 1) even if saccade latency is a predictor of saccade preferences, this would not constitute as an alternative explanation to the conclusion of effort driving saccade selection, 2) saccade latencies show an up-down asymmetry but oblique-cardinal effects in latency may not be generalizable across saccade tasks, 3) pupil size remains a robust predictor of saccade preferences even when saccade latencies are considered as a predictor of saccade preferences.

      (1) We want to first stress that saccade latencies are thought to reflect oculomotor effort (Shadmehr et al., 2019). For example, saccades with larger amplitudes and saccades where distractors need to be ignored are associated with longer latencies. Therefore, even if saccade latencies predict saccade selection, this would not contrast the idea that effort drives saccade selection. Instead, this would provide convergent evidence for our main conclusion – effort predicting saccade selection (rather than pupil size predicting saccade selection per se).

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      (2) We first tested anisotropies in saccade latency in the saccade planning task (Wilkinson notation: latency ~ obliqueness + updownness + leftrightness + saccade duration + saccade amplitude + saccade velocity + landing error + (1+obliqueness + updownness|participant)). We found upward latencies to be shorter than downward saccade latencies (b \= -.535, t \= 3.421, p \= .003). In addition, oblique saccades showed shorter latencies than cardinal saccades (b \= -1.083, t \= 3.096, p \= .002) – the opposite of what previous work has demonstrated.

      We then also tested these latency anisotropies in another dataset wherein participants (n \= 20) saccaded toward a single peripheral target as fast as possible (Koevoet et al., submitted; same amplitude and eccentricity as in the present manuscript). There we did not find a difference in saccade latency between cardinal and oblique targets, but we did observe shorter latencies for up- compared with downward saccades. We are therefore not sure in which situations oblique saccades do, or do not differ from cardinal saccades in terms of latency, and even in which direction the effect occurs.

      In contrast, we have now demonstrated a larger pupil size prior to oblique compared with cardinal saccades in two experiments. This indicates that pupil size may be a more reliable and generalizable marker of saccade costs than saccade latency. However, this remains to be investigated further.

      (3) To gain further insights into which oculomotor metrics would predict saccade selection, we conducted a linear regression across directions. We created pupil size, saccade latencies, landing precision and peak velocities maps from the saccade planning task. We then used AIC-based model selection to determine the ‘best’ model to determine which factor would predict saccade selection best. The selected model included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences ~ pupil size + saccade latency + landing precision). Pupil size (b \=-42.853, t \= 4.791, p < .001) and saccade latency (b \=-.377, t \= 2.106, p \= .043) predicted saccade preferences significantly. In contrast, landing precision did not reach significance (b \= 23.631, t \= 1.675, p \= .104). This analysis shows that although saccade latency predicts saccade preferences, pupil size remains a robust predictor of saccade selection.

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      The authors state that "from a costs-perspective, it should be eOicient to not only adjust the number of saccades (non-specific), but also by cutting especially expensive directions the most (specific)". However, saccade targets should be selected based on the maximum expected information gain. If cognitive load increases (due to an additional task) an effective strategy seems to be to perform less - but still meaningful - saccades. How would it help natural orienting to selectively cut saccades in certain (effortful) directions? Choosing saccade targets based on comfort, over information gain, would result in overall more saccades to be made - which is non-optimal, also from a cost perspective.

      We thank the reviewer for this comment. Although we do not fully agree, the logic is quite close to our rationale and it is worth adding a point of discussion here. A vital part of the current interpretation is the instruction given to participants. In our second natural visual search task, participants were performing a dual task, where the auditory task was the primary task, whilst the search task was secondary. Therefore, participants are likely to adjust their resources to optimize performance on the primary task – at the expense of the secondary task. Therefore, less resources are made available and used to searching in the dual than in the single task, because these resources are needed for the auditory task. Cutting expensive directions does not help search in terms of search performance, but it does reduce the cost of search, so that more resources are available for the prioritized auditory task. Also note that the search task was rather difficult – participants did it, but it was tough (see the original description of the dataset for more details), which provides another reason to go full in on the auditory task at expense of the visual task. This, however, opens up a nice point of discussion: If one would emphasize the importance of search (maybe with punishment or reward), we would indeed expect participants to perform whichever eye movements are getting them to their goal fastest – thus reducing the relative influence of costs on saccade behavior. This remains to be tested however - we are working on this and are looking forward to discussing such findings in the future.

      Together, we propose that there is a trade-off between distributing resources either towards cognitive tasks or the oculomotor system (also see Ballard et al., 1995; Van der Stigchel, 2020). How these resources are distributed depends highly on the current task demands (also see Sahakian et al., 2023). This allows for adaptive behavior in a wide range of contexts.

      We now added these considerations to the manuscript as follows (also see our previous replies):

      “Do cognitive operations and eye movements consume from a similar pool of resources [44]? If so, increasing cognitive demand for non-oculomotor processes should result in decreasing available resources for the oculomotor system. In line with this idea, previous work indeed shows altered eye-movement behavior un- der effort as induced by dual tasks, for example by making less saccades under increased cognitive demand [62–64]. We therefore investigated whether less sac- cades were made as soon as participants had to count the occurrence of a specific digit in the auditory number stream in comparison to ignoring the stream (in Exp. 2; Figure 4a). Participants were instructed to prioritize the auditory digit-counting task over finding the visual search target. Therefore, resources should be shifted from the oculomotor system to the primary auditory counting task. The additional cognitive demand of the dual task indeed led to a decreased saccade frequency (t(24) = 7.224, p < .001, Cohen’s d = 1.445; Figure 4h).”

      I would have expected to see a negative correlation between saccade effort and saccade direction 'change' under increased load. Yet participants mostly cut upwards saccades, but not other directions that, according to pupil size, are equally or even more costly (e.g. oblique saccades).

      The reviewer’s point is taken from the initial comment, which we will address here. First, we’d like to point out that is it not established that saccade costs in different directions are always the same. Instead, it is possible that saccade costs could be different in natural viewing compared with our delayed-saccade task. Therefore, we used pupil size during natural viewing for the search experiments. Second, the reviewer correctly notes that oblique saccades are hardly cut when under additional cognitive demand. However, participants already hardly execute oblique saccades when not confronted with the additional auditory task (Figure 4b, d), making it difficult to reduce those further (i.e. floor effect). Participants chose to cut vertical saccades, possibly because these are more costly than horizontal saccades.

      We incorporated these point in our manuscript as follows:

      “To test this, we analyzed data from two existing datasets [63] wherein participants (total n = 41) searched for small targets (’Z’ or ’H’) in natural scenes (Figure 4a; [64]). Again, we tested whether pupil size prior to saccades negatively linked with saccade preferences across directions. Because saccade costs and preferences across directions could differ for different situations (i.e. natural viewing vs. saccade preference task), but should always be negatively linked, we established both cost and preferences independently in each dataset.”

      “We calculated a saccade-adjustment map (Figure 4g) by subtracting the saccade preference map in the single task (Figure 4f) from the dual task map (Fig- ure 4d). Participants seemingly cut vertical saccades in particular, and made more saccades to the top right direction. This pattern may have emerged as vertical saccades are more costly than horizontal saccades (also see Figure 1d). Oblique saccades may not have been cut because there were very little oblique saccades in the single condition to begin with (Figure 4d), making it difficult to observe a further reduction of such saccades under additional cognitive demand (i.e. a floor effect).”

      Overall, I am not sure what practical relevance the relation between pupil size (measured in a separate experiment) and saccade decisions has for eye movement research/vision science. Pupil size does not seem to be a straightforward measure of saccade effort. Saccade latency, instead, can be easily extracted in any eye movement experiment (no need to conduct a separate, delayed saccade task to measure pupil dilation), and seems to be an equally good index.

      There are two points here.

      (1) What is the practical relevance of a link between effort and saccade selection for eyemovement research and vision science?

      We see plenty – think of changing eye movement patterns under effort (be it smooth pursuits, saccade rates, distributions of gaze positions to images etc.) which have substantial implications for human factors research, but also neuropsychology. With a cost account, one may predict (rather than just observe) how eye movement changes as soon as resources are reduced/ non-visual demand increases. With a cost account, we can explain such effects (e.g. lower saccade rates under effort, cardinal bias, perhaps also central bias) parsimoniously that cannot be explained by what is so far referred to as the three core drivers of eye movement behavior (saliency, selection history, goals, e.g., Awh et al., 2012). Conversely, one must wonder why eye-movement research/vision science simply accepts/dismisses these phenomena as such, without seeking overarching explanations.

      (2) What is the usefulness of using pupil size to measure effort?

      We hope that our replies to the comments above illustrate why pupil size is a sensible, robust and versatile marker of attentional costs. We briefly summarize our most important points here.

      - Pupil size is an established measure of effort irrespective of context, as demonstrated by hundreds of original works (e.g. working memory load, multiple object tracking, individual differences in cognitive ability). This allows pupil size to be a versatile marker of the effort, and therefore costs, of non-saccadic attentional shifts such as covert attention or those realized by other effector systems (i.e. head or hand movements).

      - Our new analysis indicates that pupil size remains a strong and robust predictor of saccade preference, even when considering saccade latency.

      - Pupil size allows to study saccade costs in natural viewing. In contrast, saccade latencies are difficult to assess in natural viewing as fixation durations and saccade latencies are intrinsically linked and very difficult to disentangle.

      - Note however, that we think that it is interesting and useful so study effects of effort/cost on eye movement behavior. Whichever index is used to do so, we see plenty potential in this line of research, this paper is a starting point to do so.

      Reviewer #3 (Public Review):

      This manuscript extends previous research by this group by relating variation in pupil size to the endpoints of saccades produced by human participants under various conditions including trial-based choices between pairs of spots and search for small items in natural scenes. Based on the premise that pupil size is a reliable proxy of "effort", the authors conclude that less costly saccade targets are preferred. Finding that this preference was influenced by the performance of a non-visual, attentiondemanding task, the authors conclude that a common source of effort animates gaze behavior and other cognitive tasks.

      Strengths:

      Strengths of the manuscript include the novelty of the approach, the clarity of the findings, and the community interest in the problem.

      We thank the reviewer for pointing out the strengths of our paper.

      Weaknesses:

      Enthusiasm for this manuscript is reduced by the following weaknesses:

      (1) A relationship between pupil size and saccade production seems clear based on the authors' previous and current work. What is at issue is the interpretation. The authors test one, preferred hypothesis, and the narrative of the manuscript treats the hypothesis that pupil size is a proxy of effort as beyond dispute or question. The stated elements of their argument seem to go like this:

      PROPOSITION 1: Pupil size varies systematically across task conditions, being larger when tasks are more demanding.

      PROPOSITION 2: Pupil size is related to the locus coeruleus.

      PROPOSITION 3: The locus coeruleus NE system modulates neural activity and interactions.

      CONCLUSION: Therefore, pupil size indexes the resource demand or "effort" associated with task conditions.

      How the conclusion follows from the propositions is not self-evident. Proposition 3, in particular, fails to establish the link that is supposed to lead to the conclusion.

      We inadvertently laid out this rationale as described above, and we thank the reviewer for pointing out this initial suboptimal structure of argumentation. The notion that the link between pupil size and effort is established in the literature because of its neural underpinnings is inaccurate. Instead, the tight link between effort and pupil size is established based on covariations of pupil diameter and cognition across a wide variety of tasks and domains. In line with this, we now introduce this tight link predominantly based on the relationships between pupil size and cognition instead of focusing on putative neural correlates of this relationship.

      As reviewed previously (Beatty, 1982; Bumke, 1911; Kahneman, 1973; Kahneman & Beatty, 1966; Koevoet et al., 2024; Laeng et al., 2012; Mathôt, 2018; Sirois & Brisson, 2014; Strauch et al., 2022; van der Wel & van Steenbergen, 2018), any increase in effort is consistently associated with an increase in pupil size. For instance, the pupil dilates when increasing load in working memory or multiple object tracking tasks, and such pupillary effects robustly explain individual differences in cognitive ability and fluctuations in performance across trials (Alnæs et al., 2014; Koevoet et al., 2024; Robison & Brewer, 2020; Robison & Unsworth, 2019; Unsworth & Miller, 2021). This extends to the planning of movements as pupil dilations are observed prior to the execution of (eye) movements (Koevoet et al., 2023; Richer & Beatty, 1985). The link between pupil size and effort has thus been firmly established for a long time, irrespective of the neural correlates of these effort-linked pupil size changes.

      We again thank the reviewer for spotting this logical mistake, and now revised the paragraph where we introduce pupil size as an established marker of effort as follows:

      “We recently demonstrated that the effort of saccade planning can be measured with pupil size, which allows for a physiological quantification of saccade costs as long as low-level visual factors are controlled for [33]. Pupil size is an established marker of effort [36–44]. For instance, loading more in working memory or tracking more objects results in stronger pupil dilation [44–52]. Pupil size not only reflects cognitive (or mental) effort but also the effort of planning and executing movements [37, 53, 54]. We leveraged this to demonstrate that saccade costs can be captured with pupil size, and are higher for oblique compared with cardinal directions [33]. Here, we addressed whether saccade costs predict where to saccade.”

      We now mention the neural correlates of pupil size only in the discussion. Where we took care to also mention roles for other neurotransmitter systems:

      “Throughout this paper, we have used cost in the limited context of saccades.

      However, cost-based decision-making may be a more general property of the brain [31, 36, 114–116]. Every action, be it physical or cognitive, is associated with an in- trinsic cost, and pupil size is likely a general marker of this [44]. Note, however, that pupil dilation does not always reflect cost, as the pupil dilates in response to many sensory and cognitive factors which should be controlled for, or at least considered, when interpreting pupillometric data [e.g., see 39, 40, 42, 117]. Effort-linked pupil dilations are thought to be, at least in part, driven by activity in the brainstem locus coeruleus (LC) [40, 118–120] [but other neurotransmitters also affect pupil size, e.g. 121, 122]. Activity in LC with its widespread connections throughout the brain [120, 123–127] is considered to be crucial for the communication within and between neu- ral populations and modulates global neural gain [128–132]. Neural firing is costly [22, 133], and therefore LC activity and pupil size are (neuro)physiologically plausible markers of cost [40]. Tentative evidence even suggests that continued exertion of effort (accompanied by altered pupil dilation) is linked to the accumulation of glutamate in the lateral prefrontal cortex [134], which may be a metabolic marker of cost [also see 116, 134, 135]. “

      (2) The authors test one, preferred hypothesis and do not consider plausible alternatives. Is "cost" the only conceivable hypothesis? The hypothesis is framed in very narrow terms. For example, the cholinergic and dopamine systems that have been featured in other researchers' consideration of pupil size modulation are missing here. Thus, because the authors do not rule out plausible alternative hypotheses, the logical structure of this manuscript can be criticized as committing the fallacy of aOirming the consequent.

      As we have noted in the response to the reviewer’s first point, we did not motivate our use of pupil size as an index of effort clearly enough. For the current purpose, the neural correlates of pupil size are less relevant than the cognitive correlates (see previous point). We reiterate that the neuromodulatory underpinnings of the observed pupil size effects (which indeed possibly include effects of the cholinergic, dopaminergic and serotonergic systems), while interesting for the discussion on the neural origin of effects, are not crucial to our conclusion. We hope the new rationale (without focusing too much on the (irrelevant) exact neural underpinnings) convinces the reviewer and reader.

      Our changes to the manuscript are shown in our reply to the previous comment.

      The reviewer notes that other plausible alternative hypotheses could explain the currently reported results. However, we did not find a more parsimonuous explanation for our data than ‘Effort Drives Saccade Selection’. Effort explains why participants prefer saccading toward specific directions in (1) highly controlled and (2) more natural settings. Note that we also predicted this effect previously (Koevoet et al., 2023). Moreover, this account explains (3) why participants make less saccades under additional cognitive demand, and (4) why especially costly saccades are reduced under additional cognitive demand. We are very open to the reviewer presenting other possible interpretations of our data so these can be discussed to be put to test in future work.

      (3) The authors cite particular publications in support of the claim that saccade selection is influenced by an assessment of effort. Given the extensive work by others on this general topic, the skeptic could regard the theoretical perspective of this manuscript as too impoverished. Their work may be enhanced by consideration of other work on this general topic, e.g, (i) Shenhav A, Botvinick MM, Cohen JD. (2013) The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron. 2013 Jul 24;79(2):217-40. (ii) Müller T, Husain M, Apps MAJ. (2022) Preferences for seeking effort or reward information bias the willingness to work. Sci Rep. 2022 Nov 14;12(1):19486. (iii) Bustamante LA, Oshinowo T, Lee JR, Tong E, Burton AR, Shenhav A, Cohen JD, Daw ND. (2023) Effort Foraging Task reveals a positive correlation between individual differences in the cost of cognitive and physical effort in humans. Proc Natl Acad Sci U S A. 2023 Dec 12;120(50):e2221510120.

      We thank the reviewer for pointing us toward this literature. These papers are indeed relevant for our manuscript, and we have now incorporated them. Specifically, we now discuss how the costs of effort are weighed in relation to possible rewards during decision-making. We have also incorporated work that has investigated how the biomechanical costs of arm movements contribute to action selection.

      “Our findings are in line with established effort-based models that assume costs to be weighed against rewards during decision-making [102–107]. In such studies, reward and cognitive/physical effort are often parametrically manipulated to as- sess how much effort participants are willing to exert to acquire a given (monetary) reward [e.g. 108, 109]. Whereas this line of work manipulated the extrinsic costs and/or rewards of decision options (e.g. perceptual consequences of saccades [110, 111] or consequences associated with decision options), we here focus on the intrin- sic costs of the movement itself (in terms of cognitive and physical effort). Relatedly, the intrinsic costs of arm movements are also considered during decision-making: biomechanically aOordable movements are generally preferred over more costly ones [26–28]. We here extend these findings in two important ways. First, until now, the intrinsic costs of saccades and other movements have been inferred from gaze behavior itself or by using computational modelling [23, 25–28, 34, 35, 112]. In con- trast, we directly measured cost physiologically using pupil size. Secondly, we show that physiologically measured saccade costs predict where saccades are directed in a controlled binary preference task, and even during natural viewing. Our findings could unite state-of-the-art computational models [e.g. 23, 25, 34, 35, 113] with physiological data, to directly test the role of saccade costs and ultimately further our understanding of saccade selection.”

      (4) What is the source of cost in saccade production? What is the currency of that cost? The authors state (page 13), "... oblique saccades require more complex oculomotor programs than horizontal eye movements because more neuronal populations in the superior colliculus (SC) and frontal eye fields (FEF) [76-79], and more muscles are necessary to plan and execute the saccade [76, 80, 81]." This statement raises questions and concerns. First, the basis of the claim that more neurons in FEF and SC are needed for oblique versus cardinal saccades is not established in any of the publications cited. Second, the authors may be referring to the fact that oblique saccades require coordination between pontine and midbrain circuits. This must be clarified. Second, the cost is unlikely to originate in extraocular muscle fatigue because the muscle fibers are so different from skeletal muscles, being fundamentally less fatigable. Third, if net muscle contraction is the cost, then why are upward saccades, which require the eyelid, not more expensive than downward? Thus, just how some saccades are more effortful than others is not clear.

      Unfortunately, our current data do not allow for the specification of what the source is of differences in saccade production, nor what the currency is. We want to explicitly state that while pupil size is a sensitive measure of saccade costs, pupil size cannot directly inform what underlying mechanisms are causing differences in saccade costs across conditions (e.g. directions). Nevertheless, we do speculate about these issues because they are important to consider. We thank the reviewer for pointing out the shortcomings in our initial speculations.

      Broadly, we agree with the reviewer that a neural source of differences in costs between different types of saccades is more likely than a purely muscular account (also see Koevoet et al., 2023). Furthermore, we think that the observed differences in saccade costs for oblique vs. cardinal and up vs. down could be due to different underlying mechanisms. While we caution against overinterpreting single directions, tentative evidence for this may also be drawn by the different time course of effects for up/down versus cardinal/oblique, Figure 1c.

      Below we speculate about why some specific saccade directions may be more costly than others:

      Why would oblique saccades be more costly than cardinal saccades? We thank the reviewer for pointing out that oblique saccades additionally require coordination between pontine and midbrain circuits (Curthoys et al., 1984; King & Fuchs, 1979; Sparks, 2002). This point warrants more revised discussion compared to our initial version. We have incorporated this as follows:

      “The complexity of an oculomotor program is arguably shaped by its neural underpinnings. For example, oblique but not cardinal saccades require communication between pontine and midbrain circuits [73–75]. Such differences in neural complexity may underlie the additional costs of oblique compared with cardinal saccades. Besides saccade direction, other properties of the ensuing saccade such as its speed, distance, curvature, and accuracy may contribute to a saccade’s total cost [22, 33, 53, 76, 77] but this remains to be investigated directly.”

      Why would downward saccades be more costly than upward saccades? As the reviewer points out: from a net muscular contraction account of cost, one would expect the opposite pattern due to the movement of the eyelid. Instead, we speculate that our findings may be associated with the well-established anisotropy in early visual cortex along the vertical meridian. Specifically, the upper vertical meridian is represented at substantially less detail than the lower vertical meridian (Himmelberg et al., 2023; Silva et al., 2018). Prior to a saccade, attention is deployed towards the intended saccadic endpoint (Deubel & Schneider, 1996; Kowler et al., 1995). Attention tunes neurons to preferentially process the attended location over non-attended locations. Due to the fact that the lower visual field is represented at higher detail than the upper visual field, attention may tune neuronal responses differently when preparing up- compared with downward saccades (Hanning et al., 2024; Himmelberg et al., 2023). Thus, it may be more costly to prepare down- compared with upward saccades. This proposition, however, does not account for the lower costs associated horizontal compared with up- and downward saccades as the horizontal meridian is represented at a higher acuity than the vertical merdian. This makes it unlikely that this explains the pattern of results completely. Again, at this point we can only speculate why costs differ, yet we demonstrate that these differences in cost are decisive for oculomotor behavior. We now explicitly state the speculative nature of these ideas that would all need to be tested directly.

      We have updated our discussion of this issue as follows:

      “The observed differences in saccade costs across directions could be linked to established anisotropies in perception [80–86], attention [87–92], saccade charac- teristics [87, 88, 92, 93], and (early) visual cortex [94–98] [also see 99]. For example, downward saccades are more costly than upward saccades, which mimics a similar asymmetry in early visual areas wherein the upper visual field is relatively under- represented [94–98]; similarly stronger presaccadic benefits are found for down- compared with upward saccades [87, 88]. Moreover, upward saccades are more pre- cise than downward saccades [93]. Future work should elucidate where saccade cost or the aforementioned anisotropies originate from and how they are related - something that pupil size alone cannot address.”

      (5) The authors do not consider observations about variation in pupil size that seem to be incompatible with the preferred hypothesis. For example, at least two studies have described systematically larger pupil dilation associated with faster relative to accurate performance in manual and saccade tasks (e.g., Naber M, Murphy P. Pupillometric investigation into the speed-accuracy trade-off in a visuo-motor aiming task. Psychophysiology. 2020 Mar;57(3):e13499; Reppert TR, Heitz RP, Schall JD. Neural mechanisms for executive control of speed-accuracy trade-off. Cell Rep. 2023 Nov 28;42(11):113422). Is the fast relative to the accurate option necessarily more costly?

      We thank the reviewer for this interesting point that we will answer in two ways. First, we discuss the main point: the link between pupil size, effort, and cost. Second, we discuss the findings described specifically in these two papers and how we interpret these from a pupillometric account.

      First, one may generally ask whether 1) any effort results in pupil dilation, 2) whether any effort is costly, and 3) whether this means that pupil dilation always reflects effort and cost respectively. Indeed, it has been argued repeatedly, prominently, and independently (e.g., Bumke, 1911; Mathôt, 2018) that any change in effort (no matter the specific origin) is associated with an evoked pupil dilation. Effort, in turn, is consistently and widely experienced as aversive, both across tasks and cultures (David et al., 2024). Effort minimization may therefore be seen as an universal law of human cognition and behavior with effort as a to-be minimized cost (Shadmehr et al., 2019; Hull 1943, Tsai 1932). However, this does not imply that any pupil dilation necessarily reflects effort or that, as a consequence thereof, any pupil dilation is always signaling cost. For instance, the pupil dark response, the pupil far response and changes in baseline pupil size are not associated with effort. Baseline and task-evoked pupil dilation responses have to be interpreted differently (see below), moreover, the pupil also changes (and dilates) due to other factors (see Strauch et al., 2022; Mathôt, 2018, Bumke 1911, Loewenfeld, 1999 for reviews).

      Second, as for Naber & Murphy (2020) & Reppert at al. (2023) specifically: Both Reppert et al. (2023) and Naber & Murphy (2020) indeed demonstrate a larger baseline pupil size when participants made faster, less accurate responses. However, baseline pupil size is not an index of effort per-se, but task-evoked pupil dilation responses are (as studied in the present manuscript) (Strauch et al., 2022). For work on differences between baseline pupil diameter and task-evoked pupil responses, and their respective links with exploration and exploitation please see Jepma & Nieuwenhuis (2011). Indeed, the link between effort and larger pupil size holds for task evoked responses, but not baseline pupil size per se (also see Koevoet et al., 2023).

      Still, Naber (third author of the current paper) & Murphy (2020) also demonstrated larger task-evoked pupil dilation responses when participants were instructed to make faster, less accurate responses compared with making accurate and relatively slow responses. However, this difference in task-evoked response gains significance only after the onset of the movement itself, and peaks substantially later than response offset. Whilst pupil dilation may be sluggish, it isn’t extremely sluggish either. As feedback to the performance of the participant was displayed 1.25s after performing the movement and clicking (taking about 630ms), we deem it possible that this effect may in part result from appraising the feedback to the participant rather than the speed of the response itself (in fact, Naber and Murphy also discuss this option). In addition to not measuring saccades but mouse movements, it is therefore possible that the observed evoked pupil effects in Naber & Murphy (2020) are not purely linked to motor preparation and execution per se. Therefore, future work that aims to investigate the costs of movements should isolate the effects of feedback and other potential factors that may drive changes in pupil size. This will help clarify whether fast or more accurate movements could be linked to the underlying costs of the movements.

      Relatedly, we do not find evidence that pupil size during saccade planning predicts the onset latency of the ensuing saccade (please refer to our second response to Reviewer 2 for a detailed discussion).

      Together, we therefore do not see the results from Reppert et al. (2023) and Naber & Murphy (2020) to be at odds with our interpretation of evoked pupil size reflecting effort and cost in the context of planning saccades.

      We think that these are considerations important to the reader, which is why we now added them to the discussion as follows:

      “Throughout this paper, we have used cost in the limited context of saccades.

      However, cost-based decision-making may be a more general property of the brain [31, 36, 114–116]. Every action, be it physical or cognitive, is associated with an in- trinsic cost, and pupil size is likely a general marker of this [44]. Note, however, that pupil dilation does not always reflect cost, as the pupil dilates in response to many sensory and cognitive factors which should be controlled for, or at least considered, when interpreting pupillometric data [e.g., see 39, 40, 42, 117].”

      (6) The authors draw conclusions based on trends across participants, but they should be more transparent about variation that contradicts these trends. In Figures 3 and 4 we see many participants producing behavior unlike most others. Who are they? Why do they look so different? Is it just noise, or do different participants adopt different policies?

      We disagree with the transparency point of the reviewer. Note that we deviated from the norm here by being more transparent than common: we added individual data points and relationships rather than showing pooled effects across participants with error bars alone (see Figures 2c, 3b,c, 4c,e,f).

      Moreover, our effects are consistent and stable across participants and are highly significant. To illustrate, for the classification analysis based on cost (Figure 2E) 16/20 participants showed an effect. As for the natural viewing experiments (total > 250,000 fixations), we also find that a majority of participants show the observed effects: Experiment 1: 15/16 participants; Experiment 2: 16/25 participants; Experiment 2 – adjustment: 22/25 participants.

      We fully agree that it’s interesting to understand where interindividual variation may originate from. We currently have too little data to allow robust analyses across individuals and zooming in on individual differences in cost maps, preference maps, or potential personalized strategies of saccade selection. That said, future work could study this further. We would recommend to hereby reduce the number of directions to gain more pupil size data per direction and therefore cleaner signals that may be more informative on the individual level. With such stronger signals, studying (differences in) links on an individual level may be feasible and would be interesting to consider – and will be a future direction in our own work too. Nonetheless, we again stress that the reported effects are robust and consistent across participants, and that interindividual differences are therefore not extensive. Moreover, our results from four experiments consistently support our conclusion that effort drives saccade selection.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations For The Authors):

      - Based on the public review, I would recommend that the authors carefully review and correct the manuscript with regard to the causal conclusions. The study is largely correlational (i.e. the pupil was only observed, not manipulated) and therefore does not allow causal conclusions to be drawn about the relationship between pupil size and saccade selection. These causal conclusions become even more confusing when pupil size is equated with effort and saccade cost. As a consequence, an actual correlation between pupil size and saccade selection has led to the title that effort drives saccade selection. It would also be helpful for the reader to summarize in an additional section of the discussion what they consider to be a causal or correlational link based on their results.

      We agree with the reviewer, and we have indeed included more explicitly which findings are correlational and which causal in detail now. As outlined before we do not see a more parimanious explanation for our findings than our title, but we fully agree that the paper benefits from making the correlational/causal nature of evidence for this idea explicitly transparent.

      “We report a combination of correlational and causal findings. Despite the correlational nature of some of our results, they consistently support the hypothesis that saccade costs predicts saccade selection [which we predicted previously, 33]. Causal evidence was provided by the dual-task experiment as saccade frequencies - and especially costly saccades were reduced under additional cognitive demand. Only a cost account predicts 1) a link between pupil size and saccade preferences, 2) a cardinal saccade bias, 3) reduced saccade frequency under additional cognitive demand, and 4) disproportional cutting of especially those directions associated with more pupil dilation. Together, our findings converge upon the conclusion that effort drives saccade selection.”

      - Can the authors please elaborate in more detail on how they transformed the predictors of their linear mixed model for the visualization in Figure 1f? It is difficult to see how the coeOicients in the table and the figure match.

      We used the ‘effectsize’ package to provide effect sizes of for each predictor of the linear mixed-effects model (https://cran.r-project.org/web/packages/effectsize/index.html). We report absolute effect sizes to make it visually easier to compare different predictors. These details have now been included in the Methods section to be more transparent about how these effect sizes were computed.

      “Absolute effect sizes (i.e. r) and their corresponding 95% confidence intervals for the linear mixed-effects models were calculated using t and df values with the ’effectsize’ package (v0.8.8) in R.”

      - Could the authors please explain in more detail why they think that a trial-by-trial analysis in the free choice task adds something new to their conclusions? In fact, a trialby-trial analysis somehow suggests that the pupil size data would enter the analysis at a single trial level. If I understand correctly, the pupil size data come from their initial mapping task. So there is only one mean pupil size for a given participant and direction that goes into their analysis to predict free choice in a single trial. If this is the case, I don't see the point of doing this additional analysis given the results shown in Figure 2c.

      The reviewer understands correctly that pupil size data is taken from the initial mapping task. We then used these mean values to predict which saccade target would be selected on a trial-by-trial basis. While showing the same conceptual result as the correlation analysis, we opted to include this analysis to show the robustness of the results across individuals. Therefore we have chosen to keep the analysis in the manuscript but now write more clearly that this shows the same conceptual finding as the correlation analysis.

      “As another test of the robustness of the effect, we analyzed whether saccade costs predicted saccade selection on a trial-by-trial basis. To this end, we first determined the more aOordable option for each trial using the established saccade cost map (Figure 1d). We predicted that participants would select the more aOordable option. Complementing the above analyses, the more aOordable option was chosen above chance level across participants (M = 56.64%, 95%-CI = [52.75%-60.52%], one-sample t-test against 50%: t(19) = 3.26, p = .004, Cohen’s d = .729; Figure 2e). Together, these analyses established that saccade costs robustly predict saccade preferences.”

      Reviewer #2 (Recommendations For The Authors):

      The authors report that "Whenever the difference in pupil size between the two options was larger, saccades curved away more from the non-selected option (β = .004, SE = .001, t = 4.448, p < .001; Figure 3b), and their latencies slowed (β = .050, SE = .013, t = 4.323, p < .001; Figure 3c)". I suspect this effect might not be driven by the difference but by a correlation between pupil size and latency.

      The authors correlate differences in pupil size (Exp1) with saccade latencies (Exp2), I recommend correlating pupil size with the latency directly, in either task. This would show if it is actually the difference between choices or simply the pupil size of the respective individual option that is linked to latency/effort. Same for curvature.

      The reviewer raises a good point. Please see the previous analyses concerning the possible correlations between pupil size and saccade latency, and how they jointly predict saccade selection.

      Our data show that saccade curvature and latencies are linked with the difference in pupil size between the selected and non-selected options. Are these effects driven by a difference in pupil size or by the pupil size associated with the chosen option?

      To assess this, we conducted two linear mixed-effects models. We predicted saccade curvature and latency using pupil size (from the planning task) of the selected and nonselected options while controlling for the chosen direction (Wilkinson notation: saccade curvature/latency ~ selected pupil size + non-selected pupil size + obliqueness + vertical + horizontal + (1+ selected pupil size + non-selected pupil size|participant). We found that saccades curved away more from costlier the non-selected targets (β \=1.534, t \= 8.151, p < .001), and saccades curved away from the non-selected target less when the selected target was cheaper (β \=-2.571, t \= -6.602, p < .001). As the costs of the selected and non-selected show opposite effects on saccade curvature, this indicates that the difference between the two options drives oculomotor conflict.

      As for saccade latencies, we found saccade onsets to slow when the cost of the selected target was higher (b \= .068, t \= 2.844, p \= .004). In contrast, saccade latencies were not significantly affected by the cost of the non-selected target (β \= -.018, t \= 1.457, p \= .145), although numerically the effect was in the opposite direction. This shows that latencies were primarily driven by the cost of the selected target but a difference account cannot be fully ruled out.

      Together, these analyses demonstrate that the difference in costs between two alternatives reliably affects oculomotor conflict as indicated by the curvature analysis. However, saccade latencies are predominantly affected by the cost of the selected target – even when controlling for the obliqueness, updownness and leftrightness of the ensuing saccade. We have added these analyses here for completeness, but because the findings seem inconclusive for saccade latency we have chosen to not include these analyses in the current paper. We are open to including these analyses in the supplementary materials if the reviewer and/or editor would like us to, but have chosen not to do so due to conciseness and to keep the paper focused.

      I was wondering why the authors haven't analyzed the pupil size in Experiment 2. If the pupil size can be assessed during a free viewing task (Experiment 3), shouldn't it be possible to also evaluate it in the saccade choice task?

      We did not analyze the pupil size data from the saccade preference task for two reasons. First, the number of saccades is much lower than in the natural search experiments (~14.000 vs. ~250.000). Second, in the saccade preference task, there were always two possible saccade targets. Therefore, even if we were able to isolate an effort signal, this signal could index a multitude of factors such as deciding between two possible saccade targets (de Gee et al., 2014), and has the possibility of two oculomotor programs being realized instead of only a single one (Van der Stigchel, 2010).

      Discussion: "due to stronger presaccadic benefits for upward compared with downward saccades [93,94]". I think this should be the other way around.

      We thank the reviewer for pointing this out. We have corrected our mistake in the revised manuscript.

      Saccade latencies differ around the visual field; to account for that, results / pupil size should be (additionally) evaluated relative to saccade onset (rather than cue offset). It is interesting that latencies were not accounted for here (Exp1), since they are considered for Exp2 (where they correlate with a pupil size difference). I suspect that latencies not only correlate with the difference in pupil size, but directly with pupil size itself.

      We agree with the reviewer that locking the pupil size signal to saccade onset instead of cue offset may be informative. We included an analysis in the supporting information that investigates this (see Figure S1). The results of the analysis were conceptually identical.

      The reviewer writes that latencies were not accounted for in Experiment 1. Although saccade latency was not included in the final model reported in the paper, it was considered during AIC-based backward model selection. As saccade latency did not predict meaningful variance in pupil size, it was ultimately not included in the analysis as a predictor. For completeness, we here report the outcome of a linear mixed-effects that does include saccade latency as a predictor. Here, saccade latencies did not predict pupil size (β \= 1.859e-03, t \= .138, p \= .889). The assymetry effects remained qualitatively unchanged: preparing oblique compared with cardinal saccades resulted in a larger pupil size (β \= 7.635, t \= 3.969, p < .001), and preparing downward compared with upward saccades also led to a larger pupil size (β \= 3.344, t \= 3.334, p \= .003).

      In addition, we have included a new analysis in the supporting information that directly addresses this issue. We will reiterate the main results here:

      “To ascertain whether pupil size or other oculomotor metrics predict saccade preferences, we conducted a multiple regression analysis. We calculated average pupil size, saccade latency, landing precision and peak velocity maps across all 36 directions. The model, determined using AIC-based backward selection, included pupil size, latency and landing precision as predictors (Wilkinson notation: saccade preferences  pupil size + saccade latency + landing precision). The analysis re- vealed that pupil size (β = -42.853, t = 4.791, p < .001) and saccade latency (β = -.377, t = 2.106, p = .043) predicted saccade preferences. Landing precision did not reach significance (β = 23.631, t = 1.675, p = .104). Together, this demonstrates that although other oculomotor metrics such as saccade latency contribute to saccade selection, pupil size remains a robust marker of saccade selection.”

      We have also added this point in our discussion:

      “We here measured cost as the degree of effort-linked pupil dilation. In addition to pupil size, other markers may also indicate saccade costs. For example, saccade latency has been proposed to index oculomotor effort [100], whereby saccades with longer latencies are associated with more oculomotor effort. This makes saccade latency a possible complementary marker of saccade costs (also see Supplemen- tary Materials). Although relatively sluggish, pupil size is a valuable measure of attentional costs for (at least) two reasons. First, pupil size is a highly established as marker of effort, and is sensitive to effort more broadly than only in the context of saccades [36–45, 48]. Pupil size therefore allows to capture not only the costs of saccades, but also of covert attentional shifts [33], or shifts with other effectors such as head or arm movements [54, 101]. Second, as we have demonstrated, pupil size can measure saccade costs even when searching in natural scenes (Figure 4). During natural viewing, it is difficult to disentangle fixation duration from saccade latencies, complicating the use of saccade latency as a measure of saccade cost. Together, pupil size, saccade latency, and potential other markers of saccade cost could fulfill complementary roles in studying the role of cost in saccade selection.”

      References

      Alnæs, D., Sneve, M. H., Espeseth, T., Endestad, T., van de Pavert, S. H. P., & Laeng, B. (2014). Pupil size signals mental eFort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleus. Journal of Vision, 14(4), 1. https://doi.org/10.1167/14.4.1

      Awh, E., Belopolsky, A. V., & Theeuwes, J. (2012). Top-down versus bottom-up attentional control: A failed theoretical dichotomy. Trends in Cognitive Sciences, 16(8), 437–443. https://doi.org/10.1016/j.tics.2012.06.010

      Ballard, D. H., Hayhoe, M. M., & Pelz, J. B. (1995). Memory Representations in Natural Tasks. Journal of Cognitive Neuroscience, 7(1), 66–80. https://doi.org/10.1162/jocn.1995.7.1.66

      Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 91(2), 276–292. https://doi.org/10.1037/0033-2909.91.2.276

      Bumke, O. (1911). Die Pupillenstörungen bei Geistes-und Nervenkrankheiten (2nd ed.). Fischer.

      Curthoys, I. S., Markham, C. H., & Furuya, N. (1984). Direct projection of pause neurons to nystagmusrelated excitatory burst neurons in the cat pontine reticular formation. Experimental Neurology, 83(2), 414–422. https://doi.org/10.1016/S0014-4886(84)90109-2

      David, L., Vassena, E., & Bijleveld, E. (2024). The unpleasantness of thinking: A meta-analytic review of the association between mental eFort and negative aFect. Psychological Bulletin, 150(9), 1070–1093. https://doi.org/10.1037/bul0000443

      de Gee, J. W., Knapen, T., & Donner, T. H. (2014). Decision-related pupil dilation reflects upcoming choice and individual bias. Proceedings of the National Academy of Sciences, 111(5), E618–E625. https://doi.org/10.1073/pnas.1317557111

      Deubel, H., & Schneider, W. X. (1996). Saccade target selection and object recognition: Evidence for a common attentional mechanism. Vision Research, 36(12), 1827–1837. https://doi.org/10.1016/0042-6989(95)00294-4

      Greenwood, J. A., Szinte, M., Sayim, B., & Cavanagh, P. (2017). Variations in crowding, saccadic precision, and spatial localization reveal the shared topology of spatial vision. Proceedings of the National Academy of Sciences, 114(17), E3573–E3582. https://doi.org/10.1073/pnas.1615504114

      Hanning, N. M., Himmelberg, M. M., & Carrasco, M. (2024). Presaccadic Attention Depends on Eye Movement Direction and Is Related to V1 Cortical Magnification. Journal of Neuroscience, 44(12). https://doi.org/10.1523/JNEUROSCI.1023-23.2023

      Himmelberg, M. M., Winawer, J., & Carrasco, M. (2023). Polar angle asymmetries in visual perception and neural architecture. Trends in Neurosciences, 46(6), 445–458. https://doi.org/10.1016/j.tins.2023.03.006

      Jepma, M., & Nieuwenhuis, S. (2011). Pupil Diameter Predicts Changes in the Exploration–Exploitation Trade-oF: Evidence for the Adaptive Gain Theory. Journal of Cognitive Neuroscience, 23(7), 1587– 1596. https://doi.org/10.1162/jocn.2010.21548

      Kahneman, D. (1973). Attention and Effort. Prentice-Hall.

      Kahneman, D., & Beatty, J. (1966). Pupil diameter and load on memory. Science (New York, N.Y.), 154(3756), 1583–1585. https://doi.org/10.1126/science.154.3756.1583

      King, W. M., & Fuchs, A. F. (1979). Reticular control of vertical saccadic eye movements by mesencephalic burst neurons. Journal of Neurophysiology, 42(3), 861–876. https://doi.org/10.1152/jn.1979.42.3.861

      Koevoet, D., Strauch, C., Naber, M., & Van der Stigchel, S. (2023). The Costs of Paying Overt and Covert Attention Assessed With Pupillometry. Psychological Science, 34(8), 887–898. https://doi.org/10.1177/09567976231179378

      Koevoet, D., Strauch, C., Van der Stigchel, S., Mathôt, S., & Naber, M. (2024). Revealing visual working memory operations with pupillometry: Encoding, maintenance, and prioritization. WIREs Cognitive Science, e1668. https://doi.org/10.1002/wcs.1668

      Kowler, E., Anderson, E., Dosher, B., & Blaser, E. (1995). The role of attention in the programming of saccades. Vision Research, 35(13), 1897–1916. https://doi.org/10.1016/0042-6989(94)00279-U

      Laeng, B., Sirois, S., & Gredebäck, G. (2012). Pupillometry: A Window to the Preconscious? Perspectives on Psychological Science, 7(1), 18–27. https://doi.org/10.1177/1745691611427305

      Loewenfeld, I. E. (1958). Mechanisms of reflex dilatation of the pupil. Documenta Ophthalmologica, 12(1), 185–448. https://doi.org/10.1007/BF00913471

      Mathôt, S. (2018). Pupillometry: Psychology, Physiology, and Function. Journal of Cognition, 1(1), 16. https://doi.org/10.5334/joc.18

      Naber, M., & Murphy, P. (2020). Pupillometric investigation into the speed-accuracy trade-oF in a visuomotor aiming task. Psychophysiology, 57(3), e13499. https://doi.org/10.1111/psyp.13499

      Nozari, N., & Martin, R. C. (2024). Is working memory domain-general or domain-specific? Trends in Cognitive Sciences, 0(0). https://doi.org/10.1016/j.tics.2024.06.006

      Reppert, T. R., Heitz, R. P., & Schall, J. D. (2023). Neural mechanisms for executive control of speedaccuracy trade-oF. Cell Reports, 42(11). https://doi.org/10.1016/j.celrep.2023.113422

      Richer, F., & Beatty, J. (1985). Pupillary Dilations in Movement Preparation and Execution. Psychophysiology, 22(2), 204–207. https://doi.org/10.1111/j.1469-8986.1985.tb01587.x

      Robison, M. K., & Brewer, G. A. (2020). Individual diFerences in working memory capacity and the regulation of arousal. Attention, Perception, & Psychophysics, 82(7), 3273–3290. https://doi.org/10.3758/s13414-020-02077-0

      Robison, M. K., & Unsworth, N. (2019). Pupillometry tracks fluctuations in working memory performance. Attention, Perception, & Psychophysics, 81(2), 407–419. https://doi.org/10.3758/s13414-0181618-4

      Sahakian, A., Gayet, S., PaFen, C. L. E., & Van der Stigchel, S. (2023). Mountains of memory in a sea of uncertainty: Sampling the external world despite useful information in visual working memory. Cognition, 234, 105381. https://doi.org/10.1016/j.cognition.2023.105381

      Shadmehr, R., Reppert, T. R., Summerside, E. M., Yoon, T., & Ahmed, A. A. (2019). Movement Vigor as a Reflection of Subjective Economic Utility. Trends in Neurosciences, 42(5), 323–336. https://doi.org/10.1016/j.tins.2019.02.003

      Silva, M. F., Brascamp, J. W., Ferreira, S., Castelo-Branco, M., Dumoulin, S. O., & Harvey, B. M. (2018). Radial asymmetries in population receptive field size and cortical magnification factor in early visual cortex. NeuroImage, 167, 41–52. https://doi.org/10.1016/j.neuroimage.2017.11.021

      Sirois, S., & Brisson, J. (2014). Pupillometry. WIREs Cognitive Science, 5(6), 679–692. https://doi.org/10.1002/wcs.1323

      Sparks, D. L. (2002). The brainstem control of saccadic eye movements. Nature Reviews Neuroscience, 3(12), Article 12. https://doi.org/10.1038/nrn986

      Strauch, C., Wang, C.-A., Einhäuser, W., Van der Stigchel, S., & Naber, M. (2022). Pupillometry as an integrated readout of distinct attentional networks. Trends in Neurosciences, 45(8), 635–647. https://doi.org/10.1016/j.tins.2022.05.003

      Unsworth, N., & Miller, A. L. (2021). Individual DiFerences in the Intensity and Consistency of Attention. Current Directions in Psychological Science, 30(5), 391–400. https://doi.org/10.1177/09637214211030266

      Van der Stigchel, S. (2010). Recent advances in the study of saccade trajectory deviations. Vision Research, 50(17), 1619–1627. https://doi.org/10.1016/j.visres.2010.05.028

      Van der Stigchel, S. (2020). An embodied account of visual working memory. Visual Cognition, 28(5–8), 414–419. https://doi.org/10.1080/13506285.2020.1742827

      Van der Stigchel, S., & Hollingworth, A. (2018). Visuospatial Working Memory as a Fundamental Component of the Eye Movement System. Current Directions in Psychological Science, 27(2), 136–143. https://doi.org/10.1177/0963721417741710

      van der Wel, P., & van Steenbergen, H. (2018). Pupil dilation as an index of eFort in cognitive control tasks: A review. Psychonomic Bulletin & Review, 25(6), 2005–2015. https://doi.org/10.3758/s13423-018-1432-y

    1. eLife Assessment

      This manuscript presents valuable findings showing that rapamycin directly activates the cool-sensing ion channel, TRPM8, acting through a different binding site than other small-molecule cooling agents such as menthol. The use of Ca2+-imaging, electrophysiology, and computational biology provides solid evidence to support the finding. The authors also present a novel NMR-based method to help identify details of the binding site interactions. In this revised version, some analysis and the presentation have been corrected and improved. Their findings provide insights into TRP channel pharmacology and may indicate previously unknown physiological effects or therapeutic mechanisms of the immunosuppressant, rapamycin.

    2. Reviewer #1 (Public review):

      Summary:

      In this valuable study, the authors found that the macrolide drug rapamycin, which is an important pharmacological tool in the clinic and the research lab, is less specific than previously thought. They provide solid functional evidence that rapamycin activates TRPM8 and begin to develop an NMR method to measure the specific binding of a ligand to a membrane protein.

      Strengths:

      The authors use a variety of complementary experimental techniques in several different systems, and their results support the conclusions drawn.

      Weaknesses:

      The proposed location of the rapamycin binding pocket within the membrane means that molecular docking approaches designed for soluble proteins alone do not provide solid evidence for a rapamycin binding pocket location in TRPM8, but the authors are appropriately careful in stating that the model is consistent with their functional experiments. The novel STTD method is intriguing and supportive of the functional results and docking predictions, but further validation of this method is needed.

      Impact:

      This work provides still more evidence for the polymodality of TRP channels, reminding both TRP channel researchers and those who use rapamycin in other contexts that the adjective "specific" is only meaningful in the context of what else has been explicitly tested.

      Comments on revisions:

      The authors have addressed my major concerns from the previous round of revision, and I agree that those things that remain un-done are outside the scope of this manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      Tóth and Bazeli et al. find rapamycin activates heterologously-expressed TRPM8 and dissociated sensory neurons in a TRPM8-dependent way with Ca2+-imaging. With electrophysiology and STTD-NMR, they confirmed the activation is through direct interaction with TRPM8. Using mutants and computational modeling, the authored localized the binding site to the groove between S4 and S5, different than the binding pocket of cooling agents such as menthol. The hydroxyl group on carbon 40 within the cyclohexane ring in rapamycin is indispensable for activation, while other rapalogs with its replacement, such as everolimus, still bind but cannot activate TRPM8. Overall, the findings provide new insights into TRPM8 functions and may indicate previously-unknown physiological effects or therapeutic mechanisms of rapamycin.

      Strengths:

      The authors spent extensive effort on demonstration that the interaction between TRPM8 and rapamycin is direct. The evidence is solid. In probing the binding site and the structural-function relationship, the authors combined computational simulation and functional experiments. It is very impressive to see that "within" a rapamycin molecule, the portion shared with everolimus is for "binding", while the hydroxyl group in the cyclohexane ring is for activation. Such detailed dissection represents a successful trial in computational biology-facilitated, functional experiment-validated study of TRP channel structural-activity relationship. The research draws the attention of scientists, including those outside the TRP channel field, to previously-neglected effects of rapamycin, and therefore the manuscript deserves broad readership.

      Weaknesses:

      The significance of the research could be improved by showing or discussing whether a similar binding pocket is present in other TRP channels, and hence rapalogs might bind to or activate these TRP channels. Additionally, while the finding on TRPM8 is novel, it is worthwhile to perform more comprehensive pharmacological characterization, including single-channel recording and a few more mutant studies to offer further insight into the mechanism of rapamycin binding to S4~S5 pocket driving channel opening. It is also necessary to know if rapalogs have independent or synergistic effects on top of other activators, including cooling agents and lower temperature, and its dependence on regulators such as PIP2.

      Additional discussion that might be helpful:

      The authors did confirm that rapamycin does not activate TRPV1, TRPA1 and TRPM3. But other TRP channels, particularly other structurally-similar TRPM channels, should be discussed or tested. Alignment of the amino acid sequences or structures at the predicted binding pocket might predict some possible outcomes. In particular, rapamycin is known to activate TRPML1 in a PI(3,5)P2-dependent manner, which should be highlighted in comparison among TRP channels (PMID: 35131932, 31112550).

      After revision:

      I acknowledge that the authors have addressed some of the questions in their revised version. They have explained that additional experiments might be beyond the scope of the current study. I appreciate their effort in doing their best to improve the manuscript and to leave the rest in discussion.

    4. Reviewer #3 (Public review):

      Summary:

      Rapamycin is a macrolide of immunologic therapeutic importance, proposed as a ligand of mTOR. It is also employed as in essays to probe protein-protein interactions.<br /> The authors serendipitously found that the drug rapamycin and some related compounds, potently activate the cationic channel TRPM8, which is the main mediator of cold sensation in mammals. The authors show that rapamycin might bind to a novel binding site that is different from the binding site for menthol, the prototypical activator of TRPM8. These convincing results are important to a wide audience, since rapamycin is a widely used drug and is also employed in essays to probe protein-protein interactions, which could be affected by potential specific interactions of rapamycin with other membrane proteins, as illustrated herein.

      Strengths:

      The authors employ several experimental approaches to convincingly show that rapamycin activates directly the TRPM8 cation channel and not an accessory protein or the surrounding membrane. In general, the electrophysiological, mutational and fluorescence imaging experiments are adequately carried out and cautiously interpreted, presenting a clear picture of the direct interaction with TRPM8. In particular, the authors convincingly show that the interactions of rapamycin with TRPM8 are distinct from interactions of menthol with the same ion channel.

      Weaknesses:

      The main weakness of the manuscript was the NMR method employed to show that rapamycin binds to TRPM8. The authors developed and deployed a novel signal processing approach based on subtraction of several independent NMR spectra to show that rapamycin binds to the TRPM8 protein and not to the surrounding membrane or other proteins. In this revised version the authors have strengthened the evidence that the method gives solid results and have improved the clarity of the presentation.

      Comments on revisions:

      The authors have greatly improved the quality of the presentation of the NMR data and have answered my concerns regarding the new methodology. The manuscript is improved and represents an important contribution.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this valuable study, the authors found that the macrolide drug rapamycin, which is an important pharmacological tool in the clinic and the research lab, is less specific than previously thought. They provide solid functional evidence that rapamycin activates TRPM8 and develop an NMR method to measure the specific binding of a ligand to a membrane protein.

      Strengths:

      The authors use a variety of complementary experimental techniques in several different systems, and their results support the conclusions drawn.

      Weaknesses:

      Controls are not shown in all cases, and a lack of unity across the figures makes the flow of the paper disjointed. The proposed location of the rapamycin binding pocket within the membrane means that molecular docking approaches designed for soluble proteins alone do not provide solid evidence for a rapamycin binding pocket location in TRPM8, but the authors are appropriately careful in stating that the model is consistent with their functional experiments.

      Impact:

      This work provides still more evidence for the polymodality of TRP channels, reminding both TRP channel researchers and those who use rapamycin in other contexts that the adjective "specific" is only meaningful in the context of what else has been explicitly tested.

      Reviewer #2 (Public Review):

      Summary:

      Tóth and Bazeli et al. find rapamycin activates heterologously-expressed TRPM8 and dissociated sensory neurons in a TRPM8-dependent way with Ca2+-imaging. With electrophysiology and STTD-NMR, they confirmed the activation is through direct interaction with TRPM8. Using mutants and computational modeling, the authored localized the binding site to the groove between S4 and S5, different than the binding pocket of cooling agents such as menthol. The hydroxyl group on carbon 40 within the cyclohexane ring in rapamycin is indispensable for activation, while other rapalogs with its replacement, such as everolimus, still bind but cannot activate TRPM8. Overall, the findings provide new insights into TRPM8 functions and may indicate previously unknown physiological effects or therapeutic mechanisms of rapamycin.

      Strengths:

      The authors spent extensive effort on demonstrating that the interaction between TRPM8 and rapamycin is direct. The evidence is solid. In probing the binding site and the structural-function relationship, the authors combined computational simulation and functional experiments. It is very impressive to see that "within" a rapamycin molecule, the portion shared with everolimus is for "binding", while the hydroxyl group in the cyclohexane ring is for activation. Such detailed dissection represents a successful trial in the computational biology-facilitated, functional experiment-validated study of TRP channel structuralactivity relationship. The research draws the attention of scientists, including those outside the TRP channel field, to previously neglected effects of rapamycin, and therefore the manuscript deserves broad readership.

      Weaknesses:

      The significance of the research could be improved by showing or discussing whether a similar binding pocket is present in other TRP channels, and hence rapalogs might bind to or activate these TRP channels. Additionally, while the finding on TRPM8 is novel, it is worthwhile to perform more comprehensive pharmacological characterization, including single-channel recording and a few more mutant studies to offer further insight into the mechanism of rapamycin binding to S4~S5 pocket driving channel opening. It is also necessary to know if rapalogs have independent or synergistic effects on top of other activators, including cooling agents and lower temperature, and their dependence on regulators such as PIP2.

      Additional discussion that might be helpful:

      The authors did confirm that rapamycin does not activate TRPV1, TRPA1 and TRPM3. But other TRP channels, particularly other structurally similar TRPM channels, should be discussed or tested. Alignment of the amino acid sequences or structures at the predicted binding pocket might predict some possible outcomes. In particular, rapamycin is known to activate TRPML1 in a PI(3,5)P2-dependent manner, which should be highlighted in comparison among TRP channels (PMID: 35131932, 31112550).

      Reviewer #3 (Public Review):

      Summary:

      Rapamycin is a macrolide of immunologic therapeutic importance, proposed as a ligand of mTOR. It is also employed as in essays to probe protein-protein interactions.

      The authors serendipitously found that the drug rapamycin and some related compounds, potently activate the cationic channel TRPM8, which is the main mediator of cold sensation in mammals. The authors show that rapamycin might bind to a novel binding site that is different from the binding site for menthol, the prototypical activator of TRPM8. These solid results are important to a wide audience since rapamycin is a widely used drug and is also employed in essays to probe protein-protein interactions, which could be affected by potential specific interactions of rapamycin with other membrane proteins, as illustrated herein.

      Strengths:

      The authors employ several experimental approaches to convincingly show that rapamycin activates directly the TRPM8 cation channel and not an accessory protein or the surrounding membrane. In general, the electrophysiological, mutational and fluorescence imaging experiments are adequately carried out and cautiously interpreted, presenting a clear picture of the direct interaction with TRPM8. In particular, the authors convincingly show that the interactions of rapamycin with TRPM8 are distinct from interactions of menthol with the same ion channel.

      Weaknesses:

      The main weakness of the manuscript is the NMR method employed to show that rapamycin binds to TRPM8. The authors developed and deployed a novel signal processing approach based on subtraction of several independent NMR spectra to show that rapamycin binds to the TRPM8 protein and not to the surrounding membrane or other proteins. While interesting and potentially useful, the method is not well developed (several positive controls are missing) and is not presented in a clear manner, such that the quality of data can be assessed and the reliability and pertinence of the subtraction procedure evaluated.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points

      (1) Given the novelty of the STTD NMR approach, please provide more details and data for the reader.

      • I would like to see all of the collected spectra so that readers can see and judge the effect sizes for themselves, perhaps as an additional supplementary figure.

      We agree with the reviewer that the data transparency of the NMR measurements should be improved. We changed panel C of Figure 2 in the main text and provided all the STD and the computed STDD and STTD spectra recorded on one set of experiments. We carried out additional experimental replicas on new samples and addressed the variability of cell samples by rescaling the STD effects based on reference <sup>1</sup>H measurements. We provided supplementary spectra of the reference experiments without saturation (Figure S5) and the obtained STTD spectra from the three parallel NMR sessions (Figure S6).

      • I appreciate the labels for STDD-1, STDD-2, and STTD on the lower two spectra of Figure 2C. Is the top spectrum from STD-1 or is it prior to saturation? In Figure 2C, what do the x1 and x2 notations on the right-hand side of the spectra indicate?

      We showed the top spectrum as an overview and a demonstration of the spectral complexity of the samples. <sup>1</sup>H experiments were run before the STD measurements to assess the sample quality and stability. The demonstrated spectrum on sample 1 (TRPM8 with rapamycin in HEK cells) was recorded with more transients than the corresponding STDs, thus it is only visually comparable with the difference spectra after scaling (2x). Figure 2 was changed and all the spectra were replaced as mentioned before. All the recorded <sup>1</sup>H-experiments without saturation including the one removed are now available in the supplementary information (Figure S5).

      • The STTD NMR results with WT TRPM8 are consistent with rapamycin binding directly to the channel. Testing whether rapamycin binding observed with STTD NMR is disrupted by one of the most compelling mutations (D796A, D802A, G805A, or Q861A) would be a further test of this direct interaction.

      We thank the reviewer for the suggestion and agree that testing the most compelling mutants would be a promising next step. These mutations were generated in plasmid vectors and only transiently transfected into HEK cells. For NMR analysis we would need a high amount of cells stably overexpressing the mutant channels which were not available for experimentation.

      • Given that this is not a methods paper, it is probably outside the scope to further validate the STTD NMR measurements by performing parallel ITC, SPR, MST, or radiolabeled ligand experiments. Nevertheless, I would be excited to see such a comparison since STTD NMR appears to have promise as an experimental technique for assessing ligand binding to membrane proteins that does not require large amounts of purified protein or radioactive isotopes.

      We agree with the reviewer that additional independent biophysical measurements on the interactions are necessary to further validate the STTD methodology. This paper is a preliminary demonstration of the STTD concept and our group is currently working on the challenges of on-cell NMR (e.g., sample and spectral complexity) and the standardization of the proposed workflow.     

      (2) Please clarify the methods used to model of rapamycin binding. Docking can be imprecise in TRP channels, even with a sophisticated docking scheme (Hughes et al., 2019, doi: https://doi.org/10.7554/eLife.49572.001).  

      Thank you for mentioning this point and providing the reference. We have further clarified our methods and included the reference in our discussion, indicating the limitations of our approach.

      • As a positive control, does the docking strategy accurately predict binding of known compounds (menthol, icilin, etc.) to TRPM8 consistent with cryo-EM structures?  

      Yes, the binding site for menthol, based on a similar docking strategy as for rapamycin, is also presented, and matches with predictions from other publications. This is now clarified in the revised manuscript.

      • Why was homology modeling to the human sequence used with the mouse structure but not the avian structure?  

      At this onset of the project, only the avian structure was available, and it was used in the primary docking. Later, to get more precise docking relevant for human TRPM8 pharmacology, we did revert to the then available structure of the mouse ortholog.  

      • How many rapamycin structural clusters were built, and how many structures were there in each cluster? How many were used? "most populated" is unspecific.  

      Thank you for your comment. We have added the following highlighted information to the methods section to address your comment:

      “Representative conformations of rapamycin were identified by clustering of the 1000-membered pools, having the macrocycle backbone atoms compared with 1.0 Å RMSD cut-off. Middle structures of the ten most populated clusters, accounting for more than 90% of the total conformational ensemble generated by simulated annealing, were used for further docking studies. To refine initial docking results and to identify plausible binding sites, the above selected rapamycin structures were docked again, following the same protocol as above, except for the grid spacing which was set to 0.375 Å in the second pass. The resultant rapamycin-TRPM8 complexes were, again, clustered and ranked according to the corresponding binding free energies. Selected binding poses were subjected to further refinement. The three most populated and plausible binding poses were further refined by a third pass of docking, where amino acid side chains of TRPM8, identified in the previous pass to be in close contact with rapamycin (< 4 Å), were kept flexible. Grid volumes were reduced to these putative binding sites including all flexible amino acid side chains (21.0-26.2 Å x 26.2-31.5 Å x 24.8-29.2 Å).”

      However, it is important to clarify that the clusters are not built and their number is not specified by the user. The number of clusters found depends on how similar the structures are in the structural ensemble analyzed by clustering. A high number of clusters indicates a diverse, whereas a low number suggests a uniform structural ensemble. Furthermore, it is arbitrarily controlled by the similarity cutoff specified by the user. If the cutoff is selected well, then the number of structures is different in each cluster. There are some highly populated clusters and a few which only have one structure. The selection of how many cluster representatives are used is usually based on the decision of whether or not the sum of the population of selected clusters sufficiently covers the mapped conformational space.

      • Additionally, the rapamycin poses were generated using a continuum solvent model that is unlikely to replicate the conditions existing in the lipid bilayer or in a lipid-exposed binding pocket as is predicted here. It is therefore possible that the rapamycin poses chosen for docking do not represent the physiological rapamycin binding pose, hampering the ability of the docking algorithm to find an appropriate docking pocket.  

      • Furthermore, accurately docking that may bind to membrane-exposed pockets is a challenging problem, particularly because many scoring algorithms, including those employed by Autodock, do not distinguish between solvent-exposed and membrane-exposed faces of the protein. This affects the predicted binding energies.  

      We appreciate the reviewer's insightful comments. We add a note in discussion part, mentioning these important limitations.  

      • In Figure 4, it appears that the proposed rapamycin binding pocket is located at the interface between two subunits, but only one is shown. Is there any contact with residues in the neighboring subunit? Based on Figure S4, I assume not, but am unsure.

      Based on the estimated distances, we do not think that there are any relevant interactions with residues from neighboring subunits. This is now indicated in the results section.

      • Consider uploading the rapamycin-docked model to a public repository such as Zenodo for readers to examine and manipulate themselves  

      As suggested, the model will be uploaded in a public repository. A link to the file on Zenodo is now included.

      (3) Please discuss the spatial location of the proposed rapamycin binding pocket relative to the vanilloid binding pocket in TRPV1.

      • The mutagenesis indicates that D745, D802, G805, and Q861 are most important for rapamycin sensitivity in TRPM8. Interestingly, the proposed rapamycin binding pocket appears to overlap spatially with the vanilloid binding pocket in TRPV1. Consistent with this, Q861 aligns with E570 in TRPV1, which is a critical residue for resiniferatoxin sensitivity. Indeed, similar to Q861's modeled proximity to the cyclohexyl ring, the hydroxyl group of the vanillyl moity of capsaicin (4DY in 7LR0, for example) is in proximity to E750 in TRPV1. Additionally, searching PubChem by structural similarity suggests that vanillyl head group of the TRP channel modulators capsaicin and eugenol are similar structurally to the trans-2Methoxycyclohexan-1-ol ring. Without overlaying the two structures myself, it is difficult to say more than that, but I encourage the authors to comment on any similarities and differences they observe.

      • If the proposed rapamycin pocket is indeed similar to the location of the vanilloid binding site, the authors may wish to discuss other TRPM channel structures that show ligands and lipids bound to this pocket because this provides evidence that this pocket influences TRPM channel function. For example, how does the proposed rapamycin binding pocket compare to TRPM8 bound to agonist AITC (PDBID 8e4l), TRPM5 bound to inhibitor NDNA (7mbv), and TRPM2 bound to phosphatidylcholine (6co7)?

      • Other TRP channel structures with ligands or lipids modeled in this region include TRPV1 bound to resiniferatoxin, capsaicin, or phosphatidylinositol (7l2j, 7l24, 7l2s, 7l2t, 7l2u, 7lp9, 7lpc, 7lqy, 7mz6, 7mz9, 7mza); TRPV3 bound to phosphatidylcholine (7mij, 7mik, 7mim, 7min, 7ugg); TRPV5 bound to econazole (6b5v) or ZINC9155 (6pbf); TRPV6 bound to piperazine (7d2k, 7k4b, 7k4c, 7k4d, 7k4e, 7k4f) or cholesterol hemisuccinate (7s8c); TRPC6 bound to BTDM (7dxf) or phosphatidylcholine (6uza); and TRP1 bound to PIP2 (6pw5).

      We thank the reviewer for these valuable insights. We have included some additional discussion highlighting the similarities between the proposed rapamycin binding site and some of the other ligandchannel interactions in the TRP superfamily, in particular the well-known vanilloid binding site in TRPV1. However, to keep the discussion focused, we have not fully discussed all the indicated interactions, to best serve the clarity and scope of the manuscript.  

      (4) I would like to see negative control calcium imaging and electrophysiology data with untransfected HEK cells to confirm that the observed activation is mediated by TRPM8 to parallel the TRPM8 KO sensory neuron experiments.  

      This important information is now included in the revised manuscript (Figure S2).

      (5) The DM-nitrophen Ca uncaging experiments are an interesting method to test Ca sensitivity of rapamycin, but the results make these experiments more complex to interpret. Ca has been shown to be an obligate cofactor for icilin sensitivity in TRPM8 under conditions where both the internal and external Ca concentrations are tightly controlled (Kuhn et al., 2009, doi: https://doi.org/10.1074/jbc.M806651200), which is necessary because TRPM8 allows Ca permeation through the pore when open. The large icilin-evoked currents in Figure 5A and 5B indicate that the effective intracellular calcium concentration is not zero prior to calcium uncaging, which may be high enough to mask any Ca-dependence of rapamycin that occurs at low Ca concentrations. Given this ambiguity, the inside-out patch clamp configuration would provide more control over the internal and external Ca concentration than is achieved in the Ca uncaging experiments. Because the authors have already demonstrated their ability to perform such experiments (Figure 2 panel B), it would be nice to see tests of Ca dependence using inside-out patch clamp.

      As was already shown in Figure 2, Rapamycin activates TRPM8 in inside-out patches, and these experiments were performed using calcium-free cytosolic and extracellular solutions. Note that earlier studies have already shown that icilin activates outward TRPM8 currents in the full absence of calcium: see e.g. Janssens et al. eLife, 2016. Chuang et al. 2004. In the case of Icilin, increased calcium further potentiates the current, which is more prominent for the inward current.

      In the Ca uncaging experiments, considering the Kd of DM-nitrophen of 5 nM, we expect that the intracellular calcium concentration before the UV flash would be approximately 15 nM. Taken together, both the inside-out experiments and the flash uncaging experiments confirm that rapamycin responses are not directly regulated by intracellular calcium, contrary to icilin.

      (6) Sequence conservation within TRPM channels could be used in combination with the binding pocket model and mutagenesis to predict rapamycin selectivity for TRPM8 over other TRPMs. For example, some important residues, specifically G805 and Q861, are not conserved in TRPM3, which agrees with the lack of rapamycin sensitivity observed in TRPM3 (Figure S1). Further sequence comparison would provide testable hypotheses for future exploration of rapamycin sensitivity in other TRPMs that could validate the proposed binding pocket.

      Thank you for the suggestion. We now indicate in the discussion that only some of the key residues are conserved and make suggestions for future studies.  

      (7) Please unify the color scheme across the figures to improve clarity.

      • The authors frequently use the colors blue, red, and green to represent menthol and rapamycin in the figures, but they are inconsistent in which one represents menthol and which represents rapamycin. It would be clearer for the audience if, for example, rapamycin is always represented with red and menthol is always represented with blue.  

      Thank you for pointing this out. We have made the coloring schemes more uniform.

      • In Figure 1, panel E, the coloring for Menthol and Pregnenolone Sulfate changes between the TRPM8+/+ and TRPM8-/- panels.  

      Thank you for pointing this out. We have updated the coloring schemes to ensure consistency between the TRPM8+/+ and TRPM8-/- panels.

      • Figure 3 B and E, perhaps color the plot background as a 3-color gradient (blue to white to red) rather than yellow and aqua. Center the white at the WT ratio, keeping the dashed line, with diverging gradients to, for example, blue for mutations that selectively affect menthol sensitivity and red for rapamycin.

      Thank you for the suggestion – we have changed the figure accordingly.  

      • Figure 4 panels A and B use the same color (green) to show two different things (menthol molecule and mutated residues that affect rapamycin sensitivity). It would be clearer for readers to change these colors to agree with a unified color scheme such that, for example, the menthol molecule is colored blue and the rapamycin-neighboring residues are colored red.

      Thank you for the suggestion. We have updated the figure to use a unified color scheme, with the menthol molecule now colored green and the rapamycin-neighboring residues colored cyan, to enhance clarity for readers.

      • I recommend adding a figure or panel that shows side chains for all mutations, colored by menthol/rapamycin selectivity, as indicated by the functional data in Figure 3B and 3E. This will highlight spatial patterns of the selective residues that are discussed in the text.

      Thank you for your suggestion, we added all the side residues in Figure S10.

      Minor points

      (1) It would be nice to have one more concentration data point in the middle of the dose response curve shown in Figure 1 panel B. The response is not saturating at the top or foot of the curve in Figure 1 panel D, precluding a confident fit to a two-state Boltzmann function.

      Instead of adding a single data point to this figure, we performed independent measurements on a plate reader system, comparing concentration responses at room temperature and 37 degrees. These data are now included as Figure S1.   

      (2) The cartoon in Figure 2 panel B should be made more accurate. For example, only the transmembrane helices should be depicted embedded in the membrane, not the whole protein including the intracellular domain. Because the experiment was performed with cells, change the orientation of TRPM8 in the cartoon to show the intracellular domain of the protein facing away from the extracellular side of the membrane where the rapamycin is applied.

      Thank you for this comment. We have corrected the cartoon accordingly

      (3) Perhaps put the yellow circles under or around the carbon atoms to which the identified hydrogen atoms belong in Figure 2 panel E and Figure 4 panel C. I found it difficult to visualize and compare the STTD NMR results with the predicted binding pocket.

      Thank you for the feedback. We have added yellow circles around the carbon atoms corresponding to the identified hydrogen atoms in Figure S9.  

      (4) Regarding the sentence on p. 12 beginning "In agreement with this notion..."

      • Include icilin, Cooling Agent-10, and WS-3 as other cooling agents whose sensitivity has been modulated by mutation of Y745

      • Cryosim-3 responses were not tested in either of the two papers cited; please add citation to Yin et al., 2022, doi: https://doi.org/10.1126/science.add1268 .

      • Other relevant papers include:

      – Malkia et al., 2009, doi: https://doi.org/10.1186/1744-8069-5-62 which includes molecular docking showing the hydroxyl group of menthol interacting with Y745

      – Beccari et al., 2017, doi: https://doi.org/10.1038/s41598-017-11194-0 Figure 5 shows disruption of icilin and Cooling Agent-10 sensitivity by Y745A

      – Palchevskyi et al., 2023, doi: https://doi.org/10.1038/s42003-023-05425-6 Figure 3 shows disruption of icilin, cooling agent-10, WS-3, and menthol sensitivity by Y745A o Plaza-Cayon et al., 2022, https://doi.org/10.1002%2Fmed.21920 Review of TRPM8 mutations

      • typo: Y754H should be Y745H

      Thank you for these suggestions. We have added the above references to the text and corrected the typo.

      (5) The authors use the competitive action of everolimus on rapamycin activation as evidence that the different macrolides are binding to the same binding pocket. In addition, prior work showed that Y745H and N799A mutations (which render TRPM8 insensitive to menthol and icilin, respectively) do not affect TRPM8 sensitivity to the structurally-related compound tacrolimus (Arcas et al., 2019). This is consistent with the docking and mutagenesis results presented here.

      Thank you for this valuable suggestion. We discuss these data in the revised version.

      (6) Rapamycin sensitivity has also been observed in TRPML1 (Zhang et al. 2019, doi: https://doi.org/10.1371/journal.pbio.3000252).

      We added a short reference to this interesting finding in the discussion.

      (7) The whole-cell currents are very large in several of the electrophysiology experiments (for example Figure 3 panel D and Figure S1), which could lead to artifacts of voltage errors as well as ion accumulation/depletion. However, because this paper is not relying on reversal potential measurements or trying to quantify V1/2, these errors are unlikely to affect the qualitative conclusions drawn.

      This is a fair point, but indeed unlikely to affect our main conclusions. Note that we compensated between 70 and 90% of the series resistance, so we don’t expect voltage errors exceeding ~10 mV.

      (8) Ligand sensitivity is frequently species-dependent in TRP channels, so it is interesting that multiple species were used here and that both human and mouse isoforms exhibit rapamycin sensitivity. It should be emphasized that human TRPM8 was used in the calcium imaging and electrophysiology experiments, as well as some docking models, while the mouse isoform was used in the sensory neuron experiments and a mutated avian isoform was used for some docking models.

      This information is available in the Methods and we believe it is clear for the readers.

      (9) Perhaps discuss the unclear mechanism of G805A action in icilin (but not menthol, cold, or praziquantel) sensitivity because it is not in direct contact with the ligand. For example, Yin et al., 2019 propose flexibility allowing Ca binding site and larger binding site for icilin.

      Yin et al. (2019) suggests that the G805A mutation impacts icilin sensitivity by influencing the flexibility of the binding site and possibly affecting calcium binding. In our study, we found that G805A significantly reduces rapamycin sensitivity, likely due to its direct role in the rapamycin binding pocket rather than affecting calcium binding. This is now briefly mentioned in the results section.

      (10) The Figure S1 legend indicates that n=5 for all panels, so please show normalized population IV curves rather than individual examples. Additionally, it would be interesting to see what happens when each agonist is co-applied with rapamycin. Does rapamycin potentiate or inhibit agonist activation in these channels and/or TRPM8?

      We believe that normalized population IVs are not ideal for representing whole-cell currents, considering the substantial variation in current densities. We therefore prefer to show example traces in Figure S3 of the revised version but include mean values of current densities for all tested cells in the text.

      While the effects of co-application of rapamycin with activating ligands could be of interest, we consider this somewhat outside the scope of the present manuscript. The combination of HEK293 cell experiments, along with results obtained in WT and TRPM8-deficient mice does, in our opinion, sufficiently describe the selectivity of rapamycin towards TRPM8 compared to other sensory TRP channels.

      (11) Figure S1 panel A does not contain units for Rapamycin or AITC concentrations.

      Thank you for pointing this out. The units were added to the figure.  

      (12) It would be nice if the authors characterized the different mutations as predicted to contribute to site 1 (D796, H845, Q861, based on Figure S4), site 2 (D796, M801, F847, and R851), and/or site 3 (F847, V849, and R851).

      The indicated mutants were all tested, as shown in Figure 3.

      (13) The numbering scheme in Figure S4 does not appear to match the residue numbers in the rest of the paper for certain residues (HIS-844 rather than H845, PHE-846 rather than F847, VAL-848 rather than V849, ARG-850 rather than R851, and GLN-860 rather than Q861), and labels are often overlapping and difficult to see. I also find the transparent spheres very difficult to distinguish from the transparent background, which makes it difficult to appreciate the STTD NMR data overlay.

      We apologize for the confusing numbering scheme. The lower numbers refer to the initial docking that was done using the avian TRPM8 ortholog. We have made a newer, clearer version of Figure S4 and inserted as Figure S9.  

      (14) Please superpose the Ligplots in Figure S5 panels E and F as described in the LigPlus manual (https://www.ebi.ac.uk/thornton-srv/software/LigPlus/manual/manual.html) to facilitate easier comparison.

      Thank you for the suggestion. We followed the suggestion to superpose the Ligplots as described but found that the result was visually cluttered and difficult to interpret. To avoid confusion, we instead decided to remove panels E and F from Figure S5, as we believe that the visualization in panels A-D is clear and informative.

      (15) Some n values are missing in figure legends.

      We checked all legends, and added n numbers were missing.

      (16) There is an inconsistent specification of error bars as SEM in the figure legends, though it is specified in methods.

      A question for my own edification: Here, you have looked at ligand interactions with the protein by saturating the protein resonances and observing transfer to the ligand. Would it be possible to instead saturate lipid or solute resonances and observe transfer to a ligand? I am curious whether this would be one way to measure equilibrium partitioning of ligand into a membrane and/or determine the effective concentration of a ligand in the membrane. Additionally, could one determine whether the compound is fully partitioned into the center of the membrane or just sitting on the surface?

      The reviewer highlights an interesting aspect. The widely used WaterLOGSY NMR experiment (doi: 10.1023/a:1013302231549) saturates water molecules then the magnetization is transferred to the ligand of interest. Characteristic changes in ligand resonances are observed in the case of a binding event with proteins. On the other hand, the selective saturation of lipids is -while theoretically possible –technically challenging mainly because of the inherent low signal-dispersion of lipids and peak overlapping with ligand resonances. Additionally, lipid systems are more dynamic compared to proteins and ligand-lipid interactions could be weaker and less specific, significantly affecting the sensitivity of STD experiments.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      • Is it feasible to test rapamycin on TRPM8 with single-channel recording? This will allow us to better probe the mechanism of rapamycin activation and compare it with menthol, with parameters of singlechannel conductance and maximal open probability.

      In our experience, it is very difficult to obtain single-channel recordings from TRPM8. The channel expresses at high densities, typically leading to patches contain multiple channels, making a proper analysis of mean open and closed times very difficult. Therefore, we have decided not to include such measurements in the manuscript.

      • The authors classified rapamycin as a type I agonist, the type that stabilizes the open conformation, same as menthol but more prominent. Does that indicate that rapamycin work synergistically (rather than independently) with menthol, because co-application of them can allow them to add to each other in stabilizing the open conformation? I wonder if the authors agree that this could be tested with experiments as in Figure S3, by showing a much more prolonged deactivation with co-application of menthol and rapamycin than applying each alone.

      Thank you for the insightful suggestion. We conducted co-application experiments, and our results show that the deactivation time is indeed significantly prolonged when both compounds are applied together compared to each alone. In fact, very little deactivation is seen when both compounds are co-applied, which made it virtually impossible to perform reliable fits to the deactivation time course for the Menthol+Rapamycin condition. Instead, we have now included summary results showing the percentage of deactivation after 100 ms. We included these findings in FigureS8.  

      • It could be tested whether rapamycin activation of TRPM8 requires or overrides the requirement of PIP2 with inside-out patch by briefly exposing the patch to poly-lysine to sequester PIP2.

      This is certainly a good suggestion for further follow-up studies. However, we considered that examination of the (potential) interaction between ligands and PIP2 was outside the scope of the current manuscript.

      • Figure 1C suggests that the authors test rapamycin when there is a relatively high baseline TRPM8 activation (prior to rapamycin) activation. This raises the possibility that rapamycin is more a potentiator than an activator. I wonder if the following two experiments could address it: (1) perfuse rapamycin while holding at different membrane potentials, wash-off rapamycin in the solution and quickly (in a few seconds) test the activated current magnitude (before rapamycin dissociation), to compare whether a more depolarized membrane potential (high baseline open probability) allows rapamycin to potentiate more. (2) Perform the experiment at a higher temperature (low baseline open probability) and test whether rapamycin EC50 shifts to the right.

      Thank you for the thoughtful suggestion. Overall, we are not really in favor of making a distinction between a potentiator and an activator since it is not really feasible to create a situation where TRPM8 activity is zero. As suggested, we performed the dose response experiment at a higher temperature (37 °C) and observed that rapamycin’s EC<sub>50</sub> shifts to the right FigureS2. This is similar to what has been observed for menthol on TRPM8 and for many other ligands on other temperature-sensitive TRP channels.

      Minor:

      (1) The author should report hill coefficient together with EC50 when showing dose-responses.

      We have added Hill coefficients for all the fits.

      (2) In Figure 1 (E, F), it might be clearer to use Venn-diagram to show whether there is overlapping among rapamycin-, menthol-, and cinnamaldehyde-responsive neurons. According to the authors' explanation, we can predict that rapamycin-insensitive, menthol-sensitive neurons should predominantly be cinnamaldehyde-responsive.

      Thank you for your suggestion. In these experiments, we applied several agonists and the combination of them would result in a visually crowded Venn diagram difficult to interpret. However, we agree, with the reviewer’s suggestion, and discuss the percentage of the cinnamaldehyde+ neurons in the rapa- menthol+ population in Trpm8<sup>-/-</sup> neurons.

      (3) In Figure 3(C), since F847 does not respond to either menthol or rapamycin, it should be excluded from (B). Otherwise it is misleading.

      Thank you for pointing this out. To clarify, we have included a calcium imaging trace for the F847 mutant, demonstrating a clear response to rapamycin in FigureS9. This additional data highlights that F847 does respond to rapamycin, albeit with a more modest response amplitude. This is now also clarified in the results section.  

      (4) The word "potency" in pharmacology usually refers to a smaller EC50 number in dose-dependent experiments. In "Effect of rapamycin analogs on TRPM8" session, the authors use "potency" to refer to response to a single-dose experiment of different compounds. The experiment does not measure potency.

      Thank you for pointing out this mistake. We have corrected the text and replaced “potency” with “efficacy”.

      (5)  "2-methoxyl-" is misspelled in the text body.

      We have corrected the typo.

      (6) It will be nice to include "vehicle" in Figure 6B, or alternatively normalize all individual traces to vehicle. In Figure 6C and D, everolimus has almost no effect with compared to vehicle, and should not be shown as if it had ~8% in Figure 6B.

      We have added the vehicle values to Figure 6B from the same experiments.

      Reviewer #3 (Recommendations For The Authors):

      (1) The NMR method presented here as novel and employed to identify a proposed molecule bound to a membrane protein (TRPM8 in this case) is not well explained and presented. Since several spectra need to be subtracted, the authors should present the raw data and the results of the subtractions step by step. Also, it seems that the height of the peaks in each spectra will be highly variable and thus a reliable criterion employed to scale spectra before subtraction. None of these problems are discussed of described.

      The reviewer is right, that the data transparency should be improved and due to the high molecular complexity of the samples the size of the STD effects should be carefully scaled. We carried out additional experimental replicas on new samples and addressed the inherent sample/peak height variability by rescaling the STD effects based on reference <sup>1</sup>H measurements. We provided supplementary spectra of the reference experiments without saturation (Figure S5) and the computed STTD spectra from three parallel NMR sessions (Figure S6). We changed panel C of Figure 2 in the main text and provided all the STD and the computed STDD and STTD spectra recorded on one set of NMR experiments. We added the following paragraph to the main text: “To address the effect of the inherent variability of cellular samples on peak heights, STD effects were normalized based on the comparison of independent <sup>1</sup>H experiments (Figure S5). Three STTD replicates were computed, unambiguously confirming direct binding to TRPM8 in two datasets (Figure S6 A,B)”.

      Importantly since this signal subtraction method is proposed as a new development, control experiments employing well-established pairs of ligand and membrane protein receptor should be performed to demonstrate the reliability of the method.

      We agree with the reviewer, that the STTD experiment as a new development needs further validation, however, this paper is a preliminary demonstration of a new strategy building on the well-established STD and STDD NMR methodologies. Our group is actively engaged in studying additional biological samples to enhance our understanding of the applicability of STTD NMR. These efforts also aim to address challenges such as sample and spectral complexity by refining and standardizing the proposed workflow.

      (2) The tail currents shown in supplementary figure 3 are clearly not monoexponential. The fit to a single exponential can be seen to be inadequate and thus the comparison of kinetics of control, rapamycin and menthol is incorrect. At least two exponentials should be fitted and their values compared.

      We agree that the decay in the (combined) presence of agonists deviates from a simple monoexponential behavior. While we agree that fitting with two (or more) exponentials would provide a better fit, this also comes with greater variations/uncertainties in the fit parameters. This is particularly the case when inactivation is very slow and incomplete, or when the difference between slow and fast exponential time constants is <5, as seen with rapamycin and rapamycin +menthol. Therefore, we decided to provide monoexponential time constants as a proxy to describe the clear slowing down of activation and deactivation time courses in the presence of Type I agonists.   

      Also related to this aspect, recordings of TRPM8 currents can not be leak subtracted with a p/n protocol, thus a large fraction of the initial tail current must be the capacitive transient. There is no indication in the methods of how was this dealt with for the fitting of tail currents.

      As explained in the methods, capacitive transients and series resistance were maximally compensated. Therefore, we do not agree that a large fraction of the initial tail current must be capacitive. This can also be clearly seen in experiment such as Figure 1C, where the inward tail current is fully abolished in the presence of a TRPM8 antagonist. Likewise, very small and rapidly inactivating tail currents can be seen during voltage steps under control conditions (e.g. Figure S7  and S8 in the revised version).  

      (3) The docking procedure employed, as the authors show, is not appropriate for membrane proteins since it does not include a lipid membrane. It is not clear in the methods section if the MD minimization described applies only to the rapamycin molecule or to rapamycin bound to TRPM8.  

      It is also not clear if the important residue Q861 (and other residues that are identified as interacting with rapamycin) were identified from dockings or proposed based on other evidence.

      (4) Identifying amino acid residues that diminish the response to a ligand, does not uniquely imply that they form a binding site or even interact with said ligand. It is entirely possible that they can be involved in the allosteric networks involved in the activating conformational change. This caveat should be clearly posited by the authors when discussing their results.

      In our study, we identified several residues that significantly reduce the response to rapamycin when mutated, while retaining robust responses to menthol, which indicates that these mutations do not affect crucial conformational changes leading to channel gating. While our cumulative data suggest that these residues may be involved in direct interaction with rapamycin, we recognize the alternative possibility that they allosterically affect rapamycin-induced channel gating. This is now clearly stated in the first paragraph of the discussion.

    1. eLife Assessment

      This important study by Liu et al. presents a comprehensive structure-function analysis of the presynaptic protein UNC-13, leading to new insights into how its distinct domains control neurotransmitter release. The methods, data, and analyses are convincing, and the genetic and electrophysiological approaches support many of their conclusions. The work will be of interest to neuroscientists studying synaptic transmission, as it provides a foundation for future mechanistic studies of Munc13/UNC-13 family proteins.

    2. Joint Public Review:

      Summary:

      In this manuscript, the authors investigate how different domains of the presynaptic protein UNC-13 regulate synaptic vesicle release in the nematode C. elegans. By generating numerous point mutations and domain deletions, they propose that two membrane-binding domains (C1 and C2B) can exhibit "mutual inhibition," enabling either domain to enhance or restrain transmission depending on its conformation. The authors also explore additional N-terminal regions, suggesting that these domains may modulate both miniature and evoked synaptic responses. From their electrophysiological data, they present a "functional switch" model in which UNC-13 potentially toggles between a basal state and a gain-of-function state, though the physiological basis for this switch remains partly speculative.

      Strengths:

      (1) The authors conduct a thorough exploration of how mutations in the C1, C2B, and other regulatory domains affect synaptic transmission. This includes single, double, and triple mutations, as well as domain truncations, yielding a large, informative dataset.

      (2) The study includes systematically measuring both spontaneous and evoked synaptic currents at neuromuscular junctions, under various experimental conditions (e.g., different Ca²⁺ levels), which strengthens the reliability of their functional conclusions.

      (3) Findings that different domain disruptions produce distinct effects on mEPSCs, mIPSCs, and evoked EPSCs suggest UNC-13 may adopt an elevated functional state to regulate synaptic transmission.

      Weaknesses:

      It remains unclear whether the various domain alterations truly converge on a single "gain-of-function" state or instead represent multiple pathways for enhancing UNC-13 activity. Different mutations selectively affect spontaneous or evoked release, suggesting that each variant may not share the same underlying mechanism. Moreover, many conclusions rely on combining domain deletions or point mutations, yet the electrophysiological data show distinct outcomes across EPSCs, IPSCs, mini, and evoked responses. This raises questions about whether these manipulations all act on the same pathway and whether their observed additivity or suppression genuinely reflects a single mechanistic process. A unifying model-or at least a clearer explanation of why the authors infer one mechanistic state across different domain manipulations would strengthen the paper's conclusions.

      The manuscript proposes that UNC-13 toggles from a basal to a "gain-of-function" state under normal synaptic activity. However, it does not address when or how this switch might occur in vivo, since it is demonstrated principally via artificial mutations. Providing direct evidence or additional discussion of such switching under physiological conditions would be particularly informative.

      What is the physiological significance of the proposed gain-of-function state? The data suggest that certain mutants (e.g., HK+D1-5N) lacking the gain-of-function state can still support synaptic transmission at wild-type levels. How do the authors reconcile this with the idea that the gain-of-function state plays a critical role at the synapse?

      The authors determined the fluorescence intensity of mApple-tagged UNC-13 variants (Figure 1J-K and Figure 7J-K), finding no significant changes compared to the wild-type. However, a more detailed analysis of the density or distribution of fluorescent puncta in axons could clarify whether certain mutations alter the localization of UNC-13 at synapses. Demonstrating colocalization with wild-type UNC-13 (or another presynaptic marker) would help rule out mislocalization effects.

      The study mainly relies on extrachromosomal transgenes, which can show variable copy numbers and expression levels among individual worm strains. This variability might complicate interpretation, as differences in expression could mask or exaggerate certain phenotypes.

      Finally, the discussion is somewhat diffused. Streamlining the text to focus on the most direct connections would help readers pinpoint the key conclusions and open questions.

    1. eLife Assessment

      This important study uses advanced computational methods to elucidate how environmental dielectric properties influence the interaction strengths of tyrosine and phenylalanine in biomolecular condensates. The evidence supporting the claims of the authors is solid, as the simulations are performed rigorously providing mechanistic insights into the origin of the differences between the two aromatic amino acids considered. This study will be of broad interest to researchers studying biomolecular phase separation.

    2. Reviewer #1 (Public review):

      This is an interesting and timely computational study using molecular dynamics simulation as well as quantum mechanical calculation to address why tyrosine (Y), as part of an intrinsically disordered protein (IDP) sequence, has been observed experimentally to be stronger than phenylalanine (F) as a promoter for biomolecular phase separation. Notably, the authors identified the aqueous nature of the condensate environment and the corresponding dielectric and hydrogen bonding effects as a key to understanding the experimentally observed difference. This principle is illustrated by the difference in computed transfer free energy of Y- and F-containing pentapeptides into a solvent with various degrees of polarity. The elucidation offered by this work is important. The computation appears to be carefully executed, the results are valuable, and the discussion is generally insightful. However, there is room for improvement in some parts of the presentation in terms of accuracy and clarity, including, e.g., the logic of the narrative should be clarified with additional information (and possibly additional computation), and the current effort should be better placed in the context of prior relevant theoretical and experimental works on cation-π interactions in biomolecules and dielectric properties of biomolecular condensates. Accordingly, this manuscript should be revised to address the following, with added discussion as well as inclusion of references mentioned below.

      (1) Page 2, line 61: "Coarse-grained simulation models have failed to account for the greater propensity of arginine to promote phase separation in Ddx4 variants with Arg to Lys mutations (Das et al., 2020)". As it stands, this statement is not accurate, because the cited reference to Das et al. showed that although some coarse-grained models, namely the HPS model of Dignon et al., 2018 PLoS Comput did not capture the Arg to Lys trend, the KH model described in the same Dignon et al. paper was demonstrated by Das et al. (2020) to be capable of mimicking the greater propensity of Arg to promote phase separation than Lys. Accordingly, a possible minimal change that would correct the inaccuracy of this statement in the manuscript would be to add the word "Some" in front of "coarse-grained simulation models ...", i.e., it should read "Some coarse-grained simulation models have failed ...". In fact, a subsequent work [Wessén et al., J Phys Chem B 126: 9222-9245 (2022)] that applied the Mpipi interaction parameters (Joseph et al., 2021, already cited in the manuscript) showed that Mpipi is capable of capturing the rank ordering of phase separation propensity of Ddx4 variants, including a charge scrambled variant as well as both the Arg to Lys and the Phe to Ala variants (see Figure 11a of the above-cited Wessén et al. 2022 reference). The authors may wish to qualify their statements in the introduction to take note of these prior results. For example, they may consider adding a note immediately after the next sentence in the manuscript "However, by replacing the hydrophobicity scales ... (Das et al., 2020)" to refer to these subsequent findings in 2021-2022.

      (2) Page 8, lines 285-290 (as well as the preceding discussion under the same subheading & Figure 4): "These findings suggest that ... is not primarily driven by differences in protein-protein interaction patterns ..." The authors' logic in terms of physical explanation is somewhat problematic here. In this regard, "Protein-protein interaction patterns" appear to be a straw man, so to speak. Indeed, who (reference?) has argued that the difference in the capability of Y and F in promoting phase separation should be reflected in the pairwise amino acid interaction pattern in a condensate that contains either only Y (and G, S) and only F (and G, S) but not both Y and F? Also, this paragraph in the manuscript seems to suggest that the authors' observation of similar contact patterns in the GSY and GSF condensates is "counterintuitive" given the difference in Y-Y and F-F potentials of mean force (Joseph et al., 2021); but there is nothing particularly counterintuitive about that. The two sets of observations are not mutually exclusive. For instance, consider two different homopolymers, one with a significantly stronger monomer-monomer attraction than the other. The condensates for the two different homopolymers will have essentially the same contact pattern but very different stabilities (different critical temperatures), and there is nothing surprising about it. In other words, phase separation propensity is not "driven" by contact pattern in general, it's driven by interaction (free) energy. The relevant issue here is total interaction energy or the critical point of the phase separation. If it is computationally feasible, the authors should attempt to determine the critical temperatures for the GSY condensate versus the GSF condensate to verify that the GSY condensate has a higher critical temperature than the GSF condensate. That would be the most relevant piece of information for the question at hand.

      (3) Page 9, lines 315-316: "...Our ε [relative permittivity] values ... are surprisingly close to that derived from experiment on Ddx4 condensates (45{plus minus}13) (Nott et al., 2015)". For accuracy, it should be noted here that the relative permittivity provided in the supplementary information of Nott et al. was not a direct experimental measurement but based on a fit using Flory-Huggins (FH), but FH is not the most appropriate theory for a polymer with long-spatial-range Coulomb interactions. To this reviewer's knowledge, no direct measurement of relative permittivity in biomolecular condensates has been made to date. Explicit-water simulation suggests that the relative permittivity of Ddx4 condensate with protein volume fraction ≈ 0.4 can have a relative permittivity ≈ 35-50 (Das et al., PNAS 2020, Fig.7A), which happens to agree with the ε = 45{plus minus}13 estimate. This information should be useful to include in the authors' manuscript.

      (4) As for the dielectric environment within biomolecular condensates, coarse-grained simulation has suggested that whereas condensates formed by essentially electric neutral polymers (as in the authors' model systems) have relative permittivities intermediate between that of bulk water and that of pure protein (ε = 2-4, or at most 15), condensates formed by highly charged polymers can have relative permittivity higher than that of bulk water [Wessén et al., J Phys Chem B 125:4337-4358 (2021), Fig.14 of this reference]. In view of the role of aromatic residues (mainly Y and F) in the phase separation of IDPs such as A1-LCD and LAF-1 that contain positively and negatively charged residues (Martin et al., 2020; Schuster et al., 2020, already cited in the manuscript), it should be useful to address briefly how the relationship between the relative phase-separation promotion strength of Y vs F and dielectric environment of the condensate may or may not be change with higher relative permittivities.

      (5) The authors applied the dipole moment fluctuation formula (Eq.2 in the manuscript) to calculate relative permittivity in their model condensates. Does this formula apply only to an isotropic environment? The authors' model condensates were obtained from a "slab" approach (page 4 and thus the simulation box has a rectangular geometry. Did the authors apply Equation 2 to the entire simulation box or only to the central part of the box with the condensate (see, e.g., Figure 3C in the manuscript). If the latter is the case, is it necessary to use a different dipole moment formula that distinguishes between the "parallel" and "perpendicular" components of the dipole moment (see, e.g., Equation 16 in the above-cited Wessén et al. 2021 paper). A brief added comment will be useful.

      (6) With regard to the general role of Y and F in the phase separation of biomolecules containing positively charged Arg and Lys residues, the relative strength of cation-π interactions (cation-Y vs cation-F) should be addressed (in view of the generality implied by the title of the manuscript), or at least discussed briefly in the authors' manuscript if a detailed study is beyond the scope of their current effort. It has long been known that in the biomolecular context, cation-Y is slightly stronger than cation-F, whereas cation-tryptophan (W) is significantly stronger than either cation-Y and cation-F [Wu & McMahon, JACS 130:12554-12555 (2008)]. Experimental data from a study of EWS (Ewing sarcoma) transactivation domains indicated that Y is a slightly stronger promoter than F for transcription, whereas W is significantly stronger than either Y or F [Song et al., PLoS Comput Biol 9:e1003239 (2013)]. In view of the subsequent general recognition that "transcription factors activate genes through the phase-separation capacity of their activation domain" [Boija et al., Cell 175:1842-1855.e16 (2018)] which is applicable to EWS in particular [Johnson et al., JACS 146:8071-8085 (2024)], the experimental data in Song et al. 2013 (see Figure 3A of this reference) suggests that cation-Y interactions are stronger than cation-F interactions in promoting phase separation, thus generalizing the authors' observations (which focus primarily on Y-Y, Y-F and F-F interactions) to most situations in which cation-Y and cation-F interactions are relevant to biomolecular condensation.

      (7) Page 9: The observation of weaker effective F-F (and a few other nonpolar-nonpolar) interactions in a largely aqueous environment (as in an IDP condensate) than in a nonpolar environment (as in the core of a folded protein) is intimately related to (and expected from) the long-recognized distinction between "bulk" and "pair" as well as size dependence of hydrophobic effects that have been addressed in the context of protein folding [Wood & Thompson, PNAS 87:8921-8927 (1990); Shimizu & Chan, JACS 123:2083-2084 (2001); Proteins 49:560-566 (2002)]. It will be useful to add a brief pointer in the current manuscript to this body of relevant resources in protein science.

    3. Reviewer #2 (Public review):

      Summary:

      In this preprint, De Sancho and López use alchemical molecular dynamics simulations and quantum mechanical calculations to elucidate the origin of the observed preference of Tyr over Phe in phase separation. The paper is well written, and the simulations conducted are rigorous and provide good insight into the origin of the differences between the two aromatic amino acids considered.

      Strengths:

      The study addresses a fundamental discrepancy in the field of phase separation where the predicted ranking of aromatic amino acids observed experimentally is different from their anticipated rankings when considering contact statistics of folded proteins. While the hypothesis that the difference in the microenvironment of the condensed phase and hydrophobic core of folded proteins underlies the different observations, this study provides a quantification of this effect. Further, the demonstration of the crossover between Phe and Tyr as a function of the dielectric is interesting and provides further support for the hypothesis that the differing microenvironments within the condensed phase and the core of folded proteins is the origin of the difference between contact statistics and experimental observations in phase separation literature. The simulations performed in this work systematically investigate several possible explanations and therefore provide depth to the paper.

      Weaknesses:

      While the study is quite comprehensive and the paper well written, there are a few instances that would benefit from additional details. In the methods section, it is unclear as to whether the GGXGG peptides upon which the alchemical transforms are conducted are positioned restrained within the condensed/dilute phase or not. If they are not, how would the position of the peptides within the condensate alter the calculated free energies reported? It would also be interesting to see what the variation in the transfer of free energy is across multiple independent replicates of the transform to assess the convergence of the simulations. Additionally, since the authors use a slab for the calculation of these free energies, are the transfer free energies from the dilute phase to the interface significantly different from those calculated from the dilute phase to the interior of the condensate? The authors mention that the contact statistics of Phe and Tyr do not show significant difference and thereby conclude that the more favorable transfer of Tyr primarily originates from the dielectric of the condensate. However, the calculation of contacts neglects the differences in the strength of interactions involving Phe vs. Tyr. Though the authors consider the calculation of energy contact formation later in the manuscript, the scope of these interactions are quite limited (Phe-Phe, Tyr-Tyr, Tyr-Amide, Phe-Amide) which is not sufficient to make a universal conclusion regarding the underlying driving forces. A more appropriate statement would be that in the context of the minimal peptide investigated the driving force seems to be the difference in dielectric. However, it is worth mentioning that the authors do a good job of mentioning some of these caveats in the discussion section.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors address the paradox of how tyrosine can act as a stronger sticker for phase separation than phenylalanine, despite phenylalanine being higher on the hydrophobicity scale and exhibiting more prominent pairwise contact statistics in folded protein structures compared to tyrosine.

      Strengths:

      This is a fascinating problem for the protein science community with special relevance for the biophysical condensate community. Using atomistic simulations of simple model peptides and condensates as well as quantum calculations, the authors provide an explanation that relies on the dielectric constant of the medium and the hydration level that either tyrosine or phenylalanine can achieve in highly hydrophobic vs. hydrophilic media. The authors find that as the dielectric constant decreases, phenylalanine becomes a stronger sticker than tyrosine. The conclusions of the paper seem to be solid, it is well-written and it also recognises the limitations of the study. Overall, the paper represents an important contribution to the field.

      Weaknesses:

      How can the authors ensure that a condensate of GSY or GSF peptides is a representative environment of a protein condensate? First, the composition in terms of amino acids is highly limited, second the effect of peptide/protein length compared to real protein sequences is also an issue, and third, the water concentration within these condensates is really low as compared to real experimental condensates. Hence, how can we rely on the extracted conclusions from these condensates to be representative for real protein sequences with a much more complex composition and structural behaviour?

    1. eLife Assessment

      This study presents important methodologies for repeated brain ultrasound localization microscopy (ULM) in awake mice and a set of results indicating that wakefulness reduces vascularity and blood flow velocity. The data supporting these findings are solid. This study is relevant for scientists investigating vascular physiology in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      Wang and Colleagues present a study aimed at demonstrating the feasibility of repeated ultrasound localization microscopy (ULM) recording sessions on mice chronically implanted with a cranial window transparent to US. They provided quantitative information on their protocol, such as the required number of Contrast enhancing microbubbles (MBs) to get a clear image of the vasculature of a brain coronal section. Also, they quantified the co-registration quality over time-distant sessions and the vasodilator effect of isoflurane.

      Strengths:

      Strengths: the study showed a remarkable performance in recording precisely the same brain coronal section over repeated imaging sessions. In addition, it sheds light on the vasodilator effect of isoflurane (an anesthetic whose effects are not fully understood) on the different brain vasculature compartments, although, as the Authors stated, some insights in this aspect have already been published with other imaging techniques. The experimental setting and protocol are very well described.

      Wang and co-authors submitted a revised version of their study, which shows improvements in the clarity of the data description.<br /> However, the flaws and limitations of this study are substantially unchanged.

      The main issues are:<br /> - Statistics are still inadequate. The TOST test proposed in this revised version is not equivalent to an ANOVA. Indeed, multivariate analyses should be the most appropriate, given that some quantifications were probably made on multiple vessels from different mice. The 3 reviewers mentioned the flaws in statistics as the primary concern.<br /> - No new data has been added, such as testing other anesthetics.<br /> - The Authors still insist on using the term Vascularity which they define as: 'proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.'. Why not use apparent cerebral blood volume or just CBV? Introducing an unnecessary and redundant term is not scientifically acceptable. In this revised version, vascularity is also used to indicate a higher vascular density (Line 275), which does not make sense: blood vessels do not generate from the isoflurane to the awake condition in a few minutes. Rev2 also raised this point.<br /> - The long-term recordings mentioned by the Authors refer to the 3-week time frame analyzed in this study. However, within each acquisition, the time available from imaging is only a few minutes (< 10', referring to most of the plots showing time courses) after the animals' arousal from isoflurane and before bubbles disappear. This limitation should be acknowledged.<br /> - The more precise description of the number of mice and blood vessels analyzed in Figure 6 makes it apparent the limited number of independent samples used to support the findings of this work. A limitation that should be acknowledged. The newly provided information added as Supplementary Figure 1 should be moved to the main text, eventually in the figure legends. The limited data in support of the findings was also highlighted by Rev2 and, indirectly, by Rev3.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a very interesting collection of methods and results using brain ultrasound localization microscopy (ULM) in awake mice. They emphasize the effect of the level of anesthesia on the quantifiable elements assessable with this technique (i.e. vessel diameter, flow speed, in veins and arteries, area perfused, in capillaries) and demonstrate the possibility of achieving longitudinal cerebrovascular assessment in one animal during several weeks with their protocol.<br /> The authors made a good rewriting of the article based on the reviewers' comments. One of the message of the first version of the manuscript was that variability in measurements (vessel diameter, flow velocity, vascularity) were much more pronounced under changes of anesthesia than when considering longitudinal imaging across several weeks. This message is now not quite mitigated, as longitudinal imaging seems to show a certain variability close to the order of magnitude observed under anesthesia. In that sense, the review process was useful in avoiding hasty conclusion and calls for further caution in ULM awake longitudinal imaging, in particular regarding precision of positioning and cancellation of tissue motion.

      Strengths:

      Even if the methods elements considered separately are not new (brain ULM in rodents, setup for longitudinal awake imaging similar to those used in fUS imaging, quantification of vessel diameters/bubble flow/vessel area), when masterfully combined as it is done in this paper, they answer two questions that have been long-running in the community: what is the impact of anesthesia on the parameters measured by ULM (and indirectly in fUS and other techniques)? Is it possible to achieve ULM in awake rodents for longitudinal imaging? The manuscript is well constructed, well written, and graphics are appealing.<br /> The manuscript has been much strengthened by the round of review, with more animals for the longitudinal imaging study.

      Weaknesses:

      Some weaknesses remain, not hindering the quality of the work, that the authors might want to answer or explain.<br /> - When considering fig 4e and fig 4j together: it seems that in fig 4e the vascularity reduction in the cortical ROI is around 30% for downward flow, and around 55% for upward flow; but when grouping both cortical flows in fig 4j, the reduction is much smaller (~5%), even at the individual level (only mouse 1 is used in fig 4e). Can you comment on that?<br /> - When considering fig4e, fig 4j, fig6e and fig6i altogether, it seems that vascularity can be highly variable, whether it be under anesthesia or vascular imaging, with changes between 5 to 40%. Is this vascularity quantification worth it (namely, reliable for example to quantify changes in a pathological model requiring longitudinal imaging)?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      • While the title is fair with respect to the data shown, in the summary and the rest of the paper, the comparison between anesthetized and awake conditions is systematically stated, while more caution should be used.

      First, isoflurane is one of the (many) anesthetics commonly used in pre-clinical research, and its effect on the brain vasculature cannot be generalized to all the anesthetics. Indeed, other anesthesia approaches do not produce evident vasodilation; see ketamine + medetomidine mixtures. Second, the imaged awake state is head-fixed and body-constrained in mice. A condition that can generate substantial stress in the animals. In this study, there is no evaluation of the stress level of the mice. In addition, the awake imaging sessions were performed a few minutes after the mouse woke up from isoflurane induction, which is necessary to inject the MB bolus. It is known that the vasodilator effects of isoflurane last a long time after its withdrawal. This aspect would have influenced the results, eventually underestimating the difference with respect to the awake state.

      These limitations should be clearly described in the Discussion.

      Looking at Figure 2e, it takes more than 5' to reach the 5 Millions MB count useful for good imaging. However, the MB count per pixel drops to a few % at that time. This information tells me that (i) repeated measurements are feasible but with limited brain coverage since a single 'wake up' is needed to acquire a single brain section and (ii) this approach cannot fit the requirements of functional ULM that requires to merge the responses to multiple stimuli to get a complete functional image. Of course, a chronic i.v. catheter would fix the issue, but this configuration is not trivial to test in the experimental setup proposed by the authors, hindering the extension of the approach to fULM.

      Thank you for highlighting these limitations, as they address aspects that were not fully considered during the experimental design and manuscript writing. In response, we have added the following paragraphs to the discussion section, addressing these limitations of our study:

      (Line 310) “Although isoflurane is widely used in ultrasound imaging because it provides long-lasting and stable anesthetic effects, it is important to note that the vasodilation observed with isoflurane is not representative of all anesthetics. Some anesthesia protocols, such as ketamine combined with medetomidine, do not produce significant vasodilation and are therefore preferred in experiments where vascular stability is essential, such as functional ultrasound imaging(47). Therefore, in future studies, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.

      Our proposed method enabled repeatable longitudinal brain imaging over a three-week period, addressing a key limitation of conventional ULM imaging and offering potential for various preclinical applications. However, there are still some limitations in this study. 

      One of the limitations is the lack of objective measures to assess the effectiveness of head-fix habituation in reducing anxiety. This may introduce variability in stress levels among mice. Recent studies suggest that tracking physiological parameters such as heart rate, respiratory rate, and corticosterone levels during habituation can confirm that mice reach a low stress state prior to imaging(48). This approach would be highly beneficial for future awake imaging studies. Furthermore, alternative head-fixation setups, such as air-floated balls or treadmills, which allow the free movement of limbs, have been shown to reduce anxiety and facilitate natural behaviors during imaging(30). Adopting these approaches in future studies could enhance the reliability of awake imaging data by minimizing stress-related confounds.

      Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. Future applications of awake ULM in functional imaging using an indwelling jugular vein catheter presents a promising alternative to enable more accurate functional imaging in awake animals, addressing current limitations associated with anesthesia-induced vascular effects.”

      • Statistics are often poor or not properly described. 

      The legend and the text referring to Figure 2 do not report any indication of the number of animals analyzed. I assume it is only one, which makes the findings strongly dependent on the imaging quality of THAT mouse in THAT experiment. Three mice have been displayed in Figure 3, as reported in the text, but it is not clear whether it is a mouse for each shown brain section. Figure 5 reports quantitative data on blood vessels in awake VS isoflurane states but: no indication about the number of tested mice is provided, nor the number of measured blood vessels per type and if statistics have been done on mice or with a multivariate method.

      Also, a T-test is inappropriate when the goal is to compare different brain regions and blood vessel types.

      Similar issues partially apply to Figure 6, too.

      Thank you for bringing this to our attention. 

      We acknowledge that the statistical analyses were not clearly explained in the original version. In the revised manuscript, we have ensured that the statistical methods are clearly described. 

      (Fig.4 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using t-test at each measurement point along the segments.”

      (Fig.6 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using the two one-sided test (TOST) procedure, which evaluates the null hypothesis that the difference between the two weeks is larger than three times the standard deviation of one week.”

      Additionally, we corrected an error in the previous comparison of the violin plots on flow velocities, where a t-test was incorrectly applied; this has now been removed.

      We acknowledge that the original version did not clearly indicate the numbers of animals in the statistical analysis. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. In the revised Figures 4 and 6, we have ensured that each quantitative analysis figure or its caption clearly indicate the specific mice.

      For original Figures 1 and 2, these are presented as case studies to illustrate the methodology. Since the anesthesia time required for tail vein injection for each animal varies slightly, it is challenging to have the consistent time taken for each mouse to recover from anesthesia across all mice. For instance, in Figure 1, the mouse took nearly 500 seconds to recover from anesthesia, but this duration is not consistent across all animals, which is a limitation of the bolus injection technique. We have noted this point in the discussion (discussion on the limitation of bolus injection), and we have also clarified in the results section and figure captions that these figures represent a case study of a single mouse rather than a standardized recovery time for all animals.

      We further clarified this point in the end of the Figure 2 caption:

      (Fig.2 caption) “This figure presents a case study based on the same mouse shown in Fig 1. The x-axis for d-f begins at 500 seconds because, at this point, the mouse’s pupil size stabilized, indicating it had recovered to an awake state. Consequently, ULM images were accumulated starting from this time. It is important to note that not every mouse requires 500 seconds to fully awaken; the time to reach a stable awake state varies across individual mice.” We added the following statement before introducing Figure 1e:

      (Line 93) “Due to differences in tail vein injection timing and anesthesia depth, the time required for each mouse to fully awaken varied. Although it was not feasible to get pupil size stabilized just after 500 seconds for each animal, ULM reconstruction only used the data that acquired after the animal reached full pupillary dilation, to ensure that ULM accurately captures the cerebrovascular characteristics in the awake state.”

      We added the following statement before introducing Figure 2d:

      (Line 139) “To further verify that the proposed MB bolus injection method can help to achieve ULM image saturation shortly after mice awaken from anesthesia, an analysis on the change in MB concentration over time was conducted once pupil size had stabilized (T = 500s).”

      For Figures 3, 4, and 5 (in the revised version, Figures 4 and 5 have been combined into a single Figure 4), the data represents results from three individual mice, with each coronal plane corresponding to a different mouse. In the revised version, we have added labels to indicate the specific mouse in each image to improve clarity. We also recognize that some analyses in the original submission (original Figure 5) may have lacked sufficient statistical power due to the small sample size. Therefore, in the revised version, we have focused only on findings that were consistently observed across the three mice to ensure robust conclusions.

      Reviewer 1 (Recommendations For the Authors):

      • If the study's main goal is to compare awake vs anesthetized ULM, the authors should test at least another anesthetic with no evident vasodilator effect.

      Thank you for this valuable suggestion. We would like to clarify that the primary aim of our study is not to comprehensively compare the effects of anesthesia versus the awake state, as a rigorous comparison would indeed require a more controlled experimental design, including additional anesthetics, a larger cohort of mice, and broader controls to ensure sufficient statistical power. We also add the following statement in the Discussion to clarify this point:

      (Line 314) “Therefore, in future studies, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      We acknowledge that the initial organization of Figures 3–5 placed excessive emphasis on comparisons between the awake and anesthetized states, but without yielding consistently significant findings. Meanwhile, our longitudinal observations in original Figure 6 were underrepresented, despite their potential importance.

      In the revised version, we shifted our focus toward the main goal of awake longitudinal imaging. By consolidating the previous Figures 4 and 5 into the new Figure 4, we emphasize conclusions that are both more consistent and broadly applicable, avoiding areas that may lack sufficient rigor or consensus. Additionally, we expanded the quantitative analysis related to longitudinal imaging, highlighting its role as the ultimate objective of this study. The awake vs. anesthetized ULM comparison was intended to demonstrate the value of awake imaging and introduce the importance of awake longitudinal imaging. In the revised text, we have reframed this comparison to emphasize the specific response to isoflurane rather than a general response to anesthesia. For example, in Figures 3 and 4, we have replaced the original term "Anesthetized" with "Isoflurane". We have also added a discussion noting that isoflurane may induces more vasodilation than other anesthetic agents.

      (Line 310) “Although isoflurane is widely used in ultrasound imaging because it provides long-lasting and stable anesthetic effects, it is important to note that the vasodilation observed with isoflurane is not representative of all anesthetics. Some anesthesia protocols, such as ketamine combined with medetomidine, do not produce significant vasodilation and are therefore preferred in experiments where vascular stability is essential, such as functional ultrasound imaging(47).”

      • The claims made about the proposed experimental protocol to be suitable for the "long-term" (line 255) are not supported by the data and should be modified according to the presented evidence.

      Thank you for your valuable feedback. We agree that our current three-week experimental results do not yet fulfill the requirements for extended longitudinal imaging that may span several months. We have revised the relevant text accordingly. For instance, the phrase “Our proposed method enabled long-term, repeatable longitudinal brain imaging” has been modified to “Our proposed method enabled repeatable longitudinal brain imaging over a threeweek period.” (Similar changes also in Line 67, Line 318, and Line 337) Additionally, we have added the following paragraph in the discussion section to indicate that extending the monitoring period to several months is a meaningful direction for future exploration:

      (Line 337) “In our longitudinal study, consistent imaging results were obtained over a three-week period, demonstrating the feasibility of awake ULM imaging for this duration. However, for certain research applications, a monitoring period of several months would be valuable. Extending the duration of longitudinal awake ULM imaging to enable such long-term studies is a potential direction for future development.”

      Recommendations for improving the writing and presentation:

      • Reporting the number of mice and blood vessels and statistics for each quantitative figure.

      Thank you for highlighting this issue. We acknowledge that the quantitative figures in the previous version lacked clarity in specifying the number of mice, vessels, and associated statistics. In the revised version, we have ensured that each quantitative figure or its caption clearly indicate the specific mice, vessels, and statistical methods used. To further minimize any potential confusion, we have also added Supplementary Figure 1 to clearly label and reference each individual mouse included in the study.

      Minor corrections to the text and figures.

      • Line 22: "vascularity reduction from anesthesia" is not clear, nor it is a codified property of brain vasculature. Explain or rephrase.

      Thank you for your comment. We apologize for any confusion caused by the phrase “vascularity reduction from anesthesia” in the abstract. We agree that this phrasing was unclear without context. To improve clarity, we have revised this statement in the abstract to make it more straightforward and easier to understand. 

      (Line 24) “Vasodilation induced by isoflurane was observed by ULM. Upon recovery to the awake state, reductions in vessel density and flow velocity were observed across different brain regions.” 

      Additionally, we have added a section in the Methods titled Quantitative Analysis of ULM Images to provide a clear definition of vascularity. This section outlines how vascularity is quantified in our study, ensuring that our terminology is well-defined. 

      The following sentence shows the definition of vascularity:

      (Line 547) “Vascularity was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.”

      We have also added an instant definition when it was firstly used in Results part:

      (Line 161) “When comparing vessel density maps, ULM images that are acquired in the awake state demonstrate a global reduction of vascularity, which refers to percentage of pixels that occupied by blood vessels.”

      • Line 76: putting the mice in a tube is also intended "To further reduce animal anxiety and minimize tissue motion" I agree with tissue motion, not with animal anxiety, which, indeed, I expect to be higher than if it could, for example, run on a ball or a treadmill.

      Thank you for pointing this out. We acknowledge the limitations of our setup regarding reducing animal anxiety. We have replaced the original phrase “to further reduce animal anxiety and minimize tissue motion” with “to further minimize tissue motion.” (Line 78) Additionally, we have added the following paragraph in Discussion section to address the limitations of our setup in reducing anxiety.

      (Line 321) “One of the limitations is the lack of objective measures to assess the effectiveness of head-fix habituation in reducing anxiety. This may introduce variability in stress levels among mice. Recent studies suggest that tracking physiological parameters such as heart rate, respiratory rate, and corticosterone levels during habituation can confirm that mice reach a low stress state prior to imaging(48). This approach would be highly beneficial for future awake imaging studies. Furthermore, alternative head-fixation setups, such as air-floated balls or treadmills, which allow the free movement of limbs, have been shown to reduce anxiety and facilitate natural behaviors during imaging(30). Adopting these approaches in future studies could enhance the reliability of awake imaging data by minimizing stress-related confounds.”

      • Line 79: PMP has been used by Sieu et al., Nat Methods, 2015; it should be acknowledged.

      Thank you for highlighting this. We have now included the reference to Sieu et al. Nat Methods, 2015 to appropriately acknowledge their use of PMP. (Line 81)

      • Figure: is there a reason why the plots start at 500 sec? What happened before that time?

      Thank you for your question regarding the starting time in the plots. Figures 1 and 2 are case studies using a single mouse to demonstrate the feasibility of our method. The “zero” timepoint was defined as the moment when anesthesia was stopped, and the microbubble injection began. However, the mouse does not fully recover immediately after anesthesia is stopped. As shown in Figure 1e, there is a period of approximately 500 seconds during which the pupil gradually dilates, indicating recovery. Only after this period does the mouse reach a relatively stable physiological state suitable for ULM imaging, which is why the plots in Figure 2 begin at T = 500 seconds.

      We recognize that this was not sufficiently explained in the main text and figure captions. In the revised manuscript, we have clarified this timing rationale in both the results section and the figure captions. We added the following sentence to the result section to introduce Fig.2d:

      (Line 139) “To further verify that the proposed MB bolus injection method can help to achieve ULM image saturation shortly after mice awaken from anesthesia, an analysis on the change in MB concentration over time was conducted once pupil size had stabilized (T = 500s).”

      We also added the following statement to note that this recover time varies across individual mice:

      (Line 154, Fig.2 caption) “This figure presents a case study based on the same mouse shown in Fig 1. The x-axis for d-f begins at 500 seconds because, at this point, the mouse’s pupil size stabilized, indicating it had recovered to an awake state. Consequently, ULM images were accumulated starting from this time. It is important to note that not every mouse requires 500 seconds to fully awaken; the time to reach a stable awake state varies across individual mice.”

      Reviewer 2 (Public Review):

      • The only major comment (calling for further work) I would like to make is the relative weakness of the manuscript regarding longitudinal imaging (mostly Figure 6), compared to the exhaustive review of the effect of isoflurane on the vasculature (3 rats, 3 imaging planes, quantification on a large number of vessels, in 9 different brain regions). The 6 cortical vessels evaluated in Figure 6 feel really disappointing. As longitudinal imaging is supposed to be the salient element of this manuscript (first word appearing in the title), it should be as good and trustworthy as the first part of the paper. Figure 6c. is of major importance, and should be supported by a more extensive vessel analysis, including various brain areas, and validated on several animals to validate the robustness of longitudinal positioning with several instances of the surgical procedure. Figure 6d estimates the reliability of flow measurements on 3 vessels only. Therefore I recommend showing something similar to what is done in Figures 4 and 5: 3 animals, and more extensive quantification in different brain regions.

      We thank the reviewer for pointing out this issue. We acknowledge that the first version of the manuscript lacked in-depth quantitative analysis in the section on the longitudinal study, which should have been a focal point. It also did not provide a sufficient number of animals to demonstrate the reproducibility of the technique. In this revised version, we have included results from more animals and conducted a more comprehensive quantitative analysis, with the corresponding text updated accordingly. Specifically, we combined the previous Figures 4 and 5 into the current Figure 4 (corresponding revised text from Line 169 to Line 207). The revised Figures 5 and 6

      compare the results of the longitudinal study, presenting data from three mice (corresponding revised text from

      Line 224 to Line 258). Detailed information about the mice used has been added to Supplementary Figure 1, and Supplementary Figure 4 further provides a detailed display of the results for the three mice in longitudinal study. We hope that these adjustments will provide a more thorough validation of the longitudinal imaging.

      Reviewer 2 (Recommendations For The Authors):

      Minor comments:

      • The statistical analyses are not always explained: could they be stated briefly in the legends of each figure, or gathered in a statistical methods section with details for each figure? Be sure to use the appropriate test (e.g. student t-test is used in Fig 5 k whereas normality of distribution is not guaranteed.)

      Thank you for pointing this out. We acknowledge that the statistical analyses were not clearly explained in the original version. In the revised manuscript, we have ensured that the statistical methods are clearly described. 

      (Fig.4 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using t-test at each measurement point along the segments.”

      (Fig.6 caption) “b,c, Comparisons of vessel diameter (b) and flow velocity (c) for the selected arterial and venous segments. Statistical analysis was conducted using the two one-sided test (TOST) procedure, which evaluates the null hypothesis that the difference between the two weeks is larger than three times the standard deviation of one week.”

      Additionally, we corrected an error in the previous comparison of the violin plots on flow velocities, where a t-test was incorrectly applied; this has now been removed.

      • The authors use early in the manuscript the term vascularity, e.g. in "vascularity reduction", it is not exactly clear what they mean by vascularity, and would require a proper definition at that moment. If I am correct, a quantification of that "vascularity reduction" (page 5 line 132), is then done in Figures 5 d e f and j.

      Thank you for highlighting this issue. We acknowledge that our initial use of the term “vascularity” may have been unclear and potentially confusing. In the revised manuscript, we have included a clear definition of “vascularity” in the Methods section under Quantitative Analysis of ULM Images (Line 534). 

      The following sentence shows the definition of vascularity:

      (Line 547) “Vascularity was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.”

      We have also added an instant definition when it was firstly used in Results part:

      (Line 161) “When comparing vessel density maps, ULM images that are acquired in the awake state demonstrate a global reduction of vascularity, which refers to percentage of pixels that occupied by blood vessels.”

      • There is very little motion in the images presented, except for the awake "Bregma -4.2 mm" (Figure 3, directional maps), especially in the area including colliculi and mesencephalon, while the cortical vessels do not move. Can you comment on that?

      Thank you for highlighting this important aspect of motion in awake animal imaging. Motion correction is indeed a critical factor in such studies. In the original version of our discussion, we briefly addressed this issue (from Line 342 to Line 346), but we agree that a more detailed discussion is needed.

      To minimize motion artifacts, we conducted habituation to acclimate the animals to the head-fixation setup, which helps reduce anxiety during imaging. With thorough head-fixed habituation, the imaging quality is generally well-preserved. We also applied correlation-based motion correction techniques based on ULM images, which can partially correct for overall brain motion, as stated in the previous version. However, this ULM-images-based correction is limited to addressing only rigid motion.

      In the revised discussion, we have expanded on the limitations of our current motion correction approach and referenced recent work about more advanced motion correction methods:

      (Line 346) “While rigid motion correction is often effective in anesthetized animals, awake animal imaging presents greater challenges due to the more prominent non-rigid motion, particularly in deeper brain regions. This is evidenced in Supplementary Fig. 1 (Mouse 7), where cortical vessels remain relatively stable, but regions around the colliculi and mesencephalon exhibit more noticeable motion artifacts, indicating that displacement is more pronounced in deeper areas. To address these deeper, non-rigid motions, recent studies suggest estimating nonrigid transformations from unfiltered tissue signals before applying corrections to ULM vascular images(16,50). Such advanced motion correction strategies may be more effective for awake ULM imaging, which experiences higher motion variability. The development of more robust and effective motion correction techniques will be crucial to reduce motion artifacts in future awake ULM applications.”

      • Figure 1f maybe flip the color bar to have an upward up and downward down.

      Thank you for your suggestion. This display method indeed makes the images more intuitive. In the revised manuscript, all directional flow color bars have been flipped to ensure that upward flow is displayed as ‘up’ and downward flow as ‘down.’

      • Figure 2b the figure is a bit confusing in what is displayed between dashed lines, solid lines, dots... maybe it would be easier to read with

      - bigger dots and dashed lines in color for each of the 4 series

      - and so in the legend, thin solid lines in the corresponding color for the fit, but no solid line in the legend (to distinguish data/fit)

      - no lines for FWHM as they are not very visible, and the FWHM values are not mentioned for these examples.

      Thank you for your detailed suggestions. We agree that the original Fig. 2b appeared messy and confusing. Based on this feedback and other comments, we decided to replace the FWHM-based vessel diameter measurement with a more stable binarization-based approach. In the revised version, we selected a specific segment of each vessel and measured the diameter by calculating the distance from the vessel’s centerline to both side after binarization. Each point on the centerline of this segment provides a diameter measurement, which can be further used to calculate the mean and standard error. This updated method is more stable and reproducible, providing reliable measurements even for vessels that are not fully saturated. It also facilitates comparison across more vessels, helping to further demonstrate the generalizability of our saturation standard. We believe these adjustments make the revised Fig. 2b clearer and more readable.

      • Page 7, lines 144-147. This passage is not really clear when linking going up or down and going from the stem to the branches that it is specific to Figure 4a (and therefore to this particular location).

      Thank you for your insightful comments on our vessel classification method. We recognize the limitations of the previous approach and, in order to enhance the rigor of the study, we have opted not to continue using this method in the revised manuscript. We have removed all content related to vessel classification based on branchin and branch-out criteria. This includes the original Classification of Cerebral Vessels section in the Methods, the relevant descriptions in the Results section under “ULM reveals detailed cerebral vascular changes from anesthetized to awake for the full depth of the brain”, limitation of this classification method in Discussion section, as well as related content in the original Figures 4 and 5.

      In the revised analysis, for the comparison between arteries and veins, we focus solely on penetrating vessels in the cortex. For these vessels, it is generally accepted that downward-flowing vessels are arterioles, while upwardflowing vessels are venules. Accordingly, in the revised Figures 4 and 6, we analyze arterioles and venules exclusively in the cortex, without relying on the previous classification method that could be considered controversial.

      • Page 11 line 222 "higher vascular density" seems unprecise.

      Thank you for pointing this out. We have revised the sentence to more precisely convey our observations regarding changes in vascular diameter and vascularity within the ROI. We present these findings as evidence of the vasodilation effect under isoflurane, in alignment with existing research. The revised statement is as follows:

      (Line 275) “Statistical analysis from Fig. 4 shows that certain vessels exhibit a larger diameter under isoflurane anesthesia, and the vascularity, calculated as the percentage of vascular area within selected brain region ROIs, is also higher in the anesthetized state. These findings suggest a vasodilation effect induced by isoflurane, consistent with existing research(20,40,41,43,44).

      • Discussion: page 12, lines 257-267: it is not exactly clear how 3D imaging will help for the differentiation of veins/arteries. However, some methods have already been proposed to discriminate between arteries and veins using pulsatility (Bourquin et al., 2022) or 3D positioning when vessels are overlapped (Renaudin et al., 2023). The latter can also help estimate the out-of-plane positioning during longitudinal imaging.

      Bourquin, C., Poree, J., Lesage, F., Provost, J., 2022. In Vivo Pulsatility Measurement of Cerebral Microcirculation in Rodents Using Dynamic Ultrasound Localization Microscopy. IEEE Trans. Med. Imaging 41, 782-792. https://doi.org/10.1109/TMI.2021.3123912

      Renaudin, N., Pezet, S., Ialy-Radio, N., Demene, C., Tanter, M., 2023. Backscattering amplitude in ultrasound localization microscopy. Sci. Rep. 13, 11477. https://doi.org/10.1038/s41598-023-38531-w

      Thank you for pointing this out. We have revised the relevant paragraph in the discussion to clarify the potential advantages of advances in ULM imaging methods, such as those based on pulsatility (as described by Bourquin et al., 2022) or backscattering amplitude (as demonstrated by Renaudin et al., 2023). These established methods could be helpful for longitudinal imaging. Below is the revised text in the discussion section:

      (Line 370) “Advances in ULM imaging methods can benefit longitudinal awake imaging. For instance, dynamic ULM can differentiate between arteries and veins by leveraging pulsatility features(51). 3D ULM, with volumetric imaging array(52,53), enables the reconstruction of whole-brain vascular network, providing a more comprehensive understanding of vessel branching patterns. Meanwhile, 3D ULM also helps to mitigate the challenge of aligning the identical coronal plane for longitudinal imaging, a process that requires precise manual alignment in 2D ULM to ensure consistency. Additionally, this alignment issue can also be alleviated in 2D imaging using backscattering amplitude method, which may assist in estimating out-of-plane positioning during longitudinal imaging(54).”

      Reviewer 3 (Public Review):

      • It is unclear whether multiple animals were used in the statistical analysis.

      Thank you for bringing this to our attention. We acknowledge that the original version did not clearly indicate the use of animals in the statistical analysis. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. In the revised Figures 4 and 6, we have ensured that each quantitative analysis figure or its caption clearly indicate the specific mice.

      • Generalizations are sometimes drawn from what seems to be the analysis of a single vessel.

      Thank you for pointing this out. To enhance the generalizability of our conclusions, we have expanded our analysis beyond single vessels in several parts of the study. For instance, in Figure 2, we analyzed three vessels at different depths within the same brain region of a single mouse, and we have included additional results in the Supplementary Figure 2 to further support these findings. Additionally, we have revised the language in the manuscript to ensure that conclusions are appropriately qualified and avoid overgeneralization.

      In Figures 4 and 6, we extended the analysis from single vessels to larger region-of-interest (ROI) analyses across entire brain regions. Unlike single-vessel measurements, which are susceptible to bias based on specific measurement locations, ROI-based analyses are less influenced by the operator and provide more objective, generalizable insights.

      • The description of the statistical analysis is mostly qualitative.

      We recognize that some aspects of the original statistical analysis (Figures 4 and 5 in the previous version) lacked rigor and description is more qualitative. The revised version of statistical analysis (Figure 4 and Figure 6) presents our findings from multiple dimensions, ranging from individual vessels to individual cortical ROI of arteries and veins, and ultimately to broader brain regions. For instance, as illustrated in the revised Figure 4f, the average cortical arterial flow speed decreases by approximately 20% from anesthesia to wakefulness, while venous flow speed decreases by an average of 40%, with the reduction in venous flow speed being significantly greater than that of arterial flow. We believe that this kind of description offers more quantitative analysis.

      For more examples, please refer to the Results section where Figure 4 (Line 169 to Line 207) and Figure 6 (Line 224 to Line 258) are described. These sections have been extensively rewritten to emphasize quantitative interpretation of the data. Each part of the analysis now focuses more heavily on quantitative analyses that consistently show similar trends across all animals.

      • Some terms used are insufficiently defined.

      • Additional limitations should be included in the discussion.

      • Some technical details are lacking. 

      Thank you for highlighting these issues. In response, we have made several improvements in the revised manuscript to address these issues. We have clarified terms such as “vascularity” (Line 547) and “saturation point” (Line 112) to ensure precision and prevent ambiguity. We have expanded the discussion (Line 310 to Line 377) to include limitations such as motion correction challenges and advances in ULM imaging methods, including dynamic ULM and backscattering amplitude techniques. We have added further details on interleaved sampling (Line 494 to Line 497), ULM tracking (Line 517 to Line 529), and quantitative analysis (Line 535 to Line 551) in the Methods section to provide a clearer understanding of our approach. 

      Please refer to our other responses for more specific adjustments.

      • Without information about whether the results obtained come from multiple animals, it is difficult to conclude that the authors generally achieved their aim. They do achieve it in a single animal. The results that are shown are interesting and could have an impact on the ULM community and beyond. In particular, the experimental setup they used along with the high reproducibility they report could become very important for the use of ULM in larger animal cohorts.

      We thank the reviewer for recognizing the impact of our work. We also acknowledge that there were some issues—specifically, we did not provide sufficient proof of reproducibility. In the revised version, we have included additional animal experiment results to ensure that the conclusions were not drawn from a single animal but are generally representative of our aim. (See supplementary figure 1 for detailed use of the animals) 

      Reviewer 3 (Recommendations For The Authors):

      • The manuscript would be more convincing by removing some of the superlatives used in the text. For instance, shouldn't "super-resolution ultrasound localization microscopy" simply be "ultrasound localization microscopy"? Expressions such as "first study", "essential", and "invaluable", etc could be replaced by more factual terms. The word "significant" is also used sometimes with statistics to back it up and sometimes without.

      Thank you for highlighting this issue. We have removed the superlatives throughout the manuscript to make the language more precise. For instance, we have simplified “super-resolution ultrasound localization microscopy” to “ultrasound localization microscopy” throughout the main text and removed expressions such as “first study” and “invaluable”. We also reviewed all uses of “essential” and “significant,” replacing “essential” with more modest alternatives where it does not indicate a strict requirement. Similarly, where “significant” does not refer to statistical significance, we have used other terms to avoid any ambiguity.

      • The section "Microbubble count serves as a quantitative metric for awake ULM image reconstruction" had several issues that I think should be addressed. Mainly, the authors make the case that after detecting 5 million microbubbles, there is no clear gain in detecting more. The argument is not very convincing as we know many vessels will not have had a microbubble circulate in them within that timeframe, which will be especially true in smaller vessels. While the analysis in Figure 2 shows nicely that the diameter estimate for vessels in the 20-30 um range is stable at 5 million microbubbles, it is not necessarily the case for smaller vessels. A better approach here might be to select, e.g., a total of 5 million detected microbubbles for practical reasons and then to determine which vessel parameters estimation (e.g., diameter, flow velocity) remain stable. In addition:

      a. Terms such as 'complete ULM reconstruction', 'no obvious change', 'ULM image saturation' are not well defined within the manuscript.

      Thank you for pointing out these issues and for offering a more rigorous approach. We completely agree with your suggestion. While our analysis demonstrated stable diameter estimates for vessels with diameter around 20 µm at 5 million microbubbles, this does not necessarily ensure stability for smaller vessels. Therefore, the choice of 5 million microbubbles was primarily for practical reasons. In the revised version, we have provided a more objective description and clarification of this limitation. We also recognize that terms such as “complete ULM reconstruction,” “no obvious change,” and “ULM image saturation” were not well defined and may have caused confusion, reducing the rigor of this manuscript. Based on your feedback, we have clearly defined “ULM image saturation” within the context of our study, removed absolute and ambiguous terms like “complete ULM reconstruction” and “no obvious change”. We revised the entire section accordingly:

      (Line 109) “To facilitate equitable comparison of brain perfusion at different states, a practical saturation point enabling stable quantification of most vessels needs to be established. Our observations indicated that when the cumulative MB count reached 5 million, ULM images achieved a relatively stable state. Accordingly, in this study, the saturation point was defined as a cumulative MB count of 5 million. There are also possible alternatives for ULM image normalization. For example, different ULM images can be normalized to have the same saturation rate. However, the proposed method of using the same number of cumulative MB count for normalization enables the analysis of blood flow distribution across different brain regions from a probabilistic perspective. The following analysis substantiates this criterion.

      Fig. 2a compares ULM directional vessel density maps and flow speed maps generated with 1, 3, 5, and 6 million MBs, using the same animal as shown in Fig. 1. To quantitatively confirm saturation, multiple vessel segments were selected for further analysis. Fig. 2b presents the measured vessel diameter for a specific segment at various MB counts. After binarizing the ULM map, the vessel diameter was measured by calculating the distance from the vessel centerline to the edge. Each point along the centerline of the segment provided a diameter measurement, enabling calculation of the mean and standard error. At low MB counts, vessels appeared incompletely filled, leading to inaccurate estimation of vessel diameter due to incomplete profiles. For example, at 1–2 million MBs, the binarized ULM map displayed a width of only one or two pixels along the segment. As a result, the measurements always yielded the same diameter values (two pixels, ~10um) with a consistently low standard error of the mean across the entire segment. With increased MB counts, the measured vessel diameter gradually rose, ultimately reaching saturation. The plots in Fig. 2b show that vessel diameter stabilized at 5 million MB count. Additionally, Fig. 2c illustrates the changes in flow velocity measured at different cumulative MB counts. The violin plots display the distribution of flow speed estimates for all valid centerline pixels within the selected segment. At low MB counts (1–3 million), flow velocity estimates fluctuated, but they stabilized as the MB count increased (4–6 million MBs). At 5 million MBs, flow velocity estimates were nearly identical to those at 6 million MBs, corroborating previous findings that vessel velocity measurements stabilize as MB count grows(39). To assess the generalizability of the 5 million MB saturation condition, vessel segments from three different mice across various brain regions were examined. The results, shown in Supplementary Fig. 2, confirm that this saturation criterion applies broadly. Although the 5 million MB threshold may not ensure absolute saturation for all vessels, it is generally effective for vessels larger than 15 μm. This MB count threshold was therefore adopted as a practical criterion.” 

      b. The choice of 10 consecutive tracking frames is arbitrary and should be described as such unless a quantitative optimization study was conducted. Was there a gap-filling parameter? What was the maximum linking distance and what is its impact on velocity estimation?

      Thank you for your comment. We acknowledge that the choice of 10 consecutive tracking frames was based on our common practice rather than a specific quantitative optimization. Additionally, with the uTrack algorithm, we set both the gap-filling parameter and maximum linking distance to 10 pixels. Setting these parameters too high could potentially overestimate velocity. These details have now been added to the Methods section for clarity:

      (Line 517) “The choice of 10 consecutive frames (10 ms) was based on established practice but can be adjusted as needed. For the uTrack algorithm, two additional key parameters were specified: the maximum linking distance and the gap-filling distance, both set to 10 pixels (~50 microns). This configuration means that only bubble centroids within 10 pixels of each other across consecutive frames are considered part of the same bubble trajectory. Additionally, when the start and end points of two tracks fall within this threshold, the gap-filling parameter merges them into a single, continuous track. It is important to select these parameters carefully, as overly large values could lead to an overestimation of flow velocity. By setting the maximum linking distance to 10 pixels, we effectively limited the measurable velocity to 50 mm/s, under the assumption that no bubble would exceed a 50-micron displacement within the 1 ms interval between frames. After determining bubble tracks with the specified parameters for uTrack algorithm, accumulating the MB tracks resulted in the flow intensity map. Considering the velocity distribution across the mouse brain, this 50 mm/s limit ensures that the vast majority of blood flow is captured accurately.”

      c. 'The plots (Figure 2b) clearly indicate that the vessel diameter stabilized beyond 5 million MB count.' This is true for one vessel. To generalize that claim, the analysis should be performed quantitatively on a larger sample of vessels in various areas of the brain, across multiple animals.

      Thank you for pointing out this limitation. We agree that conclusions drawn from a single vessel cannot be generalized across all regions. Following your suggestion, we have added Supplementary Figure 2, where we analyzed multiple vessels from different brain regions across three mice. This expanded analysis further confirms that a 5 million MB count is sufficient to stabilize vessel diameter measurements across various samples.

      (Line 133) “To assess the generalizability of the 5 million MB saturation condition, vessel segments from three different mice across various brain regions were examined. The results, shown in Supplementary Fig. 2, confirm that this saturation criterion applies broadly. Although the 5 million MB threshold may not ensure absolute saturation for all vessels, it is generally effective for vessels larger than 15 μm. This MB count threshold was therefore adopted as a practical criterion.” 

      • "Statistical analysis validates the increase in blood flow induced by anesthesia" is a very interesting section but even though a quantitative analysis was conducted in Figure 5, the language used remains mostly qualitative. I think this section should include quantitative conclusions from the statistical analysis to increase the impact of this work.

      Thank you for your valuable feedback. We recognize that some aspects of the original quantitative analysis (Figures 4 and 5 in the previous version) lacked rigor, such as the classification of arteries, veins, and capillaries, and that the data presented in each row of Figure 5 represented only one mouse per coronal section, limiting the generalizability of statistical conclusions.

      In response to the reviewers’ feedback, the revised version incorporates a new approach by merging the previous Figure 4 and Figure 5 into a single, consolidated figure (now Figure 4). This updated figure aims to present our findings from multiple dimensions, ranging from individual vessels to individual cortical ROI of arteries and veins, and ultimately to broader brain regions. We have focused on quantitative analyses that consistently show similar trends across all animals. For instance, as illustrated in the revised Figure 4f, the average cortical arterial flow speed decreases by approximately 20% from anesthesia to wakefulness, while venous flow speed decreases by an average of 40%, with the reduction in venous flow speed being significantly greater than that of arterial flow. We believe that this approach offers more insightful analysis and enhances the overall impact of the study.

      For more examples, please refer to the revised Results section where Figure 4 are described (from Line 169 to Line 212). These sections have been extensively rewritten to emphasize quantitative interpretation of the data. Each part of the analysis now focuses more heavily on quantitative analyses that consistently show similar trends across all animals.

      • In the methods, it is claimed that 6 healthy female C57 mice were used in the study, but it is hard to tell whether more than one animal is shown in the figures. It is also unclear whether the statistics were performed within or across animals. Since one of the major strengths of the manuscript is that it shows the feasibility of performing reproducible measurements using ULM, most figures should be repeated for each individual animal and provided in supplementary data and statistics should be performed across animals.

      Thank you for bringing this to our attention. We acknowledge that the original version did not clearly indicate the use of individual animals. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. Additionally, we included statistics across animals in the revised Figures 4 and 6, and detailed data for each individual mouse are now provided in Supplementary Figures 3 and 4.

      • The effect of aliasing should be discussed given that 1) a high-frequency probe is used along with a correspondingly relatively low frame rate (1000 fps) and 2) Doppler filtering is used to separate upward from downward-moving microbubbles. There will be microbubbles that circulate faster than the Nyquist limit, which will thus appear as moving in the opposite direction in the Doppler spectrum. It would be important to double-check that the effect is not too important and to report this as a limitation in the discussion.

      Thank you for highlighting this important point. Aliasing is indeed a relevant issue to consider, especially for higher flow velocities in large vessels. We have added a discussion on this limitation in the revised manuscript:

      (Line 359) “Based on the maximum linking distance and gap closing parameters outlined in the Methods section, blood flow with velocities below 50 mm/s can be detected. However, the use of a directional filter to estimate flow direction may introduce aliasing. MBs moving at higher velocities may be subject to incorrect flow direction estimation due to aliasing effects. Given that the compounded frame rate is 1000 Hz, with an ultrasound center frequency of 20 MHz and a sound speed of 1540 m/s, the relationship between Doppler frequency and the axial blood flow velocity(12) indicates that aliasing will not occur for axial flow velocities below 19.25 mm/s. In all flow velocity maps presented in this study, the range is limited to a maximum of 15 mm/s, remaining below the critical threshold for aliasing. Additionally, all vessels analyzed in the violin plots for arteriovenous flow comparisons fall within this range. While cortical arterioles and venules generally exhibit moderate flow speeds, aliasing remains a factor to consider when combining directional filtering with velocity analysis.”

      • The method used to classify vessels may be incorrect and may not be needed. I would recommend the authors not use it and describe the vessels as vessels that branch in or out, etc. Applying an arbitrary threshold of 2 to detect capillaries is also not very convincing. I understand that the authors might decide to maintain this nomenclature, in which case I would recommend clearly explaining it at the beginning of the manuscript along with some of the caveats that are already reported in the discussion.

      Thank you for your comments on our vessel classification method. We recognize the limitations of the previous approach and, in order to enhance the rigor of the study, we have opted not to continue using this method in the revised manuscript.

      In the revised analysis regarding artery and vein, we focus solely on penetrating vessels in the cortex. For these vessels, it is generally accepted that downward-flowing vessels are arterioles, while upward-flowing vessels are venules. Accordingly, in the revised Figures 4 and 6, we analyze arterioles and venules exclusively in the cortex, without relying on the previous classification method that could be considered controversial.

      Additionally, we agree that classifying vessels with values below 2 as capillaries was not a robust approach. Thus, we have removed all related analyses from the revised manuscript.

      Minor comments:

      • Line 16: "resolves capillary-scale ..."; it is not clear that the resolution that is achieved in this work is at the capillary scale.

      Thank you for your valuable feedback. We understand that “capillary-scale” may overstate the achieved resolution in our work. To clarify, we have revised the sentence as follows:

      (Line 18) “Ultrasound localization microscopy (ULM) is an emerging imaging modality that resolves microvasculature in deep tissues with high spatial resolution.” 

      This adjustment more accurately reflects the resolution capabilities of ULM as used in our study.

      • Line 22: 'vascularity' is not well defined in the manuscript. Consider defining or using another term.

      Thank you for pointing out the need for clarification on vascularity. We acknowledge that our initial use of the term “vascularity” may have been unclear and potentially confusing. In the revised manuscript, we have included a clear definition of “vascularity” in the Methods section under Quantitative Analysis of ULM Images (Line 534). 

      The following sentence shows the definition of vascularity:

      (Line 547) “Vascularity was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.”

      We have also added an instant definition when it was firstly used in Results part:

      (Line 161) “When comparing vessel density maps, ULM images that are acquired in the awake state demonstrate a global reduction of vascularity, which refers to percentage of pixels that occupied by blood vessels.”

      • Line 30: I'm not convinced the first two sentences are useful.

      Thank you for pointing out this issue. The opening sentence of the article lacked focus and was too broad. We have rewritten the sentence as follows:

      (Line 34) “Sensitive imaging of correlates of activity in the awake brain is fundamental for advancing our understanding of neural function and neurological diseases.”

      • Line 37: 'micron-scale capillaries': this expression is unclear. Capillaries are typically micron-scaled, so it gives the impression that ULM can image ULM at the one-micron scale, which is not the case.

      Thank you for your helpful comment. We agree that “micron-scale capillaries” could be misleading, as it might imply a resolution at the single-micron level. To clarify, we have revised the sentence as follows:

      (Line 40) “ULM is uniquely capable of imaging microvasculature situated in deep tissue (e.g., at a depth of several centimeters).”

      This revised wording more accurately describes ULM’s capability without implying single-micron level resolution.

      • Line 74: I don't think motion-free imaging is possible in the context of awake animals. Consider 'limiting motion' instead.

      Thank you for pointing out the potential issue with the term “motion-free”. We agree that achieving entirely motion-free imaging is challenging, especially in the context of awake animals. In response to your suggestion, we have revised the sentence to better reflect this limitation:

      (Line 76) “To achieve consistent ULM brain imaging while allowing limited movement in awake animals, a headfixed imaging platform with a chronic cranial window was used in this study.”

      This revised wording more accurately conveys our approach to minimizing motion without implying that motion is completely eliminated.

      • Line 134:'clearly reveals decreased vessel diameter' How was that demonstrated?

      • Line 153: 'significant' according to which statistical test?

      • Line 167: 'slight increase', by how much, is it significant?

      • Line 183: 'smaller vessels' the center of the distribution is not at 10mm/s, and velocity is not necessarily correlated with diameter.

      • Line 184: 'more large vessels', see above. What is a large vessel, and how was this measured?

      • Line 205: 'significantly lower', according to which statistical test?

      We acknowledge that the original version did not properly use the terms of statistical analysis. In the revised manuscript, we have deleted the related points, and rewritten the statistical analysis part to ensure the terms are used correctly. Please refer to the revised part of “ULM reveals an increase in blood flow induced by isoflurane anesthesia” (From Line 169 to Line 209). In the revised Figures 4 and 6, we have also ensured that each quantitative analysis figure or its caption is clearly explained.

      •    Line 398: the interleaved sampling scheme should be described in more detail.

      Thank you for pointing out this issue. The previous version did not clearly explain the details of interleaved sampling. We have now added the following paragraph to the Ultrasound imaging sequence section in Methods:

      (Line 494) “Interleaved sampling is employed to capture high-frequency echoes more effectively. With the system’s sampling rate limited to 62.5 MHz, the upper limit of the center frequency of the transducer passband is 15.625 MHz. To mitigate aliasing, two transmissions are sent per angle, staggered in time. This approach effectively doubles the sampling rate, ensuring more accurate image reconstruction.”

      • Figure 1: Which mouse is it? Are these results consistent across all animals?

      • Figure 2: Which mouse is it? Are these results consistent across all animals?

      • Figure 3: Which mouse is it? Are these results consistent across all animals?

      • Figure 4: Which mouse is it? Are these results consistent across all animals?

      • Figure 5: Is it a single mouse or multiple mice? Are these results consistent across all animals?

      We acknowledge that the original version did not clearly indicate the numbers of animals in the statistical analysis. In the revised manuscript, we have added Supplementary Figure 1 to specify the mice used, and we have labeled each mouse accordingly in the figures or captions. In the revised Figures 4 and 6, we have ensured that each quantitative analysis figure or its caption clearly indicate the specific mice.

      For original Figures 1 and 2, these are presented as case studies to illustrate the methodology. Since the anesthesia time required for tail vein injection for each animal varies slightly, it is challenging to have the consistent time taken for each mouse to recover from anesthesia across all mice. For instance, in Figure 1, the mouse took nearly 500 seconds to recover from anesthesia, but this duration is not consistent across all animals, which is a limitation of the bolus injection technique. We have noted this point in the discussion (discussion on the limitation of bolus injection), and we have also clarified in the results section and figure captions that these figures represent a case study of a single mouse rather than a standardized recovery time for all animals.

      We further clarified this point in the end of the Figure 2 caption:

      (Fig.2 caption) “This figure presents a case study based on the same mouse shown in Fig 1. The x-axis for d-f begins at 500 seconds because, at this point, the mouse’s pupil size stabilized, indicating it had recovered to an awake state. Consequently, ULM images were accumulated starting from this time. It is important to note that not every mouse requires 500 seconds to fully awaken; the time to reach a stable awake state varies across individual mice.” We added the following statement before introducing Figure 1e:

      (Line 93) “Due to differences in tail vein injection timing and anesthesia depth, the time required for each mouse to fully awaken varied. Although it was not feasible to get pupil size stabilized just after 500 seconds for each animal, ULM reconstruction only used the data that acquired after the animal reached full pupillary dilation, to ensure that ULM accurately captures the cerebrovascular characteristics in the awake state.”

      We added the following statement before introducing Figure 2d:

      (Line 139) “To further verify that the proposed MB bolus injection method can help to achieve ULM image saturation shortly after mice awaken from anesthesia, an analysis on the change in MB concentration over time was conducted once pupil size had stabilized (T = 500s).”

      For Figures 3, 4, and 5 (in the revised version, Figures 4 and 5 have been combined into a single Figure 4), the data represents results from three individual mice, with each coronal plane corresponding to a different mouse. In the revised version, we have added labels to indicate the specific mouse in each image to improve clarity. We also recognize that some analyses in the original submission (original Figure 5) may have lacked sufficient statistical power due to the small sample size. Therefore, in the revised version, we have focused only on findings that were consistently observed across the three mice to ensure robust conclusions.

      Minor corrections and typos from all reviewers:

      We would like to sincerely thank the reviewers for their careful reading of our manuscript. We appreciate the time and effort taken to point out the minor typographical errors. We have carefully addressed and corrected all the identified typos, as listed below:

      From Reviewer #1:

      • Line 316: "insensate": correct, please.

      (Line 409) “After confirming that the mouse was anesthetized, the head of the animal was fixed in the stereotaxic frame.”

      From Reviewer #3:

      • Line 15: Super-resolution ultrasound localization microscopy -- consider removing super-resolution as it gives the impression that it is different from standard ULM.

      (Line 18) “Ultrasound localization microscopy (ULM) is an emerging imaging modality that resolves microvasculature in deep tissues with high spatial resolution.”

      • Line 39: typo: activities should be activity.

      (Line 41) “ULM can also be combined with the principles of functional ultrasound (fUS) to image whole-brain neural activity at a microscopic scale.”

      • Line 47: typo: over under.

      (Line 50) “Therefore, in neuroscience research, brain imaging in the awake state is often preferred over imaging under anesthesia.”

      Once again, we are grateful for the reviewers’ thorough review and valuable input, which have helped us improve the clarity and precision of the manuscript.

    1. eLife Assessment

      This valuable paper explores the idea that transient modulations of neural gain promote switches between distinct perceptual interpretations of ambiguous stimuli. The authors provide solid evidence for this idea by pupillometry (an indirect proxy of neuromodulatory activity), fMRI, neural network modeling, and dynamical systems analyses. The highly integrative nature of this approach is rare in the field.

    2. Reviewer #1 (Public review):

      Summary:

      This paper proposes a neural mechanism underlying the perception of ambiguous images: neuromodulation changes the gain of neural circuits promoting a switch between two possible percepts. Converging evidence for this is provided by indirect measurements of neuromodulatory activity and large-scale brain dynamics which are linked by a neural network model. However, both the data analysis as well as the computational modeling are incomplete and would benefit from a more rigorous approach.

      This is a revised version of the manuscript which, in my view, is a considerable step forward compared to the original submission.

      In particular, the authors now model phasic gain changes in the RNN, based on the network's uncertainty. This is original and much closer to what is suggested by the phasic pupil responses. They also show that switching is actually a network effect because switching times depend on network configuration (Fig 2). This resolves my main comments 1 and 2 about the model.

      The mechanism, as I understand it, is different from what the authors described before in the RNN with tonic gain changes. As uncertainty increases, the network enters a regime in which the two excitatory populations start to oscillate. My intuition is that this oscillation arises from the feedback loop created by the new gain control mechanism. If my intuition is correct, I think it would be worth to explain this mechanism in the paper more explicitly.

      Overall, the modeling part of the paper has changed quite a lot and I think it is now more solid which is why I have updated my "strength of evidence" rating.

    3. Reviewer #2 (Public review):

      This paper tests the hypothesis that perceptual switches during the presentation of ambiguous stimuli are accompanied by changes in neuromodulation that alter neural gain and trigger abrupt changes in brain activity. To test this hypothesis, the study combines pupillometry, artificial recurrent network (RNN) analysis and fMRI recording. In particular, the study uses methods of energy landscape analysis inspired by physics, which is particularly interesting.

      Strengths

      - The authors should be commended for combining different methods (pupillometry, RNNs, fMRI) to test their hypothesis. This combination provides a mechanistic insight into perceptual switches in the brain and artificial neural networks.<br /> - The study combines different viewpoints and fields of scientific literature, including neuroscience, psychology, physics, and dynamical systems. In order to make this combination more accessible to the reader, the different aspects are presented in a pedagogical way to be accessible to all fields.<br /> - This combination of methods and viewpoints is rarely done, so it is very useful.<br /> - The authors introduce dynamic gain modulation in their recurrent neural network, which is novel. They devote a section of the paper to studying the dynamics, fixed points and convergence of this type of network.

      Weaknesses

      - The study may not be specific to perceptual switches. This is because the study relies on a paradigm in which participants report when they identify a switch in the item category. Therefore, it is unclear whether the effects reported in the paper are related to the perceptual switch itself, to attention, or to the detection of behaviourally relevant events. The authors are cautious and explicitly acknowledge this point in their study.<br /> - The demonstration of the causal role of gain modulation in perceptual switches is partial. This causality is clearly demonstrated in the simulation work with the RNN. However, it is not fully demonstrated in the pupil analysis and the fMRI analysis. One reason is that this work is correlative (which is already very informative). An analysis of the timing of the effect might have overcome this limitation. For example, in a previous study, the same group showed that fMRI activity in the LC region precedes changes in the energy landscape of fMRI dynamics, which is a step towards investigating causal links between gain modulation, changes in the energy landscape and perceptual switches.<br /> - Some effects may reflect the expectation of a perceptual switch rather than the perceptual switch itself. To mitigate this risk, the design of the fMRI task included catch trials, in which no switch occurs, to reduce the expectation of a switch. The pupil study, however, did not include such catch trials.<br /> - The paper uses RNN-based modelling to provide mechanistic insight into the role of gain modulation in perceptual switches. However, the RNN solves a task that differs markedly from that performed by human participants, which may limit the explanatory value of the model. The RNN is provided with two inputs characterising the sensory evidence supporting the first and last image category in the sequence (e.g. plane and shark). In contrast, observers in the task were naïve as to the identity of the last image at the beginning of the sequence. The brain first receives sensory evidence about the image category (e.g. plane) with which the sequence begins, which is very easy to recognise, then it sees a sequence of morphed images and has to discover what the final image category will be. To discover the final image category, the brain has to search a vast space of possible second images (it is a shark?, a frog?, a bird?, etc.), rather than comparing the likelihood of just two categories. This search process and the perceptual switch in the task appear to be mechanistically different from the competition between two inputs in the RNN.<br /> - Another aspect of the motivation for the RNN model remains unclear. The authors introduce dynamic gain modulation in the RNN, but it is not clear what the added value of dynamic gain modulation is. Both static (Fig. S1) and dynamic (Fig. 2F) gain modulation lead to the predicted effect: faster switching when the gain is larger.<br /> - The authors are to be commended for addressing their research questions with multiple tools and approaches. There are links between the different parts of the study. The RNN and the pupil are linked by the notion of gain modulation, the RNN and the fMRI analysis are linked by the study of the energy landscape, the fMRI study and the pupil study are indirectly linked by previous work for this group showing that the peak in LC fMRI activity precedes a flattening of the energy landscape. These links are very interesting but could have been stronger and more complete.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural mechanisms underlying the change in perception when viewing ambiguous figures. Each possible percept is related to an attractor-like brain state and a perceptual switch corresponds to a transition between these states. The hypothesis is that these switches are promoted by bursts of noradrenaline that change the gain of neural circuits. The authors present several lines of evidence consistent with this view: pupil diameter changes during the time point of the perceptual change; a gain change in neural network models promotes a state transition; and large-scale fMRI dynamics in a different experiment suggests a lower barrier between brain states at the change point. However, some assumptions of the computational model seem not well justified and the theoretical analysis is incomplete. The paper would also benefit from a more in-depth analysis of the experimental data.

      Strengths:

      The main strength of the paper is that it attempts to combine experimental measurements - from psychophysics, pupil measurements, and fMRI dynamics - and computational modeling to provide an emerging picture of how a perceptual switch emerges. This integrative approach is highly useful because the model has the potential to make the underlying mechanisms explicit and to make concrete predictions.

      Weaknesses:

      A general weakness is that the link between the three parts of the paper is not very strong. Pupil and fMRI measurements come from different experiments and additional analysis showing that the two experiments are comparable should be included. Crucially, the assumptions underlying the RNN modeling are unclear and the conclusions drawn from the simulation may depend on those assumptions.

      With this comment in mind we have made substantial effort to better integrate the three different aspects of our paper. On the pupillometry side, we now show that the dynamic uncertainty associated with perceptual categorisation shares a similar waveform with the observed fluctuations in pupil diameter around the switch point (Fig 2B). To better link the modelling to the behaviour we have also made the gain of the activation function of each sigmoidal unit change dynamically as a function of the uncertainty (i.e. the entropy) of the network’s classification generating phasic changes in gain that mimic the observed phasic changes in pupil dilation explicitly linking the dynamics of gain in the RNN to the observed dynamics of pupil diameter (our non-invasive proxy for neuromodulatory tone). Finally we note that the predictions of the RNN (flattened egocentric landscape and peaks in low-dimensional brain state velocity at the time point of the perceptual switch) were tested directly in the whole-brain BOLD data, which links the modelling and BOLD analysis. Finally we note that whilst we agree that an experiment in which pupilometry and BOLD data were collected simultaneously would be ideal, these data were not available to us at the time of this study.

      Main points:

      Perceptual tasks in pupil and fMRI experiments: how comparable are these two tasks? It seems that the timing is very different, with long stimulus presentations and breaks in the fMRI task and a rapid sequence in the pupil task. Detailed information about the task timing in the pupil task is missing. What evidence is there that the same mechanisms underlie perceptual switches at these different timescales? Quantification of the distributions of switching times/switching points in both tasks is missing. Do the subjects in the fMRI task show the same overall behavior as in the pupil task? More information is needed to clarify these points.

      We recognize the need for a more detailed and comparative analysis of the perceptual tasks used in our pupil and fMRI experiments, particularly regarding differences in timing, task structure, and instructions. The fMRI task incorporates jittered inter-trial intervals (ITIs) of 2, 4, 6, and 8 seconds, designed to enable effective deconvolution of the BOLD response (Stottinger et al., 2018). In contrast, the pupil task presents a more rapid sequence of stimuli without ITIs. These timing differences are reflected in the mean perceptual switch points: the 8th image in the fMRI task and the 9th image in the pupil task. This small yet consistent difference suggests subtle influences of task design on behavior.

      Despite these structural and instructional differences, our analyses indicate that overall behavioral patterns remain consistent across the two modalities. The distributions of switching times align closely, and no significant behavioral deviations were observed that might suggest a fundamental difference in the underlying mechanisms driving perceptual switches. These findings suggest that the additional time and structural differences in the fMRI task do not significantly alter the behavioral outcomes compared to the pupil task.

      To address these issues, we have added paragraphs in the Results, Methods, and Limitations sections of the manuscript. In the Results section, we provide a detailed comparison of switching point distributions across the two tasks, emphasizing behavioral consistencies and any observed variations. In the Methods section, we include an expanded description of task timing, instructions, and the presence or absence of catch trials to ensure clarity regarding the experimental setups. Finally, in the Limitations section, we acknowledge the structural differences between the tasks, particularly the lack of catch trials and rapid stimulus presentation in the pupil task, and discuss how these differences may influence perceptual dynamics.

      These additions aim to clarify how task-specific factors, such as timing, instructions, and catch trials, influence perceptual dynamics while highlighting the consistency in behavioral outcomes across both experimental setups. We believe these revisions address the concerns raised and enhance the manuscript’s transparency and rigor.

      Computational model:

      (1) Modeling noradrenaline effects in the RNN: The pupil data suggests phasic bursts of NA would promote perceptual switches. But as I understand, in the RNN neuromodulation is modeled as different levels of gain throughout the trial. Making the neural gain time-dependent would allow investigation of whether a phasic gain change can explain the experimentally observed distribution of switching times.

      We thank the reviewer for this very helpful suggestion. We updated the RNN so that, post-training, gain changes dynamically as a function of the network's classification uncertainty (i.e. the entropy of the network's output). Specifically, the gain dynamics of each unit in the neural network are governed by a linear ODE with a forcing function given by the entropy of the network’s classification (i.e. the uncertainty of the classification). This explicitly tests the hypothesis that uncertainty driven increases in gain near the perceptual switch (when the input is maximally ambiguous) speeds perceptual switches, and allows us to distinguish between tonic and phasic increases in gain (in the absence of uncertainty forcing gain decays exponentially to a tonic value of 1). Importantly, in line with our hypothesis, we found that switch times decreased as we increased the impact of uncertainty on gain (i.e. switch times decreased as the magnitude of uncertainty forcing increased). Finally, we wish to note that although making gain dynamical is relatively simple conceptually, actually implementing it and then analysing the dynamics turned out to be highly non-trivial. To our knowledge our model is the first RNN of reasonable size to implement dynamical gain requiring us to push the RNN modelling beyond the current state of the art (see Fig 2 - 4).

      (2) Modeling perceptual switches: in the results, it is described that the networks were trained to output a categorical response, but the firing rates in Fig 2B do not seem categorical but rather seem to follow the input stimulus. The output signals of the network are not shown. If I understand correctly, a trivial network that would just represent the two input signals without any internal computation and relay them to the output would do the task correctly (because "the network's choice at each time point was the maximum of the two-dimensional output", p. 22). This seems like cheating: the very operation that the model should perform is to signal the change, in a categorical manner, not to represent the gradually changing input signals.

      The output of the network was indeed trained to be categorical via a cross entropy loss function with the output defined by the max of the projection of the excitatory hidden units onto the output weights which is boilerplate RNN modelling practice. As requested we now show the output in Fig 2B. On the broader question of whether a trivially small network could solve the task we are in total agreement that with the right set of hand-crafted weights a two neuron sigmoidal network with winner-take-all readout could solve the task. We disagree, however, that using an RNN is cheating in any way. Many tasks in neuroscience can be trivially solved with a very small number of recurrent units (e.g. basically all 2AF tasks). The question we were interested in is how the brain might solve the task, and more specifically how neuromodulator control of gain changes the dynamics of our admittedly very simple task. We could have done this by hand crafting a small network to solve the task but we wanted to use the RNN modelling as a means of both hypothesis testing and hypothesis generation. We now expand on and justify this modelling choice in the second paragraph of the discussion:

      “We chose to use an RNN, instead of a simpler (more transparent) model as we wanted to use the RNN as a means of both hypothesis generation and hypothesis testing. Specifically, unlike more standard neuronal models which are handcrafted to reproduce a specific effect, when building an RNN the modeller only specifies the network inputs, labels, and the parameter constraints (e.g. Dale’s law) in advance. The dynamics of the RNN are entirely determined by optimisation. Post-training manipulations of the RNN are not built in, or in any way guaranteed to work, making them more analogous to experimental manipulations of an approximately task-optimal brain-like system. Confirmatory results are arguably, therefore, a first steps towards an in vitro experimental test.”

      (3) The mechanism of how increased gain leads to faster switches remains unclear to me. My first intuition was that increasing the gain of excitatory populations (the situation shown in Fig. 2E) in discrete attractor models would lead to deeper attractor wells and this would make it more difficult to switch. That is, a higher gain should lead to slower decisions in this case. However, here the switching time remains constant for a gain between 1 and 1.5. Lowering the gain, on the other hand, leads to slower switching. It is, of course, possible that the RNN behaves differently than classical point attractor models or that my intuition is incorrect (though I believe it is consistent with previous literature, e.g. Niyogi & Wong-Lin 2013 (doi:10.1371/journal.pcbi.1003099) who show higher firing rates - more stable attractors - for increased excitatory gain).

      We thank the reviewer for the astute observation, which we entirely agree with. The energy landscape analysis is a method still under active development within our group and we are still learning how to best explain it and its relationship to more traditional ways of quantifying potential-like energy functions of dynamical systems which we think the reviewer has in mind. We have now included a second type of energy landscape analysis which gives a complementary perspective on the RNN dynamics and is more straightforwardly comparable to typical potential functions. We describe the new analysis in the section “Large-scale neural predictions of recurrent neural network model” as follows:

      “Crucially, there are two complementary viewpoints from which we can construct an energy landscape; the first allocentric (i.e., third-person view) perspective quantifies the energy associated with each position in state space, whereas the second egocentric (i.e., first person view) perspective quantifies the energy associated relative changes independent of the direction of movement or the location in state space. The allocentric perspective is straightforwardly comparable to the potential function of a dynamical system but can only be applied to low dimensional data in settings where a position-like quantity is meaningfully defined. The egocentric perspective is analogous to taking the point of view of a single particle in a physical setting and quantifying the energy associated with movement relative to the particles initial location. An egocentric framework is thus more applicable, when signal magnitude is relative rather than absolute. See materials and methods, and (see Fig S4 for an intuitive explanation of the allocentric and egocentric energy landscape analysis on a toy dynamical system).”

      From the allocentric perspective it is entirely true that increasing gain increases the depth of the landscape, equivalent to increasing the depth of the attractor. However, because the input to the network changes dynamically the location of the approximate fixed-point attractor changes and the network state “chases” this attractor over the course of the trial. Importantly, the location of the energy minima changes more rapidly as gain increases, effectively forcing the network to rapidly change course at the point of the perceptual switch (see Fig 4). To quantify this effect we constructed a new measure - neural work - which describes the amount of “force” exerted on the low-dimensional neural trajectory by the vector field quantified by the allocentric landscape. Specifically we treat the allocentric landscape as analogous to a potential function and then leverage the fact that force is equal to the negative gradient of potential energy to calculate the work (force x displacement) done on the low dimensional trajectory at each time point. This showed that as gain increases the amount of work done on the neuronal trajectory at turning points increases analogous to the application of an external force transiently increasing the kinetic energy of an object. From the perspective of the egocentric landscape this results in a flattening of the landscape as there is a lower energy (i.e. higher probability) assigned to large deviations in the neuronal trajectory around the perceptual switch.

      Because of the novelty of the analyses we went to great lengths to carefully explain the methods in the updated manuscript. In addition we wrote a short tutorial style MATLAB script implementing both the allocentric and egocentric landscape analysis on a toy dynamical system with a known potential function (a supercritical pitchfork bifurcation).

      (4) From the RNN model it is not clear how changes in excitatory and inhibitory gain lead to slower/faster switching. In order to better understand the role of inhibitory and excitatory gain on switching, I would suggest studying a simple discrete attractor model (a rate model, for example as in Wong and Wang 2006 or Roxin and Ledberg, Plos Comp. Bio 2008) which will allow to study these effects in terms of a very few model parameters. The Roxin paper also shows how to map rate models onto simplified one-dimensional systems such as the one in Fig S3. Setting up the model using this framework would allow for making much stronger, principled statements about how gain changes affect the energy landscape, and under which conditions increased inhibitory gain leads to faster switching.

      One possibility is that increasing the excitatory gain in the RNN leads to saturated firing rates. If this is the reason for the different effects of excitatory and inhibitory gain changes, it should be properly explained. Moreover, the biological relevance of this effect should be discussed (assuming that saturation is indeed the explanation).

      We thank the reviewer for this excellent suggestion. After some consideration we decided that studying a reduced model would likely not do justice to the dynamical mechanisms of RNN especially after making gain dynamical rather than stationary. Still we very much share the reviewer’s concern that we need a stronger link between the (now dynamical) gain alterations and energy landscape dynamics. To this end we now describe and interrogate the dynamics of the RNN at a circuit level through selectivity and lesion based analyses, at a population level through analysis of the dynamical regime traversed by the network, and finally, through an extended energy landscape framework which has far stronger links to traditional potential based descriptions of low-dimensional dynamical systems (also see to comment 3. above).

      At a circuit level the speeding of perceptual switches is mediated by inhibition of the initially dominant population we describe in paragraphs 7 and 8 of the section “Computational evidence for neuromodulatory-mediated perceptual switches in a recurrent neural network” as follows:

      “Having confirmed our hypothesis that increasing gain as a function of the network uncertainty increased the speed of perceptual switches, we next sought to understand the mechanisms governing this effect starting with the circuit level and working our way up to the population level (c.f. Sheringtonian and Hopfieldian modes of analysis(66)). Because of the constraint that the input and output weights are strictly positive, we could use their (normalised) value as a measure of stimulus selectivity. Inspection of the firing rates sorted by input weights revealed that the networks had learned to complete the task by segregating both excitatory and inhibitory units into two stimulus-selective clusters (Fig 2C). As the inhibitory units could not contribute to the networks read out, we hypothesised that they likely played an indirect role in perceptual switching by inhibiting the population of excitatory neurons selective for the currently dominant stimulus allowing the competing population to take over and a perceptual switch to occur.

      To test this hypothesis, we sorted the inhibitory units by the selectivity of the excitatory units they inhibit (i.e. by the normalised value of the readout weights). Inspecting the histogram of this selectivity metric revealed a bimodal distribution with peaks at each extreme strongly inhibiting a stimulus selective excitatory population at the exclusion of the other (Fig S2). Based on the fact that leading up to the perceptual switch point both the input and firing rate of the dominant population are higher than the competing population, we hypothesized that gain likely speeds perceptual switches by actively inhibiting the currently dominant population rather than exciting/disinhibiting the competing population. We predicted, therefore, that lesioning the inhibitory units selective for the stimulus that is initially dominant would dramatically slow perceptual switches, whilst lesioning the inhibitory units selective for the stimulus the input is morphing into would have a comparatively minor slowing effect on switch times since the population is not receiving sufficient input to take over until approximately half way through the trial irrespective of the inhibition it receives. As selectivity is not entirely one-to-one, we expect both lesions to slow perceptual switches but differ in magnitude. In line with our prediction, lesioning the inhibitory units strongly selective for the initially dominant population greatly slowed perceptual switches (Fig 3F upper), whereas lesioning the population selective for the stimulus the input morphs into removed the speeding effect of gain but had a comparatively small slowing effect on perceptual switches (Fig 3F lower).”

      At the population level we characterised the dynamics of the 2D parameter space (defined by gain and the difference between the input dimensions) traversed by the network over the course of a trial as input and gain dynamically change. We describe this paragraphs 9-14 of the section “Computational evidence for neuromodulatory-mediated perceptual switches in a recurrent neural network” which we reprint below for the reviewers convenience :

      “Based on the selectivity of the network firing rates we hypothesised that the dynamics were shaped by a fixed-point attractor whose location and existence were determined by gain and  and thus changed dynamically over the course of a single trial(67-70). Because of the large size of the network, we could not solve for the fixed points or study their stability analytically. Instead we opted for a numerical approach and characterised the dynamical regime (i.e. the location and existence of approximate fixed-point attractors) across all combinations of gain and  visited by the network. Specifically, for each combination of elements in the parameter space  we ran 100 simulations with initial conditions (firing rates) drawn from a uniform distribution between [0,1], and let the dynamics run for 10 seconds of simulation time (10 times the length of the task - longer simulation times did not qualitatively change the results) without noise. As we were interested in the existence of fixed-point attractors rather than their precise location, at each time point we computed the difference in firing rate between successive time points across the network. For each simulation we computed both the proportion of trials that converged to a value below  10^-2 giving us proxy for the presence of fixed points, and the time to convergence, giving us a measure of the “strength” of the attractor.

      Across gain values when input had unambiguous values, the network rapidly converged across all initialisations (Fig 3A & 3C-H). When input became ambiguous, however, the dynamics acquired a decaying oscillation and did not converge within the time frame of the simulation. As gain increased, the range of  values characterised by oscillatory dynamics broadened. Crucially, for sufficiently high values of gain, ambiguous  values transitioned the network into a regime characterised by high amplitude inhibition-driven oscillations (Fig 3D & 3G). Each trial can, therefore, be characterised by a trajectory through this 2-dimensional parameter space, with dynamics shaped by the dynamical regimes of each location visited (Fig 3A-B).

      When uncertainty has a small impact on gain the network has a trajectory through an initial regime characterised by the rapid convergence to a fixed point where the population representing the initial stimulus dominated whilst the other was silent (Fig 3C), an uncertain regime characterised by oscillations with all neurons partially activated (Fig 3D), and after passing through the oscillatory regime, the network once again enters a new fixed-point regime where the population representing the initial stimulus is now silent and the other is dominant (Fig 3E).

      For high gain trails, the network again started and finished in states characterised by a rapid convergence to a fixed point representing the dominant input dimension (Fig 3F-H), but differed in how it transitioned between these states. Uncertain inputs now generated high amplitude oscillations with the network flip-flopping between active and silent states (Fig 3G). We hypothesised that, within the task, this has the effect of silencing the initially dominant population, and boosting the competing population. To test this we initialised each network with parameter values well inside the oscillatory regime (u = [ .5, .5]  , gain = 1.5) with initial conditions determined by the selectivity of each unit. Excitatory units selective for input dimension 1, as well as the associated inhibitory units projecting to this population, were fully activated, whilst the excitatory units selective for  input dimension 2 and the associated inhibitory units were silenced. As we predicted, when initialised in this state the network dynamics displayed an out of phase oscillation where the initially dominant population was rapidly silenced and the competing population was boosted after a brief delay (219 (ms), +/-114 Fig S3).”

      From this we concluded that at a population level, heightened gain leading up to the perceptual switch speeds the switch by transiently pushing the dynamics into an unstable dynamical regime replacing the fixed-point attractor representing the input with an oscillatory regime that actively inhibits the currently dominant population and boosts the competing population before transitioning back into a regime with a stable (approximate) fixed-point attractor representing the new stimulus (Fig 3F-H & Fig S3).

      As we describe in the our response to comment 3 above our extended energy-landscape analysis framework now includes an explicit link between the potential of the dynamical system and allocentric landscape, whilst also explaining how a transient deepening of the allocentric landscape (which can be essentially thought of analogous to a traditional potential function) relates to the flattening of the egocentric landscape.

      Finally, whilst we appreciate the interest in further characterising the effect of inhibitory gain compared with excitatory gain the topic is is largely orthogonal the aims of our paper so we have removed the discussion of inhibitory vs excitatory gain. Still, we understand that we need to do our due diligence and check that our results do not break down when we manipulate either inhibitory or excitatory gain in isolation. To this end we checked that dynamical gain still speeded perceptual switches when the effect was isolated to inhibitory or excitatory cells in isolation. We show the behavioural plots below for the reviewer’s interest.

      Author response image 1.

      Switch time as a function of uncertainty forcing

      Alternative mechanisms:

      It is mentioned in the introduction that changes in attention could drive perceptual switches. A priori, attention signals originating in the frontal cortex may be plausible mechanisms for perceptual switches, as an alternative to LC-controlled gain modulation. Does the observed fMRI dynamics allow us to distinguish these two hypotheses? In any case, I would suggest including alternative scenarios that may be compatible with the observed findings in the discussion.

      We agree with the reviewer, in that attention is itself a confound and a process that is challenging to disentangle from the perceptual switching process in the current task. Importantly, we were not arguing for exclusivity in our manuscript, but merely testing the veracity of the hypothesis that the ascending arousal system may play a causal role in mediating and/or speeding perceptual switches. Future work with experiments that more specifically aim to dissociate these different features will be required to tease apart these different possibilities.

      Reviewer #2 (Public Review):

      Strengths

      - the study combines different methods (pupillometry, RNNs, fMRI).

      - the study combines different viewpoints and fields of the scientific literature, including neuroscience, psychology, physics, dynamical systems.

      - This combination of methods and viewpoints is rarely done, it is thus very useful.

      - Overall well-written.

      Weaknesses

      - The study relies on a report paradigm: participants report when they identify a switch in the item category. The sequence corresponds to the drawing of an object being gradually morphed into another object. Perceptual switches are therefore behaviorally relevant, and it is not clear whether the effect reported correspond to the perceptual switch per se, or the detection of an event that should change behavior (participant press a button indicating the perceived category, and thus switch buttons when they identify a perceptual change). The text mentions that motor actions are controlled for, but this fact only indicates that a motor action is performed on each trial (not only on the switch trial); there is still a motor change confounded with the switch. As a result, it is not clear whether the effect reported in pupil size, brain dynamics, and brain states is related to a perceptual change, or a decision process (to report this change).

      We agree with the reviewer that the coupling of the motor change with the perceptual switch is confounded to some degree, but since motor preparation occurs on every trial we suspect that it is more accurate to describe it as confounded with task-relevance more than motor preparation per se.  While it is possible that pupil diameter, network topology and energy landscape features are all related to motor change rather than the perceptual switch, we note that the weight of evidence is against this interpretation, given the simple mechanistic explanation created by the coupling of perceptual uncertainty to network gain.

      - The study presents events that co-occur (perceptual switch, change in pupil size, energy landscape of brain dynamics) but we cannot identify the causes and consequences. Yet, the paper makes several claims about causality (e.g. in the abstract "neuromodulatory tone ... causally mediates perceptual switches", in the results "the system flattening the energy landscape ... facilitated an updating of the content of perception").

      We have made an effort to soften the causal language, where appropriate. In addition, we note that we have changed the title to “Gain neuromodulation mediates task-relevant perceptual switches: evidence from pupillometry, fMRI, and RNN Modelling” to reflect the fact that our claims do not extent to cases of perceptual switches where the stimulus is only passively observed.

      - Some effects may reflect the expectation of a perceptual switch, rather than the perceptual switch per se. Given the structure of the task, participants know that there will be a perceptual switch occurring once during a sequence of morphed drawings. This change is expected to occur roughly in the middle of the sequence, making early switches more surprising, and later switches less surprising. Differences in pupil response to early, medium, and late switches could reflect this expectation. The authors interpret this effect very differently ("the speed of a perceptual switch should be dependent on LC activity").

      The task includes catch trials designed to reduce the expectation of a perceptual switch. In these trials, a perceptual switch occurs either earlier or later than usual. While these trials are valuable for mitigating predictability, we did not focus extensively on them, as they were thoroughly discussed in the original paper. Additionally, due to the limited number of catch trials, it is difficult—if not impossible—to calculate a reliable mean surprise per image set.

      It is also worth noting that the pupil study does not include catch trials, which could contribute to differences in how perceptual switches are processed and interpreted between the fMRI and pupil experiments.

      - The RNN is far more complex than needed for the task. It has two input units that indicate the level of evidence for the two categories being morphed, and it is trained to output the dominant category. A (non-recurrent) network with only these two units and an output unit whose activity is a sigmoid transform of the difference in the inputs can solve the task perfectly. The RNN activity is almost 1-dimensional probably for this reason. In addition, the difficult part of the computation done by the human brain in this task is already solved in the input that is provided to the network (the brain is not provided with the evidence level for each category, and in fact, it does not know in advance what the second category will be).

      We agree that a simpler model could perform the task. We opted to use an RNN rather than hand craft a simpler model as we wanted to use the model as both a method of hypothesis testing and hypothesis generation. We now expand on and justify this modelling choice in the second paragraph of the discussion (also see our response to Reviewer 1 comment 4):

      “We chose to use an RNN, instead of a simpler (more transparent) model as we wanted to use the RNN as a means of both hypothesis generation and hypothesis testing. Specifically, unlike more standard neuronal models which are handcrafted to reproduce a specific effect, when building an RNN the modeller only specifies the network inputs, labels, and the parameter constraints (e.g. Dale’s law) in advance. The dynamics of the RNN are entirely determined by optimisation. Post-training manipulations of the RNN are not built in, or in any way guaranteed to work, making them more analogous to experimental manipulations of an approximately task-optimal brain-like system. Confirmatory results are arguably, therefore, a first steps towards an in vitro experimental test.”

      In other words, a simpler model would not have been appropriate to the aims. In addition we note that low dimensional dynamics are extremely common in the RNN literature and are in no way unique to our model. 

      - Basic fMRI results are missing and would be useful, before using elaborate analyses. For instance, what are the regions that are more active when a switch is detected?

      We explicitly chose to not run a standard voxelwise statistical parametric approach on these data, as the results were reported extensively in the original study (Stottinger et al., 2018).

      - The use of methods from physics may obscure some simple facts and simpler explanations. For instance, does the flatter energy landscape in the higher gain condition reflect a smaller number of states visited in the state space of the RNN because the activity of each unit gets in the saturation range? If correct, then it may be a more straightforward way of explaining the results.

      We appreciate the reviewer's concern as this would indeed be a problem. However, this is not the case for our network. At the time point of the perceptual switch where the egocentric landscape dynamics are at their flattest the RNN firing rates are approximately 50% activated nowhere near the saturation point. In addition, a flatter landscape in the egocentric and allocentric landscape analyses only occurs - mathematically speaking - when there are more states visited not less.

      In addition, we note that we are very sympathetic to the complexity of our physics based analyses and have gone to great lengths to describe them in an accessible manner in both the main text and methods. We have also included tutorial style code demonstrating how the analysis can be used on a toy dynamical system in the supplementary material.

      - Some results are not as expected as the authors claim, at least in the current form of the paper. For instance, they show that, when trained to identify which of two inputs u1 and u2 is the largest (with u2=1-u1, starting with u1=1 and gradually decreasing u1), a higher gain results in the RNN reporting a switch in dominance before the true switch (e.g. when u1=0.6 and u2=0.4), and vice et versa with a lower gain. In other words, it seems to correspond to a change in criterion or bias in the RNN's decision. The authors should discuss more specifically how this result is related to previous studies and models on gain modulation. An alternative finding could have been that the network output is a more (or less) deterministic function of its inputs, but this aspect is not reported.

      We appreciate this comment but it is simply not applicable to our network. There is no criterion in the RNN. We could certainly add one but this would be a significant departure from how decisions are typically modelled in RNNs. The (deterministic) readout is the max of the projection of the (instantaneous) excitatory firing rate onto the readout weights. A shift in criterion would imply that the dynamics are unaffected and the effect can be explained by a shift in the readout weights; this cannot be the case because the readout weights are stationary the change occurs at the level of the activation function.

      We are aware that there is a large literature in decision making and psychophysics that uses the term gain in a slightly different way. Here we are strictly referring to the gain of the activation function. Although we agree that it would be interesting and important to discuss the differing uses of the term gain, this is beyond the scope of the present paper.

    1. eLife Assessment

      This useful study aimed to examine the relationship of spatial frequency selectivity of single macaque inferotemporal (IT) neurons to category selectivity. Interesting findings in this report suggest a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. Some of the findings were difficult to evaluate because the methods are incomplete. The conclusion that single-unit spatial frequency selectivity can predict object coding requires further evidence to confirm.

    2. Reviewer #1 (Public Review):

      This study reports that spatial frequency representation can predict category coding in the inferior temporal cortex. The original conclusion was based on likely problematic stimulus timing (33 ms which was too brief). Now the authors claim that they also have a different set of data on the basis of longer stimulus duration (200 ms).

      One big issue in the original report was that the experiments used a stimulus duration that was too brief and could have weakened the effects of high spatial frequencies and confounded the conclusions. Now the authors provided a new set of data on the basis of a longer stimulus duration and made the claim that the conclusions are unchanged. These new data and the data in the original report were collected at the same time as the authors report.

      The authors may provide an explanation why they performed the same experiments using two stimulus durations and only reported one data set with the brief duration. They may also explain why they opted not to mention in the original report the existence of another data set with a different stimulus duration, which would otherwise have certainly strengthened their main conclusions.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper aimed to examine the spatial frequency selectivity of macaque inferotemporal (IT) neurons and its relation to category selectivity. The authors suggest in the present study that some IT neurons show a sensitivity for the spatial frequency of scrambled images. Their report suggests a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. In addition, they report that the selectivity for faces and objects, relative to scrambled stimuli, depends on the spatial frequency tuning of the neurons.

      Strengths:

      Previous studies using human fMRI and psychophysics studied the contribution of different spatial frequency bands to object recognition, but as pointed out by the authors little is known about the spatial frequency selectivity of single IT neurons. This study addresses this gap and shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly. They related this weak spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli they employed to assess category selectivity.

      The authors revised their manuscript and provided some clarifications regarding their experimental design and data analysis. They responded to most of my comments but I find that some issues were not fully or poorly addressed. The new data they provided confirmed my concern about low responses to their scrambled stimuli. Thus, this paper shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly (see main comments below). They related this (weak) spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli to assess category selectivity.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study reports that spatial frequency representation can predict category coding in the inferior temporal cortex.

      Thank you for taking the time to review our manuscript. We greatly appreciate your valuable feedback and constructive comments, which have been instrumental in improving the quality and clarity of our work.

      The original conclusion was based on likely problematic stimulus timing (33 ms which was too brief). Now the authors claim that they also have a different set of data on the basis of longer stimulus duration (200 ms).

      One big issue in the original report was that the experiments used a stimulus duration that was too brief and could have weakened the effects of high spatial frequencies and confounded the conclusions. Now the authors provided a new set of data on the basis of a longer stimulus duration and made the claim that the conclusions are unchanged. These new data and the data in the original report were collected at the same time as the authors report.

      The authors may provide an explanation why they performed the same experiments using two stimulus durations and only reported one data set with the brief duration. They may also explain why they opted not to mention in the original report the existence of another data set with a different stimulus duration, which would otherwise have certainly strengthened their main conclusions.

      Thank you for your comments regarding the stimulus duration used in our experiments. We appreciate the opportunity to clarify and provide further details on our methodology and decisions.

      In our original report, we focused on the early phase of the neuronal response, which is less affected by the duration of the stimulus. Observations from our data showed that certain neurons exhibited high firing rates even with the brief 33 ms stimulus duration, and the results we obtained were consistent across different durations. To avoid redundancy, we initially chose not to include the results from the 200 ms stimulus duration, as they reiterated the findings of the 33 ms duration.

      However, we acknowledge that the brief stimulus duration could raise concerns regarding the robustness of our conclusions, particularly concerning the effects of high spatial frequencies. Upon reflecting on the reviewer’s comments during the first revision, we recognized the importance of addressing these potential concerns directly. Therefore, we have included the data from the 200 ms stimulus duration in our revised manuscript.

      Furthermore, Our team is actively investigating the differences between fast (33 ms) and slow (200 ms) presentations in terms of SF processing. Our preliminary observations suggest similar processing of HSF in the early phase of the response for both fast and slow presentations, but different processing of HSF in the late phase. This was another reason we initially opted to publish the results from the brief stimulus duration separately, as we intended to explore the different aspects of SF processing in fast and slow presentations in subsequent studies.

      I suggest the authors upload both data sets and analyzing codes, so that the claim could be easily examined by interested readers.

      Thank you for your suggestion to make both data sets and the analyzing codes available for examination by interested readers.

      We have created a repository that includes a sample of the dataset along with the necessary codes to output the main results. While we cannot provide the entire dataset at this time due to ongoing investigations by our team, we are committed to ensuring transparency and reproducibility. The data and code samples we have provided should enable interested readers to verify our claims and understand our analysis process.

      Repository: https://github.com/ramintoosi/spatial-frequency-selectivity

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to examine the spatial frequency selectivity of macaque inferotemporal (IT) neurons and its relation to category selectivity. The authors suggest in the present study that some IT neurons show a sensitivity for the spatial frequency of scrambled images. Their report suggests a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. In addition, they report that the selectivity for faces and objects, relative to scrambled stimuli, depends on the spatial frequency tuning of the neurons.

      Strengths:

      Previous studies using human fMRI and psychophysics studied the contribution of different spatial frequency bands to object recognition, but as pointed out by the authors little is known about the spatial frequency selectivity of single IT neurons. This study addresses this gap and shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly. They related this weak spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli they employed to assess category selectivity.

      Thank you for your thorough review and insightful feedback on our manuscript. We greatly appreciate your time and effort in providing valuable comments and suggestions, which have significantly contributed to enhancing the quality of our work.

      The authors revised their manuscript and provided some clarifications regarding their experimental design and data analysis. They responded to most of my comments but I find that some issues were not fully or poorly addressed. The new data they provided confirmed my concern about low responses to their scrambled stimuli. Thus, this paper shows spatial frequency selectivity in IT for scrambled stimuli that drive the neurons poorly (see main comments below). They related this (weak) spatial frequency selectivity to category selectivity, but these findings are premature given the low number of stimuli to assess category selectivity.

      While we acknowledge that the number of instances per condition is relatively low, the overall dataset is substantial. Specifically, our study includes a total of 180 stimuli (6 spatial frequencies × 2 scrambled/non-scrambled conditions × 15 instances, including 9 fixed and 6 non-fixed) and 5400 trials (180 stimuli × 2 durations × 15 repetitions). Conducting these trials requires approximately one hour of experimental time per session.

      Extending the number of stimuli, while potentially addressing this limitation, would significantly compromise the quality of the experiment by increasing the duration and introducing potential fatigue effects in the subjects. Despite this limitation, our findings lay important groundwork by offering novel insights into object recognition through the lens of spatial frequency. We believe this work can serve as a foundation for future experiments designed to further explore and validate these theories with expanded stimulus sets.

      Main points.

      (1) They have provided now the responses of their neurons in spikes/s and present a distribution of the raw responses in a new Figure. These data suggest that their scrambled stimuli were driving the neurons rather poorly and thus it is unclear how well their findings will generalize to more effective stimuli. Indeed, the mean net firing rate to their scrambled stimuli was very low: about 3 spikes/s. How much can one conclude when the stimuli are driving the recorded neurons that poorly? Also, the new Figure 2- Appendix 1 shows that the mean modulation by spatial frequency is about 2 spikes/s, which is a rather small modulation. Thus, the spatial frequency selectivity the authors describe in this paper is rather small compared to the stimulus selectivity one typically observes in IT (stimulus-driven modulations can be at least 20 spikes/s).

      To address the concerns regarding the firing rates and the modulation of neuronal responses by spatial frequency (SF), we emphasize several key points:

      (1) Significance of Firing Rate Differences: While it is true that the mean net firing rate to our scrambled stimuli was relatively low, the firing rate differences observed were statistically significant, with p-values approximately at 1e-5. This indicates that despite the low firing rates, the observed differences are reliable and unlikely to have occurred by chance.

      (2) Classification Rate and Modulation by SF: Our analysis showed that the difference between various SF responses led to a classification rate of 44.68%, which is 24.68% higher than the chance level. This substantial increase above the chance level demonstrates that SF significantly modulates IT responses, even if the overall firing rates are modest.

      (3) Effect Size and SF Modulation: While the effect size in terms of firing rate differences may be small, it is significant. The significant modulation of IT responses by SF, as evidenced by our statistical analyses and classification rate, supports our conclusions regarding the role of SF in driving IT responses.

      (4) Expectations for Noise-like Pure SF Stimuli: We acknowledge that IT responses are typically higher for various object stimuli. Given the nature of our pure SF stimuli, which resemble noise-like patterns, we did not anticipate high responses in terms of spikes per second. The low firing rates are consistent with the expectation for such stimuli and do not undermine the significance of the observed modulation by SF.

      We believe that these points collectively support the validity of our findings and the significance of SF modulation in IT responses, despite the low firing rates. We appreciate your insights and hope this clarifies our stance on the data and its implications.

      We added the following description to the Appendix 1 - “Strength of SF selectivity” section:

      “While the firing rates and net responses to scrambled stimuli were modest (e.g., 2.9 Hz in T1), the differences across spatial frequency (SF) bands were statistically significant (p ≈ 1e-5) and led to a classification accuracy 24.68\% above chance. This demonstrates the robustness of SF modulation in IT neurons despite low firing rates. The modest responses align with expectations for noise-like stimuli, which are less effective in driving IT neurons, yet the observed SF selectivity highlights a fundamental property of IT encoding.”

      (2) Their new Figure 2-Appendix 1 does not show net firing rates (baseline-subtracted; as I requested) and thus is not very informative. Please provide distributions of net responses so that the readers can evaluate the responses to the stimuli of the recorded neurons.

      We understand the reviewer’s concern about the presentation of net firing rates. In T2 (the late time interval), the average response rate falls below the baseline, resulting in negative net firing rates, which might confuse readers. To address this, we have added the net responses to the text for clarity. Additionally, we have included the average baseline response in the figure to provide a more comprehensive view of the data.

      “To check the SF response strength, the histogram of IT neuron responses to scrambled, face, and non-face stimuli is illustrated in this figure. A Gamma distribution is also fitted to each histogram. To calculate the histogram, the neuron response to each unique stimulus is calculated for each neuron in spike/seconds (Hz). In the early phase, T1, the average firing rate to scrambled stimuli is 26.3 Hz which is significantly higher than the response in -50 to 50ms which is 23.4 Hz. In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The average net responses to the scrambled, face, and non-face stimuli are 2.9 Hz, 7.1 Hz, and 5.4 Hz, respectively. Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5 Hz, 19.4 Hz, and 22.4 Hz, respectively. The corresponding average net responses are 3.9 Hz, 4.0 Hz, and 1.0 Hz below the baseline response.”

      (3) The poor responses might be due to the short stimulus duration. The authors report now new data using a 200 ms duration which supported their classification and latency data obtained with their brief duration. It would be very informative if the authors could also provide the mean net responses for the 200 ms durations to their stimuli. Were these responses as low as those for the brief duration? If so, the concern of generalization to effective stimuli that drive IT neurons well remains.

      The firing rates for the 200 ms stimulus duration are as follows: 27.7 Hz, 30.7 Hz, and 30.4 Hz for scrambled, face, and object stimuli in T1), respectively; and 26.2 Hz, 29.1 Hz, and 33.9 Hz in T2. The average baseline firing rate (−50 to 50 ms) is 23.4 Hz. Therefore, the net responses are 4.3 Hz, 7.3 Hz, and 7.0 Hz for T1; and 2.8 Hz, 5.7 Hz, and 10.5 Hz for T2 for scrambled, face, and object stimuli, respectively.

      Notably, the impact of stimulus duration is more pronounced in T2, which is consistent with the time interval of the T2 compared to T1. However, the firing rates in T1 do not show substantial changes with the longer duration. As we discussed in our response to the first comment, it is important to note that high net responses are not typically expected for scrambled or noise-like stimuli in IT neurons. Instead, the key findings of this study lie in the statistical significance of these responses and their meaningful relationship to category selectivity. These results highlight the broader implications for understanding the role of spatial frequency in object recognition.

      We added the firing rates to the, Appendix 1, “Extended stimulus duration supports LSF-preferred tuning” part as follows.

      “For the 200 ms stimulus duration, the firing rates were 27.7 Hz, 30.7 Hz, and 30.4 Hz for scrambled, face, and object stimuli in T1, respectively, and 26.2 Hz, 29.1 Hz, and 33.9 Hz in T2. The corresponding net responses were 4.3 Hz, 7.3 Hz, and 7.0 Hz in T1, and 2.8 Hz, 5.7 Hz, and 10.5 Hz in T2. While the longer stimulus duration did not substantially increase firing rates in T1, its impact was more pronounced in T2.”

      (4) I still do not understand why the analyses of Figures 3 and 4 provide different outcomes on the relationship between spatial frequency and category selectivity. I believe they refer to this finding in the Discussion: "Our results show a direct relationship between the population's category coding capability and the SF coding capability of individual neurons. While we observed a relation between SF and category coding, we have found uncorrelated representations. Unlike category coding, SF relies more on sparse, individual neuron representations.". I believe more clarification is necessary regarding the analyses of Figures 3 and 4, and why they can show different outcomes.

      Figure 3 explores the relationship between SF coding and category coding at both the single-neuron and population levels.

      ● Figures 3(a) and 3(b) examine the relationship between a single neuron’s response pattern and object decoding in the population.

      ● Figure 3(c) investigates the relationship between a single neuron’s SF decoding capabilities and object decoding in the population.

      ● Figure 3(d) assesses the relationship between a single neuron’s object decoding capabilities and SF decoding in the population.

      In summary, Figure 3 demonstrates a relation between SF coding/response pattern at the single level and category coding at the population level.

      Figure 4, on the other hand, addresses the uncorrelated nature of SF and category coding.

      ● Figure 4(a) shows the uncorrelated relation between a single neuron’s SF decoding capability and its object decoding capability. This suggests that a neuron's ability to decode SF does not predict its ability to decode object categories.

      ● Figure 4(b) illustrates that the contribution of a neuron to the population decoding of SF is uncorrelated with its contribution to the population decoding of object categories. This further supports the idea that the mechanisms behind SF coding and object coding are uncorrelated.

      In summary, Figure 4 suggests that while there is a relation between SF coding and category coding as illustrated in Figure 3, the mechanisms underlying SF coding and object coding operate independently (in terms of correlation), highlighting the distinct nature of these processes.

      We hope this explanation clarifies why the analyses in Figures 3 and 4 present different outcomes. Figure 3 provides insight into the relationship between SF and category coding, while Figure 4 emphasizes the uncorrelated nature of these processes. We also added the following explanation in the “Uncorrelated mechanisms for SF and category coding” section.

      Based on your command, to clarify the presentation of the work, we added the following description to the “Uncorrelated mechanisms for SF and category coding” section:

      “Figures 3 and 4 examine different aspects of the relationship between SF and category coding. Figure 3 highlights a relationship between SF coding at the single-neuron level and category coding at the population level. Conversely, Figure 4 demonstrates the uncorrelated mechanisms underlying SF and category coding, showing that a neuron’s ability to decode SF is not predictive of its ability to decode object categories. This distinction underscores that while SF and category coding are related at broader levels, their underlying mechanisms are independent, emphasizing the distinct processes driving each form of coding.”

      (5) The authors found a higher separability for faces (versus scrambled patterns) for neurons preferring high spatial frequencies. This is consistent for the two monkeys but we are dealing here with a small amount of neurons. Only 6% of their neurons (16 neurons) belonged to this high spatial frequency group when pooling the two monkeys. Thus, although both monkeys show this effect I wonder how robust it is given the small number of neurons per monkey that belong to this spatial frequency profile. Furthermore, the higher separability for faces for the low-frequency profiles is not consistent across monkeys which should be pointed out.

      We appreciate the reviewer’s concern regarding the relatively small number of neurons in the high spatial frequency group (16 neurons, 6% of the total sample across the two monkeys) and the consistency of the results. While we acknowledge this limitation, it is important to note that findings involving sparse subsets of neurons can still be meaningful. For example, Dalgleish et al. (2020) demonstrated that perception can arise from the activity of as few as ~14 neurons in the mouse cortex, supporting the sparse coding hypothesis. This underscores the potential robustness of results derived from small neuronal populations when the activity is statistically significant and functionally relevant.

      Regarding the higher separability for faces among neurons preferring high spatial frequencies, the consistency of this finding across both monkeys suggests that this effect is robust within this subgroup. For neurons preferring low spatial frequencies, we agree that the lack of consistency across monkeys should be explicitly noted. These differences may reflect individual variability or differences in sampling across subjects and merit further investigation in future studies.

      To address this concern, we have updated the text to explicitly discuss the small size of the high spatial frequency group, its implications, and the observed inconsistency in the low spatial frequency profiles between monkeys. We have added the following description to the discussion.

      “Next, according to Figure 3(a), 6% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Importantly, findings involving small neuronal populations can still be meaningful, as studies like Dalgleish et al. (2020) have demonstrated that perception can arise from the activity of as few as ~14 neurons in the mouse cortex, emphasizing the robustness of sparse coding.”

      Regarding the separability of faces for the low-frequency profiles, we added the following to the appendix section,

      “For neurons preferring LSF, LP profile, it is important to note the lack of consistency in responses across monkeys. This variability may reflect individual differences in neural processing or variations in sampling between subjects.”

      And in the discussion:

      “Our results are based on grouping the neurons of the two monkeys; however, the results remain consistent when looking at the data from individual monkeys as illustrated in Appendix 2. However, for neurons preferring LSF, we observed inconsistency across monkeys, which may reflect individual differences or sampling variability. These findings highlight the complexity of SF processing in the IT cortex and suggest the need for further research to explore these variations.”

      * Henry WP Dalgleish, Lloyd E Russel, lAdam M Packer, Arnd Roth, Oliver M Gauld, Francesca Greenstreet, Emmett J Thompson, Michael Häusser (2020) How many neurons are sufficient for perception of cortical activity? eLife 9:e58889.

      (6) I agree that CNNs are useful models for ventral stream processing but that is not relevant to the point I was making before regarding the comparison of the classification scores between neurons and the model. Because the number of features and trial-to-trial variability differs between neural nets and neurons, the classification scores are difficult to compare. One can compare the trends but not the raw classification scores between CNN and neurons without equating these variables.

      We appreciate the reviewer’s follow-up comment and agree that differences in the number of features and trial-to-trial variability between IT neurons and CNN units make direct comparisons of raw classification scores challenging. As the reviewer suggests, it is more appropriate to focus on comparing trends rather than absolute scores when analyzing the similarities and differences between these systems. In light of this, we have revised the text to clarify that our intention was not to equate raw classification scores but to highlight the qualitative patterns and trends observed in spatial frequency encoding between IT and CNN units.

      “SF representation in the artificial neural networks

      We conducted a thorough analysis to compare our findings with CNNs. To assess the SF coding capabilities and trends of CNNs, we utilized popular architectures, including ResNet18, ResNet34, VGG11, VGG16, InceptionV3, EfficientNetb0, CORNet-S, CORTNet-RT, and CORNet-z, with both pre-trained on ImageNet and randomly initialized weights. Employing feature maps from the four last layers of each CNN, we trained an LDA model to classify the SF content of input images. Figure 5(a) shows the SF decoding accuracy of the CNNs on our dataset (SF decoding accuracy with random (R) and pre-trained (P) weights, ResNet18: P=0.96±0.01 / R=0.94±0.01, ResNet34 P=0.95±0.01 / R=0.86±0.01, VGG11: P=0.94±0.01 / R=0.93±0.01, VGG16: P=0.92±0.02 / R=0.90±0.02, InceptionV3: P=0.89±0.01 / R=0.67±0.03, EfficientNetb0: P=0.94±0.01 / R=0.30±0.01, CORNet-S: P=0.77±0.02 / R=0.36±0.02, CORTNet-RT: P=0.31±0.02 / R=0.33±0.02, and CORNet-z: P=0.94±0.01 / R=0.97±0.01). Except for CORNet-z, object recognition training increases the network's capacity for SF coding, with an improvement as significant as 64\% in EfficientNetb0. Furthermore, except for the CORNet family, LSF content exhibits higher recall values than HSF content, as observed in the IT cortex (p-value with random (R) and pre-trained (P) weights, ResNet18: P=0.39 / R=0.06, ResNet34 P=0.01 / R=0.01, VGG11: P=0.13 / R=0.07, VGG16: P=0.03 / R=0.05, InceptionV3: P=<0.001 / R=0.05, EfficientNetb0: P=0.07 / R=0.01). The recall values of CORNet-Z and ResNet18 are illustrated in Figure 5(b). However, while the CNNs exhibited some similarities in SF representation with the IT cortex, they did not replicate the SF-based profiles that predict neuron category selectivity. As depicted in Figure 5(c) although neurons formed similar profiles, these profiles were not associated with the category decoding performances of the neurons sharing the same profile.”

      Discussion:

      “Finally, we compared SF's representation trends and findings within the IT cortex and the current state-of-the-art networks in deep neural networks.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The mean baseline firing rate of their neurons (23.4 Hz) was rather high for single IT neurons (typically around 10 spikes/s or lower). Were these well-isolated units or mainly multiunit activity?

      We confirm that the recordings in our study were from both well-isolated single units and multi-unit activities (remaining after isolation neurons) sorted based on our spike sorting toolbox. The higher baseline firing rate is likely due to the experimental design, particularly the inclusion of the responsive neurons from the selectivity phase. We added the following statement to the methods section.

      “In our analysis, we utilized both well-isolated single units and multi-unit activities (which represent neural activities that could not be further sorted into single units), ensuring a comprehensive representation of neural responses across the recorded population.”

    1. eLife Assessment

      This important study identifies species- and sex-specific neuronal cell types and gene expression in the preoptic area (POA) to help understand the evolutionary divergence of social behaviors. The evidence from single-nucleus RNA sequencing and immunostaining is compelling and suggests that cellular differences in the POA may contribute to behavioral variations such as mating and parental care that are apparent in two closely related deer mouse species. These rich observations provide an entry point for future hypothesis-driven experiments to demonstrate a causal role for these populations in sex- or species-variable behaviors in vertebrates. These data will be a resource that is of value to behavioral neuroscientists.

    2. Reviewer #1 (Public review):

      (1) Summary of the Paper:

      This paper by Chen et al. examines the cellular composition and gene expression of the hypothalamic medial preoptic area (MPOA) in two closely related deer mouse species (P. maniculatus and P. polionotus) that exhibit distinct social behaviors. Through single-nucleus RNA sequencing (snRNA-seq), Chen et al., identify sex- and species-specific neuronal cell types that likely contribute to differences in mating and parental care. By comparing monogamous and promiscuous species, the study provides insights into how neuronal diversity and gene expression changes in the MPOA might underlie the evolution of social behaviors.

      (2) Strengths of the Paper:

      The paper excels in several areas. First, the data presentation is clear and well-organized, making the complex findings easy to follow. The writing is straightforward and highly accessible, which enhances the overall readability. The experimental design is innovative, particularly in how they combined samples from different species into the same dataset and then used post-hoc identification to distinguish cell types by species. This dramatically controls for potential batch effects in my opinion. Additionally, the authors contextualize their findings within the framework of previously published studies on Mus musculus, providing a strong comparative analysis that enhances the significance of their work.

      (3) Weaknesses of the Paper:

      The major limitation of the study is the absence of causal experiments linking the observed changes in MPOA cell types to species-specific social behaviors. While the study provides valuable correlational data, it lacks functional experiments that would demonstrate a direct relationship between the neuronal differences and behavior. For instance, manipulating these cell types or gene expressions in vivo and observing their effects on behavior would have strengthened the conclusions, although I certainly appreciate the difficulty in this, especially in non-musculus mice. Without such experiments, the study remains speculative about how these neuronal differences contribute to the evolution of social behaviors.

    3. Reviewer #2 (Public review):

      Summary:

      The authors report several interesting species and sex differences in cell type expression that may relate to species differences in behavior. The differential cell type abundance findings build on previously observed species/sex differences in behavior and brain anatomy. These data will be a valuable resource for behavioral neuroscientists. These findings are important but the manuscript goes too far in attributing causal influences to differences in behavior. A second important problem is that dissections used for the sequencing data include other neuropeptide-rich areas of the hypothalamus like the PVN. Although histology is included, the results into the main manuscript often do not include the mPOA making it hard to know if species/sex differences are consistent across different hypothalamic regions. The manuscript would benefit from more precise language.

      Strengths:

      The data are novel because cell-type atlases are available for only a few species.

      The authors have clearly defined appropriate steps taken to obtain trustworthy estimations of cell type abundance. Furthermore, the criteria for each cell type assignment was described in a way for readers to easily replicate. The rigor in comparing cell abundance provides convincing evidence that these species have differences in MPOA cellular composition.

      The authors have a good explanation for why 19 of the 53 neuron clusters were not classified (possible Mus/Peromyscus anatomical differences, some cell types don't have well-defined transcriptional profiles)

      Validated findings with histology.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed snRNA-seq in the pre-optic area (POA), a heterogeneous brain region implicated in multiple innate behaviors, comparing two species of Peromyscus mice that possess strikingly different parenting behaviors. P. polionotus show high levels of parental care from both sexes of parent, and P. maniculatus show lower levels of care, predominantly displayed by dams rather than sires. The overall goal of understanding the genomic basis of behavioral variation is significant and of broad interest and comparative studies in POA in these two species is an excellent approach to tackle this question. The authors correctly point out that existing studies largely compare species that are highly divergent, such as mice and humans, which confounds the association of specific neuronal populations or gene expression patterns with distinct behaviors. They identify neuronal populations with differential abundance between species and sexes, and additionally report sex and species differences in gene expression within each transcriptomic cell type. Their cell type classification is aided by mapping their Peromyscus cells onto a previously existing POA single cell dataset generated in lab mice. The detection and validation of previously observed sex differences in the Gal/Moxd1 cell type, and species differences in Avp expression provides additional support that their data are robust. Importantly, the authors demonstrate reduced sexual dimorphism in the POA of P. polionotus, compared to P. maniculatus, and prior knowledge in rats and mice. This finding suggests a potential neural substrate for the increased parental behavior in P. polionotus.

      Strengths:

      This is a pioneering comparative snRNA-seq study that provides a roadmap for similar approaches in non-traditional model organisms.

      The authors have identified populations that may underlie sex- and species- differences in parenting behavior in rodents.

      A significant strength of the manuscript is the histological validation of their most robust marker genes.

      Weaknesses:

      My primary concern is that the dataset is limited: 52,121 neuronal nuclei across 24 samples, which does not provide many cells per cluster to analyze comparatively across sex and species, particularly given the heterogeneity of the large region dissected, which contains adjacent regions such as the PVN and SCN.

      There is no explanation for the finding that there is a female-bias in gene expression across all cell types in P. polionotus.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their thoughtful comments.

      Based on their suggestions we will:

      (1) Use more accurate language to describe the hypothalamus regions under investigation in this study. While we aimed to primarily investigate the medial preoptic area (MPOA), our dissections and sequencing data in fact capture several regions of the anterior hypothalamus including the anteroventral periventricular (AVPV), paraventricular (PVN), supraoptic (SON), suprachiasmatic nuclei (SCN), and more. We will revise the language in our manuscript to reflect that our study in fact investigates the cellular evolution of the anterior hypothalamus across behaviorally divergent deer mice.

      (2) Revise our language to clarify that while our study provides a rich dataset for generating hypotheses about which cell types may contribute to behavioral differences, it does not provide any evidence of causal relationships. We hope to investigate this further in future work.

      (3) Clarify specific methodological choices for which reviewers had questions, especially about the hypothalamic regions for which we did histology to validate cell abundance differences and methodological choices related to mapping our cell clusters to Mus cell types.

      Our responses to each reviewer’s specific comments are below.

      Reviewer #1:

      The major limitation of the study is the absence of causal experiments linking the observed changes in MPOA cell types to species-specific social behaviors. While the study provides valuable correlational data, it lacks functional experiments that would demonstrate a direct relationship between the neuronal differences and behavior. For instance, manipulating these cell types or gene expressions in vivo and observing their effects on behavior would have strengthened the conclusions, although I certainly appreciate the difficulty in this, especially in non-musculus mice. Without such experiments, the study remains speculative about how these neuronal differences contribute to the evolution of social behaviors.

      Yes, we agree the study lacks functional experiments. We hope that the dataset is of value for generating hypotheses about how hypothalamic neuronal cell types may govern species-specific social behaviors, and for these hypotheses to be functionally tested by us and others in future work.

      Reviewer #2:

      Some methodology could be further explained, like the decision of a 15% cutoff value for cell type assignment per cluster, or the necessity of a multi-step analysis pipeline for gene enrichment studies.

      A 15% cutoff value for cell type assignment was chosen to include all known homology correspondences between our dataset and the Mus atlas. For example, i14:Avp/Cck cells from the Mus atlas represent Avp cells from the suprachiasmatic nuclei (SCN). Though only 17.3% of cluster 15 maps to i14:Avp/Cck, we know these two clusters correspond based on the expression of Avp and additional SCN marker genes in cluster 15 (Supp Fig 6). We will further explain this cutoff in the revised manuscript.

      Our gene enrichment study includes a multi-step analysis pipeline because we wanted to control for confounders that may be introduced because of gene expression level. Genes that are more highly expressed are more accurately quantified and thus more likely to be identified as differentially expressed. Therefore, we wanted to test for gene enrichments in our set of DE genes against a background of genes with similar expression levels. We will clarify this motivation in the revised manuscript.

      The authors should exercise strong caution in making inferences about these differences being the basis of parental behavior. It is possible, given connections to relevant research, but without direct intervention, direct claims should be avoided. There should be clear distinctions of what to conclude and what to propose as possibilities for future research.

      Yes, we agree that we are unable to make direct claims about neuronal differences being the basis of parental behavior. We will revise our language to be clearer about which relationships we are hypothesizing and what we propose as possibilities for future research.

      Histology is not performed on all regions included in the sequencing analysis.

      We apologize that our language describing the hypothalamic regions included in the sequencing analysis and those included in the histology is unclear. We aimed to dissect the medial preoptic region for the sequencing analysis, but additionally captured parts of the anterior hypothalamus including the paraventricular (PVN), supraoptic (SON), and suprachiasmatic nuclei (SCN), and more.  Our histology was performed across the entire hypothalamus and includes all regions included in the sequencing data. We will revise the manuscript to more accurately describe the hypothalamic regions for which we investigated.

      Reviewer #3:

      My primary concern is that the dataset is limited: 52,121 neuronal nuclei across 24 samples, which does not provide many cells per cluster to analyze comparatively across sex and species, particularly given the heterogeneity of the region dissected. The Supplementary table reports lower UMIs/genes per cell than is typically seen as well. Perhaps additional information could be obtained from the data by not restricting the analyses to cells that can be assigned to Mus types. A direct comparison of the two Peromyscus species could be valuable as would a more complete Peromyscus POA atlas.

      Our dataset reports ~1,500 genes and ~1,000 UMIs per nuclei which is indeed lower than is typically reported in other single nuclei datasets. Some of this discrepancy is due to a lower quality genome and annotated transcriptome available for Peromyscus compared to Mus musculus, which results in a lower mapping rate than is typically reported in Mus studies. However, our dataset was sufficient to identify known peptidergic cell types (Supp Fig 6) and to map homology to Mus cell types for 34 (64%) of our 53 clusters. Additionally, although some of our clusters contain small numbers of cells, our differential abundance analysis accounts for the variance in cell numbers observed across samples and should be robust against any increase in variance due to small numbers. In fact, even differential abundance of very small cell clusters such as oxytocin neurons (cell type 40) was validated by histology.

      We would like to clarify that all analyses were performed on all cell clusters, regardless of whether or not they could be assigned homology to a Mus cell type. All the cell types that we identified as differentially abundant or contained significant sex differences happened to be cell types for which homology to a Mus cell type could be defined. This may arise for a relatively uninteresting reason: cell types that have more distinct transcriptional signatures will be more accurately clustered, leading to more accurate identification of homology as well as more accurate measurements of differential abundance / expression. We will revise language to make this more clear in our manuscript.

      In Supplement 7, it appears that most neurons can be assigned as excitatory or inhibitory, but then so many of these cells remain in the unassigned "gray blob" seen in panel 1E. Clustering of excitatory and inhibitory neurons separately, as in prior cited work in Mus POA (refs 31 and 57) may boost statistical power to detect sex and species differences in cell types. Perhaps the cells that cannot be assigned to Mus contain too few reads to be useful, in which case they should be filtered out in the QC. The technical challenges of a comparative single-cell approach are considerable, so it benefits the scientific community to provide transparency about them.

      We are not certain about why we are unable to cluster and assign homology to many of our cells (i.e. cells in the unassigned “gray blob”). However, we note that even in the Mus atlas, many cells did not belong to obvious clusters by UMAP visualization and that several clusters lacked notable marker genes and were designated simply as “Gaba” and “Glut” clusters. Therefore, it is unsurprising that our own dataset also contains cells that lack the transcriptional signatures needed to be clustered and/or mapped to Mus cell types. We do know, however, that the median number of reads/nuclei is uniform across cell clusters and does not explain why some clusters could not be assigned to Mus. We will add this information to our revised manuscript.

      We do not think that a two-stage clustering (i.e. clustering first by excitatory vs. inhibitory neurons) is expected to gain power to resolve cell types in this case. Excitatory vs. inhibitory neurons are clearly separable on our UMAP (Supp Fig 7) so that information is already being used by our clustering procedure. However, we will explore this further in our revised manuscript to see if doing so will boost statistical power.

      The Calb1 dimorphism as observed by immunostaining, appears much more extensive in P. maniculatus compared to P. polionotus (Figures 3 E and F). This finding is not reflected in the counts of the i20:Gal/Moxd1 cluster. The use of Calb1 staining as a proxy for the Gal/Moxd1 cluster would be strengthened if the number of POA Calb1+ neurons that are found in each cluster was apparent. There may be additional Calb+ neurons in the cells that are not annotated to a Mus cluster. This clarification would add support to the overall conclusion that there is reduced sexual dimorphism in P. polionotus.

      From the Mus MPOA atlas (which includes both single-cell sequencing data and imaging-based spatial information), it is known that the i20:Gal/Moxd1 cluster comprises sexually dimorphic cells that make up both the BNST and the SDN-POA. These sexually dimorphic cells are well-studied and known to be marked by Calb1, which we used in immunostaining as a proxy for i20:Gal/Moxd1.

      However, we would like to clarify that in our study, the immunostaining of Calb1+ neurons and the sequencing counts of the i20:Gal/Moxd1 cluster are not completely reflective of each other because our sequencing dataset only captured the ventral portion of the BNST. Therefore our i20:Gal/Moxd1 counts contain a combination of some Calb1+ BNST cells and likely all Calb1+ SDN-POA cells and is difficult to interpret on its own. Our histology, however, covers the entire hypothalamus and is more reliable for identifying sex and species differences in each region. We will clarify this in the revised manuscript.

      The relationship between the sex steroid receptor expression and the sex bias in gene expression would be improved if the sex bias in sex steroid receptor expression was included in Supplementary Figure 10.

      We will include this in the revised manuscript.

      There is no explanation for the finding that there is a female bias in gene expression across all cell types in P. polionotus.

      We also find this observation interesting but don’t have a good explanation for why at this point. We plan to follow this up in future work.

    1. Author Response:

      We appreciate the reviewers' detailed feedback, which has highlighted several areas where our study could be strengthened. Although we acknowledge the relatively limited scope of our CRISPR-based gene-deletion screen, we successfully demonstrated the immunogenic role of Pccb in our syngenetic pancreatic cancer mouse model. Specifically, loss of PCCB in our mutant KRAS/p53 PIK3CA-null (αKO) cells blocked host T cell killing of tumor cells.

      Furthermore, blocking the PD1/PD-L1 interaction reverses this anti-tumor immunogenic effect. We agree with the reviewers regarding the limitations of our study, such as the sample size in our scTCR sequencing and the lack of direct cytotoxicity assays to confirm tumor-specific T cell clones. However, our results are consistent across multiple experimental approaches that strongly suggest meaningful differences in host T cell response to the three implanted tumor types, KPC, αKO and p-αKO. We agree that future mechanistic studies will be important to determine how PCCB is involved in this immunogenic response. We also agree with the reviewers that future additional studies with other KPC cell lines will strength our conclusion regarding PCCB. Finally, we acknowledge the inherent limitations of IHC techniques to assess the involvement of other T cell checkpoints that might also be involved in this anti-tumor immunogenic effect. In summary, despite these limitations, our findings provide novel insight into the role of PCCB in pancreatic tumor immunogenicity and contribute to the ongoing discussion of how to improve therapeutic strategies for this deadly cancer.

      Reviewer 1:

      Weaknesses:

      (1) Clonal expansion of cytotoxic T cells infiltrating the pancreatic αKO tumors

      a. Only two tumor-bearing hosts were evaluated by single-cell TCR sequencing, thus limiting conclusions that may be drawn regarding repertoire diversity and expansion.

      We agree with the reviewer that possible repertoire diversity and expansion could be observed by sequencing more tumor-bearing hosts. However, our current data reveal a marked consistency in the transcriptional expression within the two tumors analyzed per group. Importantly, these features are significantly divergent between the αKO and p-αKO groups. While recognizing the limited sample size, the observed within-group consistency and the clear distinction between groups strongly support the validity of the reported trends.

      b. High abundance clones in the TME do not necessarily have tumor specificity, nor are they necessarily clonally expanded. They may be clones which are tissue-resident or highly chemokine-responsive and accumulate in larger numbers independent of clonal expansion. Please consider softening language to clonal enrichment or refer to clone size as clonal abundance throughout the paper.

      We agree with the reviewer that it’s possible that the high abundance clones are not necessarily tumor specific. Our previous work (N. Sivaram 2019) demonstrated the critical role of increased pancreatic CD8+ T cells in αKO tumor regression within B6 mice. Therefore, antigen specific CD8+ T cell clonal expansion within the pancreas is an anticipated observation. However, as the reviewer pointed out, a portion of this expansion may be attributable to factors independent of tumor antigens. While the low T cell infiltration observed in KPC-implanted mice argues against a purely tissue-resident explanation, further investigation is required to definitively establish the tumor specificity of individual clones. We have revised the manuscript to reflect this nuance, replacing "clonal expansion" with "clonal enrichment".

      c. The whole story would be greatly strengthened by cytotoxicity assays of abundant TCR clones to show tumor antigen specificity.

      As mentioned above, we agree with the reviewer that future studies are needed to investigate each of the specific clones. Due to the extended timeframe required, it’s beyond the scope of the present study.

      (2) A genome-wide CRISPR gene-deletion screen to identify molecules contributing to Pik3camediated pancreatic tumor immune evasion"

      a. CRISPR mutagenesis yielded outgrowth of only 2/8 tumors. A more complete screen with an increased total number of tumors would yield much stronger gene candidates with better statistical power. It is unsurprising that candidates were observed in only one of the two tumors. Nevertheless, the authors moved forward successfully with Pccb.

      We agree that by including more mice in the CRISPR screen, it’s possible that we could have identified more candidates. Regardless, we have successfully demonstrated PCCB’s role in pancreatic tumorgenicity with our mouse model.

      (3) T cells infiltrate p-αKO tumors with increased expression of immune checkpoint

      *a. In Figure 4D, cell counts are not normalized to totalCD8+ T cell counts making it difficult to directly compare aKO to p-aKO tumors. Based on quantifications from Figure 4D, I suspect normalization will strengthen the conclusion that CD8+ infiltrate is more exhausted in p-aKO tumors. *

      Due to the use of distinct tumor sections for quantifying CD8+ cells and T cell checkpoint inhibitory receptor expression, direct normalization of these counts is challenging. However, we observed comparable CD8+ cell numbers between αKO and p-αKO tumors, with p-αKO tumors exhibiting nearly double the expression of immune checkpoint receptors. Therefore, even accounting for potential normalization discrepancies, we anticipate that p-αKO tumors would still demonstrate a significantly higher percentage of immune checkpoint receptorpositive cells compared to αKO tumors.

      b. Flow cytometric analysis to further characterize the myeloid compartment is incomplete (single replicate) and does not strengthen the argument that p-aKO TME is more immunosuppressive. It could, however, strengthen the argument that TIL has less anti-tumor potential if effector molecule expression in CD8+ infiltrating cells were quantified.

      We agree that including more tumor samples will strengthen the argument that p-αKO TME is more immunosuppressive. Future studies need to be done to characterize CD8+ T cells.

      (4) Inhibition of PD1/PD-L1 checkpoint leads to elimination of most p-αKO tumors

      a. It is reasonable to conclude that p-aKO tumors are responsive to immune checkpoint blockade. However, there is no data presented to support the statement that checkpoint blockade reactivates an existing anti-tumor CD8+ T cell response and does not induce a de novo response

      We agree that future studies exploring the clonotypes of T cells infiltrating tumors in PD-1treated mice are necessary to determine whether observed T cell response represents reactivation of existing clones, a de novo response, or a combination of both.

      b. The discussion of these data implies that anti-PD-1 would not improve aKO tumor control, but these data are not included. As such, it is difficult to compare the therapeutic response in aKO versus p-aKO. Further, these data are at best an indirect comparison of the T cell responsiveness against tumor, as the only direct comparison is infiltrating cell count in Figure 4 and there are no public TCR clones with confirmed anti-tumor specificity to follow in the aKO versus p-aKO response.

      Since αKO tumors completely regress with 100% animal survival, we deemed anti-PD1 treatment in this group unnecessary. While we did assess anti-PD1 treatment in KPCimplanted mice, no survival benefit was observed (data not shown). The p-αKO tumor model was the only one in which anti-PD1 treatment improved survival. The complexity of the in vivo tumor microenvironment likely contributes to the lack of shared TCR clones between αKO and p-αKO tumors, even within the same tumor group. Future studies aimed at identifying tumorspecific clones may involve transferring in vivo models to in vitro assays or the generation of novel mouse strains expressing identified TCRs. However, these approaches require substantial time and resources and are beyond the scope of the present study.

      Reviewer 2:

      Weaknesses:

      (1) A major issue is that it seems these data are based on the use of a single tumor cell clone with PIK3CA deleted. Therefore, there could be other changes in this clone in addition to the deletion of PIK3CA that could contribute to the phenotype.

      We have previously tested a different KPC cell line (DT10022) with genetically downregulated PIK3CA and found mice implanted with αKO cells also showed tumor regression. However, we have not tested if deletion of Pccb in the DT10022-aKO cell line will have the same effect.

      2) The conclusion that the change in the PCCB-deficient tumor cell line is unrelated to mitochondrial metabolic changes may be incorrect based on the data provided. While it is true that in the experiments performed, there was no statistically significant change in the oxygen consumption rate or metabolite levels, this could be due to experimental error. There is a trend in the OCR being higher in the PCCB-deficient cells, although due to a high standard deviation, the change is not statistically significant. There is also a trend for there being more aKG in this cell line, but because there were only 3 samples per cell line, there is no statistically significant difference.

      Although PCCB is known to cause metabolic changes, in the context of this study, we are comparing PCCB-deficient to PCCB & PIK3CA double-deficient cells. We did not address if PCCB loss alone would cause metabolic alteration. We suspect that is the case.

      (3) More data are required to make the authors' conclusion that there are myeloid changes in the PCCB-deficient tumor cells. There is only flow data from shown from one tumor of each type.

      We agree that including more tumor samples will strengthen the argument that p-αKO TME is more immunosuppressive.

      (4) The previous published study demonstrated increased MHC and CD80 expression in the PIK3CA-deficient tumors and these differences were suggested to be the reason the tumors were rejected. However, no data concerning the levels of these proteins were provided in the current manuscript.

      Our previous hypothesis for altered MHC and CD80 levels is based on the observation that there is a dramatic increase in the number of infiltrating T cells upon Pik3ca deletion. In this study, similar levels of infiltrating T cells were observed when Pccb was deleted in αKO cells, therefore we do not expect any changes in MHC and CD80 levels since these tumors appears to be still recognized by the T cells. Indeed, we are able detect clonal enrichment in p-αKO tumors.

      Reviewer 3:

      Weaknesses:

      The IHC technique that was used to stain and characterize the exhaustion status of the tumorinfiltrating T cells.

      We agree with the reviewer that incorporating multi-color IHC or flow cytometry to characterize the exhaustion status of specific T cell subtypes would provide more comprehensive information. Unfortunately, we do not have the resources to perform these studies currently.

    1. eLife Assessment

      This is a valuable study, tackling the long-standing issue of the difficulty in imaging the inferior olive and addressing the most relevant questions with a rigorous approach. The technological advance allowed the authors to generate solid experimental evidence with high-quality data. The results are presented clearly and the analyses are rigorous.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Guo and Uusisaari describes a series of experiments that employ a novel approach to address long-standing questions on the inferior olive in general and the role of the nucleo-olivary projection specifically. For the first time, they optimized the ventral approach to the inferior olive to facilitate imaging in this area that is notoriously difficult to reach. Using this approach, they are able to compare activity in two olivary regions, the PO and DAO, during different types of stimulation. They demonstrate the difference between the two regions, linked to Aldoc-identities of downstream Purkinje cells, and that there is co-activation resulting in larger events when they are clustered. Periocular stimulation also drives larger events, related to co-activation. Using optogenetic stimulation they activate the nucleo-olivary (N-O) tract and observe a wide range of responses, from excitation to inhibition. Zooming in on inhibition they test the assumption that N-O activation can be responsible for suppression of sensory-evoked events. Instead, they suggest that the N-O input can function to suppress background activity while preserving the sensory-driven responses.

      Strengths:

      This is an important study, tackling the long-standing issue of the impossibility to do imaging in the inferior olive and using that novel method to address the most relevant questions. The experiments are technically very challenging, the results are presented clearly and the analysis is quite rigorous. There is quite a lot of room for interpretation, see weaknesses, but the authors make an effort to cover many options.

      Weaknesses:

      The heavy anesthesia that is required during the experiment could severely impact the findings. Because of the anesthesia, the firing rate of IO neurons is found to be ~0.1 Hz, significantly lower than the 1 Hz found in non-anesthetized mice. This is mentioned and discussed, but what the consequences could be cannot be understated and should be addressed more. Although the methods and results are described in sufficient detail, there are a few points that, when addressed, would improve the manuscript.

    3. Reviewer #2 (Public review):

      The authors developed a strategy to image inferior olive somata via viral GCaMP6s expression, an implanted GRIN lens, and a one-photon head-mounted microscope, providing the first in vivo somatic recordings from these neurons. The main new findings relate to the activation of the nucleoolivary pathway, specifically that: this manipulation does not produce a spiking rebound in the IO; it exerts a larger effect on spontaneous IO spiking than stimulus (airpuff)-evoked spiking. In addition, several findings previously demonstrated in vivo in Purkinje cell complex spikes or inferior olivary axons are confirmed here in olivary somata: differences in event sizes from single cells versus co-activated cells; reduced coactivation when activating the NO pathway; more coactivation within a single zebrin compartment.

      The study presents some interesting findings, and for the most part, the analyses are appropriate. My two principal critiques are that the study does not acknowledge major technical limitations and their impact on the claims; and the study does not accurately represent prior work with respect to the current findings.

      Several significant technical limitations necessarily impact the veracity of several of the claims:

      (1) The authors use GCaMP6s, which has a tau_1/2 of >1 s for a normal spike, and probably closer to 2 s (10.1038/nature12354) for the unique and long type of olivary spikes that give rise to axonal bursts (10.1016/j.neuron.2009.03.023). Indeed, the authors demonstrate as much (Fig. 2B1). This affects at least several claims:

      a. The authors report spontaneous spike rates of 0.1 Hz. They attribute this to anesthesia, yet other studies under anesthesia recording Purkinje complex spikes via either imaging or electrophysiology report spike rates as high as 1.5 Hz (10.1523/JNEUROSCI.2525-10.2011). This discrepancy is not acknowledged and a plausible explanation is not given. Citations are not provided that demonstrate such low anesthetized spike rates, nor are citations provided for the claim that spike rates drop increasingly with increasing levels of anesthesia when compared to awake resting conditions. More likely, this discrepancy reflects spikes that are missed due to a combination of the indicator kinetics and low imaging sensitivity (see (2)), neither of which are presented as possible plausible alternative explanations.

      b. Many claims are made throughout about co-activation ("clustering"), but with the GCaMP6s rise time to peak (0.5 s), there is little technical possibility to resolve co-activation. This limitation is not acknowledged as a caveat and the implications for the claims are not engaged with in the text.

      c. The study reports an ultralong "refractory period" (L422-etc) in the IO, but this again must be tempered by the possibility that spikes are simply being missed due to very slow indicator kinetics and limited sensitivity. Indeed, the headline numeric estimate of 1.5 s (L445) is suspiciously close to the underlying indicator kinetic limitation of ~1-2 s.

      (2) The study uses endoscopic one-photon miniaturized microscope imaging. Realistically, this is expected to permit an axial point spread function (z-PSF) on the order of ~40um, which must substantially reduce resolution and sensitivity. This means that if there *is* local coactivation, the data in this study will very likely have individual ROIs that integrate signals from multiple neighboring cells. The study reports relationships between event magnitude and clustering, etc; but a fluorescence signal that contains photons contributed by multiple neighboring neurons will be larger than a single neuron, regardless of the underlying physiology - the text does not acknowledge this possibility or limitation.

      Second, the text makes several claims for the first multicellular in vivo olivary recordings. (L11; L324, etc). I am aware of at least two studies that have recorded populations of single olivary axons using two-photon Ca2+ imaging up to 6 years ago (10.1016/j.neuron.2019.03.010; 10.7554/eLife.61593). This technique is not acknowledged or discussed, and one of these studies is not cited. No argument is presented for why axonal imaging should not "count" as multicellular in vivo olivary recording: axonal Ca2+ reflects somatic spiking.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript by Guo and Uusisaari describes a series of experiments that employ a novel approach to address long-standing questions on the inferior olive in general and the role of the nucleoolivary projection specifically. For the first time, they optimized the ventral approach to the inferior olive to facilitate imaging in this area that is notoriously difficult to reach. Using this approach, they are able to compare activity in two olivary regions, the PO and DAO, during different types of stimulation. They demonstrate the difference between the two regions, linked to Aldoc-identities of downstream Purkinje cells, and that there is co-activation resulting in larger events when they are clustered. Periocular stimulation also drives larger events, related to co-activation. Using optogenetic stimulation they activate the nucleoolivary (N-O) tract and observe a wide range of responses, from excitation to inhibition. Zooming in on inhibition they test the assumption that N-O activation can be responsible for suppression of sensoryevoked events. Instead, they suggest that the N-O input can function to suppress background activity while preserving the sensory-driven responses.

      Strengths:

      This is an important study, tackling the long-standing issue of the impossibility to do imaging in the inferior olive and using that novel method to address the most relevant questions. The experiments are technically very challenging, the results are presented clearly and the analysis is quite rigorous. There is quite a lot of room for interpretation, see weaknesses, but the authors make an effort to cover many options.

      Weaknesses:

      The heavy anesthesia that is required during the experiment could severely impact the findings. Because of the anesthesia, the firing rate of IO neurons is found to be 0.1 Hz, significantly lower than the 1 Hz found in non-anesthetized mice. This is mentioned and discussed, but what the consequences could be cannot be understated and should be addressed more. Although the methods and results are described in sufficient detail, there are a few points that, when addressed, would improve the manuscript.

      We sincerely thank the reviewer for their encouraging comments and recognition of our study’s significance. We fully acknowledge the confounding effects of the deep anesthesia used in our experiments, which was necessary to ensure the animals’ welfare while establishing this technically demanding methodology. We elaborate on these effects below and will further clarify them in the revised manuscript.

      Ultimately, the full resolution of this issue will require recordings in awake animals, as we consider our approach an advancement from acute slice preparations but not yet a complete representation of in vivo IO function. However, key findings from our study—such as amplitude modulation with co-activation and the potential role of IO refractoriness in complex spike generation—could be further explored in existing cerebellar cortical recordings from awake, behaving animals. We hope our work will motivate re-examination of such datasets to assess whether these mechanisms contribute to overall cerebellar function.

      Reviewer #1 (Recommendations for the authors):

      On page 10 the authors indicate that 2084 events were included for DAO and 1176 for PO. Is that the total number of events? What was the average and the range per neuron and the average recording duration?

      Thank you for pointing out lack of clarity. The sentence should say "in total, 2084 and 1176 detected events from DAO and PO were included in the study". We will add the averages and ranges of events detected per neuron in different categories, as well as the durations of the recordings (ranging from 120s to 270s) to the tables.

      On page 10 it is also stated that: "events in PO reached larger values than those in DAO even though the average values did not differ". Please clarify that statement. Which parameter + p-value in the table indicates this difference?

      Apologies for omission. Currently the observation is only visible in the longer tail to the right in the PO data in Figure 2B2. We will add the range of values (3.0-75.2 vs 3.1-39.6 for PO and DAO amplitudes, respectively) in text and the tables in the revision.

      Abbreviating airpuff to AP is confusing, I would suggest not abbreviating it.

      Understood. We will change AP to airpuff in the text. In figure labels, at least in some panels, the abbreviation will be necessary due to space constraints.

      What type of pulse was used to drive ChrimsonR? Could it be that the pulse caused a rebound-like phenomenon with the pulse duration that drove the excitation?

      As described on line 229 and in the Methods, we used 5-second trains of 5-ms LED light pulses. Importantly, these stimulation parameters were informed by our extensive in vitro examination of various stimulation patterns (Lefler et al., 2014), which consistently produced stable postsynaptic responses without inducing depolarization or rebound effects. Additionally, Loyola et al. (2024) reported no evidence of rebound activity in IO cells following optogenetic activation of N-O axons in the absence of direct neuronal depolarization. We will incorporate these considerations into the discussion, while also acknowledging that unequivocal confirmation of “direct” rebound excitation would require intracellular recordings, such as patch clamp experiments.

      The authors indicate that the excitatory activity was indistinguishable in shape from other calcium activity, but can anything be said about the timing (the scale bar in Figure 4A2 has no value, is it the same 2s pulse)?

      Apologies for oversight in labeling the scale bar in Figure 4A2 (it is 2s). While we deliberately refrain from making strong claims regarding the origin of the NO-evoked spikes, their timing can be examined in more detail in Figure 4 - Supplement 1, panels C and D. We will make sure this is clearly stated in the revised text.

      Did the authors check for accidental sparse transfection with ChrimsonR of olivary neurons in the post-mortem analysis?

      Good point! However, we have never seen this AAV9-based viral construct to drive trans-synaptic expression in the IO, nor is this version of AAV known to have the capacity for transsynaptic expression in general.

      No sign of retrograde labeling (via the CF collaterals in the cerebellar nuclei) was seen either. Notably, the hSyn promoter used to drive ChrimsonR expression is extremely ineffective in the IO. Thus, we doubt that such accidental labeling could underlie the excitatory events seen upon N-O stimulation. We will add these mentions with relevant references to the discussion of the revised manuscript.

      On page 18 the authors state that: "The lower SS rate was attributed to intrinsic factors of PNs, while the reduced frequency of CSs was speculated to result from increased inhibition of the IO via the nucleo-olivary (N-O) pathway targeting the same microzone." I think I understand what you mean to say, but this is a bit confusing.

      Agreed. We will rephrase this sentence to clarify that a lower SS rate in a given microzone may lead to increased activation of inhibitory N-O axons that target the region of IO that sends CF to the same microzone.

      Is airpuff stimulation not more likely to activate PO dan DAO because of the related modalities (more face vs. more trunk/limbs?), and thereby also more likely to drive event co-activation (as it is stated in the abstract).

      We agree that the specific innervation patterns of different IO regions likely explain the discrepancy between previous reports of airpuff-evoked complex spikes in cerebellar cortical regions targeted by DAO and the absence of airpuff responses in the particular region of DAO accessible via our surgical approach. As in the present dataset virtually no airpuff-evoked events were seen in DAO regions, we are unable to directly compare airpuff-evoked event co-activation between PO and DAO. The higher co-activation for PO was observed for "spontaneous" activity.

      The Discussion addresses the question of why N-O pathway activation does not remove the airpuff response.

      Given the potentially profound effect, I would propose to expand the discussion on the role of aneasthesia, including longer refractory periods but also potential disruption of normal network interactions (even though individually the stimulations work). Briefly indicating what is known about alpha-chloralose would help interpret the results as well.

      We fully agree that the anesthetic state introduces confounding factors that must be considered when interpreting our results. We will expand the discussion to address how anesthesia, particularly alphachloralose as well as tissue cooling, may contribute to prolonged refractory periods and potential disruptions in normal network interactions. However, we recognize that certain aspects cannot be fully resolved without recordings in awake animals. For this reason, we characterize our preparation as an "upgraded" in vitro approach rather than a fully representative in vivo model.

      Please clearly indicate that the age range of P35-45 is for the moment of virus injection and specify the age range for the imaging experiment.

      Apologies for the oversight. We will indicate these age ranges in the results (as they are currently only specified in Methods). The P35-45 range refers to moment of virus injection.

      The methods indicate that a low-pass filter of 1Hz was used. I am sure this helps with smoothing, but does it not remove a lot of potentially interesting information. How would a higher low-pass filter affect the analysis and results?

      We acknowledge that applying a 1 Hz low-pass filter inevitably removes high-frequency components, including potential IO oscillations and fine details such as spike "doublets." However, given the temporal resolution constraints of our recording approach, we prioritized capturing robust, interpretable events over attempting to extract finer features that might be obscured by both the indicator kinetics and imaging speed.

      While a higher cut-off frequency could, in principle, allow more precise measurement of rise times and peak timings, it would also amplify high-frequency noise, complicating automated event detection and reducing confidence in distinguishing genuine neural signals from artifacts. Given these trade-offs, we opted for a conservative filtering approach to ensure stable event detection. Future work, particularly with faster imaging rates and improved sensors (GCaMP8s) will be used to explore the finer temporal structure of IO activity. We will deliberate on these matters more extensively in the revised discussion.

      Reviewer #2 (Public review):

      The authors developed a strategy to image inferior olive somata via viral GCaMP6s expression, an implanted GRIN lens, and a one-photon head-mounted microscope, providing the first in vivo somatic recordings from these neurons. The main new findings relate to the activation of the nucleoolivary pathway, specifically that: this manipulation does not produce a spiking rebound in the IO; it exerts a larger effect on spontaneous IO spiking than stimulus (airpuff)-evoked spiking. In addition, several findings previously demonstrated in vivo in Purkinje cell complex spikes or inferior olivary axons are confirmed here in olivary somata: differences in event sizes from single cells versus co-activated cells; reduced coactivation when activating the NO pathway; more coactivation within a single zebrin compartment.

      The study presents some interesting findings, and for the most part, the analyses are appropriate. My two principal critiques are that the study does not acknowledge major technical limitations and their impact on the claims; and the study does not accurately represent prior work with respect to the current findings.

      We thank the reviewer for recognising the value of the findings in our "reduced" in vivo preparation, and apologize for omissions in the work that led to critique. We will elaborate on these matters below and prepare a revised manuscript.

      The authors use GCaMP6s, which has a tau1/2 of >1 s for a normal spike, and probably closer to 2 s (10.1038/nature12354) for the unique and long type of olivary spikes that give rise to axonal bursts (10.1016/j.neuron.2009.03.023). Indeed, the authors demonstrate as much (Fig. 2B1). This affects at least several claims:

      a. The authors report spontaneous spike rates of 0.1 Hz. They attribute this to anesthesia, yet other studies under anesthesia recording Purkinje complex spikes via either imaging or electrophysiology report spike rates as high as 1.5 Hz (10.1523/JNEUROSCI.2525-10.2011). This discrepancy is not acknowledged and a plausible explanation is not given. Citations are not provided that demonstrate such low anesthetized spike rates, nor are citations provided for the claim that spike rates drop increasingly with increasing levels of anesthesia when compared to awake resting conditions.

      We fully acknowledge that anesthesia is a major confounding factor in our study. Given the unusually invasive nature of our surgical preparation, we prioritized deep anesthesia to ensure the animals’ welfare. This, along with potential cooling effects from tissue removal and GRIN lens contact, likely contributed to the observed suppression of IO activity.

      We recognize that reported complex spike rates under anesthesia vary considerably across studies, and we will expand our discussion to provide a more comprehensive comparison with prior literature. Notably, different anesthetic protocols, levels of anesthesia, and recording methodologies can lead to widely different estimates of firing rates. While we cannot resolve this issue without recordings in awake animals, we will clarify that our observed rates likely reflect both the effects of anesthesia and specific methodological constraints. We will also incorporate additional references to studies examining cerebellar activity under different anesthetic conditions.

      More likely, this discrepancy reflects spikes that are missed due to a combination of the indicator kinetics and low imaging sensitivity (see (2)), neither of which are presented as possible plausible alternative explanations.

      We acknowledge that the combination of slow indicator kinetics and limited optical power in our miniature microscope setup constrains the temporal resolution of our recordings. However, we are confident that we can reliably detect events occurring at intervals of 1 second or longer. This confidence is based on data from another preparation using the same viral vector and optical system, where we observed spike rates an order of magnitude higher.

      That said, we do not make claims regarding the presence or absence of somatic events occurring at very short intervals (e.g., 100-ms "doublets," as described by Titley et al., 2019), as these would likely fall below our temporal resolution. We will clarify this limitation in the revised manuscript to ensure that the constraints of our approach are fully acknowledged.

      While GCaMP6s is not as sensitive as more recent variants (Zhang et al., 2023, PMID 36922596), our previous work (Dorgans et al., 2022) demonstrated that its dynamic range and sensitivity are sufficient to detect both spikes and subthreshold activity in vitro. Although the experimental conditions differ in the current miniscope experiments, we took measures to optimize signal quality, including excluding recordings with a low signal-to-noise ratio (see Methods). This need for high signal fidelity also informed our decision to limit the sampling rate to 20 fps. In future work, we plan to adopt newer GCaMP variants that were not available at the start of this project, which should further improve sensitivity and temporal resolution.

      Many claims are made throughout about co-activation ("clustering"), but with the GCaMP6s rise time to peak (0.5 s), there is little technical possibility to resolve co-activation. This limitation is not acknowledged as a caveat and the implications for the claims are not engaged with in the text.

      As noted in the manuscript (L492-), "interpreting fluorescence signals relative to underlying voltage changes is challenging, particularly in IO neurons with unusual calcium dynamics." We acknowledge that the slow rise time of GCaMP6s ( 0.5 s) limits our ability to precisely resolve the timing of co-activation at very short intervals. However, given the relatively slow timescales of IO event clustering and the inherent synchrony in olivary network dynamics, we believe that the observed co-activation patterns remain meaningful, even if finer temporal details cannot be fully resolved.

      To ensure clarity, we will expand this section to explicitly acknowledge the temporal resolution limitations of our approach and discuss their implications for interpreting co-activation. While the precise timing of individual spikes within a cluster may not be resolvable, the observed increase in event magnitude with coarse co-activation suggests that clustering effects remain functionally relevant even when exact spike synchrony is not detectable at millisecond resolution.

      This finding is consistent with the idea that co-activation enhances calcium influx, leading to larger amplitude events — a relationship that does not require perfect temporal resolution to be observed. The fact that this effect persists across a broad range of clustering windows (as shown in Figure 2 Supplement 2) further supports its robustness. While we cannot make strong claims about precise spike timing within these clusters nor about the mechanism underlying enhanced calcium signal, our results demonstrate that co-activation may influence IO activity in a quantifiable way. We will clarify these points in the revised manuscript to ensure that our findings are appropriately framed given the temporal constraints of our imaging approach.

      The study reports an ultralong "refractory period" (L422-etc) in the IO, but this again must be tempered by the possibility that spikes are simply being missed due to very slow indicator kinetics and limited sensitivity. Indeed, the headline numeric estimate of 1.5 s (L445) is suspiciously close to the underlying indicator kinetic limitation of 1-2 s.

      Our findings suggest a potential refractory period limiting the frequency of events in the inferior olive under our recording conditions. This interpretation is supported by the observed inter-event interval distribution, the inability of N-O stimulation to suppress airpuff-evoked events, and lower bounds reported in earlier literature on complex spike intervals recorded in awake animals under various behavioral contexts. Taking into account the likely cooling of tissue, a refractory period of 1.5s is not unreasonable. Of course, we recognize that the slow decay kinetics of GCaMP6s may cause overlapping fluorescence signals, potentially obscuring closely spaced events. This is in line with data presented in the Chen et al 2013 manuscript describing GCaMp6s (PMID: 36922596; Figure 3b showing events detected with intervals less than 500 ms).

      The consideration of refractoriness only arose late in the project while we were investigating the explanations for lack of inhibition of airpuff-evoked spikes. Future experiments, particularly in awake animals, will be instrumental in validating this interpretation. To ensure that the refractory period is understood as one possible mechanism rather than a definitive explanation, we will rephrase the discussion to clarify that while our data are compatible with a refractory period, they do not establish it conclusively.

      The study uses endoscopic one-photon miniaturized microscope imaging. Realistically, this is expected to permit an axial point spread function (z-PSF) on the order of 40um, which must substantially reduce resolution and sensitivity. This means that if there *is* local coactivation, the data in this study will very likely have individual ROIs that integrate signals from multiple neighboring cells. The study reports relationships between event magnitude and clustering, etc; but a fluorescence signal that contains photons contributed by multiple neighboring neurons will be larger than a single neuron, regardless of the underlying physiology - the text does not acknowledge this possibility or limitation.

      We acknowledge that the use of one-photon endoscopic imaging imposes limitations on axial resolution, potentially leading to signal contributions from neighboring neurons. To mitigate this, we applied CNMFe processing, which allows for the deconvolution of overlapping signals and the differentiation of multiple neuronal sources within shared pixels. However, as the reviewer points out, if two neurons are perfectly overlapping in space, they may be treated as a single unit.

      To clarify this limitation, we will expand the discussion to explicitly acknowledge the impact of one-photon imaging on signal separation and to emphasize that, while CNMFe helps resolve some overlaps, perfect separation is not always possible. As already noted in the manuscript (L495-), "the absence of optical sectioning in the whole-field imaging method can lead to confounding artifacts in densely labeled structures such as the IO’s tortuous neuropil." We will further elaborate on how this factor was considered in our analysis and interpretation.

      Second, the text makes several claims for the first multicellular in vivo olivary recordings. (L11; L324, etc).

      I am aware of at least two studies that have recorded populations of single olivary axons using two-photon Ca2+ imaging up to 6 years ago (10.1016/j.neuron.2019.03.010; 10.7554/eLife.61593). This technique is not acknowledged or discussed, and one of these studies is not cited. No argument is presented for why axonal imaging should not "count" as multicellular in vivo olivary recording: axonal Ca2+ reflects somatic spiking.

      We appreciate the reviewer’s point and acknowledge the important prior work using two-photon imaging to record olivary axonal activity in the cerebellar cortex. However, while axonal calcium signals do reflect somatic spiking, these recordings inherently lack information about the local network interactions within the inferior olive itself.

      A key motivation for our study was to observe neuronal activity within the IO at the level of its gap-junctioncoupled local circuits, rather than at the level of its divergent axonal outputs. The fan-like spread of climbing fibers across rostrocaudal microzones in the cerebellar cortex makes them relatively easy to record in vivo, but it also means that individual imaging fields contain axons from neurons that may be distributed across different IO microdomains. As a result, while previous work has provided valuable insight into olivary output patterns, it has not allowed for the examination of coordinated somatic activity within localized IO neuron clusters.

      With apologies, we recognize that this distinction was not sufficiently emphasized in our introduction. We will clarify this key point and ensure that the important climbing fiber imaging studies are properly cited and contextualized in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      The authors state: "we found no reports that examined coactivation levels between Z+ and Z- microzones in cerebellar complex spike recordings" (L359). Multiple papers (that are not cited) using AldolaceC-tdTomato mice with two photon Purkinje dendritic calcium imaging showed synchronization (at similar levels) within but not across z+/z- bands. (2015 10.1523/JNEUROSCI.2170-14.2015, 2023 https://doi.org/10.7554/eLife.86340).

      We apologize for the misleading phrasing. We will rephrase this statement to: "While complex spike coactivation within individual zebrin zones has been extensively studied (references), we found no reports directly comparing the levels of intra-zone co-activation between Z+ and Z microzones."

      Additionally, we will ensure that the relevant studies demonstrating synchronization within zebrin zones, as well as (lack of) interactions between neighboring zones, are properly cited and discussed in the revised manuscript.

      The figures could use more proofreading, and several decisions should be reconsidered:

      Normalizing the amplitude to maximum is not a good strategy, as it can overemphasize noise or extremely small-magnitude signals, and should instead follow standard convention and present in fixed units (3A2, 4B2, and even 2C).

      As noted earlier, we have excluded recordings and cells with high noise or a low signal-to-noise ratio for event amplitudes, ensuring that such data do not influence the color-coded panels. Importantly, all quantitative analyses and traces presented in the manuscript are normalized to baseline noise level, not to maximal amplitude, ensuring that noise or low-magnitude signals do not skew the analysis.

      The decision to use max-amplitude normalization in color-coded panels was made specifically to aid visualization of temporal structure across recordings. This approach allows for clearer comparisons without the distraction of inter-cell variability in absolute signal strength. However, we recognize the potential for confusion and will revise the Results text to explicitly clarify that the color-coded visualizations use a different scaling method than the quantitative analyses.

      x axes with no units: Figures 2B2, 2E1, 3B2, 3C2, 5B2, 5C2, 5D2.

      No colorbar units: 5A3 (and should be shown in real not normalized units).

      No y axis units: 5D1.

      No x axis label or units: 5E1.

      5E3 says "stim/baseline" for the y-axis units and then the first-panel title says "absolute frequencies" meaning it’s *not* normalized and needs a separate (accurate) y-axis with units.

      Illegibly tiny fonts: 2E1, 3E1, etc.

      We will correct all these in the revised manuscript. Thank you for careful reading.

    1. eLife Assessment

      This useful study presents findings on the developmental roles of Nup107, a key nucleoporin, in regulating the larval-to-pupal transition in Drosophila melanogaster through its involvement in ecdysone signaling. The evidence supporting the authors' claims is solid, with robust experimental approaches including RNAi knockdown and rescue experiments. The findings highlight Nup107's function in regulating ecdysone biosynthesis, specifically through the regulation of EcR levels and Halloween genes expression in the prothoracic gland; additionally, rescue experiments suggest that the RTK PTTH/Torso signaling pathway is disrupted upon Nup107 depletion, further emphasizing its role in ecdysone regulation. However, finding a mechanism, addressing potential off-target effects of RNAi, and exploring alternative mutant models would strengthen the findings as the currently proposed mechanism is not fully supported by the data.

    2. Reviewer #1 (Public review):

      This study provides a thorough analysis of Nup107's role in Drosophila metamorphosis, demonstrating that its depletion leads to developmental arrest at the third larval instar stage due to disruptions in ecdysone biosynthesis and EcR signaling. Importantly, the authors establish a novel connection between Nup107 and Torso receptor expression, linking it to the hormonal cascade regulating pupariation.

      However, some contradictory results weaken the conclusions of the study. The authors claim that Nup107 is involved in the translocation of EcR from the cytoplasm to the nucleus. However, the evidence provided in the paper suggests it more likely regulates EcR expression positively, as EcR is undetectable in Nup107-depleted animals, even below background levels. Additionally, the link between Nup107 and Torso is not fully substantiated. While overexpression of Torso appears to rescue the lack of 20E production in the prothoracic gland, the distinct phenotypes of Torso and Nup107 depletion-developmental delay in the former versus complete larval arrest in the latter complicate understanding of Nup107's precise role.

      To clarify these discrepancies, further investigation into whether Nup107 interacts with other critical signaling pathways related to the regulation of ecdysone biosynthesis, such as EGFR or TGF-β, would be beneficial and could strengthen the findings.

      In summary, although the study presents some intriguing observations, several conclusions are not well-supported by the experimental data.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Kawadkar et al investigates the role of Nup107 in developmental progression via the regulation of ecdysone signaling. The authors identify an interesting phenotype of Nup107 whole-body RNAi depletion in Drosophila development - developmental arrest at the late larval stage. Nup107-depleted larvae exhibit mislocalization of the Ecdysone receptor (EcR) from the nucleus to the cytoplasm and reduced expression of EcR target genes in salivary glands, indicative of compromised ecdysone signaling. This mis-localization of EcR in salivary glands was phenocopied when Nup107 was depleted only in the prothoracic gland (PG), suggesting that it is not nuclear transport of EcR but the presence of ecdysone (normally secreted from PG) that is affected. Consistently, whole-body levels of ecdysone were shown to be reduced in Nup107 KD, particularly at the late third instar stage when a spike in ecdysone normally occurs. Importantly, the authors could rescue the developmental arrest and EcR mislocalization phenotypes of Nup107 KD by adding exogenous ecdysone, supporting the notion that Nup107 depletion disrupts biosynthesis of ecdysone, which arrests normal development. Additionally, they found that rescue of the Nup107 KD phenotype can also be achieved by over-expression of the receptor tyrosine kinase torso, which is thought to be the upstream regulator of ecdysone synthesis in the PG. Transcript levels of the torso are also shown to be downregulated in the Nup107KD, as are transcript levels of multiple ecdysone biosynthesis genes. Together, these experiments reveal a new role of Nup107 or nuclear pore levels in hormone-driven developmental progression, likely via regulation of levels of torso and torso-stimulated ecdysone biosynthesis.

      Strengths:

      The developmental phenotypes of an NPC component presented in the manuscript are striking and novel, and the data appears to be of high quality. The rescue experiments are particularly significant, providing strong evidence that Nup107 functions upstream of torso and ecdysone levels in the regulation of developmental timing and progression.

      Weaknesses:

      The underlying mechanism is however not clear, and any insight into how Nup107 may regulate these pathways would greatly strengthen the manuscript. Some suggestions to address this are detailed below.

      Major questions:

      (1) Determining how specific this phenotype is to Nup107 vs. to reduced NPC levels overall would give some mechanistic insight. Does knocking down other components of the Nup107 subcomplex (the Y-complex) lead to similar phenotypes? Given the published gene regulatory function of Nup107, do other gene regulatory Nups such as Nup98 or Nup153 produce these phenotypes?

      (2) In a related issue, does this level of Nup107 KD produce lower NPC levels? It is expected to, but actual quantification of nuclear pores in Nup107-depleted tissues should be added. These and the above experiments would help address a key mechanistic question - is this phenotype the result of lower numbers of nuclear pores or specifically of Nup107?

      (3) Additional experiments on how Nup107 regulates the torso would provide further insight. Does Nup107 regulate transcription of the torso or perhaps its mRNA export? Looking at nascent levels of the torso transcript and the localization of its mRNA can help answer this question. Or alternatively, does Nup107 physically bind the torso?

      (4) The depletion level of Nup107 RNAi specifically in the salivary gland vs. the prothoracic gland should be compared by RT-qPCR or western blotting.

      (5) The UAS-torso rescue experiment should also include the control of an additional UAS construct - so Nup107; UAS-control vs Nup107; UAS-torso should be compared in the context of rescue to make sure the Gal4 driver is functioning at similar levels in the rescue experiment.

      Minor:

      (6) Figures and figure legends can stand to be more explicit and detailed, respectively.

    4. Reviewer #3 (Public review):

      Summary:

      In this study by Kawadkar et al, the authors investigate the developmental role of Nup107, a nucleoporin, in regulating the larval-to-pupal transition in Drosophila through RNAi knockdown and CRISPR-Cas9-mediated gene editing. They demonstrate that Nup107, an essential component of the nuclear pore complex (NPC), is crucial for regulating ecdysone signaling during developmental transitions. The authors show that the depletion of Nup107 disrupts these processes, offering valuable insights into its role in development.

      Specifically, they find that:

      (1) Nup107 depletion impairs pupariation during the larval-to-pupal transition.<br /> (2) RNAi knockdown of Nup107 results in defects in EcR nuclear translocation, a key regulator of ecdysone signaling.<br /> (3) Exogenous 20-hydroxyecdysone (20E) rescues pupariation blocks, but rescued pupae fail to close.<br /> (4) Nup107 RNAi-induced defects can be rescued by activation of the MAP kinase pathway.

      Strengths:

      The manuscript provides strong evidence that Nup107, a component of the nuclear pore complex (NPC), plays a crucial role in regulating the larval-to-pupal transition in Drosophila, particularly in ecdysone signaling.

      The authors employ a combination of RNAi knockdown, CRISPR-Cas9 gene editing, and rescue experiments, offering a comprehensive approach to studying Nup107's developmental function.

      The study effectively connects Nup107 to ecdysone signaling, a key regulator of developmental transitions, offering novel insights into the molecular mechanisms controlling metamorphosis.

      The use of exogenous 20-hydroxyecdysone (20E) and activation of the MAP kinase pathway provides a strong mechanistic perspective, suggesting that Nup107 may influence EcR signaling and ecdysone biosynthesis.

      Weaknesses:

      The authors do not sufficiently address the potential off-target effects of RNAi, which could impact the validity of their findings. Alternative approaches, such as heterozygous or clonal studies, could help confirm the specificity of the observed phenotypes.

      NPC Complex Specificity: While the authors focus on Nup107, it remains unclear whether the observed defects are specific to this nucleoporin or if other NPC components also contribute to similar defects. Demonstrating similar results with other NPC components would strengthen their claims.

      Although the authors show that Nup107 depletion disrupts EcR signaling, the precise molecular mechanism by which Nup107 influences this process is not fully explored. Further investigation into how Nup107 regulates EcR nuclear translocation or ecdysone biosynthesis would improve the clarity of the findings.

      There are some typographical errors and overly strong phrases, such as "unequivocally demonstrate," which could be softened. Additionally, the presentation of redundant data in different tissues could be streamlined to enhance clarity and flow.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This study provides a thorough analysis of Nup107's role in Drosophila metamorphosis, demonstrating that its depletion leads to developmental arrest at the third larval instar stage due to disruptions in ecdysone biosynthesis and EcR signaling. Importantly, the authors establish a novel connection between Nup107 and Torso receptor expression, linking it to the hormonal cascade regulating pupariation.

      However, some contradictory results weaken the conclusions of the study. The authors claim that Nup107 is involved in the translocation of EcR from the cytoplasm to the nucleus. However, the evidence provided in the paper suggests it more likely regulates EcR expression positively, as EcR is undetectable in Nup107-depleted animals, even below background levels.

      We appreciate the concern raised in this public review. However, we must clarify that we do not claim that Nup107 regulates the translocation of EcR from the cytoplasm. It is important to note that we posited this hypothesis if Nup107 will regulate EcR nuclear translocation (9<sup>th</sup> line of 2<sup>nd</sup> paragraph on page 6). We have spelled this out more clearly as the 3<sup>rd</sup> sub-section title of the Results section, and in the discussion (8<sup>th</sup> line of 2<sup>nd</sup> paragraph on page 11). Overall, we have expressed surprise that Nup107 is not directly involved in the nuclear translocation of EcR.

      Ecdysone hormone acts through the EcR to induce the transcription of EcR also and creates a positive autoregulatory loop that enhances the EcR level through ecdysone signaling (1). Since Nup107 depletion leads to a reduction in ecdysone levels, it disrupts the transcription autoregulatory EcR expression loop. This can contribute to the reduced EcR levels seen in Nup107-depleted animals.

      Additionally, the link between Nup107 and Torso is not fully substantiated. While overexpression of Torso appears to rescue the lack of 20E production in the prothoracic gland, the distinct phenotypes of Torso and Nup107 depletion-developmental delay in the former versus complete larval arrest in the latter complicate understanding of Nup107's precise role.

      We understand that there are differences in the developmental delay when Tosro and Nup107 depletion is analyzed. However, the two molecules being compared here are very different, and the extent of Torso depletion is not evident in other studies (2). Even if the extent of depletion of Torso and Nup107 is similar, we believe that Nup107, being a more widely expressed protein, induces stronger defects owing to its importance in cellular physiology. We think that RNAi-mediated depletion of Nup107 causes a defect in 20E biosynthesis through the Halloween genes, inducing a developmental arrest.

      To clarify these discrepancies, further investigation into whether Nup107 interacts with other critical signaling pathways related to the regulation of ecdysone biosynthesis, such as EGFR or TGF-β, would be beneficial and could strengthen the findings.

      In summary, although the study presents some intriguing observations, several conclusions are not well-supported by the experimental data.

      We agree with the reviewer’s suggestion. As noted in the literature, five RTKs-torso, InR, EGFR, Alk, and Pvr-stimulate the PI3K/Akt pathway, which plays a crucial role in the PG functioning and controlling pupariation and body size (3). We have checked the torso and EGFR signaling. We rescued Nup107 defects with the torso overexpression, however, constitutively active EGFR (BL-59843) did not rescue the phenotype (data was not shown). Nonetheless, we plan to examine the EGFR pathway activation by measuring the pERK levels in Nup107-depleted PGs.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Kawadkar et al investigates the role of Nup107 in developmental progression via the regulation of ecdysone signaling. The authors identify an interesting phenotype of Nup107 whole-body RNAi depletion in Drosophila development - developmental arrest at the late larval stage. Nup107-depleted larvae exhibit mis-localization of the Ecdysone receptor (EcR) from the nucleus to the cytoplasm and reduced expression of EcR target genes in salivary glands, indicative of compromised ecdysone signaling. This mis-localization of EcR in salivary glands was phenocopied when Nup107 was depleted only in the prothoracic gland (PG), suggesting that it is not nuclear transport of EcR but the presence of ecdysone (normally secreted from PG) that is affected. Consistently, whole-body levels of ecdysone were shown to be reduced in Nup107 KD, particularly at the late third instar stage when a spike in ecdysone normally occurs. Importantly, the authors could rescue the developmental arrest and EcR mislocalization phenotypes of Nup107 KD by adding exogenous ecdysone, supporting the notion that Nup107 depletion disrupts biosynthesis of ecdysone, which arrests normal development. Additionally, they found that rescue of the Nup107 KD phenotype can also be achieved by over-expression of the receptor tyrosine kinase torso, which is thought to be the upstream regulator of ecdysone synthesis in the PG. Transcript levels of the torso are also shown to be downregulated in the Nup107KD, as are transcript levels of multiple ecdysone biosynthesis genes. Together, these experiments reveal a new role of Nup107 or nuclear pore levels in hormone-driven developmental progression, likely via regulation of levels of torso and torso-stimulated ecdysone biosynthesis.

      Strengths:

      The developmental phenotypes of an NPC component presented in the manuscript are striking and novel, and the data appears to be of high quality. The rescue experiments are particularly significant, providing strong evidence that Nup107 functions upstream of torso and ecdysone levels in the regulation of developmental timing and progression.

      Weaknesses:

      The underlying mechanism is however not clear, and any insight into how Nup107 may regulate these pathways would greatly strengthen the manuscript. Some suggestions to address this are detailed below.

      Major questions:

      (1) Determining how specific this phenotype is to Nup107 vs. to reduced NPC levels overall would give some mechanistic insight. Does knocking down other components of the Nup107 subcomplex (the Y-complex) lead to similar phenotypes? Given the published gene regulatory function of Nup107, do other gene regulatory Nups such as Nup98 or Nup153 produce these phenotypes?

      We thank this public review to raise this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other Nup107-complex members could induce larval developmental arrest.

      In this study, we primarily focused on the Nup107 complex (outer ring complex) of the NPC. We have not examined other nucleoporins outside of this complex, such as Nup98 and Nup153. However, previous studies have reported that Nup98 and Nup153 interact with chromatin, with these investigations conducted in Drosophila S2 cells (4, 5, 6). In the future, we may check whether Nup98 and Nup153 depletion can produce the arrest phenotype.

      (2) In a related issue, does this level of Nup107 KD produce lower NPC levels? It is expected to, but actual quantification of nuclear pores in Nup107-depleted tissues should be added. These and the above experiments would help address a key mechanistic question - is this phenotype the result of lower numbers of nuclear pores or specifically of Nup107?

      We agree with the concern raised here, and we plan to assess nucleoporin intensity using mAb414 antibody (exclusively FG-repeat Nup recognizing antibody) in the Nup107 depletion background. Our past observations suggest that Nup107-depletion does not affect the overall nuclear pore complex assembly in Drosophila salivary glands (Data is not shown).

      (3) Additional experiments on how Nup107 regulates the torso would provide further insight. Does Nup107 regulate transcription of the torso or perhaps its mRNA export? Looking at nascent levels of the torso transcript and the localization of its mRNA can help answer this question. Or alternatively, does Nup107 physically bind the torso?

      While the concern regarding torso transcript level is genuine, we have already reported in the manuscript that Nup107 levels directly regulate torso expression. When Nup107 is depleted torso levels go down, which in turn controls ecdysone production and subsequent EcR signaling (Figure 6B of the manuscript). However, the exact nature of Nup107 regulation on torso expression is still unclear. Since the Nup107 is known to interact with chromatin (7), it may affect torso transcription. The possibility of a physiologically relevant interaction between Nup107 and the torso in a cellular context is unlikely due to their distinct sub-cellular localizations. If we investigate this further, it will require a significant amount of time for having reagents and experimentation, and currently stands beyond the scope of this manuscript.

      (4) The depletion level of Nup107 RNAi specifically in the salivary gland vs. the prothoracic gland should be compared by RT-qPCR or western blotting.

      Although we know that the Nup107 protein signal is reduced in SG upon knockdown (Figure 3B), we have not compared the Nup107 transcript level in these two tissues (SG and PG). As suggested here, we will knock down Nup107 using SG and PG-specific drivers and quantify the Nup107 depletion level by RT-qPCR.

      (5) The UAS-torso rescue experiment should also include the control of an additional UAS construct - so Nup107; UAS-control vs Nup107; UAS-torso should be compared in the context of rescue to make sure the Gal4 driver is functioning at similar levels in the rescue experiment.

      This is a very valid point, and we took this into account while planning the experiment. To maintain the GAL4 function, we used the Nup107<sup>KK</sup>;UAS-GFP as control alongside the Nup107<sup>KK</sup>;UAS-torso. This approach ensures that GAL4 dilution does not affect observations made in the experiments. It can be noticed in Figure S7 that the presence of GFP signal in prothoracic glands and their reduced size indicates genes downstream to both UAS sequences are transcribed, and GAL4 dilution does not play a role here.

      Minor:

      (6) Figures and figure legends can stand to be more explicit and detailed, respectively.

      We will revisit all figures and their corresponding legends to ensure appropriate and explicit details are provided.

      Reviewer #3 (Public review):

      Summary:

      In this study by Kawadkar et al, the authors investigate the developmental role of Nup107, a nucleoporin, in regulating the larval-to-pupal transition in Drosophila through RNAi knockdown and CRISPR-Cas9-mediated gene editing. They demonstrate that Nup107, an essential component of the nuclear pore complex (NPC), is crucial for regulating ecdysone signaling during developmental transitions. The authors show that the depletion of Nup107 disrupts these processes, offering valuable insights into its role in development.

      Specifically, they find that:

      (1) Nup107 depletion impairs pupariation during the larval-to-pupal transition.

      (2) RNAi knockdown of Nup107 results in defects in EcR nuclear translocation, a key regulator of ecdysone signaling.

      (3) Exogenous 20-hydroxyecdysone (20E) rescues pupariation blocks, but rescued pupae fail to close.

      (4) Nup107 RNAi-induced defects can be rescued by activation of the MAP kinase pathway.

      Strengths:

      The manuscript provides strong evidence that Nup107, a component of the nuclear pore complex (NPC), plays a crucial role in regulating the larval-to-pupal transition in Drosophila, particularly in ecdysone signaling.

      The authors employ a combination of RNAi knockdown, CRISPR-Cas9 gene editing, and rescue experiments, offering a comprehensive approach to studying Nup107's developmental function.

      The study effectively connects Nup107 to ecdysone signaling, a key regulator of developmental transitions, offering novel insights into the molecular mechanisms controlling metamorphosis.

      The use of exogenous 20-hydroxyecdysone (20E) and activation of the MAP kinase pathway provides a strong mechanistic perspective, suggesting that Nup107 may influence EcR signaling and ecdysone biosynthesis.

      Weaknesses:

      The authors do not sufficiently address the potential off-target effects of RNAi, which could impact the validity of their findings. Alternative approaches, such as heterozygous or clonal studies, could help confirm the specificity of the observed phenotypes.

      This is a very valid point raised, and we are aware of the consequences of the off-target effects of RNAi. To assert the effects of authentic RNAi and reduce the off-target effects, we have used two RNAi lines (Nup107<sup>GD</sup> and Nup107<sup>KK</sup>) against Nup107. Both RNAi induced comparable levels of Nup107 reduction, and using these lines, ubiquitous and PG specific knockdown produced similar phenotypes. Although the Nup107<sup>GD</sup> line exhibited a relatively stronger knockdown compared to the Nup107<sup>KK</sup> line, we preferentially used the Nup107<sup>KK</sup> line because the Nup107<sup>GD</sup> line is based on the P-element insertion, and the exact landing site is unknown. Furthermore, there is an off-target predicted for the Nup107<sup>GD</sup> line, where a 19bp sequence aligns with the bifocal (bif) sequence. The bif-encoded protein is involved in axon guidance and regulation of axon extension. However, the Nup107<sup>KK</sup> line does not have a predicted off-target molecule, and we know its precise landing site on the second chromosome. Thus, the Nup107<sup>KK</sup> line was ultimately used in experimentation for its clearer and more reliable genetic background.

      We are also investigating Nup107 knockdown in the prothoracic gland, which exhibits polyteny. Additionally, the number of cells in the prothoracic gland is quite limited, approximately 50-60 cells (8). Given this, there is a possibility that a clonal study may not yield the phenotype. However, we will consider moving forward with this approach also.

      NPC Complex Specificity: While the authors focus on Nup107, it remains unclear whether the observed defects are specific to this nucleoporin or if other NPC components also contribute to similar defects. Demonstrating similar results with other NPC components would strengthen their claims.

      We thank this public review to raise this concern. Working with a Nup-complex like the Nup107 complex, this concern is anticipated but difficult to address as many Nups function beyond their complex identity. Our observations with all other members of the Nup107-complex, including dELYS, suggest that except Nup107, none of the other Nup107-complex members could induce larval developmental arrest. Since the study is primarily focused on the Nup107 complex (outer ring complex) of the NPC, we have not examined other nucleoporins outside of this complex.

      Although the authors show that Nup107 depletion disrupts EcR signaling, the precise molecular mechanism by which Nup107 influences this process is not fully explored. Further investigation into how Nup107 regulates EcR nuclear translocation or ecdysone biosynthesis would improve the clarity of the findings.

      We appreciate the concern raised. Through our observation, we have proposed the upstream effect of Nup107 on the PTTH-torso-20E-EcR axis regulating developmental transitions. We know that Nup107 regulates torso levels, but we do not know if Nup107 directly interacts with torso. We would like to address whether Nup107 exerts control on PTTH levels also.

      We must emphasize that Nup107 does not directly regulate the translocation of EcR. On the contrary, we have demonstrated that EcR translocation is 20E dependent and Nup107 independent. Through our observations, we have argued that Nup107 regulates the expression of Halloween genes required for ecdysone biosynthesis. We are interested in identifying if Nup107 associates directly or through some protein to chromatin to bring about the changes in gene expression required for normal development.

      There are some typographical errors and overly strong phrases, such as "unequivocally demonstrate," which could be softened. Additionally, the presentation of redundant data in different tissues could be streamlined to enhance clarity and flow.

      We thank the reviewer for this observation. We will remove all typographical errors and make reasonable statements based on our conclusions.

      References:

      (1) Varghese, Jishy, and Stephen M Cohen. “microRNA miR-14 acts to modulate a positive autoregulatory loop controlling steroid hormone signaling in Drosophila.” Genes & development vol. 21,18 (2007): 2277-82. doi:10.1101/gad.439807

      (2) Rewitz, Kim F et al. “The insect neuropeptide PTTH activates receptor tyrosine kinase torso to initiate metamorphosis.” Science (New York, N.Y.) vol. 326,5958 (2009): 1403-5. doi:10.1126/science.1176450

      (3) Pan, Xueyang, and Michael B O'Connor. “Coordination among multiple receptor tyrosine kinase signals controls Drosophila developmental timing and body size.” Cell reports vol. 36,9 (2021): 109644. doi:10.1016/j.celrep.2021.109644

      (4) Pascual-Garcia, Pau et al. “Metazoan Nuclear Pores Provide a Scaffold for Poised Genes and Mediate Induced Enhancer-Promoter Contacts.” Molecular cell vol. 66,1 (2017): 63-76.e6. doi:10.1016/j.molcel.2017.02.020

      (5) Pascual-Garcia, Pau et al. “Nup98-dependent transcriptional memory is established independently of transcription.” eLife vol. 11 e63404. 15 Mar. 2022, doi:10.7554/eLife.63404

      (6) Kadota, Shinichi et al. “Nucleoporin 153 links nuclear pore complex to chromatin architecture by mediating CTCF and cohesin binding.” Nature communications vol. 11,1 2606. 25 May. 2020, doi:10.1038/s41467-020-16394-3

      (7) Gozalo, Alejandro et al. “Core Components of the Nuclear Pore Bind Distinct States of Chromatin and Contribute to Polycomb Repression.” Molecular cell vol. 77,1 (2020): 67-81.e7. doi:10.1016/j.molcel.2019.10.017

      (8) Shimell, MaryJane, and Michael B O'Connor. “Endoreplication in the Drosophila melanogaster prothoracic gland is dispensable for the critical weight checkpoint.” microPublication biology vol. 2023 10.17912/micropub.biology.000741. 21 Feb. 2023, doi:10.17912/micropub.biology.000741

    1. eLife Assessment

      This study investigates trial-by-trial inter-areal interactions in the visual cortex of the mouse and the monkey by analyzing two previously published datasets. The authors find that activity in one layer (in mice) or one area (in monkeys) can partially predict neural activity in another layer or area on the single-trial level in different experimental contexts. This valuable finding expands previously known contributions of stimulus-independent downstream activity to neural responses in the visual cortex by demonstrating how these change under varying visual stimuli as well as in the absence of visual stimulation. While the methodology is solid, the analysis for the monkey data is incomplete and would benefit from including a second animal.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors propose a "unifying method to evaluate inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice over 7 days, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 1 monkey over 5 days of recording. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) Use of existing data.

      (2) Addresses an interesting question.

      Weaknesses:

      Unfortunately, the method falls short of the state of the art: both generalized linear models (GLMs), which have been used in similar contexts for at least 20 years (see the many papers, both theoretical and applied to neural population data, by e.g. Simoncelli, Paninsky, Pillow, Schwartz, and many colleagues dating back to 2004), and the extension of Granger causality to point processes (e.g. Kim et al. PLoS CB 2011). Both approaches are substantially superior to what is proposed in the manuscript, since they enforce non-negativity for spike rates (the importance of which can be seen in Figure 2AB), and do not require unnecessary coarse-graining of the data by binning spikes (the 200 ms time bins are very long compared to the time scale on which communication between closely connected neuronal populations within an area, or between related areas, takes place).

      In terms of analysis results, the work in the manuscript presents some expected and some less expected results. However, because the monkey data are based on only one monkey (misleadingly, the manuscript consistently uses the plural "monkeys"), none of the results specific to that monkey, nor the comparison of that one monkey to mice, are supported by robust data. One of the main results for mice (bimodality of explained variance values, mentioned in the abstract) does not appear to be quantified or supported by a statistical test and is only present in two out of three mice. Moreover, the two data sets differ in too many aspects to allow for any conclusions about whether the comparisons reflect differences in species (mouse vs. monkey), anatomy (L2/3-L4 vs. V1-V4), or recording technique (calcium imaging vs. extracellular spiking).

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have a significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.

      To test this, the authors analyzed previously collected and publicly available datasets. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimate the variability of individual neurons. The authors deploy a regression model that is appropriate for addressing their hypothesis, and their analytic approach appears rigorous and well-controlled.

      From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison with some minor differences between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli is not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in the cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. However, the source of these predictive signals and their impact on function is only explored in a limited fashion, largely due to limitations in the datasets. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent of the sensory signals being studied.

      Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well-designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons. The mechanistic contribution of known sources or correlates of shared variability (eye movements, pupil fluctuations, locomotion, whisking behaviors) were not considered, and these could be driving or a reflection of much of the predictability observed and explain differences in spontaneous and visual activity predictions. Previous work has explored correlations in activity between areas on various timescales, but this work only considered a narrow scope of timescales. The observation that there is some degree of predictability is not surprising, and it is unclear whether changes in observed predictability with analysis conditions are informative of a particular mechanism or just due to differences in the variance of activity under those conditions. Some of these issues could be addressed with further analysis, but some may be due to limitations in the experimental scope of the datasets and would require new experiments to resolve.

    4. Reviewer #3 (Public review):

      Neural activity in the visual cortex has primarily been studied in terms of responses to external visual stimuli. While the noisiness of inputs to a visual area is known to also influence visual responses, the contribution of this noisy component to overall visual responses has not been well characterized.

      In this study, the authors reanalyze two previously published datasets - a Ca++ imaging study from mouse V1 and a large-scale electrophysiological study from monkey V1-V4. Using regression models, they examine how neural activity in one layer (in mice) or one cortical area (in monkeys) predicts activity in another layer or area. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of non-stimulus-related downstream activity on neural responses. These findings can inform future modeling work of neural responses in the visual cortex to account for such non-visual influences.

      A major weakness of the study is that the analysis includes data from only a single monkey. This makes it hard to interpret the data as the results could be due to experimental conditions specific to this monkey, such as the relative placement of electrode arrays in V1 and V4. The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. However, the comparison of stimulus types (Figure 4) raises a potential concern. It is not clear if the differences reported reflect an actual change in predictive influence across the two conditions or if they stem from fundamental differences in the responses of the predictor population, which could in turn affect the ability to measure predictive relationships. The authors do control for some potential confounds such as the number of neurons and self-consistency of the predictor population. However, the predictability seems to closely track the responsiveness of neurons to a particular stimulus. For instance, in the monkey data, the V1 neuronal population will likely be more responsive to checkerboards than to single bars. Moreover, neurons that don't have the bars in their RFs may remain largely silent. Could the difference in predictability be just due to this? Controlling for overall neuronal responsiveness across the two conditions would make this comparison more interpretable.

    5. Author response:

      Reviewer #1:

      Summary:

      In this study, the authors propose a "unifying method to evaluate inter-areal interactions in different types of neuronal recordings, timescales, and species". The method consists of computing the variance explained by a linear decoder that attempts to predict individual neural responses (firing rates) in one area based on neural responses in another area.

      The authors apply the method to previously published calcium imaging data from layer 4 and layers 2/3 of 4 mice over 7 days, and simultaneously recorded Utah array spiking data from areas V1 and V4 of 1 monkey over 5 days of recording. They report distributions over "variance explained" numbers for several combinations: from mouse V1 L4 to mouse V1 L2/3, from L2/3 to L4, from monkey V1 to monkey V4, and from V4 to V1. For their monkey data, they also report the corresponding results for different temporal shifts. Overall, they find the expected results: responses in each of the two neural populations are predictive of responses in the other, more so when the stimulus is not controlled than when it is, and with sometimes different results for different stimulus classes (e.g., gratings vs. natural images).

      Strengths:

      (1) Use of existing data.

      (2) Addresses an interesting question.

      Unfortunately, the method falls short of the state of the art: both generalized linear models (GLMs), which have been used in similar contexts for at least 20 years (see the many papers, both theoretical and applied to neural population data, by e.g. Simoncelli, Paninsky, Pillow, Schwartz, and many colleagues dating back to 2004), and the extension of Granger causality to point processes (e.g. Kim et al. PLoS CB 2011). Both approaches are substantially superior to what is proposed in the manuscript, since they enforce non-negativity for spike rates (the importance of which can be seen in Figure 2AB), and do not require unnecessary coarse-graining of the data by binning spikes (the 200 ms time bins are very long compared to the time scale on which communication between closely connected neuronal populations within an area, or between related areas, takes place).

      We thank the reviewer for this suggestion. Our goal was to use a simple and unified linear ridge regression framework that can be applied to both calcium imaging (mouse) and MUAe (monkey) data.

      We will perform a GLM-based analysis enforcing non-negativity as suggested, including in the GLM any additional available variables that may contribute to the neuronal responses.

      We also would like to note that:

      ● Macaque data: Our MUAe data are binned at 25 ms, not 200 ms. We used the envelope

      of multi-unit activity as reported in the original study [1]. We did not perform spike sorting on these data and therefore, strictly speaking, this is not a point process and methods developed for point processes are not directly applicable.

      ● Mouse data: The Stringer et al. dataset [2,3] uses two-photon calcium imaging sampled at 2.5 or 3 Hz. Additionally, responses were computed by averaging two frames per stimulus (yielding an effective bin size of 666 ms or 800 ms), dictated by acquisition constraints. We will emphasize the low temporal resolution of these signals as a limitation in the discussion section, but we cannot improve the temporal resolution with our analyses. These signals are not point processes either (although there is a correlation between two-photon calcium signals and spike rates).

      Regardless of these considerations, the reviewer’s points are well taken, and we will conduct additional analyses as described above.

      In terms of analysis results, the work in the manuscript presents some expected and some less expected results. However, because the monkey data are based on only one monkey (misleadingly, the manuscript consistently uses the plural ‘monkeys’), none of the results specific to that monkey, nor the comparison of that one monkey to mice, are supported by robust data.

      We will add data from at least two more monkeys, as suggested by the reviewer:

      ● First, we will include a second monkey from the same dataset [1]. The reason this monkey was not included in the original submission is that the dataset for this second monkey consisted of much less data than the original. For example, for the lights-off condition, the number of V4 channels with signal-to-noise ratio greater than 2 (recommended electrodes to use by dataset authors) is 9-12 in this second monkey, compared to 68-74 in the first monkey [1]. However, we will still add results for this second monkey.

      ● Additionally, we will include data from a new monkey by collaborating with the Ponce lab who will collect new data for this study.

      One of the main results for mice (bimodality of explained variance values, mentioned in the abstract) does not appear to be quantified or supported by a statistical test.

      We appreciate this point. We will conduct statistical tests to quantify the degree of bimodality and clarify these findings in the results.

      Moreover, the two data sets differ in too many aspects to allow for any conclusions about whether the comparisons reflect differences in species (mouse vs. monkey), anatomy (L2/3-L4 vs. V1-V4), or recording technique (calcium imaging vs. extracellular spiking).

      We agree that the methodological and anatomical differences between the mouse and monkey datasets make any direct cross-species comparisons hard to interpret. We explicitly discuss this point in the Discussion section. We will add a section within the Discussion entitled “Limitations of this study”. We will further emphasize that our goal is not to attempt a direct quantitative comparison across species. We will further emphasize that the two experiments differ in terms of: (i) differences in recording modalities (calcium vs. electrophysiology) and associated differences in temporal resolution, neuronal types, and SNR, (ii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we will further highlight that our primary aim is to investigate inter-areal interactions within each species rather than to draw comparisons across species.

      Reviewer #2:

      Summary:

      In this work, the authors investigated the extent of shared variability in cortical population activity in the visual cortex in mice and macaques under conditions of spontaneous activity and visual stimulation. They argue that by studying the average response to repeated presentations of sensory stimuli, investigators are discounting the contribution of variable population responses that can have a significant impact at the single trial level. They hypothesized that, because these fluctuations are to some degree shared across cortical populations depending on the sources of these fluctuations and the relative connectivity between cortical populations within a network, one should be able to predict the response in one cortical population given the response of another cortical population on a single trial, and the degree of predictability should vary with factors such as retinotopic overlap, visual stimulation, and the directionality of canonical cortical circuits.

      To test this, the authors analyzed previously collected and publicly available datasets. These include calcium imaging of the primary visual cortex in mice and electrophysiology recordings in V1 and V4 of macaques under different conditions of visual stimulation. The strength of this data is that it includes simultaneous recordings of hundreds of neurons across cortical layers or areas. However, the weaknesses of calcium dynamics (which has lower temporal resolution and misses some non-linear dynamics in cortical activity) and multi-unit envelope activity (which reflects fluctuations in population activity rather than the variance in individual unit spike trains), underestimate the variability of individual neurons. The authors deploy a regression model that is appropriate for addressing their hypothesis, and their analytic approach appears rigorous and well-controlled.

      We agree that both calcium imaging and multi-unit envelope recordings have inherent limitations in capturing the variability of individual neuron spiking. Among other factors, the slower temporal resolution of calcium signals can blur fast spiking events, and multi-unit envelopes can mask single-unit heterogeneity. In the Discussion, we will explicitly mention these modality-specific caveats and note that our approach is meant to capture shared variability at the population level rather than the fine temporal structure of individual neurons and individual spikes.

      From their analysis, they found that there was significant predictability of activity between layer II/III and layer IV responses in mice and V1 and V4 activity in macaques, although the specific degree of predictability varied somewhat with the condition of the comparison with some minor differences between the datasets. The authors deployed a variety of analytic controls and explored a variety of comparisons that are both appropriate and convincing that there is a significant degree of predictability in population responses at the single trial level consistent with their hypothesis. This demonstrates that a significant fraction of cortical responses to stimuli is not due solely to the feedforward response to sensory input, and if we are to understand the computations that take place in the cortex, we must also understand how sensory responses interact with other sources of activity in cortical networks. However, the source of these predictive signals and their impact on function is only explored in a limited fashion, largely due to limitations in the datasets. Overall, this work highlights that, beyond the traditionally studied average evoked responses considered in systems neuroscience, there is a significant contribution of shared variability in cortical populations that may contextualize sensory representations depending on a host of factors that may be independent of the sensory signals being studied.

      We will include a section within the Discussion to emphasize the limitations in the datasets used in this study. We also agree and appreciate the reviewer’s description and will borrow some of the reviewer’s terminology to provide context in the Discussion section.

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.

      We agree that the methodological and anatomical differences between the mouse and monkey datasets make any direct cross-species comparisons hard to interpret. We explicitly discuss this point in the Discussion section. We will add a section within the Discussion entitled “Limitations of this study”. We will further emphasize that our goal is not to attempt a direct quantitative comparison across species. We will further emphasize that the two experiments differ in terms of: (i) differences in recording modalities (calcium vs. electrophysiology) and associated differences in temporal resolution, neuronal types, and SNR, (ii) cortical targets (layers vs. areas), (iii) sample size, (iv) stimuli, (v) task conditions. In the revised manuscript, we will further highlight that our primary aim is to investigate inter-areal interactions within each species rather than to draw comparisons across species.

      Strengths:

      This work considers a variety of conditions that may influence the relative predictability between cortical populations, including receptive field overlap, latency that may reflect feed-forward or feedback delays, and stimulus type and sensory condition. Their analytic approach is well-designed and statistically rigorous. They acknowledge the limitations of the data and do not over-interpret their findings.

      Weaknesses:

      The different recording modalities and comparisons (within vs. across cortical areas) limit the interpretability of the inter-species comparisons.The mechanistic contribution of known sources or correlates of shared variability (eye movements, pupil fluctuations, locomotion, whisking behaviors) were not considered, and these could be driving or a reflection of much of the predictability observed and explain differences in spontaneous and visual activity predictions.

      We also appreciate this important point. We agree that multiple behavioral factors may significantly contribute to shared variability. In our analyses of the mouse data, we addressed non-visual influences by projecting out “non-visual ongoing neuronal activity” (as shown in Figure 6C, following the approach in Stringer et al. 2019). Additionally, we will further evaluate the contribution of behavioral measures available in the open dataset—such as running speed, whisking, pupil area, and “eigenface” components– to predictivity of neuronal responses.

      For the macaque data, the head-fixed and eye-fixation conditions help minimize some of these other potential behavioral contributions. Moreover, we have performed comparisons of eyes-open versus eyes-closed conditions (see Figure 5D). We will also analyze pupil size specifically for the lights-off condition. We do not have access to any other behavioral data from monkeys.

      Previous work has explored correlations in activity between areas on various timescales, but this work only considered a narrow scope of timescales.

      We appreciate this suggestion. We will perform additional analyses to evaluate predictivity at different temporal scales, as suggested.

      The observation that there is some degree of predictability is not surprising, and it is unclear whether changes in observed predictability with analysis conditions are informative of a particular mechanism or just due to differences in the variance of activity under those conditions. Some of these issues could be addressed with further analysis, but some may be due to limitations in the experimental scope of the datasets and would require new experiments to resolve.

      Our initial analyses in Fig.6A examined the effect of variance in activity and predictability in mice. As the reviewer intuited, there is a correlation between variance and predictability, at least when presenting a stimulus. Importantly, however, this is not the case when predicting activity in the absence of any stimulus. In the macaque, we cannot compute the variance across stimuli in the checkerboard case (single stimulus), but we will compute it for the conditions of the 4 moving bars. In addition, inspired by the reviewer’s question, we will perform an analysis where we further normalize the variance in activity.

      We would like to note that our key contribution is not to merely show that some degree of predictability is possible (which we agree is not surprising) but rather: (i) to use a simple approach to quantify this predictability, (ii) to assess directional differences in predictability, (iii) to evaluate how this predictability depends on neuronal properties and receptive field overlap, (iv) how it depends on the stimuli, and, importantly, (v) to compare predictability during visual stimulation versus absence of visual input.

      We agree with the limitations in the datasets. We will include a section within the Discussion to emphasize these limitations.

      Reviewer #3:

      Neural activity in the visual cortex has primarily been studied in terms of responses to external visual stimuli. While the noisiness of inputs to a visual area is known to also influence visual responses, the contribution of this noisy component to overall visual responses has not been well characterized.

      In this study, the authors reanalyze two previously published datasets - a Ca++ imaging study from mouse V1 and a large-scale electrophysiological study from monkey V1-V4. Using regression models, they examine how neural activity in one layer (in mice) or one cortical area (in monkeys) predicts activity in another layer or area. Their main finding is that significant predictions are possible even in the absence of visual input, highlighting the influence of non-stimulus-related downstream activity on neural responses. These findings can inform future modeling work of neural responses in the visual cortex to account for such non-visual influences.

      A major weakness of the study is that the analysis includes data from only a single monkey. This makes it hard to interpret the data as the results could be due to experimental conditions specific to this monkey, such as the relative placement of electrode arrays in V1 and V4.

      We will add data from at least two more monkeys, as suggested by the reviewer:

      ● First, we will include a second monkey from the same dataset [1]. The reason this monkey was not included in the original submission is that the dataset for this second monkey consisted of much less data than the original. For example, for the lights-off condition, the number of V4 channels with signal-to-noise ratio greater than 2 (recommended electrodes to use by dataset authors) is 9-12 in this second monkey, compared to 68-74 in the first monkey [1]. However, we will still add results for this second monkey.

      ● Additionally, we will include data from a new monkey by collaborating with the Ponce lab who will collect new data for this study.

      The authors perform a thorough analysis comparing regression-based predictions for a wide variety of combinations of stimulus conditions and directions of influence. However, the comparison of stimulus types (Figure 4) raises a potential concern. It is not clear if the differences reported reflect an actual change in predictive influence across the two conditions or if they stem from fundamental differences in the responses of the predictor population, which could in turn affect the ability to measure predictive relationships. The authors do control for some potential confounds such as the number of neurons and self-consistency of the predictor population. However, the predictability seems to closely track the responsiveness of neurons to a particular stimulus. For instance, in the monkey data, the V1 neuronal population will likely be more responsive to checkerboards than to single bars. Moreover, neurons that don't have the bars in their RFs may remain largely silent. Could the difference in predictability be just due to this? Controlling for overall neuronal responsiveness across the two conditions would make this comparison more interpretable.

      This is also a valid concern. As the reviewer noted, we controlled for the number of neurons and degree of self-consistency (Fig. 3A, 3C), and this was always done within their respective stimulus type.

      As the reviewer intuits, in Fig. 6A in mice, we show that predictability correlates with neuronal responsiveness. This observation only held during the stimulus condition and not during the gray screen condition. We also showed correlations with self-consistency metrics as a proxy for responsiveness in Fig. 6A and 6C. However, we will directly assess the impact of responsiveness in two ways: (i) by correlating predictability directly with neuronal responsiveness and (ii) by following the same subsampling approach in Fig. 3 to normalize the degree of responsiveness and recompute the predictability metrics.

      REFERENCES

      (1) Chen, X., Morales-Gregorio, A., Sprenger, J., Kleinjohann, A., Sridhar, S., van Albada, S.J., Grün, S., and Roelfsema, P.R. (2022). 1024-channel electrophysiological recordings in macaque V1 and V4 during resting state. Sci Data 9, 77. https://doi.org/10.1038/s41597-022-01180-1.

      (2) Stringer, C., Pachitariu, M., Steinmetz, N., Carandini, M., and Harris, K.D. (2019). High-dimensional geometry of population responses in visual cortex. Nature 571, 361–365. https://doi.org/10.1038/s41586-019-1346-5.

      (3) Stringer, C., Pachitariu, M., Carandini, M., and Harris, K. (2018). Recordings of 10,000 neurons in visual cortex in response to 2,800 natural images. (Janelia Research Campus). https://doi.org/10.25378/janelia.6845348.v4 https://doi.org/10.25378/janelia.6845348.v4.

    1. eLife Assessment

      This important study offers a molecular characterization of neurons and glia in the adult nervous system of the fruit fly Drosophila melanogaster. The study focuses on the progeny of a specific set of neural stem cells, called Type II neuroblasts that contribute to the central complex, a conserved brain region that plays key roles in sensorimotor integration. The data are convincing and collected using validated methodology, generating an invaluable resource for future studies. The study will be of interest to developmental neurobiologists.

    2. Reviewer #1 (Public review):

      Summary:

      Epiney et al. use single-nuclei RNA sequencing (snRNA-seq) to characterize the lineage of Type-2 (T2) neuroblasts (NBs) in the adult Drosophila brain. To isolate cells born from T2 NBs, the authors used a genetic tool that specifically allows the permanent labeling of T2-derived cell types, which are then FAC-sorted for snRNA-seq. This effective labeling approach also allows them to compare the isolated T2 lineage cells with T1-derived cell types by a simple exclusion method. The authors begin by describing a transcriptomic atlas for all T1 and T2-derived neuronal and glia clusters, reporting that the T2-derived lineage comprises 161 neuronal clusters, in contrast to the T1 lineage which comprises 114 of them. The authors then use the expression of VAChT, VGlut, Gad1, Tbh, Ple, SerT, and Tdc2 to show that T2 neuroblasts generate all major neuron classes of fast-acting neurotransmitters. Strikingly, they show that a subset of glia and neuronal clusters have disproportionate enrichment in males or females, suggesting that T2 neuroblasts generate sex-biased cell types. The authors then proceed to characterize neuropeptide expression across T2-derived neuronal clusters and argue that the same neuropeptide can be expressed across different cell types, while similar cell types can express distinct neuropeptides. The functional implication of both observations, however, remains to be tested. Furthermore, the authors describe combinatorial transcription factor (TF) codes that are correlated with neuropeptide expression for T2-derived neurons along with an overall TF code for all T2-derived cell types, both of which will serve as an important starting point for future investigations. Finally, the authors map well-studied neuronal types of the central complex to the clusters of their T2-derived snRNA-seq dataset. They use known marker combinations, bulk RNA-seq data and highly specific split-GAL4 driver lines to annotate their T2-derived atlas, establishing a comprehensive transcriptomic atlas that would guide future studies in this field.

      Strengths:

      This study provides an in-depth transcriptomic characterization of neurons and glia derived from Type-2 neuroblast lineages. The results of this manuscript offer several future directions to investigate the mechanisms of diversifying neuronal identity. The datasets of T1-derived and T2-derived cells will pave the way for studies focused on the functional analysis of combinatorial TF codes specifying cell identity, sex-based differences in neurogenesis and gliogenesis, the relationship between neuropeptide (co)expression and cell identity, and the differential contributions of distinct progenitor populations to the same cell type.

      Weaknesses:

      The study presents several important observations based on the characterization of Type II neuroblast-derived lineages. However, a mechanistic insight is missing for most observations. The idea that there is a sex-specific bias to certain T2-derived neurons and glial clusters is quite interesting, however, the functional significance of this observation is not tested or discussed extensively. Finally, the authors do not show whether the combinatorial TF code is indeed necessary for neuropeptide expression or if this is just a correlation due to cell identity being defined by TFs. Functional knockdown of some candidate TFs for a subset of neuropeptide-expressing cells would have been helpful in this case.

    3. Reviewer #2 (Public review):

      In this manuscript, Epiney et al., present a single-nucleus sequencing analysis of Drosophila adult central brain neurons and glia. By employing an ingenious permanent labeling technique, they trace the progeny of T2 neuroblasts, which play a key role in the formation of the central complex. This transcriptomic dataset is poised to become a valuable resource for future research on neurogenesis, neuron morphology, and behavior.

      The authors further delve into this dataset with several analyses, including the characterization of neurotransmitter expression profiles in T2-derived neurons. While some of the bioinformatic analyses are preliminary, they would benefit from additional experimental validation in future studies.

    1. eLife Assessment

      The paper addresses the question of gene epistasis and asks what is the correct null model for which we should declare no epistasis. By reanalyzing synthetic gene array datasets regarding single and double-knockout yeast mutants, and considering two theoretical models of cell growth, the authors reach the valuable conclusion that the product function is a good null model. The analysis is still incomplete, as some assumptions and hypotheses are not fully justified. However, once verified, the results have the potential to be of value to the field of gene epistasis.

    2. Reviewer #1 (Public review):

      Summary:

      Detecting unexpected epistatic interactions among multiple mutations requires a robust null expectation - or neutral function - that predicts the combined effects of multiple mutations on phenotype, based on the effects of individual mutations. This study assessed the validity of the product neutrality function, where the fitness of double mutants is represented as the multiplicative combination of the fitness of single mutants, in the absence of epistatic interactions. The authors utilized a comprehensive dataset on fitness, specifically measuring yeast colony size, to analyze epistatic interactions.

      The study confirmed that the product function outperformed other neutral functions in predicting the fitness of double mutants, showing no bias between negative and positive epistatic interactions. Additionally, in the theoretical portion of the study, the authors applied a well-established theoretical model of bacterial cell growth to simulate the growth rates of both single and double mutants under various parameters. The simulations further demonstrated that the product function was superior to other functions in predicting the fitness of hypothetical double mutants. Based on these findings, the authors concluded that the product function is a robust tool for analyzing epistatic interactions in growth fitness and effectively reflects how growth rates depend on the combination of multiple biochemical pathways.

      Strengths:

      By leveraging a previously published extensive dataset of yeast colony sizes for single- and double-knockout mutants, this study validated the relevance of the product function, commonly used in genetics to analyze epistatic interactions. The finding that the product function provides a more reliable prediction of double-mutant fitness compared to other neutral functions offers significant value for researchers studying epistatic interactions, particularly those using the same dataset.

      Notably, this dataset has previously been employed in studies investigating epistatic interactions using the product neutrality function. The current study's findings affirm the validity of the product function, potentially enhancing confidence in the conclusions drawn from those earlier studies. Consequently, both researchers utilizing this dataset and readers of previous research will benefit from the confirmation provided by this study's results.

      Weaknesses:

      This study exhibits several significant logical flaws, primarily arising from the following issues: a failure to differentiate between distinct phenotypes, instead treating them as identical; an oversight of the substantial differences in the mechanisms regulating cell growth between prokaryotes and eukaryotes; and the adoption of an overly specific and unrealistic set of assumptions in the mutation model. Additionally, the study fails to clearly address its stated objective-investigating the mechanistic origin of the multiplicative model. Although it discusses conditions under which deviations occur, it falls short of achieving its primary goal. Moreover, the paper includes misleading descriptions and unsubstantiated reasoning, presented without proper citations, as if they were widely accepted facts. Readers should consider these issues when evaluating this paper. Further details are discussed below.

      (1) Misrepresentation of the dataset and phenotypes

      The authors analyze a dataset on the fitness of yeast mutants, describing it as representative of the Malthusian parameter of an exponential growth model. However, they provide no evidence to support this claim. They assert that the growth of colony size in the dataset adheres to exponential growth kinetics; in contrast, it is known to exhibit linear growth over time, as indicated in [Supplementary Note 1 of https://doi.org/10.1038/nmeth.1534]. Consequently, fitness derived from colony size should be recognized as a different metric and phenotype from the Malthusian parameter. Equating these distinct phenotypes and fitness measures constitutes a fundamental error, which significantly compromises the theoretical discussions based on the Malthusian parameter in the study.

      (2) Misapplication of prokaryotic growth models

      The study attempts to explain the mechanistic origin of the multiplicative model observed in yeast colony fitness using a bacterial cell growth model, particularly the Scott-Hwa model. However, the application of this bacterial model to yeast systems lacks valid justification. The Scott-Hwa model is heavily dependent on specific molecular mechanisms such as ppGpp-mediated regulation, which plays a crucial role in adjusting ribosome expression and activity during translation. This mechanism is pivotal for ensuring the growth-dependency of the ribosome fraction in the proteome, as described in [https://doi.org/10.1073/pnas.2201585119]. Unlike bacteria, yeast cells do not possess this regulatory mechanism, rendering the direct application of bacterial growth models to yeast inappropriate and potentially misleading. This fundamental difference in regulatory mechanisms undermines the relevance and accuracy of using bacterial models to infer yeast colony growth dynamics.

      If the authors intend to apply a growth model with macroscopic variables to yeast double-mutant experimental data, they should avoid simply repurposing a bacterial growth model. Instead, they should develop and rigorously validate a yeast-specific growth model before incorporating it into their study.

      (3) Overly specific assumptions in the theoretical model

      The theoretical model in question assumes that two mutations affect only independent parameters of specific biochemical processes, an overly restrictive premise that undermines its ability to broadly explain the occurrence of the multiplicative model in mutations. Additionally, experimental evidence highlights significant limitations to this approach. For example, in most viable yeast deletion mutants with reduced growth rates, the expression of ribosomal proteins remains largely unchanged, in direct contradiction to the predictions of the Scott-Hwa model, as indicated in [https://doi.org/10.7554/eLife.28034]. This discrepancy emphasizes that the Scott-Hwa model and its derivatives do not reliably explain the growth rates of mutants based on current experimental data, suggesting that these models may need to be reevaluated or alternative theories developed to more accurately reflect the complex dynamics of mutant growth.

      (4) Lack of clarity on the mechanistic origin of the multiplicative model

      The study falls short of providing a definitive explanation for its primary objective: elucidating the "mechanistic origin" of the multiplicative model. Notably, even in the simplest case involving the Scott-Hwa model, the underlying mechanistic basis remains unexplained, leaving the central research question unresolved. Furthermore, the study does not clearly specify what types of data or models would be required to advance the understanding of the mechanistic origin of the multiplicative model. This omission limits the study's contribution to uncovering the biological principles underlying the observed fitness patterns.

    3. Reviewer #2 (Public review):

      The paper deals with the important question of gene epistasis, focusing on asking what is the correct null model for which we should declare no epistasis.

      In the first part, they use the Synthetic Genetic Array dataset to claim that the effects of a double mutation on growth rate are well predicted by the product of the individual effects (much more than e.g. the additive model). The second (main) part shows this is also the prediction of two simple, coarse-grained models for cell growth.

      I find the topic interesting, the paper well-written, and the approach innovative.

      One concern I have with the first part is that they claim that:<br /> "In these experiments, the colony area on the plate, a proxy for colony size, followed exponential growth kinetics. The fitness of a mutant strain was determined as the rate of exponential growth normalized to the rate in wild type cells."

      There are many works on "range expansions" showing that colonies expand at a constant velocity, the speed of which scales as the square root of the growth rate (these are called "Fisher waves", predicted in the 1940', and there are many experimental works on them, e.g. https://www.pnas.org/doi/epdf/10.1073/pnas.0710150104) If that's the case, the area of the colony should be proportional to growth_rate X time^2 , rather than exp(growth_rate*time), so the fitness they might be using here could be the log(growth_rate) rather than growth_rate itself? That could potentially have a big effect on the results.

      Additional comments/questions:

      (1) What is the motivation for the model where the effect of two genes is the minimum of the two?

      (2) How seriously should we take the Scott-Hwa model? Should we view it as a toy model to explain the phenomenon or more than that? If the latter, then since the number of categories in the GO analysis is much more than two (47?) in many cases the analysis of the experimental data would take pairs of genes that both affect one process in the Scott-Hwa model - and then the product prediction should presumably fail? The same comment applies to the other coarse-grained model.

      (3) There are many works in the literature discussing additive fitness contributions, including Kaufmann's famous NK model as well as spin-glass-type models (e.g. Guo and Amir, Science Advances 2019, Reddy and Desai, eLife 2021, Boffi et al., eLife 2023) These should be addressed in this context.

      (4) The experimental data is for deletions, but it would be interesting to know the theoretical model's prediction for the expected effects of beneficial mutations and how they interact since that's relevant (as mentioned in the paper) for evolutionary experiments. Perhaps in this case the question of additive vs. multiplicative matters less since the fitness effects are much smaller.

    1. eLife Assessment

      This study presents a valuable finding of novel markers that may potentially identify resident tendon stem/progenitor cells (TSPCs). The study also presents a comprehensive single-cell transcriptional dataset that will be of value to the field. The evidence supporting the identification of novel markers of a TSPC is incomplete, requiring clarification of current analyses, additional analyses between ages, and additional validation experiments to demonstrate that these markers are indeed specific and these cells are indeed TSPCs. This work will be of interest to biologists and engineers focused on tendons and ligaments.

    2. Reviewer #1 (Public review):

      This study is focused on identifying unique, innovative surface markers for mature Achilles tendons by combining the latest multi-omics approaches and in vitro evaluation, which would address the knowledge gap of the controversial identity of TPSCs with unspecific surface markers. The use of multi-omics technologies, in vivo characterization, in vitro standard assays of stem cells, and in vitro tissue formation is a strength of this work and could be applied for other stem cell quantification in musculoskeletal research. The evaluation and identification of Cd55 and Cd248 in TPSCs have not been conducted in tendons, which is considered innovative. Additionally, the study provided solid sequencing data to confirm co-expressions of Cd55 and Cd248 with other well-described surface markers such as Ly6a, Tpp3, Pdgfra, and Cd34. Generally, the data shown in the manuscript support the claims that the identified surface antigens mark TPSCs in juvenile tendons.

      However, there are missing links between scientific questions aimed to be addressed in Introduction and Methodology/Results. If the study focuses on unsatisfactory healing responses of mature tendons and understanding of mature TPSCs, at least mature Achilles tendons from more than 12-week-old mice and their comparison with tendons from juvenile/neonatal mice should be conducted. However, either 2-week or 6-week-old mice, used for characterization here, are not skeletally mature, Additionally, there is a lack of complete comparison of TPSCs between 2-week and 6-week-old mice in the transcriptional and epigenetic levels.

      In order to distinguish TPSCs and characterize their epigenetic activities, the authors used scRNA-seq, snRNA-seq, and snATAC-seq approaches. The integration, analysis, and comparison of sequencing data across assays and/or time points is confusing and incomplete. For example, it should be more comprehensive to integrate both scRNA-seq and snRNA-seq data (if not, why both assays were used for Achilles tendons of both 2-week and 6-week timepoints). snRNA-seq and snATAC-seq data of 6-week-old mice were separately analyzed. No comparison of difference and similarity of TPSCs of 2-week and 6-week-old mice was conducted.

      Given the goal of this work to identify specific TPSC markers, the specificity of Cd55 and Cd248 for TPSCs is not clear. First, based on the data shown here, Cd55 and Cd248 mark the same cell population which is identified by Ly6a, TPPP3, and Pdgfra. Although, for instance, Cd34 is expressed by other tissues as discussed here, no data/evidence is provided by this work showing that Cd55 and Cd248 are not expressed by other musculoskeletal tissues/cells. Second, the immunostaining of Cd55 and Cd248 doesn't support their specificity. What is the advantage of using Cd55 and Cd248 for TPSCs compared to using other markers?

    3. Reviewer #2 (Public review):

      Summary:

      The molecular signature of tendon stem cells is not fully identified. The endogenous location of tendon stem cells within the native tendon is also not fully elucidated. Several molecular markers have been identified to isolate tendon stem cells but they lack tendon specificity. Using the declining tendon repair capacity of mature mice, the authors compared the transcriptome landscape and activity of juvenile (2 weeks) and mature (6 weeks) tendon cells of mouse Achilles tendons and identified CD55 and CD248 as novel surface markers for tendon stem cells. CD55+ CD248+ FACS-sorted cells display a preferential tendency to differentiate into tendon cells compared to CD55neg CD248neg cells.

      Strengths:

      The authors generated a lot of data on juvenile and mature Achilles tendons, using scRNAseq, snRNAseq, and ATACseq strategies. This constitutes a resource dataset.

      Weaknesses:

      The analyses and validation of identified genes are not complete and could be pushed further. The endogenous expression of newly identified genes in native tendons would be informative. The comparison of scRNAseq and snRNAseq datasets for tendon cell populations would strengthen the identification of tendon cell populations.

    4. Reviewer #3 (Public review):

      Summary:

      In their report, Tsutsumi et al., use single nucleus transcriptional and chromatin accessibility analyses of mouse achilles tendon in an attempt to uncover new markers of tendon stem/progenitor cells. They propose CD55 and CD248 as novel markers of tendon stem/progenitor cells.

      Strengths:

      This is an interesting and important research area. The paper is overall well written.

      Weaknesses:

      Major problems:

      (1) It is not clear what tissue exactly is being analyzed. The authors build a story on tendons, but there is little description of the dissection. The authors claim to detect MTJ and cartilage cells, but not bone or muscle cells. The tendon sheath is known to express CD55, so the population of "progenitors" may not be of tendon origin.

      (2) Cluster annotations are seemingly done with a single gene. Names are given to cells without functional or spatial validation. For example, MTJ cells are annotated based on Postn, but it is never shown that Postn is only expressed at the MTJ, and not in other anatomical locations in the tendon.

      (3) The authors compare their data to public data based on interrogating single genes in their dataset. It is now standard practice to integrate datasets (eg, using harmony), or at a minimum using gene signatures built into Seurat (eg AddModuleScore).

      (4) Progenitor populations (SP1, SP2). The authors claim these are progenitors but show very clearly that they express macrophage genes. What are they, macrophages or fibroblasts?

      (5) All omics analysis is done on single data points (from many mice pooled). The authors make many claims on n=1 per group for readouts dependent on sample number (eg frequency of clusters).

      (6) The scRNAseq atlas in Figure 1 is made by analyzing 2W and 6W tendons at the same time. The snRNAseq and ATACseq atlas are built first on 2W data, after which the 6W data is compared. Why use the 2W data as a reference? Why not analyze the two-time points together as done with the scRNAseq?

      (7) Figure 5: The authors should show the gating strategy for FACS. Were non-fibroblasts excluded (eg, immune cells, endothelia...etc). Was a dead cell marker used? If not, it is not surprising that fibroblasts form colonies and express fibroblast genes when compared to CD55-CD248- immune cells, dead cells, or debris. Can control genes such as Ptprc or Pecam1 be tested to rule out contamination with other cell types?

      Minor problems:

      (1) Report the important tissue processing details: type of collagenase used. Viability before loading into 10x machine.

    1. eLife Assessment

      This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. The reviewers thought the model was original and its conclusion that intrinsic connectivity is largely unaltered during sensory stimulation is very interesting, but that future use of the model could potentially be affected by false positive conclusions. Overall, this work is important with solid evidence for its conclusions - it will be of interest to neuroscientists working on brain connectivity and dynamics.

    2. Reviewer #1 (Public review):

      This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. The core idea is to combine the Vector Autoregressive model that is often used to infer Granger-causal connectivity in brain data with an encoding model that maps the features of a sensory stimulus to that brain data. The authors do a nice job of explaining the framework. And then they demonstrate its utility through some simulations and some analysis of real intracranial EEG data recorded from subjects as they watched movies. They infer from their analyses that the functional connectivity in these brain recordings is essentially unaltered during movie watching, that accounting for the driving movie stimulus can protect one against misidentifying brain responses to the stimulus as functional connectivity, and that recurrent brain activity enhances and prolongs the putative neural responses to a stimulus.

      This manuscript presents an interesting new framework (VARX) for simultaneously quantifying effective connectivity in brain activity during sensory stimulation and how that brain activity is being driven by that sensory stimulation. Overall, I thought this was an interesting manuscript with some rich and intriguing ideas. That said, I had some concerns also - one potentially major - with the inferences drawn by the authors on the analyses that they carried out.

      Main comments:

      (1) My primary concern with the way the manuscript is written right now relates to the inferences that can be drawn from the framework. In particular, the authors want to assert that, by incorporating an encoding model into their framework, they can do a better job of accounting for correlated stimulus-driven activity in different brain regions, allowing them to get a clearer view of the underlying innate functional connectivity of the brain. Indeed, the authors say that they want to ask "whether, after removing stimulus-induced correlations, the intrinsic dynamic itself is preserved". This seems a very attractive idea indeed. However, it seems to hinge critically on the idea of fitting an encoding model that fully explains all of the stimulus-driven activity. In other words, if one fits an encoding model that only explains some of the stimulus-driven response, then the rest of the stimulus-driven response still remains in the data and will be correlated across brain regions and will appear as functional connectivity in the ongoing brain dynamics - according to this framework. This residual activity would thus be misinterpreted. In the present work, the authors parameterize their stimulus using fixation onsets, film cuts, and the audio envelope. All of these features seem reasonable and valid. However, they surely do not come close to capturing the full richness of the stimuli, and, as such, there is surely a substantial amount of stimulus-driven brain activity that is not being accounted for by their "B" model and that is being absorbed into their "A" model and misinterpreted as intrinsic connectivity. This seems to me to be a major limitation of the framework. Indeed, the authors flag this concern themselves by (briefly) raising the issue in the first paragraph of their caveats section. But I think it warrants much more attention and discussion.

      (2) Related to the previous comment, the authors make what seems to me to be a complex and important point on page 6 (of the pdf). Specifically, they say "Note that the extrinsic effects captured with filters B are specific (every stimulus dimension has a specific effect on each brain area), whereas the endogenous dynamic propagates this initial effect to all connected brain areas via matrix A, effectively mixing and adding the responses of all stimulus dimensions. Therefore, this factorization separates stimulus-specific effects from the shared endogenous dynamic." It seems to me that the interpretation of the filter B (which is analogous to the "TRF") for the envelope, say, will be affected by the fact that the matrix A is likely going to be influenced by all sorts of other stimulus features that are not included in the model. In other words, residual stimulus-driven correlations that are captured in A might also distort what is going on in B, perhaps. So, again, I worry about interpreting the framework unless one can guarantee a near-perfect encoding model that can fully account for the stimulus-driven activity. I'd love to hear the authors' thoughts on this. (On this issue - the word "dominates" on page 12 seems very strong.)

      (3) Regarding the interpretation of the analysis of connectivity between movies and rest... that concludes that the intrinsic connectivity pattern doesn't really differ. This is interesting. But it seems worth flagging that this analysis doesn't really account for the specific dynamics in the network that could differ quite substantially between movie watching and rest, right? At the moment, it is all correlational. But the dynamics within the network could be very different between stimulation and rest I would have thought.

      (4) I didn't really understand the point of comparing the VARX connectivity estimate with the spare-inverse covariance method (Figure 2D). What was the point of this? What is a reader supposed to appreciate from it about the validity or otherwise of the VARX approach?

      (5) I think the VARX model section could have benefitted a bit from putting some dimensions on some of the variables. In particular, I struggled a little to appreciate the dimensionality of A. I am assuming it has to involve both time lags AND electrode channels so that you can infer Granger causality (by including time) between channels. Including a bit more detail on the dimensionality and shape of A might be helpful for others who want to implement the VARX model.

      (6) A second issue I had with the inferences drawn by the authors was a difficulty in reconciling certain statements in the manuscript. For example, in the abstract, the authors write "We find that the recurrent connectivity during rest is largely unaltered during movie watching." And they also write that "Failing to account for ... exogenous inputs, leads to spurious connections in the intrinsic "connectivity".

    3. Reviewer #2 (Public review):

      Summary:

      The authors apply the recently developed VARX model, which explicitly models intrinsic dynamics and the effect of extrinsic inputs, to simulated data and intracranial EEG recordings. This method provides a directed method of 'intrinsic connectivity'. They argue this model is better suited to the analysis of task neuroimaging data because it separates the intrinsic and extrinsic activity. They show: that intrinsic connectivity is largely unaltered during a movie-watching task compared to eyes open rest; intrinsic noise is reduced in the task; and there is intrinsic directed connectivity from sensory to higher-order brain areas.

      Strengths:

      (1) The paper tackles an important issue with an appropriate method.

      (2) The authors validated their method on data simulated with a neural mass model.

      (3) They use intracranial EEG, which provides a direct measure of neuronal activity.

      (4) Code is made publicly available and the paper is written well.

      Weaknesses:

      It is unclear whether a linear model is adequate to describe brain data. To the author's credit, they discuss this in the manuscript. Also, the model presented still provides a useful and computationally efficient method for studying brain data - no model is 'the truth'.

      Appraisal of whether the authors achieve their aims:

      As a methodological advancement highlighting a limitation of existing approaches and presenting a new model to overcome it, the authors achieve their aim. Generally, the claims/conclusions are supported by the results.

      The wider neuroscience claims regarding the role of intrinsic dynamics and external inputs in affecting brain data could benefit from further replication with another independent dataset and in a variety of tasks - but I understand if the authors wanted to focus on the method rather than the neuroscientific claims in this manuscript.

      Impact:

      The authors propose a useful new approach that solves an important problem in the analysis of task neuroimaging data. I believe the work can have a significant impact on the field.

    1. eLife Assessment

      This study presents useful findings on the differences between male and hermaphrodite C. elegans connectomes and how they may result in changes in locomotory behavioural outputs. However, the study appears incomplete with respect to the relationship between sex-specific AVA wiring and male mate-finding. Another area of concern is that the analysis does not consider animal-to-animal variability in the wiring when attempting to identify significant differences between the male and hermaphrodite.

    2. Reviewer #1 (Public review):

      Summary:

      This work seeks to predict differences in neural function and behavior between male and hermaphrodite C. elegans by comparing their nervous system maps of synaptic wiring. The authors then seek to validate some of their predictions by measuring differences in neural activity or behavior, including in response to neuron-specific genetic manipulations. In particular, the authors focus on the role of neuron AVA which has notable differences in its connectivity between the male and hermaphrodite, and they use this and behavior measurements to argue for a role of AVA in mate-searching behavior in males.

      Strengths:

      A major strength of this work is its approach to investigating differences in wiring between males and hermaphrodites in a systematic and quantitative way. The work laudably takes advantage of recently available comprehensive connectomes, including across sexes of the same species, and applies concepts from network science to mining their differences. Another strength of the work is that it supplements network analysis with measurements of behavior, including with cell-specific genetic manipulations. The measurements and analysis will be of value to the scientific community.

      Weaknesses:

      The evidence to support conclusions about the special relationship between differences in AVA's wiring and male mate-finding appears incomplete. The authors selected AVA based on changes in wiring and then observed a decrease in male chemotaxis towards hermaphrodites for animals in which neuron AVA is inhibited. This is presented as evidence that specifically AVA is important for mate-finding, and therefore that changes in wiring inform changes in function. But given AVA's known role in all reversal-related locomotion, it is important to more forcefully rule out an alternative hypothesis that the observed deficits in mate-finding could be explained by any reversal circuitry motor defect (including those without wiring differences), rather than specifically attributed to AVA and its wiring. Similarly, more evidence is needed to show that deficits in reversal circuitry preferentially affect mate-seeking compared to other goal-directed navigation behaviors.

      There are some areas where methods would benefit from further justification or clarification. For example, the work would benefit from better justification for selecting sub-networks to study, or for combining bilaterally symmetric neurons. More details are also needed to better interpret calcium imaging studies, such as details about the indicator and illumination wavelength and intensity.

      Finally, there are some weaknesses inherent to the entire field of connectomic analysis that are necessarily also present here. For example, it is unclear how to weight the relative contributions of chemical versus electrical gap junctions when performing analyses of the wiring diagram, and the choice could potentially influence results. The wiring diagram also lacks information about timescales of neural dynamics or the role of neuromodulators or other molecular details that may influence the strength or function of various connections, and this poses a major challenge for predicting neural dynamics from neural wiring. For example, in their neural dynamics simulation, the authors assume that all neurons have the same conductance and reversal potentials - a standard practice - despite known diversity among neurons that limits the usefulness of this approach. It will be helpful to further acknowledge these limitations of the broader field.

    3. Reviewer #2 (Public review):

      Summary:

      In their study, Wang and co-workers aimed to identify sexual dimorphisms in the connectomes of male and hermaphrodite C. elegans, and link these to sex-related behaviors. To this end they analyzed and compared various network properties of simplified male and hermaphrodite connectome datasets, and then focused on the AVA premotor neurons, linking their distinctive connectivity with their differential influence on reversing behaviors between the two sexes.

      Strengths:

      The study employs a range of basic methods from network and computational neuroscience and provides experimental testing of one of the predictions of the analysis.

      Weaknesses:

      Various aspects of sexual dimorphism in the nervous system of C. elegans have already been described and discussed (reviewed, for example, in Emmons 2018, Walsh et al. 2021). In particular, Cook et al, (2019), who mapped the male connectome (which serves as the key data in the current study), included in their work an analysis of connectome-level differences between males and hermaphrodites. Unfortunately, the foundations of the current study are somewhat problematic, and the results it provides are rather rudimentary and do not provide substantial new insight.

      My critique of the study can be organized around several major issues.

      (1) Source data

      A large portion of the work is based on the analysis of a single male and a single hermaphrodite connectome datasets from Cook et al. 2019. These original connectomes were simplified in the current study, merging most individual neurons into neuron class nodes. As a measure of edge weight, the authors used the number of synaptic contacts between each two nodes. Cook et al. 2019 estimated this number to be of high variance, and even when considering unweighted connectivity (whether two nodes are at all connected or not) substantial variability exists between independent connectome datasets (e.g., Birari and Rabinowitch, 2024). Therefore, basing the analysis on synaptic weights from a single connectome (for each sex) may be somewhat unreliable.

      On top of this, a huge gap may exist between connectome structure and function, especially when overlooking: (1) the sign of the synapses (excitatory vs. inhibitory), (2) synaptic efficiency (a single strong synapse may be more efficient than multiple weak synapses), (3) the spatial distribution of the synapses (clusters of synapses, for example, may be stronger than scattered synapses). These should at the very least be acknowledged. Moreover, the pooling of electrical and chemical synapses done by the authors is problematic, as is assuming all electrical synapses are bidirectional. These and other factors may undermine the results of the analysis, and, again, at the very least should be considered and discussed.

      A minimal validation of the analysis could be achieved by sensitivity analyses. For example, studying how consistent the results are when: separately analyzing the chemical and electrical networks; binarizing synaptic contacts to existing vs. non-existing connections regardless of weight; and comparing with additional connectome datasets (at least for hermaphrodites).

      Another important approach for validation would be synaptic labeling of key pathways, in order to establish the extent to which they maintain sexual dimorphism across the population (as performed, for example, by Cook et al., 2019; Pechuk et al. 2022).

      (2) Statistical analysis

      Comparing any two connectomes will show differences in connectivity and other network properties. The question is to what degree the differences found in the current study between two particular male and hermaphrodite connectomes transcend such basic inconsistencies. This fundamental question is not addressed in the manuscript.

      A second major concern is that a considerable portion of the results are based on improper comparisons between male and hermaphrodite connectome measures.

      In Figure 1D,I,M,V, Figure 2D,H,L, Figure 4E,I there is no sense in statistically testing the differences between hermaphrodite sex-specific (N=2) and shared nodes. The sample size is way too small. Corresponding conclusions about male-specific neurons being different from hermaphrodite-specific neurons in terms of connectivity are thus improperly founded. Similarly, the analyses in Figure 1P,S, 2O,R contain more data points, because of connectivity, but could still be misleading, since all the edges there contain either HSN or VC (just two nodes).

      More so, any claim comparing the differences between two measures in males vs. hermaphrodites should be based on a 2X2 (or 3X2) design (e.g., tested using 2-way ANOVA with an interaction term). It is erroneous to interpret comparisons between two effects without directly comparing them (Makin et al., 2019).

      When more than one comparison is performed, a one-way ANOVA should precede post hoc analyses, and corrections for multiple comparisons should be carried out and reported.

      The plots in Figure 1E,W and Figure 4F,J are illustrative but do not contain any statistical test to support the claims about which functions are emphasized in which sex. They also rely on a very superficial categorization of individual neuron class function, whereas in reality, in C. elegans many neurons serve multiple functions.

      In Figures 5-7 individual data points should be plotted, and the error bars and boxes should be defined (in all figures).

      Finally, Figure 3C,F,I,L,N,P and Figure 5A-C lack statistical analysis (e.g., via bootstrapping). In addition, the term 'significantly' in the text should be reserved for statistical significance.

      (3) Testing network predictions

      A key emphasis of the network analysis concerns the AVA premotor neurons. It is well established that reversing behavior is controlled by premotor neurons such as AVA (e.g., Maricq et al. 1995) and that AVA activity is spontaneous and coupled to reversing (e.g., Chronis et al. 2007). More so, it has already been shown that male reversal frequency is higher than that of hermaphrodites (e.g., Mah et al. 1992; Zhao et al. 2003). Similar findings in the current study are thus not very surprising. The current study does add some new detail. Namely, the higher frequency of AVA activity in adult males compared to hermaphrodites, and the presumably sex-specific roles of RIC and DVC as well as several AVA glutamate receptors, in modulating reversing. At the same time, PQR, for example, showed no such role, contrary to the predictions.

      Incidentally, AVA is not a commander neuron, but rather a command or, preferably, a premotor neuron. Altogether, the major specific focus of the analysis, predicting a sexually dimorphic role for AVA, is not very novel.

      (4) Further predictions

      The discussion section presents several additional predictions stemming from the analysis. However, to me, they seem almost arbitrary.

      The statement claiming that the authors found the male pharyngeal connectome to be more strongly wired to the main connectome as opposed to previous findings, is unclear. Sex-specific differences in connectivity between the pharyngeal and somatic networks are immediately evident from the connectomes and do not require graph theoretical tools to be discovered (page 4 and discussion of Figure 3N).

      The prediction that the AIY→RIA→RMD_DV circuit may facilitate pheromone-guided olfactory steering behavior in males is not very strong. On the one hand, it is known that males respond to sex pheromones (notably, however, if these pheromone receptors are ectopically expressed in hermaphrodites then hermaphrodites also respond to the pheromones [Wan et al. 2019]). Since these pheromone-sensing neurons are also involved in other sensory processes, it is quite trivial that the circuits involved in general sensory-based steering should be shared with specific pheromone-based steering. The fact that the interneurons in the circuit may be more strongly connected (excitatory, inhibitory, electrical?) in males could imply many things but does not add much to the picture.

      The authors also mention AFD as having more synaptic contacts with AIY in males, and link this somehow to the dimorphic expression of insulin-like peptides in AFD. However, neuropeptide-based transmission is largely independent of synaptic connections, so I don't see the relevance.

      (5) Methods

      The example provided in the Methods section for calculating graph measures is very helpful. I am not sure, however, why the length of a path was defined as the reciprocal sum of the edge weights of the connections within the path. Why the reciprocal? Is it the sum of the reciprocals? Do more synaptic contacts imply a shorter path?

      The description in the text (as opposed to the Methods section) of node strength is not very clear: "The node strength measures how strongly a node directly possesses with other nodes in the network" - This should be clarified.

      For the RC simulation, I assume the sodium and potassium conductances are fixed. If so, they are leak currents themselves. What does the extra leak current represent? Obviously the simulation includes multiple arbitrary assumptions and parameter values. It would be useful to discuss at least the considerations for choosing the model design and parameters. I also assume that the delayed responses in the bottom neurons in Figure 4A (that still respond) are due to indirect synaptic connections (path lengths > 1)?

    1. eLife Assessment

      This study reports that activation of TFEB promotes lysosomal exocytosis and clearance of cholesterol from lysosomes, the strength of evidence for which is convincing with appropriate and validated methodology in line with current state-of-the-art. The significance of the findings is important in the context of Niemann-Pick Disease Type C as well as other subfields.

    2. Reviewer #2 (Public review):

      Summary:

      This study presents an important finding that the activation of TFEB by sulforaphane (SFN) could promote lysosomal exocytosis and biogenesis in NPC, suggesting a potential mechanism by SFN for the removal of cholesterol accumulation, which may contribute to the development of new therapeutic approaches for NPC treatment.

      Strengths:

      The cell-based assays are convincing, utilizing appropriate and validated methodologies to support the conclusion that SFN facilitates the removal of lysosomal cholesterol via TFEB activation.

      Comments on revisions:

      The authors have addressed most of my questions. I have only one minor technical point to emphasize, which does not affect the overall strength of the evidence for this project.

      The pKa values of pHrodo Green (P35368, pKa=6.757) and pHrodo Red-Dex (P10361, pKa=6.816) are very similar. Prof. Xu's article, cited in the response letter (Hu, Li et al. 2022), is an excellent example of lysosomal pH measurement. He used LysoTracker Red DND-99 for a rough estimation of lysosomal acidity, and for accurate monitoring of lysosomal pH, he employed the ratiometric OG488-dex (pKa 4.6).

    3. Author response:

      The following is the authors’ response to the original reviews.

      Although the reviewers found our work interesting, they raised several important concerns about our study. To address these concerns, mostly we performed new experiments. The most important changes are highlighted in the summary paragraphs.

      First, in response to Reviewer 1’s suggestions, we have conducted the SFN experiments systematically, e.g., we further confirmed the mechanism of SFN-activated TFEB in HeLa NPC1 cells with new experiments including: the effect of BAPTA-AM (a calcium chelator), FK506+CsA (calcineurin inhibitors) and NAC (ROS scavenger) on SFN-induced TFEB-nuclear translocation in HeLa NPC1 cells (New Fig. S3). The effect of SFN on NPC1 expression (New Fig. S5). Particularly, we examined the colocalization of DiO (a PM marker) staining and surface LAMP1 staining in HeLa NPC1 cells under SFN treatment to confirm the PM exocytosis. In main text and figure legends, accuracy of sentence is thoroughly checked and defined. Hence, we have significantly improved the presentation and clarity in the revision.

      Second, in response to Reviewer 2’s suggestions, we have performed additional experiments to demonstrate that the role of TFEB in SFN-evoked the lysosomal exocytosis by using TFEB-KO cells (New Fig. S7B). In TFEB KO cells, this increase of surface LAMP1 signal by SFN treatment was significantly reduced, suggestive of SFN-induced exocytosis in a TFEB-dependent manner. We also investigated the effect of U18666A on CF555-dextran endocytosis. By examining the localization of CF-dex and Lamp1, we found that CF555 is present in the lysosome with U18666A treatment (Fig for reviewers only A,B), suggesting that NPC1 deficiency/U18666A treatment has no effect on CF-dex endocytosis.

      Third, in response to Reviewer 3’s suggestions, we have performed experiments in addition to response to other reviewers’ suggestion ie. the cytotoxicity of the concentration of SFN used in this study in various cell lines (New Fig.S10).

      In addition, according to the reviewers’ suggestions, we made clarifications and corrections wherever appropriate in the manuscript.

      Reviewer #1 (Public review):

      Summary:

      The authors are trying to determine if SFN treatment results in dephosphorylation of TFEB, subsequent activation of autophagy-related genes, exocytosis of lysosomes, and reduction in lysosomal cholesterol levels in models of NPC disease.

      Strengths:

      (1) Clear evidence that SFN results in translocation of TFEB to the nucleus.

      (2) In vivo data demonstrating that SFN can rescue Purkinje neuron number and weight in NPC1<sup>-/-</sup> animals.

      Thank you for the support!

      Weaknesses:

      (1) Lack of molecular details regarding how SFN results in dephosphorylation of TFEB leading to activation of the aforementioned pathways. Currently, datasets represent correlations.

      Thank you for raising this critical point! The reviewer is right that in this manuscript we did not talk too much about the molecular mechanism of SFN-evoked TFEB activation. Because in our previous study (Li, Shao et al. 2021), we explored the mechanism of SFN-induced TFEB activation. We show that SFN-evoked TFEB activation via a ROS-Ca<sup>2+</sup>-calcineurin dependent but MTOR -independent pathway (Li, Shao et al. 2021). In the current manuscript, we cited this paper, but did not talk the details of the mechanism, which obviously confused the reviewers. Therefore, in the revision manuscript we added more details of the molecular mechanism of SFN-activated TFEB. Also, we further confirmed this mechanism in HeLa NPC1 cells with new experiments including: the effect of BAPTA-AM (a calcium chelator), FK506+CsA (calcineurin inhibitors) and NAC (ROS scavenger) on SFN-induced TFEB-nuclear translocation in NPC cells (New Fig.S3).

      (2) Based on the manuscript narrative, discussion, and data it is unclear exactly how steady-state cholesterol would change in models of NPC disease following SFN treatment. Yes, there is good evidence that lysosomal flux to (and presumably across) the plasma membrane increases with SFN. However, lysosomal biogenesis genes also seem to be increasing. Given that NPC inhibition, NPC1 knockout, or NPC1 disease mutations are constitutively present and the cell models of NPC disease contain lysosomes (even with SFN) how could a simple increase in lysosomal flux decrease cholesterol levels? It would seem important to quantify the number of lysosomes per cell in each condition to begin to disentangle differences in steady state number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosed.

      Thank you for this constructive comment. From our data, in NPC1 cells SFN reduced the cholesterol levels by inducing lysosomal exocytosis and increasing lysosomal biogenesis. We understand the reviewer’s point that it would be really helpful to differentiate the exact three states of original number of lysosomes, number of new lysosomes, and number of lysosomes being exocytosis. Unfortunately, due to the technique limitation, so far seems there is no appropriate method that could clearly differentiate the lysosomes exactly come from which state. In the future, hopefully we will have technique to explore this mechanism.

      (3) Lack of evidence supporting the authors' premise that "SFN could be a good therapeutic candidate for neuropathology in NPC disease".

      Suggestion was taken! We removed this sentence. Thanks!

      Reviewer #2 (Public review):

      (4) The in vivo experiments demonstrate the therapeutic potential of SFN for NPC. A clear dose response analysis would further strengthen the proposed therapeutic mechanism of SFN.

      Thank you for this constructive suggestion. We examined the effect of two doses of SFN30 and 50mg/kg on NPC mice. As shown in Fig.6, SFN (50mg/kg), but not 30mg/kg prevents a degree of Purkinje cell loss in the lobule IV/V of cerebellum, suggesting a dose-correlated preventive effect of SFN. In the future study, we will continue optimizing the dosage form and amount of SFN and do a dose-responsive analysis.

      (5) Additional data supporting the activation of TFEB by SFN for cholesterol clearance in vivo would strengthen the overall impact of the study.

      Thank the reviewer for this constructive comment. We have detected a significant decrease of pS211-TFEB protein in brain tissues of NPC mice upon SFN treatment compared to vehicle, suggesting that SFN activates TFEB in brain tissue for the first time. It is worth to further examine the lysosomal cholesterol levels in brain tissues to show the direct effect of SFN. However, in our hands and in the literatures Filipin seems not suitable for detecting lysosomal cholesterol accumulation in brain tissue. So far there isn’t a good method to directly measure lysosomal cholesterol in tissue.

      (6) In Figure 4, the authors demonstrate increased lysosomal exocytosis and biogenesis by SFN in NPC cells. Including a TFEB-KO/KD in this assay would provide additional validation of whether these effects are TFEB-dependent.

      Great suggestion! We investigated the role of TFEB in SFN-evoked the lysosomal exocytosis by using TFEB-KO cells. As shown in New Suppl. Fig. 7B, in TFEB KO cells, this increase of surface LAMP1 signal by SFN (15 μM, 12 h) treatment was significantly reduced, suggestive of SFN induced exocytosis in a TFEB-dependent manner.

      (7) For lysosomal pH measurement, the combination of pHrodo-dex and CF-dex enables ratiometric pH measurement. However, the pKa of pHrodo red-dex (according to Invitrogen) is ~6.8, while lysosomal pH is typically around 4.7. This discrepancy may account for the lack of observed lysosomal pH changes between WT and U18666A-treated cells. Notably, previous studies (PMID: 28742019) have reported an increase in lysosomal pH in U18666A-treated cells.

      We understand the reviewer’s point. But as stated in the methods and main text, we used pHrodo™ Green-Dextran (P35368, Invitrogen), rather than pHrodo Red-dextran. According to the product information from Invitrogen, pHrodo Green-dex conjugates are non-fluorescent at neural pH, but fluorescence bright green at acidic pH around 4, such as those in endosomes and lysosomes. Therefore, pHrodo Green-dex is suitable to monitor the acidity of lysosome (Hu, Li et al. 2022). We also used LysoTracker Red DND-99 (Thermo Scien fic, L7528) to measure lysosomal pH (Fig. 4G, H), which is consistent with results from pHrodo Green/CF measurement.

      The reviewer mentioned that previous studies have reported an increase in lysosomal pH in U18666Atreated cells. We understood this concern. But in our hands, from our data with two lysosomal pH sensors, we have not detected lysosomal pH change in U18666A-treated NPC1 cell models.

      (7) The authors are also encouraged to perform colocalization studies between CF-dex and a lysosomal marker, as some researchers may be concerned that NPC1 deficiency could reduce or block the trafficking of dextran along endocytosis.

      Thank you for raising this important point and suggestion was taken! We investigated the effect of NPC1 deficiency on CF555-dextran trafficking into lysosome by examining the localization of CF-dex and Lamp1. To clearly define whether CF555-dex is present in the lysosome, we first used apilimod to enlarge lysosomes and then examined the relative posi on of CF555-dex and lamp1. As shown in Author response image 1A,B, in HeLa cells treated with U18666A, CF555 signals (red) clearly present inside lysosome (LAMP1 labelled lysosomal membrane, green signal), suggesting that CF555dex endocytosis is not affected by NPC1 deficiency (U18666A treatment).

      Author response image 1.

      The effect of NPC1 deficiency on CF555 endocytosis. HeLa cells were transiently transfected with LAMP1-GFP plasmid for 24 h. Cells were then treated with apilimod (100 nM) for 2 h to enlarge the lysosomes, and followed by co- treatment of U18666A (2.5 μM, 24 h) and CF555 (12 h). (A)Each panel shows fluorescence images taken by confocal microscopes. (B) Each panel shows the fluorescence intensity of a line scan (white line) through the double labeled object indicated by the white arrow. Scale bar, 20 μm or 2 μm (for zoom-in images).

      (9) In vivo data supporting the activation of TFEB by SFN for cholesterol clearance would significantly enhance the impact of the study. For example, measuring whole-animal or brain cholesterol levels would provide stronger evidence of SFN's therapeutic potential.

      We really appreciate the reviewer’s comments. Please see response to point #5.

      Reviewer #3 (Public review):

      (10) The manuscript is extremely hard to read due to the writing; it needs careful editing for grammar and English.

      Sorry for the defects in the writing and grammar. We had thoroughly checked grammar and polished the English to improve the manuscript.

      (11) There are a number of important technical issues that need to be addressed.

      We will address the technical issues mentioned in the following ques ons.

      (12) The TFEB influence on filipin staining in Figure 1A is somewhat subtle. In the mCherry alone panels there is a transfected cell with no filipin staining and the mCherry-TFEBS211A cells still show some filipin staining.

      Thank you for raising this point. The reviewer is right that not all the mCherry alone cells with the same level of filipin signal and not all mCherry-TFEBS211 transfected cells show completely no filipin signal. The statistical results were from randomly selected cells from 3 independent experiments. To avoid the confusion, we have included more cells in the statistical analysis to cover all the conditions as shown in the new Fig. 1B. Hopefully this helps to clarify the confusion.

      (13) Figure 1C is impressive for the upregulation of filipin with U18666A treatment. However, SFN is used at 15 microM. This must be hitting multiple pathways. Vauzour et al (PMID: 20166144) use SFN at 10 nM to 1microM. Other manuscripts use it in the low microM range. The authors should repeat at least some key experiments using SFN at a range of concentrations from perhaps 100 nM to 5 microM. The use of 15 microM throughout is an overall concern.

      The reason that we use this concentration of SFN is based on our previous study (Li, Shao et al. 2021). We had shown that SFN (10–15 μM, 2–9 h) induces robust TFEB nuclear translocation in a dose- and time-dependent manner in HeLa cells as well as in other human cell lines without cytotoxicity (Li, Shao et al. 2021). Also, tissue concentrations of SFN can reach 3–30 μM upon broccoli consumption (Hu, Khor et al. 2006), so we used low micromolar concentrations of SFN (15 μM) in our study. Moreover, we further confirmed that SFN (15 μM) induces TFEB nuclear translocation in HeLa NPC1 cells (Fig. 1F, G Fig. 2B, G) and this concentration of SFN has no cytotoxicity (New Fig.S10).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The following comments are designed to improve and focus the authors' work.

      (14) Related to data in Figure 1. The mechanism through which TFEB can reduce Filipin in U18 conditions is unclear. Inhibi on of NPC1 results in hyperactivation of mTOR through cholesterol transport at ER-Lysosome contacts (see Zoncu group publications). If mTORC is hyperac ve in NPC disease models, TFEB would be expected to remain cytoplasmic and not enter the nucleus as the representative image in Figure 1A demonstrates.

      In our previous study (Li, Shao et al. 2021), we have shown that SFN induces TFEB nuclear translocation in a mTOR-independent manner (Li, Shao et al. 2021). Consistent with this result, in this study we confirmed that SFN-induced TFEB nuclear translocation is mTor-independent in NPC1 cells (Now Fig. S4A, B). Thus, SFN induced TFEB nuclear translocation in various NPC cells (Fig. 1F, G, Fig. 2B, G). Please also see the discussion about the mechanism of SFN in response to point #1.

      (15) Therefore, how does overexpression of TFEB, which remains in the cytoplasm, result in a decreased filipin signal? Similar ques ons relate to Figure 1C-H.

      Medina et. al (Medina, Fraldi et al. 2011) show that TFEB overexpression (not activation, so overexpressed TFEB is in the cytoplasm) increases the pool of lysosomes in the proximity of the plasma membrane and promotes their fusion with PM by raising intracellular Ca<sup>2+</sup> levels through lysosomal Ca<sup>2+</sup> channel MCOLN1, leading to increased lysosomal exocytosis. Hence, TFEB overexpression only (TFEB is not activated) could reduce filipin signal via increasing lysosomal exocytosis. And with TFEB agonist treatment such as TFEB could further boost this increase.

      (16) It would seem appropriate to measure the NPC1 and NPC2 proteins using western blot to ensure that SFN-dependent clearance of cholesterol is not due to enhanced expression of the native protein in U18-treated cells or enhanced folding of the protein in patient fibroblasts.

      Thank you for this constructive comment! Because NPC1 gene mutation takes about 95% of NPC cases and NPC2 mutation takes about 5% of NPC cases. And in this study we focused on NPC1 deficiency cases. Thus, we measured the effect of SFN on the expression of NPC1 in human NPC1-patient fibroblasts. Western blot analysis showed that SFN (15 μM, 24 h) treatment did not affect NPC1 expression in human NPC1-patient fibroblasts (new Fig. S5).

      (17) Related to data in Figures 1C-E. Controls are missing related to the effect SFN has on steady-state cholesterol levels. This may be insightful in providing information on the mode of action of this compound.

      Suggestion was taken! We have supplemented the control- SFN only in new Fig. 1C-E.

      (18) The mechanism that links SFN to TFEB-dependent translocation is suggested to involve calcineur independent dephosphorylation of TFEB. However, no data is provided. It would seem important to iden fy the mechanism(s) through which SFN positively regulates TFEB location. This would shift the manuscript and its model from correlations to causation. Experiments involving calcineurin inhibitors, or agonists of TRPML1 that have been reported as being a key source of Ca<sup>2+</sup> for calcineurin activation, may provide molecular insight.

      Please see the paragraph in response to point #1.

      (19) Related to Figure 4. Using a plasma membrane counterstain to quantify plasma membrane LAMP1 would increase the rigor of the analysis.

      Great idea! We examined the colocalization of DiO (a PM marker) staining and LAMP1 staining in HeLa NPC1 cells under SFN treatment. As shown in new Fig.4A, surface LAMP1 signal(red) colocalized with DiO (green), a PM marker.

      (20) Related to Figure 5. How do the authors explain the kinetic disparity between SFN treatment for 24 vs 72 hrs? IF TFEB is activated and promoting lysosomal biogenesis and increased lysosomal flux across the PM, why does cholesterol accumulation lag? Perhaps related to this point. Are other cholesterol metabolizing enzymes that may have altered activity in NPC sensitive to SFN? A similar comment applies to the Sterol regulatory element binding protein pathway, which has been shown to be activated in models of NPC disease.

      We understand the reviewer’s point. As shown in Fig. 5C, D, in NPC1<sup>-/-</sup> MEF cells, SFN treatment for 24 h showed relative weaker cholesterol clearance compared to the effects in human cells (Fig.1C, D, Fig.2.E, I). Thus, we explored a longer treatment of SFN for 72 h (fresh SFN in medium was added every 24 h), and 72h treatment of SFN exhibited substantial cholesterol reduction (Fig. 5C, D). This different effect could be attributed to the continuous action of SFN, which could prolong the exocytosis, leading to more effective cholesterol clearance. As shown in the DMSO-treated MEF cells, the cholesterol levels are similar in both 24 and 72 h, thus 24 h U18666A treatment has reached the upper limit of the accumulated cholesterol, longer treatment me would not change the cholesterol levels. Thus, cholesterol accumulation has no lag.

      We did not investigate whether SFN regulates other cholesterol metabolizing enzymes or sterol regulatory element binding proteins although we cannot rule out this possibility. In this study we mainly focus on the cholesterol clearance effect by SFN via TFEB-mediated pathways. From our data, TFEB KO could significantly diminish SFN-evoked cholesterol clearance. Hence, the effect of other cholesterol metabolizing enzymes or sterol regulatory element binding proteins maybe not as important as TFEB, thus out of scope of this study. In the future, we may explore the involvement of possible other pathways on SFN’s effects.

      (21) Related to Figure 7. The western blots for pS211-TFEB are poor. It's suggested that whole blots are shown to increase rigor.

      Thank you for the comments. We have represented the blots with more spare space to increase the rigor.

      (22) Data demonstrating the ability of SFN to improve Purkinje cell survival are exci ng and pair well with the weight analysis, however, to address the overall goal of determining if "SFN could be a good therapeutic candidate for neuropathology in NPC disease" survival analysis should be tested as well.

      Please see the paragraph in response to point #3.

      Minor

      (23) Throughout the manuscript many different Fonts and font sizes are used. This is very jarring to readers. It is suggested that a more uniform approach is taken to presenting these nice datasets.

      We are so sorry and apologize for these oversights. We have thoroughly checked all the manuscript to make sure that Fonts and sizes of font are synchronized.

      (24) Related to data presentation. In general, there is a lack of alignment and organization of the figures.

      So sorry about this. We have reorganized the figures to get them better aligned.

      (25) Line 149, SFN is missing.

      Corrected!

      Reviewer #3 (Recommendations for the authors):

      (26) In Figure 3 the authors should use multiple single siRNAs or perform a functional rescue to determine specificity.

      We understand the reviewer’s point. We did design several siRNAs and the efficiency of these siRNAs were validated. Finally, we decide use this siRNA whose knockdown efficiency is best in the study and the specificity of the siTFEB has been validated by Western blot as shown in Fig. 3A. Furthermore, we used TFEB knockout cells constructed by CRISPR/Cas9 to further examine the role of TFEB in SFN-induced cholesterol clearance (Fig. 3D). Consistently with the results in the siTFEB-transfected HeLa NPC1 cells (Fig. 3B, C), SFN failed to diminish cholesterol in HeLa TFEB KO cells. The result from TFEB KO cells is even convincing than siRNA experiment. We also performed a functional rescue of re-expressing TFEB in TFEB KO cells, in which SFN-induced cholesterol clearance was restored (Fig. 3E, F). Collectively, these data indicate that TFEB is required for lysosomal cholesterol reduction upon SFN treatment. Thus, we did not repeat this rescue experiment in the siTFEB-transfected HeLa NPC1 cells.

      (27) The label for 3D is missing.

      Corrected! Thanks!

      (28) Figure 4, although the authors use an an body against the luminal domain of LAMP1 there could s ll be some permeabilization. A marker of the plasma membrane would be helpful.

      Please see the response to point #19.

      (29) Figure 4, cholesterol in the media because of lysosome exocytosis. This is where the high concentration of SFN is of concern. Is there any cell death that could explain the result? The authors should test for cell death with the SFN treatment.

      Thank you for raising this important point! We have measured the cytotoxicity of SFN of the concentrations used in this study in various cell lines (New Fig.S10). Please also see the paragraph in response to point #13.

      (30) The blot in Figure 6A is unclear. It is very hard to see any change in pS211-TFEB levels, and, the blurry signal is the detection of phospho-TFEB is uncertain.

      Please see the summary paragraph in response to point #21.

      References:

      Hu, M. Q., P. Li, C. Wang, X. H. Feng, Q. Geng, W. Chen, M. Marthi, W. L. Zhang, C. L. Gao, W. Reid, J. Swanson, W. L. Du, R. Hume and H. X. Xu (2022). "Parkinson's disease-risk protein TMEM175 is a proton-activated proton channel in lysosomes." Cell 185(13): 2292-+.

      Hu, R., T. O. Khor, G. Shen, W. S. Jeong, V. Hebbar, C. Chen, C. Xu, B. Reddy, K. Chada and A. N. Kong (2006). "Cancer chemoprevention of intestinal polyposis in ApcMin/+ mice by sulforaphane, a natural product derived from cruciferous vegetable." Carcinogenesis 27(10): 2038-2046.

      Li, D., R. Shao, N. Wang, N. Zhou, K. Du, J. Shi, Y. Wang, Z. Zhao, X. Ye, X. Zhang and H. Xu (2021). "Sulforaphane Activates a lysosome-dependent transcriptional program to mitigate oxidative stress." Autophagy 17(4): 872-887.

      Medina, D. L., A. Fraldi, V. Bouche, F. Annunziata, G. Mansueto, C. Spampanato, C. Puri, A. Pignata, J. A. Martina, M. Sardiello, M. Palmieri, R. Polishchuk, R. Puertollano and A. Ballabio (2011). "Transcriptional activation of lysosomal exocytosis promotes cellular clearance." Dev Cell 21(3): 421-430.

    1. eLife Assessment

      This revision of important work is a versatile addition to the chemical protein modifications and bioconjugation toolbox in synthetic biology. The technology developed cleverly uses Connectase to irreversibly fuse proteins of interest together so they can be studied in their native context, with compelling well-controlled data showing the technique works for various protein partners. This work will help multiple fields to explore multi-function constructs in basic synthetic biology. This work will also be of interest to those studying fusion oncoproteins commonly expressed in various human pathologies.

    2. Reviewer #1 (Public review):

      Fuchs describes a novel method of enzymatic protein-protein conjugation using the enzyme Connectase. The author is able to make this process irreversible by screening different Connectase recognition sites to find an alternative sequence that is also accepted by the enzyme. They are then able to selectively render the byproduct of the reaction inactive, preventing the reverse reaction, and add the desired conjugate with the alternative recognition sequence to achieve near-complete conversion. I agree with the authors that this novel enzymatic protein fusion method has several applications in the field of bioconjugation, ranging from biophysical assay conduction to therapeutic development. Previously the author has published on the discovery of the Connectase enzymes and has shown its utility in tagging proteins and detecting them by in-gel fluorescence. They now extend their work to include the application of Connectase in creating protein-protein fusions, antibody-protein conjugates, and cyclic/polymerized proteins. As mentioned by the author, enzymatic protein conjugation methods can provide several benefits over other non-specific and click chemistry labeling methods. Connectase specifically can provide some benefits over the more widely used Sortase, depending on the nature of the species that is desired to be conjugated. Overall, this method provides a novel, reproducible way to enzymatically create protein-protein conjugates.

      The manuscript is well-written and will be of interest to those who are specifically working on chemical protein modifications and bioconjugation.

      Comments on revisions:

      The authors have improved the manuscript significantly by clarifying the questions raised adding new text, providing additional references and/or adding additional data. The thorough study and efficiency of the method for enzymatic protein-protein conjugation using the enzyme Connectase warrants publication of this manuscript in its current form.

    3. Reviewer #2 (Public review):

      Summary:

      Unlike previous traditional protein fusion protocols, the author claims their proposed new method is fast, simple, specific, reversible, and results in a complete 1:1 fusion. A multi-disciplinary approach from cloning and purification, biochemical analyses, and proteomic mass spec confirmation revealed fusion products were achieved.

      Strengths:

      The author provides convincing evidence that an alternative to traditional protein fusion synthesis is more efficient with 100% yields using connectase. The author optimized the protocol's efficiency with assays replacing a single amino acid and identification of a proline aminopeptidase, Bacilius coagulans (BcPAP), as a usable enzyme to use in the fusion reaction. Multiple examples including Ubiquitin, GST, and antibody fusion/conjugations reveal how this method can be applied to a diverse range of biological processes.

      Weaknesses:

      Though the ~100% ligation efficiency is an advancement, the long recognition linker may be the biggest drawback. For large native proteins that are challenging/cannot be synthesized and require multiple connectase ligation reactions to yield a complete continuous product, the multiple interruptions with long linkers will likely interfere with protein folding, resulting in non-native protein structures. This method will be a good alternative to traditional approaches as the author mentioned but limited to generating epitope/peptide/protein tagged proteins, and not for synthetic protein biology aimed at examining native/endogenous protein function in vitro.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Fuchs describes a novel method of enzymatic protein-protein conjugation using the enzyme Connectase. The author is able to make this process irreversible by screening different Connectase recognition sites to find an alternative sequence that is also accepted by the enzyme. They are then able to selectively render the byproduct of the reaction inactive, preventing the reverse reaction, and add the desired conjugate with the alternative recognition sequence to achieve near-complete conversion. I agree with the authors that this novel enzymatic protein fusion method has several applications in the field of bioconjugation, ranging from biophysical assay conduction to therapeutic development. Previously the author has published on the discovery of the Connectase enzymes and has shown its utility in tagging proteins and detecting them by in-gel fluorescence. They now extend their work to include the application of Connectase in creating protein-protein fusions, antibody-protein conjugates, and cyclic/polymerized proteins. As mentioned by the author, enzymatic protein conjugation methods can provide several benefits over other non-specific and click chemistry labeling methods. Connectase specifically can provide some benefits over the more widely used Sortase, depending on the nature of the species that is desired to be conjugated. However, due to a similar lengthy sequence between conjugation partners, the method described in this paper does not provide clear benefits over the existing SpyTag-SpyCatcher conjugation system.  Additionally, specific disadvantages of the method described are not thoroughly investigated, such as difficulty in purifying and separating the desired product from the multiple proteins used. Overall, this method provides a novel, reproducible way to enzymatically create protein-protein conjugates.

      The manuscript is well-written and will be of interest to those who are specifically working on chemical protein modifications and bioconjugation.

      I'd like to comment on two points.

      (1) The benefits over the SpyTag-SpyCatcher system. Here, the conjugation partners are fused via the 12.3 kDa SpyCatcher protein, which is considerably larger than the Connectase fusion sequence (19 aa). This is mentioned in the introduction (p. 1 ln 24-26). Furthermore, SpyTag-SpyCatcher fusions are truly irreversible, while Connectase/BcPAP fusions may be reversed (p. 8, ln 265-273). For example, target proteins (e.g., AGAFDADPLVVEI-Protein) may be covalently fused to functionalized magnetic beads (e.g., Bead-ELASKDPGAFDADPLVVEI) in order to perform a pulldown assay. After the assay, the target protein and any bound interactors could be released from the beads by the addition of a Connectase / peptide (AGAFDAPLVVEI) mixture.

      In a related technology, the SpyTag-SpyCatcher system was split into three components, SpyLigase, SpyTag and KTag  (Fierer et al., PNAS 2014). The resulting method introduces a sequence between the fusion partners (SpyTag (13aa) + KTag (10aa)), which is similar in length to the Connectase fusion sequence (p. 8, ln 297 - 298). Compared to the original method, however, this approach seems to require longer incubation times, while yielding less fusion product (Fierer et al., Figure 2).

      (2) Purification of the fusion product. The method is actually advantageous in this respect, as described in the discussion (p. 8, ln 258-264). Examples are now provided in Figure 6.

      Reviewer #2 (Public review):

      Summary:

      Unlike previous traditional protein fusion protocols, the author claims their proposed new method is fast, simple, specific, reversible, and results in a complete 1:1 fusion. A multi-disciplinary approach from cloning and purification, biochemical analyses, and proteomic mass spec confirmation revealed fusion products were achieved.

      Strengths:

      The author provides convincing evidence that an alternative to traditional protein fusion synthesis is more efficient with 100% yields using connectase. The author optimized the protocol's efficiency with assays replacing a single amino acid and identification of a proline aminopeptidase, Bacilius coagulans (BcPAP), as a usable enzyme to use in the fusion reaction. Multiple examples including Ubiquitin, GST, and antibody fusion/conjugations reveal how this method can be applied to a diverse range of biological processes.

      Weaknesses:

      Though the ~100% ligation efficiency is an advancement, the long recognition linker may be the biggest drawback. For large native proteins that are challenging/cannot be synthesized and require multiple connectase ligation reactions to yield a complete continuous product, the multiple interruptions with long linkers will likely interfere with protein folding, resulting in non-native protein structures. This method will be a good alternative to traditional approaches as the author mentioned but limited to generating epitope/peptide/protein tagged proteins, and not for synthetic protein biology aimed at examining native/endogenous protein function in vitro.

      The assessment is fair, and I have no further comments to add.

      Reviewer #1 (Recommendations for the authors):

      Major/Experimental Suggestions:

      (1) Throughout the paper only one reaction shown via gels had 100% conversion to desired product (Figure 3C). It is misleading to title a paper with absolutes such as "100% product yield", when the majority of reactions show >95% product yield, without any purification. Please change the title of the manuscript to something along the lines of "Novel Irreversible Enzymatic Protein Fusions with Near-Complete Product Yield".

      The conjugation reaction is thermodynamically favored. It is driven by the hydrolysis of a peptide bond (P|GADFDADPLVVEI), which typically releases 8 - 16 kJ/mol energy. This should result in a >99.99% complete reaction (DG° = -RT ln (Product/Educt)). In line with this, 99% - 100% of the less abundant educts (LysS, Figure 3A; MBP, Figure 3B; Ub-Strep, Figure 3C) are converted in the time courses (Figure 3D-F show different reaction conditions, which slow down conjugate formation). 100% conversion are also shown in Figure 5, Figure 6, and Figure S4. Likewise, 99.6% relative fusion product signal intensity in an LCMS analysis (Figure S2) after 4h reaction time (0.13% and 0.25% educts). In this experiment, the proline had been removed from 99.8% of the peptide byproducts (P|GADFDADPLVVEI). It is clear that this reaction is still ongoing and that >99.99% of the prolines will be removed from the peptides in time. These findings suggest that the conjugation reaction gradually slows down the less educt is available, but eventually reaches completion.

      For some experiments, lower product yields (e.g. 97% in Figure 3B) are reported in the paper. These were calculated with Yield = 100% x Product / (Educt1 + Educt 2 + Product). With this formula, 100% conjugation can only be achieved with exactly equimolar educt quantities, because both educt 1 and educt 2 need to be converted entirely. If one educt 1 is available in excess, for example because of protein concentration measurement inaccuracies or pipetting errors, some of it will be left without fusion partner. In case of Figure 3B, 3% more GST seemed to have been in the mixture. These are methodological inaccuracies.

      (2) Please provide at least one example of a purified desired product, and mention the difficulties involved as a disadvantage to this particular method. Separating BcPAP, Connectase, and the desired protein-protein conjugate may prove to be quite difficult, especially when Connectase cleaves off affinity tags.

      Examples are now provided in Figure 6. As described in the discussion (p. 8, ln 258-264), the simple product purification is one of the advantages of the method.

      (3) For the antibody conjugate, please provide an example of conjugating an edduct that would prove to be more useful in the context of antibodies. For example, as you mention in the introduction, conjugation of fluorophores, immobilization tags such as biotin, and small molecule linker/drugs are useful bioconjugates to antibodies.

      Antibody-biotinylation is now shown in Figure S6; Antibody-fluorophore conjugates are part of Figures S5 and S7.

      (4) Please assess the stability of these protein-protein conjugates under various conditions (temperature, pH, time) to ensure that the ligation via Connectase is stable over a broad array of conditions. In particular, a relevant antibody-conjugate stability assay should be done over the period of 1-week in both buffer and plasma to show applicability for potential therapeutics.

      The stability of an antibody-biotin conjugate in blood plasma over 7 days at different temperatures is now shown in Figure S7.

      Generally, Connectase introduces a regular peptide bond (Asp-Ala) with a high chemical and physical stability (e.g. 10 min incubation at 95°C in SDS-PAGE loading buffer; H2O-formic acid / acetonitrile gradients for LC-MS). The sequence may be susceptible to proteases, although this is not the case in HEK293 cells (antibody expression), E. coli, or blood plasma (Figure S7).

      (5) Please conduct functional assays with the antibody-protein/peptide conjugates to show that the antibody retains binding capabilities to the HER-2 antigen and the modification was site-selective, not interfering with the binding paratope or binding ability of the antibody in any way. This can be done through bio-layer interferometry, surface plasmon resonance, ELISA, etc.

      We plan the immobilization of the HER2 antibody on microplates and its use in an ELISA. However, this experiment requires significant testing and optimizations. It will be part of a future paper on the use of Connectase for protein immobilization.

      For now, the mass spectrometry data provide clear evidence of a single site-selective conjugation, as the C-terminal ELASKDPGAFDADPLVVEI-Strep sequence is replaced by ELASKDAGAFDADPLVVEI(-Ub). Given that the conjugation sites at the C-termini are far from the antigen binding sites, and have already been used in a number of other approaches (e.g., SpyTag, SnapTag, Sortase), it appears unlikely that these conjugations interfere with antigen binding.

      (6) Please include gels of all proteins used in ligation reactions after purification steps in the SI to show that each species was pure.

      The pure proteins are now shown in Figure S9.

      (7) Please provide the figures (not just tables) of LC/MS deconvoluted mass spectra graphs for all conjugates, either in the main text or the SI.

      Please specify which spectra you are missing. I believe all relevant spectra are shown in Figures 4, 5, and S3. The primary data can be found in Dataset S2.

      (8) Please provide more information in the methods section on exactly how the densitometry quantification of gel bands was performed with ImageJ.

      Details on the quantification with Image Studio Lite 5.2 were added in the method section (p. 17, ln 461-463).

      Minor Suggestions:

      (1) Page 1, line 19: can include one sentence on what assays these particular bioconjugations are usefule for (e.g. internalization cell studies, binding assays, etc.)

      I prefer not to provide additional details here to keep the text concise and focused.

      (2) Page 1, line 22: "three to ten equivalents" instead of 3x-10x.

      Done.

      (3) Page 1, line 23: While NHS labeling is widely considered non-specific, maleimide conjugation to free cysteines is generally considered specific for engineered free cysteine residues, since native proteins often do not have free cysteine residues available for conjugation. If you are referring to the potential of maleimides to label lysines as well, that should be specifically stated.

      I modified the sentence, now stating that these methods are "can be" unspecific.

      As pointed out, it is possible to achieve specificity by eliminating all other free cysteines and/or engineering a cysteine in an appropriate position. In many other cases, however (e.g., natural antibodies), several cysteines are available, or the sample contains other proteins/peptides. I did not want to go into more detail here and refer to the cited review.

      (4) Page 1, line 31: "and an oligoglycine G(1-5)-B"

      Done.

      (5) Page 1, line 34: It is not clear where in the source these specific Km values are coming from, considering these are variable based on specific conditions/substrates and tend to be reaction-specific.

      I cited another review, which lists the same values, along with a few other measurements (Jacobitz et al., Adv Protein Chem Struct Biol 2017, Table 2). It is clear that each of these measurements differs somewhat, but they are generally comparable (K<sub>M</sub>(LPETG) = 5500 - 8760 µM; K<sub>M</sub>(GGGGG) = 140 - 196 µM). I chose the cited study (Frankel et al., Biochemistry 2005), because it also investigated hydrolysis rates. In this study, the measurements are derived from the plots in Figure 2.

      (6) Page 1, line 47: the comparison to western blots feels a little like apples to oranges, even though this comparison was made in previous literature. Engineering an expressed protein to have this tag and then using the tag to detect and quantify it, feels more akin to a tagging/pull down assay than a western blot in which unmodified proteins are easily detected.

      It is akin to a frequently used type of western blots with tag-specific antiboies, e.g. Anti-His<sub>6</sub>, -Streptavidin, -His<sub>6</sub>, -HA ,-cMyc, -Flag. I modified the sentence to clarify this.

      (7) Page 2, line 51: "Connectase cleaves between the first D and P amino acids in the recognition sequence, resulting in an N-terminal A-ELASKD-Connectase intermediate and a C-terminal PGAFDADPLVVEI peptide."

      I prefer the current sentence, because we assume that a bond between the aspartate and Connectase is formed before PGAFDADPLVVEI is cleaved off.

      (8) Page 3, line 94: "Exact determination is not possible due to reversibility of the reaction", the way it is stated now sounds like it is a flaw in the methods. Also, update Figure 2 to read "Estimated relative ligation rate".

      Done.

      (9) Page 3, lines 101-107: This is worded in a confusing way. It can either be X<sub>1</sub> or X<sub>2</sub> that is inactivated depending on if the altered amino acid is on the original protein sequence or on the desired edduct to conjugate. You first give examples of how to render other amino acids inactive, but then ultimately state that proline made inactive, so separate the two distinct possibilities a bit more clearly.

      The reaction requires the inactivation of X<sub>1</sub>, without affecting X<sub>2</sub> (ln 100 - 102). This is true, no matter whether it is X<sub>1</sub> = A, C, S, or P that is inactivated. I added a sentence to clarify this (ln 102 – 103).

      (10) Page 4, line 118: Give a one-sentence justification for why these proteins were chosen to work with (easy to express, stable, etc).

      Done.

      (11) Page 5, line 167: "payload molecules".

      Done.

      (12) Page 5, lines 170-173: Word this more clearly- "full conversion with many of these methods is difficult on antibodies due to each heavy and light chain being modified separately, resulting in only a total yield of 66% DAR4 even when 90% of each chain is conjugated."

      I rephrased the section.

      (13) Page 8, line 290: Discuss other disadvantages of this method including difficulties purifying and in incorporating such a long sequence into proteins of interest.

      Product purification is shown in the new Figure 6. As stated above, I consider the simple purification process an advantage of the method.  The genetic incorporation of the sequence into proteins is a routine process and should not make any difficulties. The disadvantages of long linker sequences between fusion partners are now discussed (p.8 – 9, ln 300-302).

      (14) Page 10, line 341: 'The experiment is described and discussed in detail in a previously published paper.31"

      Done.

      Reviewer #2 (Recommendations for the authors):

      Minor Points:

      (1) It's unclear how the author derived 100% ligation rate with X = Proline in Figure 2 when there is still residual unligated UB-Strep at 96h. Please provide an expanded explanation for those not familiar with the protocol. Is the assumption made that there will be no UB-Strep if the assay was carried out beyond 96h?

      I clarified the figure legend. The assay shows the formation of an equilibrium between educts and products. Therefore, only ~50% Ub-Strep is used with X = Proline (see p. 2, ln 79 - 81). The "relative ligation rate" refers to the relative speed with which this equilibrium is established. The highest rate is seen with X = Proline, and it is set to 100%. The other rates are given relative to the product formation with X = Proline.

      (2) Though the qualitative depiction of the data in Figure 3 is appreciated, an accompanying graphical representation of the data in the same figure will greatly enhance reception and better comprehension of several of the author's conclusions.

      Graphs are now shown in Figure S1.

      (3) Figure 3 panel E is misaligned. Please align it with panel B above it.

      Done, thank you.

      (4) The author refers to 'The resulting circular assemblies (37% UB2...)' in the text but identifies it as UB-C2 in Figure 5B. Is this a mistake or does UB2 refer to another assembly not mentioned in the Figures? Please check for inconsistencies.

      All circular assemblies are now labeled Ub-C <sub>1-6</sub>.

      (5) Finishing with a graphical schematic that depicts the entire protocol in a simple image would be much appreciated and well-received by readers. Including the scheme with A and B proteins, the recognition linkers, the addition of connectase and BcPAP, etc. to the final resulting protein with connected linker.

      A graphical summary of the reaction is now included in Figure 6.

    1. eLife Assessment

      This manuscript addresses a mechanism by which dopamine (DA) regulates synaptic plasticity. The authors build upon their previous finding that DA applied after a timing pattern that ordinarily induces long-term depression (LTD) now induces long-term potentiation (LTP). The new findings that this "DA-dependent LTP" involves de novo protein synthesis, a cyclicAMP signalling pathway, and calcium-permeable AMPA receptors (CP-AMPARs) are of valuable significance. The conclusions are convincing and largely supported by the evidence provided.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Fuchsberger et al. demonstrate a set of experiments which ultimately identifies the de novo synthesis of GluA1-, but not GluA2-containing Ca2+ permeable AMPA receptors as a key driver of dopamine-dependent LTP (DA-LTP) during conventional post-before-pre spike-timing dependent (t-LTD) induction. The authors further identify adenylate cyclase 1/8, cAMP, and PKA as the crucial mitigators of these actions. While some comments have been identified below, the experiments presented are thorough and address the aims of the manuscript, figures are presented clearly (with minor comments), and experimental samples sizes and statistical analyses are suitable. Suitable controls have been utilized to confirm the role of Ca2+ permeable AMPAR. This work provides a valuable step forward built on convincing data towards understanding the underlying mechanisms of spike-timing dependent plasticity and dopamine.

      Strengths:

      Appropriate controls were used.

      The flow of data presented is logical and easy to follow.

      The quality of the data is solid.

      Weaknesses:

      Our concerns raised within the first round of review have been appropriately addressed by the authors.

    3. Reviewer #2 (Public review):

      Summary:

      The aim was to identify the mechanisms that underlie a form of long-term potentiation (LTP) that requires activation of dopamine (DA).

      Strengths:

      The authors have provided multiple lines of evidence that supports their conclusions; namely that this pathway involves activation of a cAMP / PKA pathway that leads to the insertion of calcium permeable AMPA receptors.

      Weaknesses:

      Some of the experiments could have been conducted in a more convincing manner.

    4. Reviewer #3 (Public review):

      The manuscript of Fuchsberger et al. investigates the cellular mechanisms underlying dopamine-dependent long-term potentiation (DA-LTP) in mouse hippocampal CA1 neurons. The authors conducted a series of experiments to measure the effect of dopamine on the protein synthesis rate in hippocampal neurons and its role in enabling DA-LTP. The key results indicate that protein synthesis is increased in response to dopamine and neuronal activity in the pyramidal neurons of the CA1 hippocampal area, mediated via the activation of adenylate cyclases subtypes 1 and 8 (AC1/8) and the cAMP-dependent protein kinase (PKA) pathway. Additionally, the authors show that postsynaptic DA-induced increases in protein synthesis are required to express DA-LTP, while not required for conventional t-LTP.

      The increased expression of the newly synthesized GluA1 receptor subunit in response to DA supports the formation of homomeric calcium-permeable AMPA receptors (CP-AMPARs). This evidence aligns well with data showing that DA-LTP expression requires the GluA1 AMPA subunit and CP-AMPARs, as DA-LTP is absent in the hippocampus of a GluA1 genetic knock-out mouse model.

      Comments on revisions:

      The authors addressed adequately all my comments.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Fuchsberger et al. demonstrate a set of experiments that ultimately identifies the de novo synthesis of GluA1-, but not GluA2-containing Ca2+ permeable AMPA receptors as a key driver of dopamine-dependent LTP (DA-LTP) during conventional post-before-pre spike-timing dependent (t-LTD) induction. The authors further identify adenylate cyclase 1/8, cAMP, and PKA as the crucial mitigators of these actions. While some comments have been identified below, the experiments presented are thorough and address the aims of the manuscript, figures are presented clearly (with minor comments), and experimental sample sizes and statistical analyses are suitable. Suitable controls have been utilized to confirm the role of Ca2+ permeable AMPAR. This work provides a valuable step forward built on convincing data toward understanding the underlying mechanisms of spike-timing-dependent plasticity and dopamine.

      Strengths:

      Appropriate controls were used.

      The flow of data presented is logical and easy to follow.

      The quality of the data, except for a few minor issues, is solid.

      Weaknesses:

      The drug treatment duration of anisomycin is longer than the standard 30-45 minute duration (as is the 500uM vs 40uM concentration) typically used in the field. Given the toxicity of these kinds of drugs long term it's unclear why the authors used such a long and intense drug treatment.

      In an initial set of control experiments (Figure S 1C-D) we wanted to ensure that protein synthesis was definitely blocked and therefore used a relatively high concentration of anisomycin and a relatively long pre-incubation period. We agree with the Reviewer that we cannot exclude the possibility that this treatment could compromise cell health in addition to the protein synthesis block. Therefore, we carried out an additional experiment with an alternative protein synthesis inhibitor cycloheximide at a lower standard concentration (10 µM) which confirmed a significant reduction in the puromycin signal (Figure S 1A-B). Together these results support the conclusion that puromycin signal is specific to protein synthesis in our labelling assay.

      Furthermore, in the electrophysiology experiments, we used 500 μM anisomycin in the patch pipette solution. Under these conditions, we recorded a stable EPSP baseline for 60 minutes, indicating that the treatment did not cause toxic effects to the cell (Figure S1F). This high concentration would ensure an effective block of local translation at dendritic sites. Nevertheless, we also carried out this experiment with cycloheximide at a lower standard concentration (10 µM) and observed a similar result with both protein synthesis inhibitors (Figure 1F).

      With some of the normalizations (such as those in S1) there are dramatic differences in the baseline "untreated" puromycin intensities - raising some questions about the overall health of slices used in the experiments.

      We agree with the Reviewer that there is a large variability in the normalised puromycin signal which might be due to variability in the health of slices. However, we assume that the same variability would be present in the treated slices, which showed, despite the variability, a significant inhibition of protein synthesis. To avoid any bias by excluding slices with low puromycin signal in the control condition, we present the full dataset.

      The large set of electrophysiology experiments carried out in our study (all recorded cells were evaluated for healthy resting membrane potential, action potential firing, and synaptic responses) confirmed that, generally, the vast majority of our slices were indeed healthy. 

      Reviewer #2 (Public Review):

      Summary:

      The aim was to identify the mechanisms that underlie a form of long-term potentiation (LTP) that requires the activation of dopamine (DA).

      Strengths:

      The authors have provided multiple lines of evidence that support their conclusions; namely that this pathway involves the activation of a cAMP / PKA pathway that leads to the insertion of calcium-permeable AMPA receptors.

      Weaknesses:

      Some of the experiments could have been conducted in a more convincing manner.

      We carried out additional control experiments and analyses to address the specific points that were raised.

      Reviewer #3 (Public Review):

      The manuscript of Fuchsberger et al. investigates the cellular mechanisms underlying dopamine-dependent long-term potentiation (DA-LTP) in mouse hippocampal CA1 neurons. The authors conducted a series of experiments to measure the effect of dopamine on the protein synthesis rate in hippocampal neurons and its role in enabling DA-LTP. The key results indicate that protein synthesis is increased in response to dopamine and neuronal activity in the pyramidal neurons of the CA1 hippocampal area, mediated via the activation of adenylate cyclases subtypes 1 and 8 (AC1/8) and the cAMP-dependent protein kinase (PKA) pathway. Additionally, the authors show that postsynaptic DA-induced increases in protein synthesis are required to express DA-LTP, while not required for conventional t-LTP.

      The increased expression of the newly synthesized GluA1 receptor subunit in response to DA supports the formation of homomeric calcium-permeable AMPA receptors (CP-AMPARs). This evidence aligns well with data showing that DA-LTP expression requires the GluA1 AMPA subunit and CP-AMPARs, as DA-LTP is absent in the hippocampus of a GluA1 genetic knock-out mouse model. Overall, the study is solid, and the evidence provided is compelling. The authors clearly and concisely explain the research objectives, methodologies, and findings. The study is scientifically robust, and the writing is engaging. The authors' conclusions and interpretation of the results are insightful and align well with the literature. The discussion effectively places the findings in a meaningful context, highlighting a possible mechanism for dopamine's role in the modulation of protein-synthesis-dependent hippocampal synaptic plasticity and its implications for the field. Although the study expands on previous works from the same laboratory, the findings are novel and provide valuable insights into the dynamics governing hippocampal synaptic plasticity.

      The claim that GluA1 homomeric CP-AMPA receptors mediate the expression of DA-LTP is fascinating, and although the electrophysiology data on GluA1 knock-out mice are convincing, more evidence is needed to support this hypothesis. Western blotting provides useful information on the expression level of GluA1, which is not necessarily associated with cell surface expression of GluA1 and therefore CP-AMPARs. Validating this hypothesis by localizing the protein using immunofluorescence and confocal microscopy detection could strengthen the claim. The authors should briefly discuss the limitations of the study.

      Although it would be possible to quantify the surface expression of GluA1 using immunofluorescence, it would not be possible to distinguish  between GluA1 homomers and GluA1-containing heteromers. It would therefore not be informative as to whether these are indeed CP-AMPARs. This is an interesting problem, which we have briefly discussed in the Discussion section.

      Additional comments to address:

      (1) In Figure 2A, the representative image with PMY alone shows a very weak PMY signal. Consequently, the image with TTX alone seems to potentiate the PMY signal, suggesting a counterintuitive increase in protein synthesis.

      We agree with the Reviewer that the original image was not representative and have replaced it with a more representative image.

      (2) In Figures 3A-B, the Western blotting representative images have poor quality, especially regarding GluA1 and α-actin in Figure 3A. The quantification graph (Figure 3B) raises some concerns about a potential outlier in both the DA alone and DA+CHX groups. The authors should consider running a statistical test to detect outlier data. Full blot images, including ladder lines, should be added to the supplementary data.

      We have replaced the western blot image in Figure 3A and have also presented full blot images including ladder lines in supplementary Figure S3.

      Using the ROUT method (Q=1%) we identified one outlier in the DA+CHX group of the western blot quantification. The quantification for this blot was then removed from the dataset and the experiment was repeated to ensure a sufficient number of repeats.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) How the authors perform these experiments with puromycin, these are puromycilation experiments - not SuNSET. The SuNSET protocol (surface sensing of translation) specifically refers to the detection of newly synthesized proteins externally at the plasma membrane. I'd advise to update the terminology used.

      We thank the Reviewer for pointing this out. We have updated this to ‘puromycin-based labelling assay’.

      (2) The legend presented in Figure 2F suggests WT is green and ACKO is orange, however, in Figure 2G the WT LTP trace is orange, consider changing this to green for consistency.

      We thank the Reviewer for this suggestion and agree that a matching colour scheme makes the Figure clearer. This has been updated.

      (3) In the results section, it is recommended to include units for the values presented at the first instance and only again when the units change thereafter.

      The units of the electrophysiology data were [%], this is included in the Results section. Results of western blots and IHC images were presented as [a.u.]. While we included this in the Figures, we have not specifically added this to the text of individual results. 

      (4) Two hours pre-treatment with anisomycin vs 30 minutes pretreatment with cycloheximide seems hard to directly compare - as the pharmokinetics of translational inhibition should be similar for both drugs. What was the rationale for the extremely long anisomycin pretreatment? What controls were taken to assess slice health either prior to or following fixation? This is relevant to the below point (5).

      In an initial set of control experiments (Figure S 1C-D) we wanted to ensure that protein synthesis was definitely blocked and therefore used a relatively high concentration of anisomycin and a relatively long pre-incubation period. We agree with the Reviewer that we cannot exclude the possibility that this treatment could compromise cell health in addition to the protein synthesis block. Therefore, we carried out an additional experiment with an alternative protein synthesis inhibitor cycloheximide at a lower standard concentration (10 µM) which confirmed a significant reduction in the puromycin signal (Figure S1A-B). Together these results support the conclusion that puromycin signal is specific to protein synthesis in our labelling assay.

      IHC slices were visually assessed for health. The large set of electrophysiology experiments carried out in our study (all recorded cells were evaluated for healthy resting membrane potential, action potential firing, and synaptic responses) also confirmed that, generally, the vast majority of our slices were indeed healthy. 

      (5) In Supplementary Figure 1, there is a dramatic difference in the a.u. intensities across CHX (B) and AM (D), please explain the reason for this. It is understood these are normalised values to nuclear staining, please clarify if this is a nuclear area.

      We agree with the Reviewer that there is a large variability in normalised puromycin signal which may be due to variability in the health of the slices. However, we assume that the same variability would be present in the treated slices, which showed, despite the variability, a significant effect of protein synthesis inhibition. To prevent introducing bias by excluding slices with low puromycin signal in the control condition, we present the full dataset.

      The CA1 region of the hippocampus contains of a dense layer of neuronal somata (pyramidal cell layer). We normalized against the nuclear area as it provides a reliable estimate of the number of neurons present in the image. This approach minimizes bias by accounting for variation in the number of neurons within the visual field, ensuring consistency and accuracy in our analysis.

      (6) Please clarify the decision to average both the last 5 minutes of baseline recordings and the last 5 minutes of the recording for the normalisation of EPSP slopes.

      The baseline usually stabilises after a few minutes of recording, thus the last 5 minutes were used for baseline measurement, which are the most relevant datapoints to compare synaptic weight change to. After induction of STDP, potentiation or depression of synaptic weights develops gradually. Based on previous results, evaluating the EPSP slopes at 30-40 minutes after the induction protocol gives a reliable estimate of the amount of plasticity.

      Reviewer #2 (Recommendations For The Authors):

      The concentration of anisomycin used (0.5 mM) is very high.

      As described above, in an initial set of control experiments (Figure S 1C-D) we wanted to ensure that protein synthesis was definitely blocked and therefore used a relatively high concentration of anisomycin and a relatively long pre-incubation period. We agree with the Reviewer that this is higher than the standard concentration used for this drug and we cannot exclude the possibility that this treatment could compromise cell health in addition to the protein synthesis block. Therefore, we carried out an additional experiment with an alternative protein synthesis inhibitor cycloheximide at a lower standard concentration (10 µM) which confirmed a significant reduction in the puromycin signal (Figure S1A-B). Together these results support the conclusion that puromycin signal is specific to protein synthesis in our labelling assay.

      Furthermore, in the electrophysiology experiments, we also used 500 µM anisomycin in the patch pipette solution. Under these conditions, we recorded a stable EPSP baseline for 60 minutes, indicating that the treatment did not cause toxic effects to the cell (Figure S1F). This high concentration would ensure an effective block of local translation at dendritic sites. Nevertheless, we also carried out this experiment with cycloheximide at a lower standard concentration (10 µM) and observed a similar result with both protein synthesis inhibitors (Figure 1F).

      The authors conclude that the effect of DA is mediated via D1/5 receptors, which based on previous work seems likely. But they cannot conclude this from their current study which used a combination of a D1/D5 and a D2 antagonist.

      We thank the Reviewer for pointing this out. We agree and have updated this in the Discussion section to ‘dopamine receptors’, without specifying subtypes.

      There is no mention that I can see that the KO experiments were conducted in a blinded manner (which I believe should be standard practice). Did they verify the KOs using Westerns?

      Only a subset of the experiments was conducted in a blinded manner. However, the results were collected by two independent experimenters, who both observed significant effects in KO mice compared to WTs (TF and ZB).

      We received the DKO mice from a former collaborator, who verified expression levels of the KO mice (Wang et al., 2003). We verified DKO upon arrival in our facility using genotyping.

      Maybe I'm misunderstanding but it appears to me that in Figure 1F there is LTP prior to the addition of DA. (The first point after pairing is already elevated). I think the control of pairing without DA should be added.

      We thank the Reviewer for pointing this out. Based on previous results (Brzosko et al., 2015) we would expect potentiation to develop over time once DA is added after pairing, however, it indeed appears in the Figure here as if there was an immediate increase in synaptic weights after pairing. It should be noted, however, that when comparing the first 5 minutes after pairing to the baseline, this increase was not significant (t(9)=1.810, p =0.1037). Nevertheless, we rechecked our data and noticed that this initial potentiation was biased by one cell with an increasing baseline, which had both the test and control pathway strongly elevated. We had mistakenly included this cell in the dataset, despite the unstable conditions (as stated in the Methods section, the unpaired control pathway served as a stability control). We apologise for the error and this has now been corrected (Figure 1F). In addition, we present the control pathway in Figure S1G and I.

      We have also now included the control for post-before-pre pairing (Δt = -20 ms) without dopamine in a supplemental figure (Figure S1E and F).

      The Westerns (Figure 3A) are fairly messy. Also, it is better to quantify with total protein. Surface biotinylation of GluA1 and GluA2 would be more informative.

      We carried out more repeats of Western blots and have exchanged blots in Figure 3A.

      We observed that DA increases protein synthesis, we therefore cannot exclude the possibility that application of DA could also affect total protein levels. Thus quantifying with total protein may not be the best choice here. Quantification with actin is standard practice.

      While we agree with the Reviewer that surface biotinylation of GluA1 and GluA2 could in principle be more informative, we do not think it would work well in our experimental setup using acute slice preparation, as it strictly requires intact cells. Slicing generates damaged cells, which would take up the surface biotin reagents. This would cause unspecific biotinylation of the damaged cells, leading to a strong background signal in the assay.

      In Figure 4 panels D and E the baselines are increasing substantially prior to induction. I appreciate that long stable baselines with timing-dependent plasticity may not be possible but it's hard to conclude what happened tens of minutes later when the baseline only appears stable for a minute or two. Panels A and B show that relatively stable baselines are achievable.

      We agree with the Reviewer that the baselines are increasing, however, when looking at the baseline for 5 minutes prior to induction (5 last datapoints of the baseline), which is what we used for quantification, the baselines appeared stable. Unfortunately, longer baselines are not suitable for timing-dependent plasticity. In addition, all experiments were carried out with a control pathway which showed stable conditions throughout the recording.

      In general, the discussion could be better integrated with the current literature. Their experiments are in line with a substantial body of literature that has identified two forms of LTP, based on these signalling cascades, using more conventional induction patterns.

      We thank the Reviewer for this suggestion and have added more discussion of the two forms of LTP in the Discussion section.

      It would be helpful to include the drug concentrations when first described in the results.

      Drug concentration have now been included in the Results section.

      It is now more common to include absolute t values (not just <0.05 etc).

      While we indicate significance in Figures using asterisks when p values are below the indicated significance levels, we report absolute values of p and t values in the Results section.

      Similarly full blots should be added to an appendix / made available.

      We have now included full blot images in Supplementary Figure S3.

      A 30% tolerance for series resistance seems generous to me. (10-20% would be more typical).

      We thank the Reviewer for their suggestion, and will keep this in mind for future studies. However, the error introduced by the higher tolerance level is likely to be small and would not influence any of the qualitative conclusions of the manuscript.

      Whereas series resistance is of course extremely important in voltage-clamp experiments, changes in series resistance would be less of a concern in current-clamp recordings of synaptic events. We use the amplifier as a voltage follower, and there are two problems with changes in the electrode, or access, resistance. First, there is the voltage drop across the electrode resistance. Clearly this error is zero if no current is injected and is also negligible for the currents we use in our experiments to maintain the membrane voltage at -70 mV. For example, the voltage drop would be 0.2 mV for 20 pA current through a typical 10 MOhm electrode resistance, and a change in resistance of 30% would give less than 0.1 mV voltage change even if the resistance were not compensated. The second problem is distortion of the EPSP shape due to the low-pass filtering properties of the electrode set up by the pipette capacitance and series resistance (RC). This can be a significant problem for fast events, such as action potentials, but less of a problem for the relatively slow EPSPs recorded in pyramidal cells. Nevertheless, we take on board the advice provided by the Reviewer and will use the conventional tolerance of 20% in future experiments.

      Reviewer #3 (Recommendations For The Authors):

      In the references, the entry for Burnashev N et al. has a different font size. Please ensure that all references are formatted consistently.

      We thank the Reviewer for spotting this and have updated the font size of this reference.

    1. eLife Assessment

      Birdsong production depends on precise neural sequences in a vocal motor nucleus HVC. In this useful biophysical model, Daou and colleagues identify specific biophysical parameters that result in sparse neural sequences observed in vivo. While the model is presently incomplete because it is overfit to produce sequences and therefore not robust to real biological variation, the model has the potential to address some outstanding issues in HVC function.

    2. Reviewer #1 (Public review):

      Summary:

      The paper presents a model for sequence generation in the zebra finch HVC, which adheres to cellular properties measured experimentally. However, the model is fine-tuned and exhibits limited robustness to noise inherent in the inhibitory interneurons within the HVC, as well as to fluctuations in connectivity between neurons. Although the proposed microcircuits are introduced as units for sub-syllabic segments (SSS), the backbone of the network remains a feedforward chain of HVC_RA neurons, similar to previous models.

      Strengths:

      The model incorporates all three of the major types of HVC neurons. The ion channels used and their kinetics are based on experimental measurements. The connection patterns of the neurons are also constrained by the experiments.

      Weaknesses:

      The model is described as consisting of micro-circuits corresponding to SSS. This presentation gives the impression that the model's structure is distinct from previous models, which connected HVC_RA neurons in feedforward chain networks (Jin et al 2007, Li & Greenside, 2006; Long et al 2010; Egger et al 2020). However, the authors implement single HVC_RA neurons into chain networks within each micro-circuit and then connect the end of the chain to the start of the chain in the subsequent micro-circuit. Thus, the HVC_RA neuron in their model forms a single-neuron chain. This structure is essentially a simplified version of earlier models.

      In the model of the paper, the chain network drives the HVC_I and HVC_X neurons. The role of the micro-circuits is more significant in organizing the connections: specifically, from HVC_RA neurons to HVC_I neurons, and from HVC_I neurons to both HVC_X and HVC_RA neurons.

      How useful is this concept of micro-circuits? HVC neurons fire continuously even during the silent gaps. There are no SSS during these silent gaps.

      A significant issue of the current model is that the HVC_RA to HVC_RA connections require fine-tuning, with the network functioning only within a narrow range of g_AMPA (Figure 2B). Similarly, the connections from HVC_I neurons to HVC_RA neurons also require fine-tuning. This sensitivity arises because the somatic properties of HVC_RA neurons are insufficient to produce the stereotypical bursts of spikes observed in recordings from singing birds, as demonstrated in previous studies (Jin et al 2007; Long et al 2010). In these previous works, to address this limitation, a dendritic spike mechanism was introduced to generate an intrinsic bursting capability, which is absent in the somatic compartment of HVC_RA neurons. This dendritic mechanism significantly enhances the robustness of the chain network, eliminating the need to fine-tune any synaptic conductances, including those from HVC_I neurons (Long et al 2010).

      Why is it important that the model should NOT be sensitive to the connection strengths?

      First, the firing of HVC_I neurons is highly noisy and unreliable. HVC_I neurons fire spontaneous, random spikes under baseline conditions. During singing, their spike timing is imprecise and can vary significantly from trial to trial, with spikes appearing or disappearing across different trials. As a result, their inputs to HVC_RA neurons are inherently noisy. If the model relies on precisely tuned inputs from HVC_I neurons, the natural fluctuations in HVC_I firing would render the model non-functional. The authors should incorporate noisy HVC_I neurons into their model to evaluate whether this noise would render the model non-functional.

      Second, Kosche et al. (2015) demonstrated that reducing inhibition by suppressing HVC_I neuron activity makes HVC_RA firing less sparse but does not compromise the temporal precision of the bursts. In this experiment, the local application of gabazine should have severely disrupted HVC_I activity. However, it did not affect the timing precision of HVC_RA neuron firing, emphasizing the robustness of the HVC timing circuit. This robustness is inconsistent with the predictions of the current model, which depends on finely tuned inputs and should, therefore, be vulnerable to such disruptions.

      Third, the reliance on fine-tuning of HVC_RA connections becomes problematic if the model is scaled up to include groups of HVC_RA neurons forming a chain network, rather than the single HVC_RA neurons used in the current work. With groups of HVC_RA neurons, the summation of presynaptic inputs to each HVC_RA neuron would need to be precisely maintained for the model to function. However, experimental evidence shows that the HVC circuit remains functional despite perturbations, such as a few degrees of cooling, micro-lesions, or turnover of HVC_RA neurons. Such robustness cannot be accounted for by a model that depends on finely tuned connections, as seen in the current implementation.

      The authors examined how altering the channel properties of neurons affects the activity in their model. While this approach is valid, many of the observed effects may stem from the delicate balancing required in their model for proper function.

      In the current model, HVC_X neurons burst as a result of rebound activity driven by the I_H current. Rebound bursts mediated by the I_H current typically require a highly hyperpolarized membrane potential. However, this mechanism would fail if the reversal potential of inhibition is higher than the required level of hyperpolarization. Furthermore, Mooney (2000) demonstrated that depolarizing the membrane potential of HVC_X neurons did not prevent bursts of these neurons during forward playback of the bird's own song, suggesting that these bursts (at least under anesthesia, which may be a different state altogether) are not necessarily caused by rebound activity. This discrepancy should be addressed or considered in the model.

      Some figures contain direct copies of figures from published papers. It is perhaps a better practice to replace them with schematics if possible.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry? And how is the sparse precise bursting in HVC related to a songbird's vocalizations?

      The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three major classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers.

      The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K+ currents or Ca-dependent K+ currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. In the introduction, the authors do not discuss adequately which questions recent experimental and theoretical work have failed to explain adequately, concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, which includes some experimental details but ignores many others, match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied.

      However, these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      Also missing is a discussion, or at least an acknowledgment, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some models can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of the nucleus HVC acts as a premotor circuit that drives the nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVC-RA neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what the fitting algorithm was (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what the error function used for fitting (sum of least squares?) was, and what data were used for the fitting.

      The authors should also state clearly the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depended on the choice of the initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

    4. Author response:

      eLife Assessment

      Birdsong production depends on precise neural sequences in a vocal motor nucleus HVC. In this useful biophysical model, Daou and colleagues identify specific biophysical parameters that result in sparse neural sequences observed in vivo. While the model is presently incomplete because it is overfit to produce sequences and therefore not robust to real biological variation, the model has the potential to address some outstanding issues in HVC function.

      We are grateful for the extensive supportive comments from the reviewers, including broad, strong appreciation of the novel aspects of our manuscript. We believe these will be only strengthened in the next submission.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The paper presents a model for sequence generation in the zebra finch HVC, which adheres to cellular properties measured experimentally. However, the model is fine-tuned and exhibits limited robustness to noise inherent in the inhibitory interneurons within the HVC, as well as to fluctuations in connectivity between neurons. Although the proposed microcircuits are introduced as units for sub-syllabic segments (SSS), the backbone of the network remains a feedforward chain of HVC_RA neurons, similar to previous models.

      Strengths:

      The model incorporates all three of the major types of HVC neurons. The ion channels used and their kinetics are based on experimental measurements. The connection patterns of the neurons are also constrained by the experiments.

      Weaknesses:

      The model is described as consisting of micro-circuits corresponding to SSS. This presentation gives the impression that the model's structure is distinct from previous models, which connected HVC_RA neurons in feedforward chain networks (Jin et al 2007, Li & Greenside, 2006; Long et al 2010; Egger et al 2020). However, the authors implement single HVC_RA neurons into chain networks within each micro-circuit and then connect the end of the chain to the start of the chain in the subsequent micro-circuit. Thus, the HVC_RA neuron in their model forms a single-neuron chain. This structure is essentially a simplified version of earlier models.

      In the model of the paper, the chain network drives the HVC_I and HVC_X neurons. The role of the micro-circuits is more significant in organizing the connections: specifically, from HVC_RA neurons to HVC_I neurons, and from HVC_I neurons to both HVC_X and HVC_RA neurons.

      We thank Reviewer 1 for their thoughtful comments.

      While the reviewer is correct about the fact that the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, we need to emphasize that this is true only if there is no intrinsic or synaptic perturbation to the HVC network. For example, we showed in Figures 10 and 12 how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC<sub>RA</sub> neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics. Moreover, all existing models that describe premotor sequence generation in the HVC either assume a distributed model (Elmaleh et al., 2021) that dictates that local HVC circuitry is not sufficient to advance the sequence but rather depends upon momentto-moment feedback through Uva (Hamaguchi et al., 2016), or assume models that rely on intrinsic connections within HVC to propagate sequential activity. In the latter case, some models assume that HVC is composed of multiple discrete subnetworks that encode individual song elements (Glaze & Troyer, 2013; Long & Fee, 2008; Wang et al., 2008), but lacks the local connectivity to link the subnetworks, while other models assume that HVC may have sufficient information in its intrinsic connections to form a single continuous network sequence (Long et al. 2010). The HVC model we present extends the concept of a feedforward network by incorporating additional neuronal classes that influence the propagation of activity (interneurons and HVC<sub>X</sub> neurons). We have shown that any disturbance of the intrinsic or synaptic conductances of these latter neurons will disrupt activity in the circuit even when HVC<sub>RA</sub> neurons properties are maintained.

      In regard to the similarities between our model and earlier models, several aspects of our model distinguish it from prior work. In short, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties. We tuned the intrinsic and the synaptic properties bases on the traces collected by Daou et al. (2013) and Mooney and Prather (2005) as shown in Figure 3. The three classes of model neurons incorporated to our network as well as the synaptic currents that connect them are based on HodgkinHuxley formalisms that contain ion channels and synaptic currents which had been pharmacologically identified. This is an advancement over prior models that primarily focused on the role of synaptic interactions or external inputs. The model is based on a feedforward chain of microcircuits that encode for the different sub-syllabic segments and that interact with each other through structured feedback inhibition, defining an ordered sequence of cell firing. Moreover, while several models highlight the critical role of inhibitory interneurons in shaping the timing and propagation of bursts of activity in HVC<sub>RA</sub> neurons, our work offers an intricate and comprehensive model that help understand this critical role played by inhibition in shaping song dynamics and ensuring sequence propagation.

      How useful is this concept of micro-circuits? HVC neurons fire continuously even during the silent gaps. There are no SSS during these silent gaps.

      Regarding the concern about the usefulness of the 'microcircuit' concept in our study, we appreciate the comment and we are glad to clarify its relevance in our network. While we acknowledge that HVC<sub>RA</sub> neurons interconnect microcircuits, our model's dynamics are still best described within the framework of microcircuitry particularly due to the firing behavior of HVC<sub>X</sub> neurons and interneurons. Here, we are referring to microcircuits in a more functional sense, rather than rigid, isolated spatial divisions (Cannon et al. 2015). A microcircuit in our model reflects the local rules that govern the interaction between all HVC neuron classes within the broader network, and that are essential for proper activity propagation. For example, HVC<sub>INT</sub> neurons belonging to any microcircuit burst densely and at times other than the moments when the corresponding encoded SSS is being “sung”. What makes a particular interneuron belong to this microcircuit or the other is merely the fact that it cannot inhibit HVC<sub>RA</sub> neurons that are housed in the microcircuit it belongs to. In particular, if HVC<sub>INT</sub> inhibits HVC<sub>RA</sub> in the same microcircuit, some of the HVC<sub>RA</sub> bursts in the microcircuit might be silenced by the dense and strong HVC<sub>INT</sub> inhibition breaking the chain of activity again. Similarly, HVC<sub>X</sub> neurons were selected to be housed within microcircuits due to the following reason: if an HVC<sub>X</sub> neuron belonging to microcircuit i sends excitatory input to an HVC<sub>INT</sub> neuron in microcircuit j, and that interneuron happens to select an HVC<sub>RA</sub> neuron from microcircuit i, then the propagation of sequential activity will halt, and we’ll be in a scenario similar to what was described earlier for HVC<sub>INT</sub> neurons inhibiting HVC<sub>RA</sub> neurons in the same microcircuit.

      We agree that there are no sub-syllabic segments described during the silent gaps and we thank the reviewer to pointing this out. Although silent gaps are integral to the overall process of song production, we have not elaborated on them in this model due to the lack of a clear, biophysically grounded representation for the gaps themselves at the level of HVC. Our primary focus has been on modeling the active, syllable-producing phases of the song, where the HVC network’s sequential dynamics are critical for song. However, one can think the encoding of silent gaps via similar mechanisms that encode SSSs, where each gap is encoded by similar microcircuits comprised of the three classes of HVC neurons (let’s called them GAP rather than SSS) that are active only during the silent gaps. In this case, the propagation of sequential activity is carried throughout the GAPs from the last SSS of the previous syllable to the first SSS of the subsequent syllable. We’ll make sure to emphasize this mechanism more in the revised version of the manuscript.

      A significant issue of the current model is that the HVC_RA to HVC_RA connections require fine-tuning, with the network functioning only within a narrow range of g_AMPA (Figure 2B). Similarly, the connections from HVC_I neurons to HVC_RA neurons also require fine-tuning. This sensitivity arises because the somatic properties of HVC_RA neurons are insufficient to produce the stereotypical bursts of spikes observed in recordings from singing birds, as demonstrated in previous studies (Jin et al 2007; Long et al 2010). In these previous works, to address this limitation, a dendritic spike mechanism was introduced to generate an intrinsic bursting capability, which is absent in the somatic compartment of HVC_RA neurons. This dendritic mechanism significantly enhances the robustness of the chain network, eliminating the need to fine-tune any synaptic conductances, including those from HVC_I neurons (Long et al 2010).

      Why is it important that the model should NOT be sensitive to the connection strengths?

      We thank the reviewer for the comment. While mathematical models designed for highly complex nonlinear biological processes tangentially touch the biological realism, the current network as is right now is the first biologically realistic-enough network model designed for HVC that explains sequence propagation. We do not include dendritic processes in our network although that increases the realistic dynamics for various reasons. 1) The ion channels we integrated into the somatic compartment are known pharmacologically (Daou et al. 2013), but we don’t know about the dendritic compartment’s intrinsic properties of HVC neurons and the cocktail of ion channels that are expressed there. 2) We are able to generate realistic bursting in HVC<sub>RA</sub> neurons despite the single compartment, and the main emphasis in this network is on the interactions between excitation and inhibition, the effects of ion channels in modulating sequence propagation, etc. 3) The network model already incorporates thousands of ODEs that govern the dynamics of each of the HVC neurons, so we did not want to add more complexity to the network especially that we don’t know the biophysical properties of the dendritic compartments.

      Therefore, our present focus is on somatic dynamics and the interaction between HVC<sub>RA</sub> and HVC<sub>INT</sub> neurons, but we acknowledge the importance of these processes in enhancing network resiliency. Although we agree that adding dendritic processes improves robustness, we still think that somatic processes alone can offer insightful information on the sequential dynamics of the HVC network. While the network should be robust across a wide range of parameters, it is also essential that certain parameters are designed to filter out weaker signals, ensuring that only reliable, precise patterns of activity propagate. Hence, we specifically chose to make the HVC<sub>RA</sub>-to-HVC<sub>RA</sub> excitatory connections more sensitive (narrow range of values) such that only strong, precise and meaningful stimuli can propagate through the network representing the high stereotypy and precision seen in song production.

      First, the firing of HVC_I neurons is highly noisy and unreliable. HVC_I neurons fire spontaneous, random spikes under baseline conditions. During singing, their spike timing is imprecise and can vary significantly from trial to trial, with spikes appearing or disappearing across different trials. As a result, their inputs to HVC_RA neurons are inherently noisy. If the model relies on precisely tuned inputs from HVC_I neurons, the natural fluctuations in HVC_I firing would render the model non-functional. The authors should incorporate noisy HVC_I neurons into their model to evaluate whether this noise would render the model non-functional.

      We acknowledge that under baseline and singing settings, interneurons fire in an extremely noisy and inaccurate manner, although they exhibit time locked episodes in their activity (Hahnloser et al 2002, Kozhinikov and Fee 2007). In order to mimic the biological variability of these neurons, our model does, in fact, include a stochastic current to reflect the intrinsic noise and random variations in interneuron firing shown in vivo (and we highlight this in the Methods). If necessary and to make sure the network is resilient to this randomness in interneuron firing, we will investigate different approaches to enhance the noise representation even further and check its effect on sequence propagation.

      Second, Kosche et al. (2015) demonstrated that reducing inhibition by suppressing HVC_I neuron activity makes HVC_RA firing less sparse but does not compromise the temporal precision of the bursts. In this experiment, the local application of gabazine should have severely disrupted HVC_I activity. However, it did not affect the timing precision of HVC_RA neuron firing, emphasizing the robustness of the HVC timing circuit. This robustness is inconsistent with the predictions of the current model, which depends on finely tuned inputs and should, therefore, be vulnerable to such disruptions.

      We thank the reviewer for the comment. The differences between the Kosche et al. (2015) findings and the predictions of our model arise from differences in the aspect of HVC function we are modeling. Our model is more sensitive to inhibition, which is a designed mechanism for achieving precise song patterning. This is a modeling simplification we adopted to capture specific characteristics of HVC function. Hence, Kosche et al. (2015) findings do not invalidate the approach of our model, but highlights that HVC likely operates with several, redundant mechanisms that overall ensure temporal precision.Nevertheless, we will investigate further the effects of the degree of inhibition on song patterning.

      Third, the reliance on fine-tuning of HVC_RA connections becomes problematic if the model is scaled up to include groups of HVC_RA neurons forming a chain network, rather than the single HVC_RA neurons used in the current work. With groups of HVC_RA neurons, the summation of presynaptic inputs to each HVC_RA neuron would need to be precisely maintained for the model to function. However, experimental evidence shows that the HVC circuit remains functional despite perturbations, such as a few degrees of cooling, micro-lesions, or turnover of HVC_RA neurons. Such robustness cannot be accounted for by a model that depends on finely tuned connections, as seen in the current implementation.

      Our model of individual HVC<sub>RA</sub> neurons and as stated previously is reductive model that focuses on understanding the mechanisms that govern sequential neural activity. We agree that scaling the model to include many of HVC<sub>RA</sub> neurons poses challenges, specifically concerning the summation of presynaptic inputs. However, our model can still be adapted to a larger network without requiring the level of fine-tuning currently needed. In fact, the current fine-tuning of synaptic connections in the model is a reflection of fundamental network mechanisms rather than a limitation when scaling to a larger network. Besides, one important feature of this neural network is redundancy. Even if some neurons or synaptic connections are impaired, other neurons or pathways can compensate for these changes, allowing the activity propagation to remain intact.

      The authors examined how altering the channel properties of neurons affects the activity in their model. While this approach is valid, many of the observed effects may stem from the delicate balancing required in their model for proper function.

      In the current model, HVC_X neurons burst as a result of rebound activity driven by the I_H current. Rebound bursts mediated by the I_H current typically require a highly hyperpolarized membrane potential. However, this mechanism would fail if the reversal potential of inhibition is higher than the required level of hyperpolarization. Furthermore, Mooney (2000) demonstrated that depolarizing the membrane potential of HVC_X neurons did not prevent bursts of these neurons during forward playback of the bird's own song, suggesting that these bursts (at least under anesthesia, which may be a different state altogether) are not necessarily caused by rebound activity. This discrepancy should be addressed or considered in the model.

      In our HVC network model, one goal with HVC<sub>X</sub> neurons is to generate bursts in their underlying neuron population. Since HVC<sub>X</sub> neurons in our model receive only inhibitory inputs from interneurons, we rely on inhibition followed by rebound bursts orchestrated by the IH and the I<sub>CaT</sub> currents to achieve this goal. The interplay between the T-type Ca<sup>++</sup> current and the H current in our model is fundamental to generate their corresponding bursts, as they are sufficient for producing the desired behavior in the network. Due to this interplay, we do not need significant inhibition to generate rebound bursts, because the T-type Ca<sup>++</sup> current’s conductance can be stronger leading to robust rebound bursting even when the degree of inhibition is not very strong. We will highlight this with more clarity in the revised version.

      Some figures contain direct copies of figures from published papers. It is perhaps a better practice to replace them with schematics if possible.

      We will replace the relevant figures with schematic representations where possible.

      Reviewer #2 (Public review):

      Summary:

      In this paper, the authors use numerical simulations to try to understand better a major experimental discovery in songbird neuroscience from 2002 by Richard Hahnloser and collaborators. The 2002 paper found that a certain class of projection neurons in the premotor nucleus HVC of adult male zebra finch songbirds, the neurons that project to another premotor nucleus RA, fired sparsely (once per song motif) and precisely (to about 1 ms accuracy) during singing.

      The experimental discovery is important to understand since it initially suggested that the sparsely firing RA-projecting neurons acted as a simple clock that was localized to HVC and that controlled all details of the temporal hierarchy of singing: notes, syllables, gaps, and motifs. Later experiments suggested that the initial interpretation might be incomplete: that the temporal structure of adult male zebra finch songs instead emerged in a more complicated and distributed way, still not well understood, from the interaction of HVC with multiple other nuclei, including auditory and brainstem areas. So at least two major questions remain unanswered more than two decades after the 2002 experiment: What is the neurobiological mechanism that produces the sparse precise bursting: is it a local circuit in HVC or is it some combination of external input to HVC and local circuitry?

      And how is the sparse precise bursting in HVC related to a songbird's vocalizations?

      The authors only investigate part of the first question, whether the mechanism for sparse precise bursts is local to HVC. They do so indirectly, by using conductance-based Hodgkin-Huxley-like equations to simulate the spiking dynamics of a simplified network that includes three known major classes of HVC neurons and such that all neurons within a class are assumed to be identical. A strength of the calculations is that the authors include known biophysically deduced details of the different conductances of the three major classes of HVC neurons, and they take into account what is known, based on sparse paired recordings in slices, about how the three classes connect to one another. One weakness of the paper is that the authors make arbitrary and not well-motivated assumptions about the network geometry, and they do not use the flexibility of their simulations to study how their results depend on their network assumptions. A second weakness is that they ignore many known experimental details such as projections into HVC from other nuclei, dendritic computations (the somas and dendrites are treated by the authors as point-like isopotential objects), the role of neuromodulators, and known heterogeneity of the interneurons. These weaknesses make it difficult for readers to know the relevance of the simulations for experiments and for advancing theoretical understanding.

      Strengths:

      The authors use conductance-based Hodgkin-Huxley-like equations to simulate spiking activity in a network of neurons intended to model more accurately songbird nucleus HVC of adult male zebra finches. Spiking models are much closer to experiments than models based on firing rates or on 2-state neurons.

      The authors include information deduced from modeling experimental current-clamp data such as the types and properties of conductances. They also take into account how neurons in one class connect to neurons in other classes via excitatory or inhibitory synapses, based on sparse paired recordings in slices by other researchers.

      The authors obtain some new results of modest interest such as how changes in the maximum conductances of four key channels (e.g., A-type K<sup>+</sup> currents or Ca-dependent K<sup>+</sup> currents) influence the structure and propagation of bursts, while simultaneously being able to mimic accurately current-clamp voltage measurements.

      Weaknesses:

      One weakness of this paper is the lack of a clearly stated, interesting, and relevant scientific question to try to answer. In the introduction, the authors do not discuss adequately which questions recent experimental and theoretical work have failed to explain adequately, concerning HVC neural dynamics and its role in producing vocalizations. The authors do not discuss adequately why they chose the approach of their paper and how their results address some of these questions.

      For example, the authors need to explain in more detail how their calculations relate to the works of Daou et al, J. Neurophys. 2013 (which already fitted spiking models to neuronal data and identified certain conductances), to Jin et al J. Comput. Neurosci. 2007 (which already discussed how to get bursts using some experimental details), and to the rather similar paper by E. Armstrong and H. Abarbanel, J. Neurophys 2016, which already postulated and studied sequences of microcircuits in HVC. This last paper is not even cited by the authors.

      We thank the reviewer for this valuable comment, and we agree that we did not clarify enough throughout the paper the utility of our model or how it advanced our understanding of the HVC dynamics and circuitry. To that end, we will revise several places of the manuscript and make sure to cite and highlight the relevance and relatedness of the mentioned papers.

      In short, and as mentioned to Reviewer 1, while several models of how sequence is generated within HVC have been proposed (Cannon et al., 2015; Drew & Abbott, 2003; Egger et al., 2020; Elmaleh et al., 2021; Galvis et al., 2018; Gibb et al., 2009a, 2009b; Hamaguchi et al., 2016; Jin, 2009; Long & Fee, 2008; Markowitz et al., 2015; Jin et al., 2007), all the models proposed either rely on intrinsic HVC circuitry to propagate sequential activity, rely on extrinsic feedback to advance the sequence or rely on both. These models do not capture the complex details of spike morphology, do not include the right ionic currents, do not incorporate all classes of HVC neurons, or do not generate realistic firing patterns as seen in vivo. Our model is the first biophysically realistic model that incorporates all classes of HVC neurons and their intrinsic properties.

      No existing hypothesis had been challenged with our model, rather; our model is a distillation of the various models that’s been proposed for the HVC network. We go over this in detail in the Discussion. We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      The authors' main achievement is to show that simulations of a certain simplified and idealized network of spiking neurons, which includes some experimental details but ignores many others, match some experimental results like current-clamp-derived voltage time series for the three classes of HVC neurons (although this was already reported in earlier work by Daou and collaborators in 2013), and simultaneously the robust propagation of bursts with properties similar to those observed in experiments. The authors also present results about how certain neuronal details and burst propagation change when certain key maximum conductances are varied.

      However, these are weak conclusions for two reasons. First, the authors did not do enough calculations to allow the reader to understand how many parameters were needed to obtain these fits and whether simpler circuits, say with fewer parameters and simpler network topology, could do just as well. Second, many previous researchers have demonstrated robust burst propagation in a variety of feed-forward models. So what is new and important about the authors' results compared to the previous computational papers?

      A major novelty of our work is the incorporation of experimental data with detailed network models. While earlier works have established robust burst propagation, our model uses realistic ion channel kinetics and feedback inhibition not only to reproduce experimental neural activity patterns but also to suggest prospective mechanisms for song sequence production in the most biophysical way possible. This aspect that distinguishes our work from other feed-forward models. We go over this in detail in the Discussion. However, the reviewer is right regarding the details of the calculations conducted for the fits, we will make sure to highlight this in the Methods and throughout the manuscript with more details.

      We believe that the network model we developed provide a step forward in describing the biophysics of HVC circuitry, and may throw a new light on certain dynamics in the mammalian brain, particularly the motor cortex and the hippocampus regions where precisely-timed sequential activity is crucial. We suggest that temporally-precise sequential activity may be a manifestation of neural networks comprised of chain of microcircuits, each containing pools of excitatory and inhibitory neurons, with local interplay among neurons of the same microcircuit and global interplays across the various microcircuits, and with structured inhibition as well as intrinsic properties synchronizing the neuronal pools and stabilizing timing within a firing sequence.

      Also missing is a discussion, or at least an acknowledgment, of the fact that not all of the fine experimental details of undershoots, latencies, spike structure, spike accommodation, etc may be relevant for understanding vocalization. While it is nice to know that some models can match these experimental details and produce realistic bursts, that does not mean that all of these details are relevant for the function of producing precise vocalizations. Scientific insights in biology often require exploring which of the many observed details can be ignored and especially identifying the few that are essential for answering some questions. As one example, if HVC-X neurons are completely removed from the authors' model, does one still get robust and reasonable burst propagation of HVC-RA neurons? While part of the nucleus HVC acts as a premotor circuit that drives the nucleus RA, part of HVC is also related to learning. It is not clear that HVC-X neurons, which carry out some unknown calculation and transmit information to area X in a learning pathway, are relevant for burst production and propagation of HVC<sub>RA</sub> neurons, and so relevant for vocalization. Simulations provide a convenient and direct way to explore questions of this kind.

      One key question to answer is whether the bursting of HVC-RA projection neurons is based on a mechanism local to HVC or is some combination of external driving (say from auditory nuclei) and local circuitry. The authors do not contribute to answering this question because they ignore external driving and assume that the mechanism is some kind of intrinsic feed-forward circuit, which they put in by hand in a rather arbitrary and poorly justified way, by assuming the existence of small microcircuits consisting of a few HVC-RA, HVC-X, and HVC-I neurons that somehow correspond to "sub-syllabic segments". To my knowledge, experiments do not suggest the existence of such microcircuits nor does theory suggest the need for such microcircuits.

      Recent results showed a tight correlation between the intrinsic properties of neurons and features of song (Daou and Margoliash 2020, Medina and Margoliash 2024), where adult birds that exhibit similar songs tend to have similar intrinsic properties. While this is relevant, we acknowledge that not all details may be necessary for every aspect of vocalization, and future models could simplify concentrate on core dynamics and exclude certain features while still providing insights into the primary mechanisms.

      The question of whether HVC<sub>X</sub> neurons are relevant for burst propagation given that our model includes these neurons as part of the network for completeness, the reviewer is correct, the propagation of sequential activity in this model is primarily carried by HVC<sub>RA</sub> neurons in a feed-forward manner, but only if there is no perturbation to the HVC network. For example, we have shown how altering the intrinsic properties of HVC<sub>X</sub> neurons or for interneurons disrupts sequence propagation. In other words, while HVC neurons are the key forces to carry the chain forward, the interplay between excitation and inhibition in our network as well as the intrinsic parameters for all classes of HVC neurons are equally important forces in carrying the chain of activity forward. Thus, the stability of activity propagation necessary for song production depend on a finely balanced network of HVC neurons, with all classes contributing to the overall dynamics.

      We agree with the reviewer however that a potential drawback of our model is that its sole focus is on local excitatory connectivity within the HVC (Kornfeld et al., 2017; Long et al., 2010), while HVC neurons receive afferent excitatory connections (Akutagawa & Konishi, 2010; Nottebohm et al., 1982) that plays significant roles in their local dynamics. For example, the excitatory inputs that HVC neurons receive from Uvaeformis may be crucial in initiating (Andalman et al., 2011; Danish et al., 2017; Galvis et al., 2018) or sustaining (Hamaguchi et al., 2016) the sequential activity. While we acknowledge this limitation, our main contribution in this work is the biophysical insights onto how the patterning activity in HVC is largely shaped by the intrinsic properties of the individual neurons as well as the synaptic properties where excitation and inhibition play a major role in enabling neurons to generate their characteristic bursts during singing. This is true and holds irrespective of whether an external drive is injected onto the microcircuits or not. We will however elaborate on and investigate this more during the next submission.

      Another weakness of this paper is an unsatisfactory discussion of how the model was obtained, validated, and simulated. The authors should state as clearly as possible, in one location such as an appendix, what is the total number of independent parameters for the entire network and how parameter values were deduced from data or assigned by hand. With enough parameters and variables, many details can be fit arbitrarily accurately so researchers have to be careful to avoid overfitting. If parameter values were obtained by fitting to data, the authors should state clearly what the fitting algorithm was (some iterative nonlinear method, whose results can depend on the initial choice of parameters), what the error function used for fitting (sum of least squares?) was, and what data were used for the fitting.

      The authors should also state clearly the dynamical state of the network, the vector of quantities that evolve over time. (What is the dimension of that vector, which is also the number of ordinary differential equations that have to be integrated?) The authors do not mention what initial state was used to start the numerical integrations, whether transient dynamics were observed and what were their properties, or how the results depended on the choice of the initial state. The authors do not discuss how they determined that their model was programmed correctly (it is difficult to avoid typing errors when writing several pages or more of a code in any language) or how they determined the accuracy of the numerical integration method beyond fitting to experimental data, say by varying the time step size over some range or by comparing two different integration algorithms.

      We thank the reviewer again. The fitting process in our model occurred only at the first stage where the synaptic parameters were fit to the Mooney and Prather as well as the Kosche results. There was no data shared and we merely looked at the figures in those papers and checked the amplitude of the elicited currents, the magnitudes of DC-evoked excitations etc, and we replicated that in our model. While this is suboptimal, it was better for us to start with it rather than simply using equations for synaptic currents from the literature for other types of neurons (that are not even HVC’s or in the songbird) and integrate them into our network model. However, we will certainly highlight the details of this fitting process in the new submission. We will also highlight more technical details in the Methods regarding the exact number of ODEs, the initial conditions to run them, etc.

      Also disappointing is that the authors do not make any predictions to test, except rather weak ones such as that varying a maximum conductance sufficiently (which might be possible by using dynamic clamps) might cause burst propagation to stop or change its properties. Based on their results, the authors do not make suggestions for further experiments or calculations, but they should.

      We agree that making experimental testable predictions is crucial for the advancement of the model. Our predictions include testing whether eradication of a class of neurons such as HVC<sub>X</sub> neurons disrupts activity propagation which can be done through targeted neuron elimination. This also can be done through preventing rebound bursting in HVC<sub>X</sub> by pharmacologically blocking the I<sub>h</sub> channels. Others include down regulation of certain ion channels (pharmacologically done through ion blockers) and testing which current is fundamental for song production (and there a plenty of test based our results, like the SK current, the T-type Ca<sup>++</sup> current, the A-type K<sup>+</sup> current, etc). We will incorporate these into the revised manuscript to better demonstrate the model's applicability and to guide future research directions.

    1. eLife Assessment

      This manuscript presents important findings on how structural color can be manipulated through a specific single-gene mutation in the motile bacterium Flavobacterium IR1. It provides a promising model to identify genes and molecular mechanisms supporting this widespread optical phenomenon. The story relies on convincing data with proteomic analysis and well-designed experiments, although it remains rather descriptive. This work will be of interest to biophysicists and microbiologists working on structural colors and Flavobacterium.

    2. Reviewer #1 (Public review):

      Summary:

      Structural colors (SC) are based on nanostructures reflecting and scattering light and producing optical wave interference. All kinds of living organisms exhibit SC. However, understanding the molecular mechanisms and genes involved may be complicated due to the complexity of these organisms. Hence, bacteria that exhibit SC in colonies, such as Flavobacterium IR1, can be good models.

      Based on previous genomic mining and co-occurrence with SC in flavobacterial strains, this article focuses on the role of a specific gene, moeA, in SC of Flavobacterium IR1 strain colonies on an agar plate. moeA is involved in the synthesis of the molybdenum cofactor, which is necessary for the activity of key metabolic enzymes in diverse pathways.

      The authors clearly showed that the absence of moeA shifts SC properties in a way that depends on the nutritional conditions. They further bring evidence that this effect was related to several properties of the colony, all impacted by the moeA mutant: cell-cell organization, cell motility and colony spreading, and metabolism of complex carbohydrates. Hence, by linking SC to a single gene in appearance, this work points to cellular organization (as a result of cell-cell arrangement and motility) and metabolism of polysaccharides as key factors for SC in a gliding bacterium. This may prove useful for designing molecular strategies to control SC in bacterial-based biomaterials.

      Strengths:

      The topic is very interesting from a fundamental viewpoint and has great potential in the field of biomaterials.

      The article is easy to read. It builds on previous studies with already established tools to characterize SC at the level of the flavobacterial colony. Experiments are well described and well executed. In addition, the SIBR-Cas method for chromosome engineering in Flavobacteria is the most recent and is a leap forward for future studies in this model, even beyond SC.

      Weaknesses:

      The paper appears a bit too descriptive and could be better organized. Some of the results, in particular the proteomic comparison, are not well exploited (not explored experimentally). In my opinion, the problem originates from the difficulty in explaining the link between the absence of moeA and the alterations observed at the level of colony spreading and polysaccharide utilization, and the variation in proteomic content.

      First, the effect of moeA deletion on molybdenum cofactor synthesis should be addressed.

      Second, as I was reading the entire manuscript, I kept asking myself if moeA (and by extension molybdenum cofactor) was really involved in SC or it was an indirect effect. For example, what if the absence of moeA alters the cell envelope because the synthesis of its building blocks is perturbed, then subsequently perturbates all related processes, including gliding motility and protein secretion? It would help to know if the effects on colony spreading and polysaccharide metabolism can be uncoupled. I don't think the authors discussed that clearly.

    3. Reviewer #2 (Public review):

      Summary:

      The authors constructed an in-frame deletion of moeA gene, which is involved in molybdopterin cofactor (MoCo) biosynthesis, and investigated its role in structural colors in Flavobacterium IR1. The deletion of moeA shifted colony color from green to blue, reduced colony spreading, and increased starch degradation, which was attributed to the upregulation of various proteins in polysaccharide utilization loci. This study lays the ground for developing new colorants by modifying genes involved in structural colors.

      Major strengths and weaknesses:

      The authors conducted well-designed experiments with appropriate controls and the results in the paper are presented in a logical manner, which supports their conclusions. Using statistical tests to compare the differences between the wild type and moeA mutant, and adding a significance bar in Figure 4B, would strengthen their claims on differences in cell motility regarding differences in cell motility. Additionally, in the result section (Figure 6), the authors suggest that the shift in blue color is "caused by cells which are still highly ordered but narrower", which to my knowledge is not backed up by any experimental evidence.

      Overall, this is a well-written paper in which the authors effectively address their research questions through proper experimentation. This work will help us understand the genetic basis of structural colors in Flavobacterium and open new avenues to study the roles of additional genes and proteins in structural colors.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Structural colors (SC) are based on nanostructures reflecting and scattering light and producing optical wave interference. All kinds of living organisms exhibit SC. However, understanding the molecular mechanisms and genes involved may be complicated due to the complexity of these organisms. Hence, bacteria that exhibit SC in colonies, such as Flavobacterium IR1, can be good models.

      Based on previous genomic mining and co-occurrence with SC in flavobacterial strains, this article focuses on the role of a specific gene, moeA, in SC of Flavobacterium IR1 strain colonies on an agar plate. moeA is involved in the synthesis of the molybdenum cofactor, which is necessary for the activity of key metabolic enzymes in diverse pathways.

      The authors clearly showed that the absence of moeA shifts SC properties in a way that depends on the nutritional conditions. They further bring evidence that this effect was related to several properties of the colony, all impacted by the moeA mutant: cell-cell organization, cell motility and colony spreading, and metabolism of complex carbohydrates. Hence, by linking SC to a single gene in appearance, this work points to cellular organization (as a result of cell-cell arrangement and motility) and metabolism of polysaccharides as key factors for SC in a gliding bacterium. This may prove useful for designing molecular strategies to control SC in bacterial-based biomaterials.

      Strengths:

      The topic is very interesting from a fundamental viewpoint and has great potential in the field of biomaterials.

      Thank you for your comments.

      The article is easy to read. It builds on previous studies with already established tools to characterize SC at the level of the flavobacterial colony. Experiments are well described and well executed. In addition, the SIBR-Cas method for chromosome engineering in Flavobacteria is the most recent and is a leap forward for future studies in this model, even beyond SC.

      We appreciate these comments.

      Weaknesses:

      The paper appears a bit too descriptive and could be better organized. Some of the results, in particular the proteomic comparison, are not well exploited (not explored experimentally). In my opinion, the problem originates from the difficulty in explaining the link between the absence of moeA and the alterations observed at the level of colony spreading and polysaccharide utilization, and the variation in proteomic content.

      We will look at the organisation of the manuscript carefully in the coming, detailed revision, as suggested. In terms of the proteomics, there are clearly a large number of proteins affected by the moeA deletion. In terms of experimental exploration, we chose spreading, structural colour formation and starch degradation to test phenotypically, as the most relevant. For example, in L615-617, we discuss the downregulation of GldL (which is known to be involved Flavobacterial gliding motility [Shrivastava et al., 2013]) in the _moe_A KO as a possible explanation for the reduced colony spreading of moeA mutant. Changes in polysaccharide (starch) utilization were seen on solid medium, as well as in the proteomic profile where we observed the upregulation of carbohydrate metabolism proteins linked to PUL (polysaccharide utilisation locus) operons (Terrapon et al., 2015), such as PAM95095-90 (Figure 8), and other carbohydrate metabolism-related proteins, including a pectate lyase (Table S7) which is involved in starch degradation (Aspeborg et al., 2012). And as noted in L555-566 and Figure 9, starch metabolism was tested experimentally.

      First, the effect of moeA deletion on molybdenum cofactor synthesis should be addressed.

      MoeA is the last enzyme in the MoCo synthesis pathway, thus if only MoeA is absent the cell would accumulate MPT-AMP (molybdopterin-adenosine monophosphatase) (Iobbi-Nivol & Leimkühler, 2013), and the expressed molybdoenzymes would not be functional. In L582-585, we commented how the lack of molybdenum cofactor may affect the synthesis of molybdoenzymes. However, if you meant to analyse the presence of the small molecules, the cofactors, involved in these pathways, that was an assay we were not able to perform. Moreover, in L585-587, we addressed how the deletion of _moe_A affected the proteins encoded by the rest of genes in the operon.

      Second, as I was reading the entire manuscript, I kept asking myself if moeA (and by extension molybdenum cofactor) was really involved in SC or it was an indirect effect. For example, what if the absence of moeA alters the cell envelope because the synthesis of its building blocks is perturbed, then subsequently perturbates all related processes, including gliding motility and protein secretion? It would help to know if the effects on colony spreading and polysaccharide metabolism can be uncoupled. I don't think the authors discussed that clearly.

      The message of the paper is that the moeA gene, as predicted from a previous genomics analysis, is important in SC. This is based on the representation of the _moe_A gene in genomes of bacteria that display SC. This analysis does not predict the mechanism. When knocked out, a significant change in structural colour occurred, supporting this hypothesis. Whether this effect is direct or indirect is difficult to assess, as this referee rightly suggests. In order to follow up this central result, we performed proteomics (both intra- and extracellular). As we observed, the deletion of a single gene generated many changes in the proteomic profile, thus in the biological processes. Based on the known functions of molybdenum cofactor, we could only hypothesize that pterin metabolism is important for SC, not exactly how.

      We intend to discuss the links between gliding/spreading and polysaccharide metabolism more clearly, with reference to the literature, as quite a bit is known here including possible links to SC.

      Reviewer #2 (Public review):

      Summary:

      The authors constructed an in-frame deletion of moeA gene, which is involved in molybdopterin cofactor (MoCo) biosynthesis, and investigated its role in structural colors in Flavobacterium IR1. The deletion of moeA shifted colony color from green to blue, reduced colony spreading, and increased starch degradation, which was attributed to the upregulation of various proteins in polysaccharide utilization loci. This study lays the ground for developing new colorants by modifying genes involved in structural colors.

      Major strengths and weaknesses:

      The authors conducted well-designed experiments with appropriate controls and the results in the paper are presented in a logical manner, which supports their conclusions.

      We appreciate your comment.

      Using statistical tests to compare the differences between the wild type and moeA mutant, and adding a significance bar in Figure 4B, would strengthen their claims on differences in cell motility regarding differences in cell motility.

      Thank you. Figure 4B contains the significance bars that represent the standard deviation of the mean value of the three replicates, but we will modify it to make them more clear.

      Additionally, in the result section (Figure 6), the authors suggest that the shift in blue color is "caused by cells which are still highly ordered but narrower", which to my knowledge is not backed up by any experimental evidence.

      Thanks. We mentioned that the mutant cells are narrower than the wild type based on the estimated periodicity resulting from the goniometry analysis (L427-430). We will now say “likely to be narrower based on the estimated periodicity from the optical analysis” rather than just “narrower” in the revision.

      Overall, this is a well-written paper in which the authors effectively address their research questions through proper experimentation. This work will help us understand the genetic basis of structural colors in Flavobacterium and open new avenues to study the roles of additional genes and proteins in structural colors.

      Much appreciated.

      REFERENCES

      Aspeborg, Henrik, Pedro M. Coutinho, Yang Wang, Harry Brumer, and Bernard Henrissat. "Evolution, substrate specificity and subfamily classification of glycoside hydrolase family 5 (GH5)." BMC evolutionary biology 12 (2012): 1-16.

      lobbi-Nivol, Chantal, and Silke Leimkühler. "Molybdenum enzymes, their maturation and molybdenum cofactor biosynthesis in Escherichia coli." Biochimica et Biophysica Acta (BBA)-Bioenergetics 1827, no. 8-9 (2013): 1086-1101.

      Shrivastava, Abhishek, Joseph J. Johnston, Jessica M. Van Baaren, and Mark J. McBride. "Flavobacterium johnsoniae GldK, GldL, GldM, and SprA are required for secretion of the cell surface gliding motility adhesins SprB and RemA." Journal of bacteriology 195, no. 14 (2013): 3201-3212.

      Terrapon, Nicolas, Vincent Lombard, Harry J. Gilbert, and Bernard Henrissat. "Automatic prediction of polysaccharide utilization loci in Bacteroidetes species." Bioinformatics 31, no. 5 (2015): 647-655.

    1. eLife Assessment

      This fundamental research conducted a molecular comparison between smooth muscle cells and adjacent fibroblast cells within lung blood vessels affected by pulmonary arterial hypertension. The study identified distinct disease-related states in each cell type and provided deeper insights into their interactions and communication. While certain conclusions should be interpreted with caution due to inherent methodological limitations, the study's findings remain convincing and robust. This is supported by the use of advanced and complementary techniques, as well as the rare isolation of diseased lung blood vessel cells from the same donor, enabling direct comparison.

    2. Reviewer #1 (Public review):

      Summary:

      The authors isolated and cultured pulmonary artery smooth muscle cells (PASMC) and pulmonary artery adventitial fibroblasts (PAAF) of the lung samples derived from the patients with idiopathic pulmonary arterial hypertension (PAH) and the healthy volunteers. They performed RNA-seq and proteomics analyses to detail the cellular communication between PASMC and PAAF, which are the main target cells of pulmonary vascular remodeling during the pathogenesis of PAH. The authors revealed that PASMC and PAAF retained their original cellular identity and acquired different states associated with the pathogenesis of PAH, respectively.

      Strengths:

      Although previous studies have shown that PASMC and PAAF cells each have an important role in the pathogenesis of PAH, there have been scarce reports focusing on the interactions between PASMC and PAAF. These findings may provide valuable information for elucidating the pathogenesis of pulmonary arterial hypertension.

      Comments on revisions:

      The authors adequately responded to my concerns and revised their manuscript to elaborate on the new data from new experiments and address my queries. Although some of the issues I initially raised could not be fully resolved, the revised manuscript has been significantly improved. This manuscript provides essential insights into the communications across the PASMCs and PAAFs in PAH. This would greatly interest various researchers in both basic and clinical fields.

    3. Reviewer #2 (Public review):

      Summary:

      Utilizing a combination of transcriptomic and proteomic profiling as well as cellular phenotyping from source-matched PASMC and PAAFs in IPAH, this<br /> study sought to explore a molecular comparison of these cells in order to track distinct cell fate trajectories and acquisition of their IPAH-associated cellular states. The authors also aimed to identify cell-cell communication axes in order to infer mechanisms by which these two cells interact and depend upon external cues. This study will be of interest to the scientific and clinical communities of those interested in pulmonary vascular biology and disease. It also will appeal to those interested in lung and vascular development as well as multi-omic analytic procedures.

      Strengths:

      (1) This is one of the first studies using orthogonal sequencing and phenotyping for characterization of source-matched neighoring mesenchymal PASMC and PAAF cells in healthy and diseased IPAH patients. This is a major strength which allows for direct comparison of neighboring cell types and the ability to address an unanswered question regarding the nature of these mesenchymal "mural" cells at a precise molecular level.

      (2) Unlike a number of multi-omic sequencing papers that read more as an atlas of findings without structure, the inherent comparative organization of the study and presentation of the data were valuable in aiding the reader in understanding how to discern the distinct IPAH-associated cell states. As a result, the reader not only gleans greater insight into these two interacting cell types in disease but also now can leverage these datasets more easily for future research questions in this space.

      (3) There are interesting and surprising findings in the cellular characterizations, including the low proliferative state of IPAH-PASMCs as compared to the hyperproliferative state in IPAH-PAAFs. Furthermore, the cell-cell communication axes involving ECM components and soluble ligands provided by PAAFs that direct cell state dynamics of PASMCs offer some of the first and foundational descriptions of what are likely complex cellular interactions that await discovery.

      (4) Technical rigor is quite high in the -omics methodology and in vitro phenotyping tools used.

      Weaknesses:

      There are some weaknesses in the methodology that should temper the conclusions:

      (1) The number of donors sampled for PAAF/PASMCs was relatively small for both healthy controls and IPAH patients. Thus, while the level of detail of -omics profiling was quite deep, the generalizability of their findings to all IPAH patients or Group 1 PAH patients is limited. In the revised manuscript, the authors addressed this concern with important text changes and additional data.

      (2) While the study utilized early passage cells, these cells nonetheless were still cultured outside the in vivo milieu prior to analysis. Thus, while there is an assumption that these cells do not change fundamental behavior outside the body, that is not entirely proven for all transcriptional and proteomic signatures. As such, the major alterations that are noted would be more compelling if validated from tissue or cells derived directly from in vivo sources. Without such validation, the major limitation of the impact and conclusions of the paper is that the full extent of the relevance of these findings to human disease is not known. The authors addressed this concern appropriately with significant text changes to clarify these limitations for the reader.

      (3) While the presentation of most of the manuscript was quite clear and convincing, the terminology and conclusions regarding "cell fate trajectories" throughout the manuscript did not seem to be fully justified. That is, all of the analyses were derived from cells originating from end-stage IPAH, and otherwise, the authors were not lineage tracing across disease initiation or development (which would be impossible currently in humans). So, while the description of distinct "IPAH-associated states" makes sense, any true cell fate trajectory was not clearly defined. The revised manuscript has removed this terminology and replaced it with more precise language.

      Comments on revisions:

      The authors were quite responsive to all of my concerns, offering both important revisions to the presentation of the work as well as new data. While some of the limitations were not fully resolved (and the authors provide appropriate justification for this), the revised manuscript is much improved. It will be of great interest to both the scientific and clinical communities.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study explored a molecular comparison of smooth muscle and neighboring fibroblast cells found in lung blood vessels afflicted by a disease called pulmonary arterial hypertension. In doing so, the authors described distinct disease-associated states of each of these cell types with further insights into the cellular communication and crosstalk between them. The strength of evidence was convincing through the use of complementary and sophisticated tools, accompanied by rare isolation of human diseased lung blood vessel cells that were source-matched to the same donor for direct comparison.

      We thank the editors and reviewers in their highly positive and encouraging assessment of our manuscript detailing the cell state changes of arterial smooth muscle cells and fibroblasts in the pulmonary bed. We addressed reviewers’ major comments in the revised manuscript by providing validation of key in vitro findings, such as preserved marker localization and increased GAG deposition in IPAH pulmonary arteries. We additionally provide comparison of transcriptomic profiles spanning fresh, very early and late passage cells. Finally, we present expanded experimental data in support of cellular crosstalk, including testing of additional PAAF ligands on donor PASMC and influence of PTX3/HGF on IPAH PASMC.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors isolated and cultured pulmonary artery smooth muscle cells (PASMC) and pulmonary artery adventitial fibroblasts (PAAF) of the lung samples derived from the patients with idiopathic pulmonary arterial hypertension (PAH) and the healthy volunteers. They performed RNA-seq and proteomics analyses to detail the cellular communication between PASMC and PAAF, which are the main target cells of pulmonary vascular remodeling during the pathogenesis of PAH. The authors revealed that PASMC and PAAF retained their original cellular identity and acquired different states associated with the pathogenesis of PAH, respectively.

      Strengths:

      Although previous studies have shown that PASMC and PAAF cells each have an important role in the pathogenesis of PAH, there have been scarce reports focusing on the interactions between PASMC and PAAF. These findings may provide valuable information for elucidating the pathogenesis of pulmonary arterial hypertension.

      We appreciate the reviewer’s positive view of our study.

      Weaknesses:

      The results of proteome analysis using primary culture cells in this paper seem a bit insufficient to draw conclusions. In particular, the authors described "We elucidated the involvement of cellular crosstalk in regulating cell state dynamics and identified pentraxin-3 and hepatocyte growth factor as modulators of PASMC phenotypic transition orchestrated by PAAF." However, the presented data are considered limited and insufficient.

      We thank the reviewer for drawing our attention to this point and have accordingly modified the conclusion section to read: “We investigated the involvement of cellular crosstalk….” Moreover, we provide further experimental evidence demonstrating the effect of both PTX3 and HGF on cell state marker expression in IPAH-PASMC cells (Figure 7H). In addition, we clarify the selection strategy applied to investigate particular PAAF-secreted ligands and test three additional ligands on donor PASMC (Figure S8), supporting the original focus on PTX3 and HGF.

      Reviewer #2 (Public Review):

      Summary:

      Utilizing a combination of transcriptomic and proteomic profiling as well as cellular phenotyping from source-matched PASMC and PAAFs in IPAH, this study sought to explore a molecular comparison of these cells in order to track distinct cell fate trajectories and acquisition of their IPAH-associated cellular states. The authors also aimed to identify cell-cell communication axes in order to infer mechanisms by which these two cells interact and depend upon external cues. This study will be of interest to the scientific and clinical communities of those interested in pulmonary vascular biology and disease. It also will appeal to those interested in lung and vascular development as well as multi-omic analytic procedures.

      We thank the reviewer for overall highly positive assessment of our study.

      Strengths:

      (1) This is one of the first studies using orthogonal sequencing and phenotyping for the characterization of source-matched neighboring mesenchymal PASMC and PAAF cells in healthy and diseased IPAH patients. This is a major strength that allows for direct comparison of neighboring cell types and the ability to address an unanswered question regarding the nature of these mesenchymal "mural" cells at a precise molecular level.

      We value the reviewer’s kind and objective summary of our study.

      (2) Unlike a number of multi-omic sequencing papers that read more as an atlas of findings without structure, the inherent comparative organization of the study and presentation of the data were valuable in aiding the reader in understanding how to discern the distinct IPAH-associated cell states. As a result, the reader not only gleans greater insight into these two interacting cell types in disease but also now can leverage these datasets more easily for future research questions in this space.

      We thank the reviewer for this highly positive comment.

      (3) There are interesting and surprising findings in the cellular characterizations, including the low proliferative state of IPAH-PASMCs as compared to the hyperproliferative state in IPAH-PAAFs. Furthermore, the cell-cell communication axes involving ECM components and soluble ligands provided by PAAFs that direct cell state dynamics of PASMCs offer some of the first and foundational descriptions of what are likely complex cellular interactions that await discovery.

      We agree with the reviewer’s assessment that some of the novel data in our study helps to formulate testable hypothesis that can be followed through with more focused follow-up research.

      (4) Technical rigor is quite high in the -omics methodology and in vitro phenotyping tools used.

      We are grateful for reviewer’s assessment of our work and positive recognition.

      Weaknesses:

      There are some weaknesses in the methodology that should temper the conclusions:

      (1) The number of donors sampled for PAAF/PASMCs was small for both healthy controls and IPAH patients. Thus, while the level of detail of -omics profiling was quite deep, the generalizability of their findings to all IPAH patients or Group 1 PAH patients is limited.

      We appreciate the reviewers concerns regarding the generalizability of the findings and have acknowledged this as the study limitation in the discussion: “A low case number and end-stage disease samples used for omics characterization represents a study limitation that has to be taken into account before assuming similar findings would be evident in the entire PAH patient population over the course of the disease development and progression”. We have addressed this issue by performing validation of key in vitro findings using fresh cells or assessment of FFPE lung material from additional independent samples in the revised manuscript (Figures 2D, 3D, 3H, 4H). For transparency, we provide biological sample number in the result section of the modified manuscript.

      (2) While the study utilized early passage cells, these cells nonetheless were still cultured outside the in vivo milieu prior to analysis. Thus, while there is an assumption that these cells do not change fundamental behavior outside the body, that is not entirely proven for all transcriptional and proteomic signatures. As such, the major alterations that are noted would be more compelling if validated from tissue or cells derived directly from in vivo sources. Without such validation, the major limitation of the impact and conclusions of the paper is that the full extent of the relevance of these findings to human disease is not known.

      We thank the reviewer for this constructive and excellent suggestion. The comparison of fresh and cultured cells revealed a strong and early divergence of differentially regulated pathways for PAAF, while a more gradual transition for PASMC. The results of this analysis are included in the new Figures 2D, 3D, 3H, and 4H. Implications are discussed in the revised manuscript: “However, the same mechanism renders cells susceptible to phenotypic change induced simply by extended vitro culturing, testified by broad expression profile differences between fresh and cultured cells. This common caveat in cell biology research and represents a technical and practical tradeoff that requires cross validation of key findings. Using a combination of archived lung tissue and available single cell RNA sequencing dataset of human pulmonary arteries, we show that some of the key defining phenotypic features of diseased cells, such as altered proliferation rate and ECM production, are preserved and gradually lost upon prolonged culturing”.

      (3) While the presentation of most of the manuscript was quite clear and convincing, the terminology and conclusions regarding "cell fate trajectories" throughout the manuscript did not seem to be fully justified. That is, all of the analyses were derived from cells originating from end-stage IPAH, and otherwise, the authors were not lineage tracing across disease initiation or development (which would be impossible currently in humans). So, while the description of distinct "IPAH-associated states" makes sense, any true cell fate trajectory was not clearly defined.

      In accordance to reviewer’s comment, we have decided to modify the wording to exclude the “cell fate trajectory” phrase and replace it with “acquisition of disease cell state”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) In Figure 1, PASMC and PAAF were collected from the lungs of healthy donors and analyzed for transcriptomics and proteomics; in Figure 1A, it can be taken as if both cells from IPAH patients were also analyzed, but this is not reflected in the results. In Figure1D, immunostaining of normal lungs confirms the localization of PASMC and PAAF markers found by transcriptomics. The authors describe a strong, but not perfect, correlation between the transcriptomics and proteomics data from Figure S1, but the gene names of each cellular marker they found should also be listed. In addition, the authors have observed the expression of markers characteristic of PASMC and PAAF in pulmonary vessels of healthy subjects by IH, but is there any novelty in these markers? Furthermore, are the expression sites of these markers altered in IPAH patients?

      In the revised manuscript we have adjusted the schematic to reflect the fact that only donor cells are compared in Figure 1. We additionally provide a correlation of cell type markers between proteomic and transcriptomic data sets for those molecules that are detected in both datasets (Figure S1B).

      We provide clarification on the novelty aspect in the result section: “Some of the molecules were previously associated with predominant SMC, such as RGS5 and CSPR1 (Crnkovic et al., 2022; Snider et al., 2008), or adventitial fibroblast, such as SCARA5, CFD and MGST1 (Crnkovic et al., 2022; Sikkema et al., 2023) expression”. Except for RGS5, expression and localization of other markers in IPAH was previously unknown.

      The conservation of expression sites for reported markers was validated in IPAH in the revised manuscript (Figure 2D), with IGFBP5 showing dual localization in both cell types. Moreover, results in Figure 1D, 1E and 2D support the validity of omics findings and preservation of key markers during passaging.

      (2) In Figure 2, the authors compare PASMC and PAAF derived from IPAH patients and donors. The results show that transcriptomics and proteomics changes are clearly differentiated by cell type and not by pathological state. In the pathological state, transcriptional changes are more pronounced. The GO analysis of the factors that showed significant changes in each cell type is shown in Figure 2E, but the differences between the GO analysis of the transcriptomics and proteomics results are not clearly shown. The reviewer believes that the advantages of a combined analysis of both should be indicated. Also, in Figure 2G, the GAG content in PA appears to be elevated in only 3 cases, while the other 5 cases appear to be at the same level as the donor; is there a characteristic change in these 3 cases? Figure 2I shows that the phenotype of PAAF changes with cell passages. Since this phenomenon would be interesting and useful to the reader, additional discussion regarding the mechanism would be desired.

      We have integrated both data sets in order to achieve stronger and meaningful analysis due to weaker and uncomplete correlation between transcriptomic and protein dataset as indicated in the results section: “Comparative analysis of transcriptomic and proteomic data sets revealed a strong, but not complete level of linear correlation between the gene and protein expression profiles (Figure S1B, C). We therefore decided to use an integrative dataset and analyzed all significantly enriched genes and proteins (-log10(P)>1.3) between both cell types to achieve stronger and more robust analysis”. In general, proteomic profile showed fewer significant differences and extent of change was lesser compared with transcriptomics, likely due to technical limitations of the method and sensitivity, testified by the complete lack of top transcriptomic molecules (RGS5, ADH1C, IGFBP5, CFD, SCARA5) in the protein dataset.

      To strengthen the findings of increased GAG in IPAH pulmonary arteries, we have performed compartment-specific, quantitative image analysis of Alcian blue staining on additional donor and patient samples (n=10 for each condition). The new analysis totaling around 40 PA confirmed significantly increased deposition of GAG in IPAH pulmonary arteries.

      We have addressed the issue of phenotypic change with prolonged cell culture in the revised manuscript by systematically comparing enrichment for biological processes between fresh (Crnkovic et al., 2022: GSE210248), very early (this study: GSE255669) and later passage cells (Chelladurai et al., 2022: GSE144932; Gorr et al., 2020: GSE144274). We observed cell type differences in the rate of change of phenotypic features, with PAAF showing faster shift early on during culturing that could for some of the features be due to isolation from immunomodulatory environment or presence of hydrocortisone supplement in the PAAF cell media. These points have been described in the revised results section and mentioned in the discussion.

      (3) The authors claim that one feature of this paper is the use of "very early passage (p1)" of pulmonary artery smooth muscle cells (PASMC). Since there are other existing (previouly reported) data that are publicly available, such as RNA-seq data using cells with 2-4 cell passages, it may be possible to show that fewer passages are better in primary culture by comparing the data presented in this paper.

      Following reviewers’ comments, we have performed systematic comparison (Crnkovic et al., 2022: GSE210248), very early (this study: GSE255669) and later passage cells (Chelladurai et al., 2022: GSE144932; Gorr et al., 2020: GSE144274). in the revised manuscript in order to comprehensively address the issue and define changes occurring as a result of prolonged in vitro conditions (Figure 3H). The results showed that the expression profile of early passage cells retains some of the key phenotypic features displayed by cells in their native environment, with PASMC displaying a more gradual loss of phenotypic characteristics compared to PAAF. Interestingly, PAAF displayed a striking inverse enrichment for inflammatory/NF-kB signaling between fresh and cultured PAAF, which could potentially be caused by the hydrocortisone supplement in the PAAF cell media or due to the isolation from its highly immunomodulatory enviroment. These points have been described in the revised results section and mentioned in the discussion.

      (4) The authors describe a study characterized by decreased expression of "cytoskeletal contractile elements" in pulmonary artery smooth muscle cells (PASMC) derived from patients with IPAH. What are the implications of this result, and does it arise from the use of smooth muscle in patients resistant to pulmonary artery smooth muscle dilating agents? A discussion on this issue needs to be made in a way that is easy for the reader to understand.

      The reviewer raises an interesting point regarding the loss the contractile markers and response to vasodilating therapy. We would speculate that isolated decrease in contractile machinery, without concomitant change in ECM and other PASMC features, would dampen both the contraction and relaxation properties of the single PASMC, affecting not only its response to dilating agents, but also to vasoconstrictors. Clinical consequences and responsiveness to dilating agents are more difficult to predict, since the vasoactive response would additionally depend on mechanical properties of the pulmonary artery defined by cellular and ECM composition. Nevertheless, we believe that decreased expression of contractile machinery reflects an intrinsic, “programmed” response of SMC to remodeling, rather than vasodilator therapy-induced selection pressure, since similar phenotypic change is observed in SMC from systemic circulation and in various animal models without exposure to PAH medication. These considerations have been included in the revised discussion section.

      (5) There are a lot of secreted proteins that increase or decrease in Figure 6G, but there is scant reason to focus on PTX3 and HGF among them. The authors need to elaborate on the above issue.

      We regret the lack of clarity and provide improved explanation of the ligand selection strategy in the revised manuscript. In order to prioritize the potential hits, we first used hierarchical clustering to group co-regulated ligands into smaller number of groups. We then prioritized for the ligands that lacked or had limited information with respect to IPAH. Based on these results, we analyzed the effect of three additional ligands on PASMC cell state marker expression (Figure S8). This additional data supported the initial focus on PTX3 and HGF.

      Minor comments:

      (1) Regarding the number of specimens used in the Result, it would be more helpful to the reader if the number of samples were also mentioned in the text.

      We have included the number of used samples in manuscript text.

      (2) There is no explanation of what R2Y represents in Figure 2B. This reviewer is not able to understand the statistical analysis of Figure 2H. The detailed results should be explained.

      We apologize for the oversight in labeling of Figure 2B and modify the figure legend: “Orthogonal projection to latent structures-discriminant analysis (OPLS-DA) T score plots separating predictive variability (x-axis), attributed to biological grouping, and non-predictive variability (technical/inter-individual, y-axis). Monofactorial OPLS-DA model for separation according to cell type or disease. C) Bifactorial OPLS-DA model considering cell type and disease simultaneously. Ellipse depicting the 95% confidence region, Q2 denoting model’s predictive power (significance: Q2>50%) and R2Y representing proportion of variance in the response variable explained by the model (higher values indicating better fit)”.

      We also modified figure legend wording for the analysis in Figure 2H (new Figure 3E) to clarify the independent factors whose interaction was investigated using 3-way ANOVA: “Interaction effects of stimulation, cell type, and disease state on cellular proliferation were analyzed by 3-way ANOVA. Significant interaction effects are indicated as follows: * for stimulation × cell type interactions and # for cell type × disease state interactions (both *, # p<0.05)”.

      (3) In Figure 3, the authors examined whether there were molecular abnormalities common to IPAH-PASMC and IPAH-PAAF and found that the number of commonly regulated genes and proteins was limited to 47. Further analysis of these regulators by STRING analysis revealed that factors related to the regulation of apoptosis are commonly altered in both cells. On the other hand, the authors focused on mitochondria, as SOD2 is downregulated, and found an increase in ROS production specific to PASMC, indicating that mitochondrial dysfunction is common to PASMC and PAAF in IPAH, but downstream phenomena are different between cell types. Factors associated with apoptosis regulation have been found to be both upward and downward regulated, but the actual occurrence of apoptosis in both cell types has not been addressed.

      We have performed TUNEL staining on FFPE lung tissue from donors and IPAH patients that revealed apoptosis as a rare event in both conditions in PASMC and PAAF. Therefore, no meaningful quantification could be conducted. An example of pulmonary artery where rare positive signal in either PAAF or PASMC could be found is provided in Figure 4H.

      Unfortunately, association of a particular gene with a pathway is by default arbitrary and potentially ambiguous. In particular, factors identified as associated in apoptosis are also involved in regulation of inflammatory signaling (BIRC3, DDIT3) and amino acid metabolism (SHMT1). Nevertheless, mitochondria represent a crucial cellular hub for apoptosis regulation and, as shown in the current study, display significant functional alterations in IPAH in both cell types, aligning with reduced mitochondrial superoxide dismutase (SOD2) expression.

      (4) The meaning of the gray circle in Figure 3C should be clarified. Similarly, the meaning of the color in Fig. 3D should be clearly explained. In Figure 3E-G, each cell is significantly different from 18-61 cells, and the number of each cell and the reason should be described.

      We regret the confusion and provide better explanation of the figure legend: “gray nodes representing their putative upstream regulators”, “with color coding reflecting the IPAH dependent regulation”. In the revised Figure panels 4E-G (old 3E-G) we provide the exact number of cells measured in each condition. Although we tried to have comparable cell confluency at the time of measurement, different proliferation rates between cells from different cell type and condition led to different number of measured cells per donor/patient.

      (5) In Figure 4, the authors focus on factors that vary in different directions between cells, revealing fingerprints of molecular changes that differ between cell types, particularly IPAH-PASMC, which acquires a synthetic phenotype with enhanced regulation of chemotaxis elements, whereas IPAH-PAAF, a fast cycling cell characteristics. Next, focusing on the ECM components that were specifically altered in IPAH-PASMC, Nichenet analysis in Figure 5 suggested that ligands from PAAF may act on PASMC, and the authors focused on integrin signaling to examine ECM contact and changes in cell function. The results indicate that adhesion to laminin is poor in PASMC. Although no difference was observed between donor and IPAH PASMCs, a discussion of the reasons for this would be desired and helpful to the readers.

      Both donor and IPAH PASMCs respond similarly to laminin. However, our key finding is the downregulation of laminin in IPAH PAAF, which likely leads to a skewed laminin-to-collagen ratio and altered ECM composition in remodeled arteries. This shift in the ECM class results in altered PASMC behavior, affecting both donor and IPAH cells similarly. In the revised manuscript, we demonstrate that PASMC largely retain the expression pattern of integrin subunits that serve as high-affinity collagen and laminin receptors, with higher levels compared to PAAF (Figure 6F, G). Furthermore, we speculate that the distinct cellular phenotypic responses to collagen versus laminin coatings may arise from different downstream signaling pathways activated by the various integrin subunits (Nguyen et al., 2000). These considerations have been included in the revised discussion: “The comparable responses of donor and IPAH PASMC likely result from their shared integrin receptor expression profiles. Meanwhile, ECM class switching engages different high-affinity integrin receptors, which activate alternative downstream signaling pathways (Nguyen et al., 2000) and lead to differential responses to collagen and laminin matrices. We thus propose a model in which laminins and collagens act as PAAF-secreted ligands, regulating PASMC behavior through their ECM-sensing integrin receptors.”

      (6) Since Figure 3B and Figure 4A seem to show the same results, why not combine them into one?

      Indeed, these figure panels show the same results, but the focus of the investigations in each Figure is different. We therefore opted to keep the panels separate for better clarity and logical link to other panels in the same figure

      (7) In Figure 6, the interaction analysis of scRNAseq data with respect to signaling between PASMC and PAAF was performed using Nichenet and CellChat, showing that signaling from PAAF to PASMC is biased toward secreted ligands and that a functionally relevant set of soluble ligands is impaired in the IPAH state. From there, they proceeded with co-culture experiments and showed that co-culture healthy PASMC with PAAF of IPAH patients abolished PASMC markers in the healthy state. Furthermore, the authors attempted to identify ligands that induce functional changes in PASMCs produced from IPAH PAAFs and found that HGF is a factor that downregulates the expression of contractile markers in PASMCs. Further insights may be gained by co-culturing IPAH-derived cells in co-culture experiments. Also, no beneficial effect of pentraxin3 was found in Figure 6H. The authors should examine the effect of pentraxin3 on PASMC cells derived from IPAH patients, rather than healthy donors.

      We tested the influence of IPAH-PASMC on donor-PAAF and found no effect on the expression of the selected markers. We thank the reviewer for the suggestion to conduct the experiments on IPAH-PASMC. The new data show that both PTX3 and HGF have a significant effect, but differential effect on IPAH-PASMC as compared to donors-PASMC. Whereas PTX lacks effect on donor PASMC, it leads to downregulation of some of the contractile markers in IPAH PASMC, while HGF upregulates VCAN synthetic marker in IPAH PASMC. These results are now included in Figure 7H.

      Reviewer #2 (Recommendations For The Authors):

      The authors should double-check for grammar and typos in the manuscript. I caught a few such as "therefor" and others, but there could be more.

      We thank the reviewer for the effort and time in reading and evaluating the manuscript. To the best of our knowledge, we have corrected the grammatical errors in the revised manuscript.

    1. eLife Assessment

      The paper presents a valuable theoretical treatment of the role of passage of time in optimal decision strategies in pursuit based tasks. The computational evidence and methodologies employed are novel, and the authors offer solid evidence for the majority of the claims.

    2. Reviewer #2 (Public review):

      Summary:

      This paper from Sutlief et al. focuses on an apparent contradiction observed in experimental data from two related types of pursuit-based decision tasks. In "forgo" decisions, where the subject is asked to choose whether or not to accept a presented pursuit, after which they are placed into a common inter-trial interval, subjects have been shown to be nearly optimal in maximizing their overall rate of reward. However, in "choice" decisions, where the subject is asked which of two mutually-exclusive pursuits they will take, before again entering a common inter-trial interval, subjects exhibit behavior that is believed to be sub-optimal. To investigate this contradiction, the authors derive a consistent reward-maximizing strategy for both tasks using a novel and intuitive geometric approach that treats every phase of a decision (pursuit choice and inter-trial interval) as vectors. From this approach, the authors are able to show that previously-reported examples of sub-optimal behavior in choice decisions are in fact consistent with a reward-maximizing strategy. Additionally, the authors are able to use their framework to deconstruct the different ways the passage of time impacts decisions, demonstrating the time cost contains both an opportunity cost and an apportionment cost, as well as examine how a subject's misestimation of task parameters impacts behavior.

      Strengths:

      The main strength of the paper lies in the authors' geometric approach to studying the problem. The authors chose to simplify the decision process by removing the highly technical and often cumbersome details of evidence accumulation that is common in most of the decision-making literature. In doing so, the authors were able to utilize a highly accessible approach that is still able to provide interesting insights into decision behavior and the different components of optimal decision strategies.

      Weaknesses:

      The authors have made great improvements to the strength of their evidence through revision, especially concerning their treatment of apportionment cost. However, I am concerned that the story this paper tells is far from concise, and that this weakness may limit the paper's audience and overall impact. I would strongly suggest making an effort to tighten up the language and structure of the paper to improve its readability and accessibility.

    3. Reviewer #3 (Public review):

      Summary:

      The goal of the paper is to examine the objective function of total reward rate in an environment to understand behavior of humans and animals in two types of decision-making tasks: 1) stay/forgo decisions and 2) simultaneous choice decisions. The main aims are to reframe the equation of optimizing this normative objective into forms that are used by other models in the literature like subjective value and temporally discounted reward. One important contribution of the paper is the use of this theoretical analysis to explain apparent behavioral inconsistencies between forgo and choice decisions observed in the literature.

      Strengths:

      The paper provides a nice way to mathematically derive different theories of human and animal behavior from a normative objective of global reward rate optimization. As such, this work has value in trying to provide a unifying framework for seemingly contradictory empirical observations in literature, such as differentially optimal behaviors in stay-forgo v/s choice decision tasks. The section about temporal discounting is particularly well motivated as it serves as another plank in the bridge between ecological and economic theories of decision-making. The derivation of the temporal discounting function from subjective reward rate is much appreciated as it provides further evidence for potential equivalence between reward rate optimization and hyperbolic discounting, which is known to explain a slew of decision-making behaviors in the economics literature.

      Weaknesses:

      (1) Readability and organization:<br /> While I appreciate the detailed analysis and authors' attempts to provide as many details as possible, the paper would have benefitted from a little selectivity on behalf of the authors so that the main contributions aren't buried by the extensive mathematical detail provided.<br /> For instance, in Figure 5, the authors could have kept the most important figures (A, B and G) to highlight the most relevant terms in the subjective value instead of providing all possible forms of the equation.

      Further, in subfigure 5E, is there a reason that the outside reward r_out is shown to be zero? The text referencing 5E is also very unclear: "In so downscaling, the subjective value of a considered pursuit (green) is to the time it would take to traverse the world were the pursuit not taken, 𝑡_out, as its opportunity cost subtracted reward (cyan) is to the time to traverse the world were it to be taken (𝑡_in+ 𝑡_out) (Figure 5E)."

      In the abstract, the malapportionment of time is mentioned as a possible explanation for reconciling observed empirical results between simultaneous and sequential decision-making. However, perhaps due to the density of mathematical detail presented, the discussion of the malapportionment hypothesis is pushed all the way to the end of the discussion section.

      (2) Apportionment Cost definition and interpretation<br /> This additional cost arises in their analyses from redefining the opportunity cost in terms of just "outside" rewards so that the subjective value of the current pursuit and the opportunity cost are independent of each other. However, in doing so, an additional term arises in defining the subjective value of a pursuit, named here the "apportionment cost". The authors have worked hard to provide a definition to conceptualize the apportionment cost though it remains hard to intuit, especially in comparison to the opportunity cost. The additive form of apportionment cost (Equation 9) doesn't add much in way of intuition or their later analyses for the malapportionment hypothesis. It appears that the most important term is the apportionment scaling term so just focusing on this term will help the reader through the subsequent analyses.

      (3) Malapportionment Hypothesis: From where does this malapportionment arise?<br /> The authors identify the range of values for t_in and t_out in Figure 18, the terms comprising the apportionment scaling term, that lead to optimal forgo behaviors despite suboptimally rejecting the larger-later (LL) choice in choice decisions. They therefore conclude that a lower apportionment scale, which arises from overestimating the time required outside the pursuit (t_out) or underestimating the time required at the current pursuit (t_in). What is not discussed though is whether and how the underestimation of t_out and overestimation of t_in can be dissociated, though it is understood that empirical demonstration of this dissociation is outside the scope of this work.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      (1) Although there are many citations acknowledging relevant previous work, there often isn't a very granular attribution of individual previous findings to their sources. In the results section, it's sometimes ambiguous when the paper is recapping established background and when it is breaking new ground. For example, around equation 8 in the results (sv = r - rho*t), it would be good to refer to previous places where versions of this equation have been presented. Offhand, McNamara 1982 (Theoretical Population Biology) is one early instance and Fawcett et al. 2012 (Behavioural Processes) is a later one. Line 922 of the discussion seems to imply this formulation is novel here.

      We would like to clarify that original manuscript equation 8, , as we derive, is not new, as it is similarly expressed in prior foundational work by McNamara (1982), and we thank the reviewer for drawing our attention to the extension of this form by Fawcett, McNamara, Houston (2012).

      We now so properly acknowledge this foundational work and extension in the results section…

      “This global reward-rate equivalent immediate reward (see Figure 4) is the subjective value of a pursuit, svPursuit (or simply, sv, when the referenced pursuit can be inferred), as similarly expressed in prior foundational work (McNamara 1982), and subsequent extensions (see (Fawcett, McNamara, Houston (2012)).”

      …and in the Discussion section at the location referenced by the reviewer:

      “From it, we re-expressed the pursuit’s worth in terms of its global reward rate-equivalent immediate reward, i.e., its ‘subjective value’, reprising McNamara’s foundational formulation (McNamara 1982).”

      (2) The choice environments that are considered in detail in the paper are very simple. The simplicity facilitates concrete examples and visualizations, but it would be worth further consideration of whether and how the conclusions generalize to more complex environments. The paper considers "forgo" scenario in which the agent can choose between sequences of pursuits like A-B-A-B (engaging with option B at all opportunities, which are interleaved with a default pursuit A) and A-A-A-A (forgoing option B). It considers "choice" scenarios where the agent can choose between sequences like A-B-A-B and A-C-A-C (where B and C are larger-later and smaller-sooner rewards, either of which can be interleaved with the default pursuit). Several forms of additional complexity would be valuable to consider. [A] One would be a greater number of unique pursuits, not repeated identically in a predictable sequence, akin to a prey-selection paradigm. It seems to me this would cause t_out and r_out (the time and reward outside of the focal prospect) to be policy-dependent, making the 'apportionment cost' more challenging to ascertain. Another relevant form of complexity would be if there were [B] variance or uncertainty in reward magnitudes or temporal durations or if [C] the agent had the ability to discontinue a pursuit such as in patch-departure scenarios.

      A) We would like to note that the section “Deriving Optimal Policy from Forgo Decision-making worlds”, addresses the reviewer’s scenario of n-number of pursuits”, each occurring at their own frequency, as in prey selection, not repeating identically in a predictable sequence. Within our subsection “Parceling the world…”, we introduce the concept of dividing a world (such as that) into the considered pursuit type, and everything outside of it. ‘Outside’ would include any number of other pursuits currently part of any policy, as the reviewer intuits, thus making t<sup>out</sup> and r<sup>out</sup> policy dependent. Nonetheless, a process of excluding (forgoing) pursuits by comparing the ‘in’ to the ‘out’ reward rate (section “Reward-rate optimizing forgo policy…”) or its equivalent sv (section “The forgo decision can also be made from subjective value), would iteratively lead to the global reward rate maximizing policy. This manner of parceling into ‘in’ and ‘out’ thus simplifies visualization of what can be complex worlds. Simpler cases that resemble common experimental designs are given in the manuscript to enhance intuition.

      We thank the reviewer for this keen suggestion. We now include example figures (Supplemental 1 & 2) for multi-pursuit worlds which have the same (Supplemental 1) and different pursuit frequencies (Supplemental 2), which illustrate how this evaluation leads to reward-rate optimization. This addition demonstrates how an iterative policy would lead to reward rate maximization and emphasizes how parcellating a world into ‘in’ and ‘out’ of the pursuit type applies and is a useful device for understanding the worth of any given pursuit in more complex worlds. The policy achieving the greatest global reward rate can be realized through an iterative process where pursuits with lower reward rates than the reward rate obtained from everything other than the considered pursuit type are sequentially removed from the policy.

      B) We would also emphasize that the formulation here contends with variance or uncertainty in the reward magnitudes or temporal durations. The ‘in’ pursuit is the average reward and the average time of the considered pursuit type, as is the ‘out’ the average reward and average time outside of the considered pursuit type.

      C) In this work, we consider the worth of initiating one-or-another pursuit (from having completed a prior one), and not the issue of continuing within a pursuit (having already engaged it), as in patch/give-up. Handling worlds in which the agent may depart from within a pursuit, which is to say ‘give-up’ (as in patch foraging), is outside the scope of this work.

      (3) I had a hard time arriving at a solid conceptual understanding of the 'apportionment cost' around Figure 5. I understand the arithmetic, but it would help if it were possible to formulate a more succinct verbal description of what makes the apportionment cost a useful and meaningful quality to focus on.

      We thank the reviewer for pressing for a succinct and intuitive verbal description.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in new paragraphs (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (original lines 434,435,447,450).

      I think Figure 6C relates to this, but I had difficulty relating the axis labels to the points, lines, and patterned regions in the plot.

      We thank the reviewer for pointing out that this figure can be made to be more easily understood.

      We have done so by breaking its key features over a greater number of plots so that no single panel is overloaded. We have also changed text in the legend to clarify how apportionment and opportunity costs add to constitute time’s cost, and also correspondingly in the main text.

      I also was a bit confused by how the mathematical formulation was presented. As I understood it, the apportionment cost essentially involves scaling the rest of the SV expression by t<sup>out</sup>/(t<sup>in</sup> + t<sup>out</sup>).

      The reviewer’s understanding is correct: the amount of reward of the pursuit that remains after subtracting the opportunity cost, when so scaled, is equivalent to the subjective value of that pursuit. The amount by which that scaling decreases the rest of the SV expression is equal to the apportionment cost of time.

      The way this scaling factor is written in Figure 5C, as 1/(1 + (1/t<sup>out</sup>) t<sup>in</sup>), seems less clear than it could be.

      To be sure, we present the formula in original Figure 5C in this manner to emphasize the opportunity cost subtraction as separable from the apportionment rescaling, expressing the opportunity cost subtraction and the apportionment scaling component of the equation as their own terms in parentheses.

      But we understand the reviewer to be referring to the manner by which we chose to express the scaling term. We presented it in this way in the original manuscript, (rather than its more elegant form recognized by the reviewer) to make direct connection to temporal discounting literature. In this literature, discounting commonly takes the same mathematical form as our apportionment cost scaling, but whereas the steepness of discounting in this literature is controlled by a free fit parameter, k, we show how for a reward rate maximizing agent, the equivalent k term isn’t a free fit parameter, but rather is the reciprocal of the time spent outside the considered pursuit type.

      We take the reviewer’s advice to heart, and now first express subjective value in the format that emphasizes opportunity cost subtraction followed by an apportionment downscaling, identifying the apportionment scaling term, t<sup>out</sup>/(t<sup>out</sup> + t<sup>in</sup>), ie the outside weight. Figure 5 now shows the geometric representation of apportionment scaling and apportionment cost. Only subsequently in the discounting function section then do we now in the revised manuscript rearrange this subjective value expression to resemble the standard discounting function form.

      Also, the apportionment cost is described in the text as being subtracted from sv rather than as a multiplicative scaling factor.

      What we describe in the original text is how apportionment cost is a component of time’s cost, and how sv is the reward less time’s cost. It would be correct to say that apportionment cost and opportunity cost are subtracted from the pursuit’s reward to yield the subjective value of the pursuit. This is what we show in the original Figure 5D graphically. Original Figure 5 and accompanying formulas at its bottom show the equivalence of expressing sv in terms of subtracting time’s cost as calculated from the global reward rate under a policy of accepting the considered pursuit, or, of subtracting opportunity cost and then scaling the opportunity cost subtracted reward by the apportionment scaling term, thereby accounting for the apportionment cost of time.

      The revision of original figure 5, its figure legend, and accompanying text now make clear the meaning of apportionment cost, how it can be considered a subtraction from the reward of a pursuit, or, equivalently, how it can be thought of as the result of scaling down of opportunity cost subtracted reward.

      It could be written as a subtraction, by subtracting a second copy of the rest of the SV expression scaled by t_in/(t_in + t_out). But that shows the apportionment cost to depend on the opportunity cost, which is odd because the original motivation on line 404 was to resolve the lack of independence between terms in the SV expression.

      On line 404 of the original manuscript, we point out that the simple equation―which is a reprisal of McNamara’s insight―is problematic in that its terms on the RHS are not independent: the global reward rate is dependent on the considered pursuit’s reward (see Fig5B). The alternative expression for subjective value that we derive expresses sv in terms that are all independent of one another. We may have unintentionally obscured that fact by having already defined rho<sup>in</sup> as r<sup>in</sup>/ t<sup>in</sup> and rho<sup>out</sup> as r<sup>out</sup>/t<sup>out</sup> on lines 306 and 307.

      Therefore, in the revision, Ap 8 is expressed so to keep clear that it uses terms that are all independent of one another, and only subsequently express this formula with the simplifying substitution, rho<sup>out</sup>.

      That all said, we understand the reviewer’s point to be that the parenthetical terms relating the opportunity cost and the apportionment rescaling both contain within them the parameter t<sup>out</sup>, and in this way these concepts we put forward to understand the alternative equation are non-independent. That is correct, but it isn’t at odds with our objective to express SV in terms that are independent with one another (which we do). Our motivation in introducing these concepts is to provide insight and intuition into the cost of time (especially now with a clear and simple definition of apportionment cost stated). We go to lengths to demonstrate their relationship to each other.

      (4) In the analysis of discounting functions (line 664 and beyond), the paper doesn't say much about the fact that many discounting studies take specific measures to distinguish true time preferences from opportunity costs and reward-rate maximization.

      We understand the reviewer’s comment to connote that temporal decision-making worlds in which delay time does not preclude reward from outside the current pursuit is a means to distinguish time preference from the impact of opportunity cost. One contribution of this work is to demonstrate that, from a reward-rate maximization framework, an accounting of opportunity cost is not sufficient to understand apparent time preferences as distinguishable from reward-rate maximization. The apportionment cost of time must also be considered to have a full appreciation of the cost of time. For instance, let us consider a temporal decision-making world in which there is no reward received outside the considered pursuit. In such a world, there is no opportunity cost of time, so apparent temporal discounting functions would appear as if purely hyperbolic as a consequence of the apportionment cost of time alone. Time preference, as revealed experimentally by the choices made between a SS and a LL reward, then, seem confounding, as preference can reverse from a SS to a LL option as the displacement of those options (maintaining their difference in time) increases (Green, Fristoe, and Myerson 1994; Kirby and Herrnstein 1995). While this shift, the so-called “Delay effect”, could potentially arise as a consequence of some inherent time preference bias of an agent, we demonstrate that a reward-rate maximal agent exhibits hyperbolic discounting, and therefore it would also exhibit the Delay effect, even though it has no time preference.

      In the revision we now make reference to the Delay Effect (in abstract, results new section “The Delay Effect” with new figure 14, and in the discussion), which is taken as evidence of time preference in human and animal literature, and note explicitly how a reward-rate maximizing agent would also exhibit this behavior as a consequence of apparent hyperbolic discounting.

      In many of the human studies, delay time doesn't preclude other activities.

      Our framework is generalizable to worlds in which being in pursuit does not preclude an agent from receiving reward during that time at the outside reward rate. Original Ap 13 solves for such a condition, and shows that in this context, the opportunity cost of time drops out of the SV equation, leaving only the consequences of the apportionment cost of time. We made reference to this case on lines 1032-1034 of the original manuscript: “In this way, such hyperbolic discounting models [models that do not make an accounting of opportunity cost] are only appropriate in worlds with no “outside” reward, or, where being in a pursuit does not exclude the agent from receiving rewards at the rate that occurs outside of it (Ap. 13).”

      The note and reference is fleeting in the original work. We take the reviewer’s suggestion and now add paragraphs in the discussion on the difference between humans and animals in apparent discounting, making specific note of human studies in which delay time doesn’t preclude receiving outside reward while engaged in a pursuit. Relatedly, hyperbolic discounting is oft considered to be less steep in humans than in animals. As the reviewer points out, these assessments are frequently made under conditions in which being in a pursuit does not preclude receiving reward from outside the pursuit. When humans are tested under conditions in which outside rewards are precluded, they exhibit far steeper discounting. We now include citation to that observation (Jimura et al. 2009). We handle such conditions in original AP 13, and show how, in such worlds, the opportunity cost of time drops out of the equation. The consequence of this is that the apparent discounting function would become less steep (the agent would appear as if more patient), consistent with reports.

      “Relating to the treatment of opportunity cost, we also note that many investigations into temporal discounting do not make an explicit distinction between situations in which 1) subjects continue to receive the usual rewards from the environment during the delay to a chosen pursuit, and 2) situations in which during a chosen pursuit’s delay no other rewards or opportunities will occur (Kable & Glimcher, 2007; Kirby & Maraković, 1996; McClure, Laibson, Loewenstein, & Cohen, 2004). Commonly, human subjects are asked to answer questions about their preferences between options for amounts they will not actually earn after delays they will not actually have to wait, during which it is unclear whether they are really investing time away from other options or not (Rosati et al., 2007). In contrast, in most animal experiments, subjects actually receive reward after different delays during which they do not receive new options or rewards. By our formulation, when a pursuit does not exclude the agent from receiving rewards at the rate that occurs outside, the opportunity cost of time drops out of the subjective value equation (Ap 12).

      Equation 10. The value of initiating a pursuit when pursuit does not exclude receiving rewards at the outside rate (Ap 12)

      Therefore, the reward-rate maximizing discounting function in these worlds is functionally equivalent to the situation in which the outside reward rate is zero, and will―lacking an opportunity cost―be less steep. This rationalizes why human discounting functions are often reported to be longer (gentler) than animal discounting functions: they are typically tested in conditions that negate opportunity cost, whereas animals are typically tested in conditions that enforce opportunity costs. Indeed, when humans are made to wait for actually received reward, their observed discounting functions are much steeper (Jimura et al. 2009). “

      In animal studies, rate maximization can serve as a baseline against which to measure additional effects of temporal discounting. This is an important caveat to claims about discounting anomalies being rational under rate maximization (e.g., line 1024).

      We agree that the purpose of this reward-rate maximizing framework is to serve as a point of comparison in which effects of temporal intervals and rewards that define the environment can be analyzed to better understand the manner in which animals and humans deviate from this ideal behavior. Our interest in this work is in part motivated by a desire to have a deeper understanding of what “true” time preference means. Using the reward-rate maximizing framework here provides a means to speak about time preferences (ie biases) in terms of deviation from optimality. From this perspective, a reward-rate maximal agent doesn’t exhibit time preference: its actions are guided solely by reward-rate optimizing valuation. Therefore, one contribution of this work is to show that purported signs of time preference (hyperbolic discounting, magnitude, sign, and (now) delay effect) can be explained without invoking time preference. What errors from optimality that remain following an proper accounting of reward-rate maximizing behavior should then, and only then, be considered from the lens of time preference (bias).

      (5) The paper doesn't feature any very concrete engagement with empirical data sets. This is ok for a theoretical paper, but some of the characterizations of empirical results that the model aims to match seem oversimplified. An example is the contention that real decision-makers are optimal in accept/reject decisions (line 816 and elsewhere). This isn't always true; sometimes there is evidence of overharvesting, for example.

      We would like to note that the scope of this paper is limited to examining the value of initiating a pursuit, rather than the value of continuing within a pursuit. The issue of continuing within a pursuit constitutes a third fundamental topology, which could be called give-up or patch-foraging, and is complex and warrants its own paper. In Give-up topologies, which are distinct from Forgo, and Choice topologies, the reviewer is correct in pointing out that the preponderance of evidence demonstrates that animals and humans are as if overpatient, adopting a policy of investing too much time within a pursuit, than is warranted_._ In Forgo instances, however, the evidence supports near optimality.

      (6) Related to the point above, it would be helpful to discuss more concretely how some of this paper's theoretical proposals could be empirically evaluated in the future. Regarding the magnitude and sign effects of discounting, there is not a very thorough overview of the several other explanations that have been proposed in the literature. It would be helpful to engage more deeply with previous proposals and consider how the present hypothesis might make unique predictions and could be evaluated against them.

      We appreciate the reviewer’s point that there are many existing explanations for these various ‘anomalous’ effects. We hold that the point of this work is to demonstrate that these effects are consistent with a reward-rate maximizing framework so do not require additional assumptions, like separate processes for small and large rewards, or the inclusion of a utility function.

      Nonetheless, there is a diversity of explanations for the sign and magnitude effect, and, (now with its explicit inclusion in the revision) the delay effect. Therefore, we now also include reference to additional work which proffers alternative explanations for the sign and magnitude effects, (as reviewed by (Kalenscher and Pennartz 2008; Frederick et al. 2002)), as well as a scalar timing account of non-stationary time preference (Gibbon, 1977).

      With respect to making predictions, this framework makes the following in regards to the magnitude, sign, and (now in the revision) delay effect: in Discussion, Magnitude effect subsection: “The Magnitude Effect should be observed, experimentally, to diminish when 1) increasing the outside time while holding the outside reward constant, (thus decreasing the outside reward rate), or when 2) decreasing the outside reward while holding the outside time constant (thus decreasing the outside reward rate). However, 3) the Magnitude Effect would exaggerate as the outside time increased while holding the outside reward rate constant.”, in Sign effect subsection: “…we then also predict that the size of the Sign effect would diminish as the outside reward rate decreases (and as the outside time increases), and in fact would invert should the outside reward rate turn negative (become net punishing), such that punishments would appear to discount more steeply than rewards.” Delay effect subsection: “...a sign of irrationality is that a preference reversal occurs at delays greater than what a reward-rate-maximizing agent would exhibit.”

      A similar point applies to the 'malapportionment hypothesis' although in this case there is a very helpful section on comparisons to prior models (line 1163). The idea being proposed here seems to have a lot in common conceptually with Blanchard et al. 2013, so it would be worth saying more about how data could be used to test or reconcile these proposals.

      We thank the reviewer for holding that the section of model comparisons to be very helpful. We believe the text previously dedicated to this issue to be sufficient in this regard. We have, however, adding substantively to the Malapportionment Hypothesis section (Discussion) and its accompanying figure, to make explicit a number of predictions from the Malapportionment hypothesis as it relates to Hyperbolic discounting, the Delay Effect, and the Sign and Magnitude Effects.

      Reviewer #1 Recommendations

      (1) As a general note about the figures, it would be helpful to specify, either graphically or in the caption, what fixed values of reward sizes and time intervals are being assumed for each illustration.

      Thank you for the suggestion. We attempted to keep graphs as uncluttered as possible, but agree that for original figures 4,5,16, and 17, which didn’t have numbered axes, that we should provide the amounts in the captions in the revised figures (4,5, and now 17,18). These figures did not have numerics as their shapes and display are to illustrate the form of the relationship between vectors, being general to the values they may take.

      We now include in the captions for these figures the parameter amounts used.

      (2) Should Equation 2 have t in the denominator instead of r?

      Indeed. We thank the reviewer for catching this typographical error.

      We have corrected it in the revision.

      (3) General recommendation:

      My view is that in order for the paper's eLife assessment to improve, it would be necessary to resolve points 1 through 4 listed under "weaknesses" in my public review, which pertain to clarity and acknowledgement of prior work. I think a lot hinges on whether the authors can respond to point #3 by making a more compelling case for the usefulness and generality of the 'apportionment cost' concept, since that idea is central to the paper's contribution.

      We believe these critical points (1-4) to improve the paper will now have been addressed to the reviewer’s satisfaction.

      Reviewer #2 (Public review):

      While the details of the paper are compelling, the authors' presentation of their results is often unclear or incomplete:

      (1) The mathematical details of the paper are correct but contain numerous notation errors and are presented as a solid block of subtle equation manipulations. This makes the details of the authors' approach (the main contribution of the paper to the field) highly difficult to understand.

      We thank the reviewers for having detected typographical errors regarding three equations. They have been corrected. The first typographical error in the original main text (Line 277) regards equation 2 and will be corrected so that equation 2 appears correctly as

      The second typo regards the definition of the considered pursuit’s reward rate which appear in the original main text (line 306), and has been corrected to appear as

      The third typographical error occurred in conversion from Google Sheets to Microsoft Word appearing in the original main text (line 703) and regards the subjective value expression when no reward is received in an intertrial interval (ITI). It has been corrected to appear as

      (2) One of the main contributions of the paper is the notion that time’s cost in decision-making contains an apportionment cost that reflects the allocation of decision time relative to the world. The authors use this cost to pose a hypothesis as to why subjects exhibit sub-optimal behavior in choice decisions. However, the equation for the apportionment cost is never clearly defined in the paper, which is a significant oversight that hampers the effectiveness of the authors' claims.

      We thank the reviewer for pressing on this critical point. Reviewers commonly identified a need to provide a concise and intuitive definition of apportionment cost, and to explicitly solve and provide for its mathematical expression.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in new paragraphs (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      (3) Many of the paper's figures are visually busy and not clearly detailed in the captions (for example, Figures 6-8). Because of the geometric nature of the authors' approach, the figures should be as clean and intuitive as possible, as in their current state, they undercut the utility of a geometric argument.

      We endeavored to make our figures as simple as possible. We have made in the revision changes to figures that we believe improve their clarity. These include: 1) breaking some figures into more panels when more than one concept was being introduced (such as in revised Figure 5 , 6, 7, and 8), 2) using the left hand y axis for the outside reward, and the right hand axis for the inside reward when plotting the “in” and “outside” reward, and indicating their respective numerics (which run in opposite directions), 3) adding a legend to the figures themselves where needed (revised figures 10, 11, 12, 14) 4) adding the values used to the figure captions, where needed, and 5) ensuring all symbols are indicated in legends.

      (4) The authors motivate their work by focusing on previously-observed behavior in decision experiments and tell the reader that their model is able to qualitatively replicate this data. This claim would be significantly strengthened by the inclusion of experimental data to directly compare to their model's behavior. Given the computational focus of the paper, I do not believe the authors need to conduct their own experiments to obtain this data; reproducing previously accepted data from the papers the authors' reference would be sufficient.

      Our objective was not to fit experimentally observed data, as is commonly the goal of implementation/computational models. Rather, as a theory, our objective is to rationalize the broad, curious, and well-established pattern of temporal decision-making behaviors under a deeper understanding of reward-rate maximization, and from that understanding, identify the nature of the error being committed by whatever learning algorithm and representational architecture is actually being used by humans and animals. In doing so, we make a number of important contributions. By identifying and analyzing reward-rate-maximizing equations, we 1) provide insight into what composes time’s cost and how the temporal structure of the world in which it is embedded (its ‘context’) impacts the value of a pursuit, 2) rationalize a diverse assortment of temporal decision-making behaviors (e.g., Hyperbolic discounting, the Magnitude Effect, the Sign Effect, and the Delay effect), explaining them with no assumed free-fit parameter, and then, by analyzing error in parameters enabling reward-rate maximization, 3) identify the likely source of error and propose the Malapportionment Hypothesis. The Malapportionment Hypothesis identifies the underweighting of a considered pursuit’s “outside”, and not error in pursuit’s reward rates, as the source of error committed by humans and animals. It explains why animals and humans can present as suboptimally ‘impatient’ in Choice, but as optimal in Forgo. At the same time, it concords with numerous and diverse observations in decision making regarding whether to initiate a pursuit. The nature of this error also, then, makes numerous predictions. These insights inform future computational and experimental work by providing strong constraints on the nature of the algorithm and representational architecture used to learn and represent the values of pursuits. Rigorous test of the Malapportionment Hypothesis will require wholly new experiments.

      In the revision, we also now emphasize and add predictions of the Malapportionment Hypothesis, updated its figure (Figure 21), its legend, and its paragraphs in the discussion.

      “We term this reckoning of the source of error committed by animals and humans the Malapportionment Hypothesis, which identifies the underweighting of the time spent outside versus inside a considered pursuit but not the misestimation of pursuit rates, as the source of error committed by animals and humans (Figure 21). This hypothesis therefore captures previously published behavioral observations (Figure 21A) showing that animals can make decisions to take or forgo reward options that optimize reward accumulation (Krebs et al., 1977; Stephens and Krebs, 1986; Blanchard and Hayden, 2014), but make suboptimal decisions when presented with simultaneous and mutually exclusive choices between rewards of different delays (Logue et al., 1985; Blanchard and Hayden, 2015; Carter and Redish, 2016; Kane et al., 2019). The Malapportionment Hypothesis further predicts that apparent discounting functions will present with greater curvature than what a reward-rate-maximizing agent would exhibit (Figure 21B). While experimentally observed temporal discounting would have greater curvature, the Malapportionment Hypothesis also predicts that the Magnitude (Figure 21C) and Sign effect (Figure 21D) would be less pronounced than what a reward-rate-maximizing agent would exhibit, with these effects becoming less pronounced the greater the underweighting. Finally, with regards to the Delay Effect (Figure 21E), the Malapportionment Hypothesis predicts that preference reversal would occur at delays greater than that exhibited by a reward-rate-maximizing agent, with the delay becoming more pronounced the greater the underweighting outside versus inside the considered pursuit by the agent.”

      (5) While the authors reference a good portion of the decision-making literature in their paper, they largely ignore the evidence-accumulation portion of the literature, which has been discussing time-based discounting functions for some years. Several papers that are both experimentally-(Cisek et al. 2009, Thurs et al. 2012, Holmes et al. 2016) and theoretically-(Drugowitsch et al. 2012, Tajima et al. 2019, Barendregt et al. 22) driven exist, and I would encourage the authors to discuss how their results relate to those in different areas of the field.

      In this manuscript, we consider the worth of initiating one or another pursuit having completed a prior one, and not the issue of continuing within a pursuit having already engaged in it. The worth of continuing a pursuit, as in patch-foraging/give-up tasks, constitutes a third fundamental time decision-making topology which is outside the scope of the current work. It engages a large and important literature, encompassing evidence accumulation, and requires a paper on the value of continuing a pursuit in temporal decision making, in its own right, that can use the concepts and framework developed here. The excellent works suggested by the reviewer will be most relevant to that future work concerning patch-foraging/give-up topologies.

      Reviewer #2 Recommendations:

      (1) In Equation 1, the term rho_d is referred to as the reward rate of the default pursuit, when it should be the reward of the default pursuit.

      Regarding Equation 1, it is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, f<sub>i</sub> is the encounter rate of pursuit i for one unit of time spent in the default pursuit (lines 259-262). Added to the summation in the numerator, we have the average reward obtained in the default pursuit per unit time () and in the denominator we have the time spent in the default pursuit per unit time (1).

      We have added clarifying text to assist in meaning of the equation in Ap 1, and thank the reviewer for pointing out this need.

      (2) The notation for "in" and "out" of a considered pursuit type begins as being used to describe the contribution from a single pursuit (without inter-trial interval) towards global reward rate and the contribution of all other factors (other possible pursuits and inter-trial interval) towards global reward rate, respectively, but is then used to describe the pursuit's contribution and the inter-trial interval's contribution, respectively, to the global reward rate. This should be cleaned up to be consistent throughout, or at the very least, it should be addressed when this special case is considered the default.

      As understood by the reviewer, “in” and “out” of the considered pursuit type describes the general form by which a world can be cleaved into these two parts: the average time and reward received outside of the considered pursuit type for the average time and reward received within that pursuit type. A specific, simple, and common experimental instance would be a world composed of one or another pursuit and an intertrial interval.

      We now make clear how such a world composed of a considered pursuit and an inter trial interval would be but one special case. In example cases where t<sup>out</sup> represents the special case of an inter-trial interval, this is now stated clearly. For instance, we do so when discussing how a purely hyperbolic discounting function would apply in worlds in which no reward is received in t<sup>out</sup>, stating that this is often the case common to experimental designs where t<sup>out</sup> represents an intertrial interval with no reward. Importantly, by the new inclusion of illustrated worlds in the revision that have n-number pursuits that could occur from a default pursuit and 1) equal frequency (Supplemental 1), and 2) at differing frequencies (Supplemental 2), we make more clear the generalizability and utility of this t<sup>out</sup>/tin concept.

      (3) Figure 5 should make clear the decomposition of time's cost both graphically and functionally. As it stands, the figure does not define the apportionment cost.

      In the revision of original fig 5, we now further decompose the figure to effectively convey 1) what opportunity cost, and (especially) 2) the apportionment cost is, both graphically and mathematically, 3) how time’s cost is comprised by them, 4) how the apportionment scaling term scales the opportunity-cost-subtracted reward by time’s allocation to equal the subjective value, and 4) the equivalence between the expression of time’s cost using terms that are not independent of one another with the expression of time’s cost using terms that are independent of one another.

      (4) Figures 6-8 do not clearly define the dots and annuli used in panels B and C.

      We have further decomposed figures 6-8 so that the functional form of opportunity, apportionment, and time’s cost can be more clearly appreciated, and what their interrelationship is with respect to changing outside reward and outside time, and clearly identify symbols used in the corresponding legends.

      (5) The meaning of a negative subjective value should be specifically stated. Is it the amount a subject would pay to avoid taking the considered pursuit?

      As the reviewer intuits, negative subjective value can be considered the amount an agent ought be willing to pay to avoid taking the considered pursuit.

      We now include the following lines in “The forgo decision can also be made from subjective value” section in reference to negative subjective value…

      “A negative subjective value thus indicates that a policy of taking the considered pursuit would result in a global reward rate that is less than a policy of forgoing the considered pursuit. Equivalently, a negative subjective value can be considered the amount an agent ought be willing to pay to avoid having to take the considered pursuit.”

      (6) Why do you define the discounting function as the normalized subjective value? This choice should be justified, via literature citations or a well-described logical argument.

      The reward magnitude normalized subjective value-time function is commonly referred to as the temporal discounting function as it permits comparison of the discount rate isolated from a difference in reward magnitude and/or sign and is deeply rooted in historical precedent. As the reviewer points out, the term is overloaded, however, as investigations in which comparisons between the form of subjective value-time functions is not needed tend to refer to these functions as temporal discounting functions as well.

      We make clear in the revised text in the introduction our meaning and use of the term, the justification in doing so, and its historical roots.

      “Historically, temporal decision-making has been examined using a temporal discounting function to describe how delays in rewards influence their valuation. Temporal discounting functions describe the subjective value of an offered reward as a function of when the offered reward is realized. To isolate the form of discount rate from any difference in reward magnitude and sign, subjective value is commonly normalized by the reward magnitude when comparing subjective value-time functions (Strotz, 1956, Jimura, 2009). Therefore, we use the convention that temporal discounting functions are the magnitude-normalized subjective value-time function (Strotz, 1956).”

      Special addition. In investigating the historical roots of the discounting function prompted by the reviewer, we learned (Grüne-Yanoff 2015) that it was Mazur that simply added the “1+k” in the denominator of the hyperbolic discounting function. Our derivation for the reward-rate optimal agent makes clear why apparent temporal discounting functions ought have this general form.

      Therefore, we add the following to the “Hyperbolic Temporal Discounting Function section in the discussion…

      “It was Ainslie (Ainslie, 1975) who first understood that the empirically observed “preference reversals” between SS and LL pursuits could be explained if temporal discounting took on a hyperbolic form, which he initially conjectured to arise simply from the ratio of reward to delay (Grüne-Yanoff 2015). This was problematic, however, on two fronts: 1) as the time nears zero, the value curve goes to infinity, and 2) there is no accommodation of differences observed within and between subjects regarding the steepness of discounting. Mazur (Mazur, 1987) addressed these issues by introducing 1 + k into the denominator, providing for the now standard hyperbolic discounting function, . Introduction of “1” solved the first issue, though “it never became fully clear how to interpret this 1” (Grüne-Yanoff 2015; interviewing Ainslie). Introduction of the free-fit parameter, k, accommodated the variability observed across and within subjects by controlling the curvature of temporal discounting, and has become widely interpreted as a psychological trait, such as patience, or willingness to delay gratification (Frederick et al., 2002).”

      …continuing later in that section to explain why the reward-rate optimal agent would exhibit this general form…

      “Regarding form, our analysis reveals that the apparent discounting function of a reward-rate-maximizing agent is a hyperbolic function…

      …which resembles the standard hyperbolic discounting function, , in the denominator, where . Whereas Mazur introduced 1 + k to t in the denominator to 1) force the function to behave as t approaches zero, and 2) provide a means to accommodate differences observed within and between subjects, our derivation gives cause to the terms 1 and k, their relationship to one another, and to t in the denominator. First, from our derivation, “1” actually signifies taking t<sub>out</sub> amount of time expressed in units of t<sub>out</sub> (t<sub>out</sub>/t<sub>out</sub>=1) and adding it to t<sub>in</sub>  amount of time expressed in units of t<sub>out</sub> (ie, the total time to make a full pass through the world expressed in terms of how the agent apportions its time under a policy of accepting the considered pursuit).”

      Additional Correction. In revising the section, “Hyperbolic Temporal Discounting Functions” in the discussion, we also detected an error in our description of the meaning of suboptimal bias for SS. In the revision, the sentence now reads…

      More precisely, what is meant by this suboptimal bias for SS is that the switch in preference from LL to SS occurs at an outside reward rate that is lower—and/or an outside time that is greater —than what an optimal agent would exhibit.”

      (7) Figure 15B should have negative axes defined for the pursuit's now negative reward.

      Yes- excellent point.

      To remove ambiguity regarding the valence of inside and outside reward magnitudes, we have changed all such figures so that the left hand y-axis is used to signify the outside reward magnitude and sign, and so that the right hand y-axis is used to signify the inside reward magnitude and sign.

      With respect to the revision of original 15B, this change now makes clear that the inside reward label and numerics on the right hand side of the graph run from positive (top) to negative (bottom) values so that it can now be understood that the magnitude of the inside reward is negative in this figure (ie, a punishment). The left hand y-axis labeling the outside reward magnitude has numerics that run in the opposite direction, from negative (top) to positive (bottom). In this figure, the outside reward rate is positive whereas the inside reward rate is negative.

      (8) When comparing your discounting function to the TIMERR and Heuristic models, it would be useful to include a schematic plot illustrating the different obtainable behaviors from all models rather than just telling the reader the differences.

      We hold that the descriptions and references are sufficient to address these comparisons.

      (9) I would strongly suggest cleaning up all appendices for notation…

      The typographical errors that have been noted in these reviews have all been corrected. We believe the reviewer to be referring here to the manner that we had cross-referenced Equations in the appendices and main text which can lead to confusion between whether an equation number being referenced is in regard to its occurrence in the main text or its occurrence in the appendices.

      In the revision, we eliminate numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are numbered sequentially and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      …and replacing some of the small equation manipulations with written text describing the goal of each derivation.

      To increase clarity, we have taken the reviewer’s helpful suggestion, adding helper text in the appendices were needed, and have bolded the equations of importance within the Appendices (rather than removing equation manipulations making clear steps of derivation).

      (10) I would suggest moving the table in Appendix 11 to the main text where misestimation is referenced.

      So moved. This appendix now appears in the main text as table 1 “Definitions of misestimating global reward rate-enabling parameters”.

      Reviewer #3 (Public review):

      One broad issue with the paper is readability. Admittedly, this is a complicated analysis involving many equations that are important to grasp to follow the analyses that subsequently build on top of previous analyses.

      But, what's missing is intuitive interpretations behind some of the terms introduced, especially the apportionment cost without referencing the equations in the definition so the reader gets a sense of how the decision-maker thinks of this time cost in contrast with the opportunity cost of time.

      We thank the reviewer for encouraging us to formulate a succinct and intuitive statement as to the nature of apportionment cost. We thank the reviewer for pressing for a succinct and intuitive verbal description.

      We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in a new paragraph (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5). We also expanded original figure 5 and its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      The above definition of apportionment cost adds to other stated relationships of apportionment cost found throughout the paper (original lines 434,435,447,450).

      Re-analysis of some existing empirical data through the lens of their presented objective functions, especially later when they describe sources of error in behavior.

      Our objective was not to fit experimentally observed data, as is commonly the goal of implementation/computational models. Rather, as a theory, our objective is to rationalize the broad, curious, and well-established pattern of temporal decision-making behaviors under a deeper understanding of reward-rate maximization, and from that understanding, identify the nature of the error being committed by whatever learning algorithm and representational architecture is actually being used by humans and animals. In doing so, we make a number of important contributions. By identifying and analyzing reward-rate-maximizing equations, we 1) provide insight into what composes time’s cost and how the temporal structure of the world in which it is embedded (its ‘context’) impacts the value of a pursuit, 2) rationalize a diverse assortment of temporal decision-making behaviors (e.g., Hyperbolic discounting, the Magnitude Effect, the Sign Effect, and the Delay effect), explaining them with no assumed free-fit parameter, and then, by analyzing error in parameters enabling reward-rate maximization, 3) identify the likely source of error and propose the Malapportionment Hypothesis. The Malapportionment Hypothesis identifies the underweighting of a considered pursuit’s “outside”, and not error in pursuit’s reward rates, as the source of error committed by humans and animals. It explains why animals and humans can present as suboptimally ‘impatient’ in Choice, but as optimal in Forgo. At the same time, it concords with numerous and diverse observations in decision making regarding whether to initiate a pursuit. The nature of this error also, then, makes numerous predictions. These insights inform future computational and experimental work by providing strong constraints on the nature of the algorithm and representational architecture used to learn and represent the values of pursuits. Rigorous test of the Malapportionment Hypothesis will require wholly new experiments.

      In the revision, we also now emphasize and add predictions of the Malapportionment Hypothesis, augmenting its figure (Figure 21), its legend, and its paragraphs in the discussion.

      “We term this reckoning of the source of error committed by animals and humans the Malapportionment Hypothesis, which identifies the underweighting of the time spent outside versus inside a considered pursuit but not the misestimation of pursuit rates, as the source of error committed by animals and humans (Figure 21). This hypothesis therefore captures previously published behavioral observations (Figure 21A) showing that animals can make decisions to take or forgo reward options that optimize reward accumulation (Krebs et al., 1977; Stephens and Krebs, 1986; Blanchard and Hayden, 2014), but make suboptimal decisions when presented with simultaneous and mutually exclusive choices between rewards of different delays (Logue et al., 1985; Blanchard and Hayden, 2015; Carter and Redish, 2016; Kane et al., 2019). The Malapportionment Hypothesis further predicts that apparent discounting functions will present with greater curvature than what a reward-rate-maximizing agent would exhibit (Figure 21B). While experimentally observed temporal discounting would have greater curvature, the Malapportionment Hypothesis also predicts that the Magnitude (Figure 21C) and Sign effect (Figure 21D) would be less pronounced than what a reward-rate-maximizing agent would exhibit, with these effects becoming less pronounced the greater the underweighting. Finally, with regards to the Delay Effect (Figure 21E), the Malapportionment Hypothesis predicts that preference reversal would occur at delays greater than that exhibited by a reward-rate-maximizing agent, with the delay becoming more pronounced the greater the underweighting outside versus inside the considered pursuit by the agent.”

      Reviewer #3 Recommendations:

      As mentioned above, the readability of this paper should be improved so that the readers can follow the derivations and your analyses better. To this end, careful numbering of equations, following consistent equation numbering formats, and differentiating between appendix referencing and equation numbering would have gone a long way in improving the readability of this paper. Some specific questions are noted below.

      To increase clarity, in the revision we eliminated numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are thus numbered sequentially as they appear and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      (1) In general, it is unclear what the default pursuit is. From the schematic on the left (forgo decision), it appears to be the time spent in between reward-giving pursuits. However, this schematic also allows for smaller rewards to be attained during the default pursuit as do subsequent equations that reference a default reward rate. Here is where an example would have really benefited the authors in getting their point across as to what the default pursuit is in practice in the forgo decisions and how the default reward rate could be modulated.

      (1) The description of the default pursuit has been modified in section “Forgo and Choice decision topologies” to now read… “After either the conclusion of the pursuit, if accepted, or immediately after rejection, the agent returns to a pursuit by default (the “default” pursuit). This default pursuit effectively can be a waiting period over which reward could be received, and reoccurs until the next pursuit opportunity becomes available.” (2) Additionally, helper text has been added to Ap1 regarding the meaning of time and reward spent in the default pursuit. Finally, (3) new figures concerning n-pursuits occurring at the same (Supplement 1) or different (Supplement 2) frequencies from a default pursuit is now added, providing examples as suggested by the reviewer.

      (2) I want to clarify my understanding of the topologies in Figure 1. In the forgo, do they roam in the "gold" pursuit indefinitely before they are faced with the purple pursuit? In general, comparing the 2 topologies, it seems like in the forgo decision, they can roam indefinitely in the gold topology or choose the purple but must return to the gold.

      The reviewer’s understanding of the topology is correct. The agent loops across one unit time in the default gold pursuit indefinitely, though the purple pursuit (or any pursuit that might exist in that world) occurs on exit from gold at its frequency per unit time. The default gold pursuit will then itself have an average duration in units of time spent in gold. As the reviewer states, the agent can re-enter into gold from having exited gold, and can enter gold from having exited purple, but cannot re-enter purple from having exited purple; rather, it must enter into the default pursuit.

      …Another point here is that this topology is highly simplified (only one considered pursuit). So it may be helpful to either add a schematic for the full topology with multiple pursuits or alternatively, provide the corresponding equations (at least in appendix 1 and 2) for the simplified topology so you can drive home the intuition behind derived expressions in these equations.

      We understand the reviewer to be noting that, while, the illustrated example is of the simple topology, the mathematical formulation handles the case of n-number pursuits, and that illustrating a world in which there are a greater number of pursuits, corresponding to original appendices 1&2, would assist readers in understanding the generality of these equations.

      An excellent suggestion. We have now n-pursuit world illustrations where each pursuit occurs at the same (Supplemental Figure 1) and at different frequencies (Supplemental Figure 2) to the manuscript, and have added text to assist in understanding the form of the equation and its relationship to unit time in the default pursuit in the main and in the appendices.

      (3) In Equation and Appendix 1, there are a few things that are unclear. Particularly, why is the expected time of the default option E(t_default )= 1/(∑_(i=1)^n f_i )? Similarly, why is the E(r_default )= ρ_d/(∑_(i=1)^n f_i )? Looking at the expression for E(r_default ), it implies that across all pursuits 1 through n, the default option is encountered only once. Ultimately, in Equation 1.4, (and Equation 1), the units of the two terms in the numerator don't seem to match. One is a reward rate (ρ_d) and the other is a reward value. This is the most important equation of the paper since the next several equations build upon this. Therefore, the lack of clarity here makes the reader less likely to follow along with the analysis in rigorous detail. Better explanations of the terms and better formatting will help alleviate some of these issues.

      The equation is formulated to calculate the average reward received and average time spent per unit time spent in the default pursuit. So, f<sub>i</sub> is the encounter rate of pursuit i for one unit of time spent in the default pursuit. Added to the summation in the numerator we have the average reward obtained in the default pursuit per unit time () and in the denominator we have the time spent in the default pursuit per unit time (1).

      Text explaining the above equation has been added to Ap 1.

      (4) In equation and appendix 2, I'm trying to relate the expressions for t_out and r_out to the definitions "average time spent outside the considered pursuit". If I understand the expression in Equation 2.4 on the right-hand side, the numerator is the total time spent in all of the pursuits in the environment and the denominator refers to the number of times the considered pursuit is encountered. It is unclear as to why this is the average time spent outside the considered pursuit. In my mind, the expression for average time spent outside the considered pursuit would look something like t_out=1+ ∑_(i≠in)〖p_i t_i 〗= 1+ ∑_(i≠in)〖f_i/(∑_(j=1)^n f_j ) * t_i 〗. It is unclear how these expressions are then equivalent.

      Regarding the following equation,

      f<sub>i</sub> is the probability that pursuit i will be encountered during a single unit of time spent in the default pursuit. The numerator of the expression is the average amount of time spent across all pursuits, excepting the considered pursuit, per unit time spent in the default pursuit. Note that the + 1 in the numerator is accounting for the unit of time spent in the default pursuit and is added outside of the sum. Since f<sub>in</sub> is the probability that the considered pursuit will be encountered per unit of time spent in the default pursuit, is the average amount of time spent in the default pursuit between encounters of the considered pursuit. By multiplying the average time spent across all outside pursuits per unit of time in the default pursuit by the average amount of time spent in the default pursuit between encounters of the considered pursuit, we get the average amount of time spent outside the considered pursuit per encounter of the considered pursuit. This is calculated as if the pursuit encounters are mutually exclusive within a single unit of time spent within the default pursuit, as this is the case as the length of our unit time (delta t) approaches zero.

      The above text explaining the equation has been added to Ap 2.

      (5) In Figure 3, one huge advantage of this separation into in-pursuit and out-of-pursuit patches is that the optimal reward rate maximizing rule becomes one that compares ρ_in and ρ_out. This contrasts with an optimal foraging rule which requires comparing to the global reward rate and therefore a circularity in solution. In practice, however, it is unclear how ρ_out will be estimated by the agent.

      How, in practice, a human or animal estimates the reward rates―be they the outside and/or global reward rate under a policy of accepting a pursuit―is the crux of the matter. This work identifies equations that would enable a reward-rate maximizing agent to calculate and execute optimal policies and emphasizes that the effective reward rates and weights of pursuits must be accurately appreciated for global reward rate optimization. In so doing, it makes a reckoning of behaviors commonly but erroneously treated as suboptimal. Then, by examining the consequences of misestimation of these enabling parameters, it identifies mis-weighting pursuits as the nature of the error committed by whatever algorithm and representational architecture is being used by humans and animals (the Malapportionment Hypothesis). This curious pattern identified and analyzed in this work thus provides a clue into the nature of the learning algorithm and means of representing the temporal structure of the environment that is used by humans and animals―the subject of future work.

      We note, however, that we do discuss existing models that grapple with how, in practice, how a human or animal may estimate the outside reward rate. Of particular importance is the TIMERR model, which estimates the outside reward rate from its past experience, and can make an accounting of many qualitative features widely observed. However, while appealing, it would mix prior ‘in’ and ‘outside’ experiences within that estimate, and so would fail to perform forgo tasks optimally. Something is still amiss, as this work demonstrates.

      (6) The apportionment time cost needs to be explained a little bit more intuitively. For instance, it is clear that the opportunity cost of time is the cost of not spending time in the rest of the environment relative to the current pursuit. But given the definition of apportionment cost here in lines 447- 448 "The apportionment cost relates to time's allocation in the world: the time spent within a pursuit type relative to the time spent outside that pursuit type, appearing in the denominator." The reference to the equation (setting aside the confusion regarding which equation) within the definition makes it a bit harder to form an intuitive interpretation of this cost. Please reference the equation being referred to in lines 447-448, and again, an example may help the authors communicate their point much better

      We thank the reviewer for pressing on this critical point.

      Action: We added the following succinct verbal description of apportionment cost… “Apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration.” This definition appears in a new paragraph (as below) describing apportionment cost in the results section “Time’s cost: opportunity & apportionment costs determine a pursuit’s subjective value”, and is accompanied by equations for apportionment cost, and a figure giving its geometric depiction (Figure 5).

      “What, then, is the amount of reward by which the opportunity cost-subtracted reward is scaled down to equal the sv of the pursuit? This amount is the apportionment cost of time. The apportionment cost of time (height of the brown vertical bar, Figure 5F) is the global reward rate after taking into account the opportunity cost (slope of the magenta-gold dashed line in Figure 5F) times the time of the considered pursuit. Equally, the difference between the inside and outside reward rates, times the time of the pursuit, is the apportionment cost when scaled by the pursuit’s weight, i.e., the fraction that the considered pursuit is to the total time to traverse the world (Equation 9, right hand side). From the perspective of decision-making policies, apportionment cost is the difference in reward that can be expected, on average, between a policy of taking versus a policy of not taking the considered pursuit, over a time equal to its duration (Equation 9 center, Figure 5F).

      Equation 9. Apportionment Cost.

      While this difference is the apportionment cost of time, the opportunity cost of time is the amount that would be expected from a policy of not taking the considered pursuit over a time equal to the considered pursuit’s duration. Together, they sum to Time’s Cost (Figure 5G). Expressing a pursuit’s worth in terms of the global reward rate obtained under a policy of accepting the pursuit type (Figure 5 left column), or from the perspective of the outside reward and time (Figure 5 right column), are equivalent. However, the latter expresses sv in terms that are independent of one another, conveys the constituents giving rise to global reward rate, and provides the added insight that time’s cost comprises an apportionment as well as an opportunity cost.”

      (7) The analyses in Figures 6 and 7 give a nice visual representation of how the time costs are distributed as a function of outside reward and time spent. However, without an expression for apportionment cost it is hard to intuitively understand these visualizations. This also relates to the previous point of requiring a more intuitive explanation of apportionment costs in relation to the opportunity cost of time. Based on my quick math, it seems that an expression for apportionment cost would be as follows: (r_in- ρ_out*t_in)*(t_in⁄t_out )/(t_in⁄t_out +1 ). The condition described in Figure 7 seems like the perfect place to compute the value of just apportionment cost when the opportunity cost is zero. It would be helpful to introduce the equation here.

      We designed original figure 7, as the reviewer appreciates, to emphasize that time has a cost even when there is no opportunity cost, being due entirely to the apportionment cost of time.

      We now provide the mathematical expression of apportionment cost and apportionment scaling in Figure 5, the point in the main text of its first occurrence.

      …and have expanded original figure 5, its legend (so as to illustrate the apportionment scaling factor and the apportionment cost), and its accompanying main text, to further illustrate and clarify apportionment cost, and its relationship to opportunity cost, and time’s cost.

      (8) The analysis regarding choice decisions is relatively straightforward, pending the concerns for the main equations listed above for the forgo decisions. Legends certainly would have helped me grasp Figures 10-12 better.

      We believe the reviewer is referring to missing labels for the Sooner Smaller pursuit, and the Larger Later Pursuit in these figures? We used the same conventions as in Figure 9, but we see now that adding these labels to these figures would be helpful, and add them in the revision.

      We have now added to the figures themselves figure legends indicating the Sooner Small Pursuit and the Larger Later Pursuit. We have also added to the main text to emphasize the points made in these figures regarding the impact of opportunity cost and apportionment cost.

      (9) The derivation of the temporal discounting function from subjective reward rate is much appreciated as it provides further evidence for potential equivalence between reward rate optimization and hyperbolic discounting, which is known to explain a slew of decision-making behaviors in the economics literature.

      We thank and greatly appreciate the reviewer for this recognition.

      In response to the reviewer’s comment, we have added text that further relates reward rate optimization to hyperbolic discounting…

      (1) We add discussion of how our normative derivation gives explanation to Mazur’s ad hoc addition of 1 + k to Ainslie’s reward/time hyperbolic discounting conception. See new first paragraph under “Hyperbolic Temporal Discounting Functions” for the historical origins of the standard hyperbolic equation (which are decidedly not normatively derived). And then see our discussion (new second paragraph in sections “The apparent discounting function of global….”) of how our normative derivation gives explanation to “1”, “k”, and their relationship to each other.

      (2) We add explicit treatment of the Delay Effect in a new “The Delay Effect” section of the results along with a figure, and in its corresponding Discussion section.

      Minor comments:

      (1) Typo in equation 2, should be t_i in the denominator within the summation, not r_i .

      We thank the reviewer for catching this typo, and have corrected it in the revision.

      (2) Before equation 6, typo when defining ρ_in= r_in/(t_in.). Should be t_in in the denominator, not r_out.

      We thank the reviewer for catching this typo, and have corrected it in the revision.

      (3) Please be consistent with equation numbers, placement of equation references, and the reason for placing appendix numbers. This will improve readability immensely.

      To increase clarity, in the revision we eliminated numbering of equations in the appendices except where an equation occurs in an appendix that is referenced within the main text. In the main text, important equations are thus numbered sequentially and note the appendix from which they derive. If an equation in an appendix is referenced in the main text, it is noted within the appendix it derives.

      (4) Line 505 - "dominants" should be dominates.

      Typo fixed as indicated

      (5) Figures 10-12: add legends to the figures.

      Now so included.

      (6) Lines 701-703: please rewrite the equation separately. It is highly unclear what rt is here.

      We thank the reviewer for bringing attention to this error. The error arose in converting from Google Sheets to Microsoft Word.

      The equation has now been corrected.

      Additional citations noted in reply and appearing in Main text

      Ainslie, George. 1975. “Specious Reward: A Behavioral Theory of Impulsiveness and Impulse Control.” Psychological Bulletin 59: 257–72.

      Frederick, Shane, George Loewenstein, Ted O. Donoghue, and T. E. D. O. Donoghue. 2002. “Time Discounting and Time Preference : A Critical Review.” Journal of Economic Literature 40: 351–401.

      Gibbon, John. 1977. “Scalar Expectancy Theory and Weber’s Law in Animal Timing.” Psychological Review 84: 279–325.

      Green, Leonard, Nathanael Fristoe, and Joel Myerson. 1994. “Temporal Discounting and Preference Reversals in Choice between Delayed Outcomes.” Psychonomic Bulletin & Review 1: 383–89.

      Grüne-Yanoff, Till. 2015. “Models of Temporal Discounting 1937-2000: An Interdisciplinary Exchange between Economics and Psychology.” Science in Context 28 (4): 675–713.

      Jimura, Koji, Joel Myerson, Joseph Hilgard, Todd S. Braver, and Leonard Green. 2009. “Are People Really More Patient than Other Animals? Evidence from Human Discounting of Real Liquid Rewards.” Psychonomic Bulletin & Review 16: 1071–75.

      Kalenscher, Tobias, and Cyriel M. A. Pennartz. 2008. “Is a Bird in the Hand Worth Two in the Future? The Neuroeconomics of Intertemporal Decision-Making.” Progress in Neurobiology 84 (3): 284–315.

      Kirby, Kris N., and R. J. Herrnstein. 1995. “Preference Reversals Due to Myopic Discounting of Delayed Reward.” Psychological Science 6 (2): 83–89.

      Mazur, James E. 1987. “An Adjusting Procedure for Studying Delayed Reinforcement.” In The Effect of Delay and of Intervening Events on Reinforcement Value., 55–73. Quantitative Analyses of Behavior, Vol. 5. Hillsdale, NJ, US: Lawrence Erlbaum Associates, Inc.

      McNamara, John. 1982. “Optimal Patch Use in a Stochastic Environment.” Theoretical Population Biology 21 (2): 269–88.

      Rosati, Alexandra G., Jeffrey R. Stevens, Brian Hare, and Marc D. Hauser. 2007. “The Evolutionary Origins of Human Patience: Temporal Preferences in Chimpanzees, Bonobos, and Human Adults.” Current Biology: CB 17: 1663–68.

      Strotz, R. H. 1956. “Myopia and Inconsistency in Dynamic Utility Maximization.” The Review of Economic Studies 23: 165–80.

    1. eLife Assessment

      This useful study by Gao et al identifies Hspa2 as a heterogeneous transcript in the early embryo and proposes a plausible mechanism showing interactions with Carm1. The authors propose that variability in HSPA2 levels among blastomeres at the 4-cell stage skews their relative contribution to the embryonic lineage. Given only 4 other heterogeneous transcripts/non-coding RNA have been proposed to act similarly at or before the 4-cell stage, this would be a key addition to our understanding of how the first cell fate decision is made. While this is a solid study, further data are needed to fully support the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of HSPA2 during mouse preimplantation development. Knocking down HSPA2 in zygotes, the authors describe lower chances of developing into blastocysts, which show a reduced number of inner cell mass cells. They find that HSPA2 mRNA and protein levels show some heterogeneity among blastomeres at the 4-cell stage and propose that HSPA2 could contribute to skewing their relative contribution to embryonic lineages. To test this, the authors try to reduce HSPA2 expression in one of the 2-cell stage blastomere and propose that it biases their contribution to towards extra-embryonic lineages. To explain this, the authors propose that HSPA2 would interact with CARM1, which controls chromatin accessibility around genes regulating differentiation into embryonic lineage.

      Strengths:

      (1) The study offers simple and straightforward experiments with large sample sizes.

      (2) Unlike most studies in the field, this research often relies on both mRNA and protein levels to analyse gene expression and differentiation.

      Weaknesses:

      (1) Image and statistical analyses are not well described.

      (2) The functionality of the overexpression construct is not fully validated.

      (3) Tracking of KD cells in embryos injected at the 2-cell stage with GFP is unclear.

      (4) A key rationale of the study relies on measuring small differences in the levels of mRNA and proteins using semi-quantitative methods to compare blastomeres. As such, it is not possible to know whether those subtle differences are biologically meaningful. For example, the lowest HSPA2 level of the embryo with the highest level is much higher than the top cell from the embryo with the lowest level. What does this level mean then? Does this mean that some blastomeres grafted from strong embryos would systematically outcompete all other blastomeres from weaker embryos? That would be very surprising. I think the authors should be more careful and consider the lack of quantitative power of their approach before reaching firm conclusions. Although to be fair, the authors only follow a long trend of studies with the same intrinsic flaw of this approach.

      (5) Some of the analyses on immunostaining do not take into account that this technique only allows for semi-quantitative measurements and comparisons.<br /> a) Some of the microscopy images are shown with an incorrect look-up table.<br /> b) Some of the schematics are incorrect and misleading.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Gao et al. use RNA-seq to identify Hspa2 as one of the earliest transcripts heterogeneously distributed between blastomeres. Functional studies are performed using siRNA knockdown showing Hspa2 may bias cells toward the ICM lineage via interaction with the known methyltransferase CARM1.

      Strengths:

      This study tackles an important question regarding the origins of the first cell fate decision in the preimplantation embryo. It provides novelty in its identification of Hspa2 as a heterogeneous transcript in the early embryo and proposes a plausible mechanism showing interactions with Carm1. Multiple approaches are used to validate their functional studies (FISH, WB, development rates, proteomics). Given only 4 other transcripts/RNA have been identified at or before the 4-cell stage (LincGET, CARM1, PRDM14, HMGA1), this would be an important addition to our understanding of how TE vs ICM fate is established.

      Weaknesses:

      The RNA-seq results leading the authors to focus on Hspa2 are not included in the manuscript. This dataset would serve as an important resource but is neither included nor discussed. Nor is it mentioned whether Hspa2 was identified in prior RNA-seq embryos studies (for example Deng Science 2014).

      Furthermore, the authors show that Hspa2 knockdown at the 1-cell stage lowers total Carm1 levels at the 4-cell stage. However, it is unclear how total abundance within the embryo alters lineage specification within blastomeres. The authors go on to propose a plausible mechanism involving Hspa2 and Carm1 interaction, but do not discuss how expression levels may be involved.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of HSPA2 during mouse preimplantation development. Knocking down HSPA2 in zygotes, the authors describe lower chances of developing into blastocysts, which show a reduced number of inner cell mass cells. They find that HSPA2 mRNA and protein levels show some heterogeneity among blastomeres at the 4-cell stage and propose that HSPA2 could contribute to skewing their relative contribution to embryonic lineages. To test this, the authors try to reduce HSPA2 expression in one of the 2-cell stage blastomere and propose that it biases their contribution to towards extra-embryonic lineages. To explain this, the authors propose that HSPA2 would interact with CARM1, which controls chromatin accessibility around genes regulating differentiation into embryonic lineage.

      Strengths:

      (1) The study offers simple and straightforward experiments with large sample sizes.

      Thanks for your kind recognition.

      (2) Unlike most studies in the field, this research often relies on both mRNA and protein levels to analyses gene expression and differentiation.

      Thanks for your kind recognition.

      Weaknesses:

      (1) Image and statistical analyses are not well described.

      Thanks for your advisable comment. We redescribe the image and statistical analyses in our revised version (line 255-257).

      (2) The functionality of the overexpression construct is not validated.

      Thanks for your kind suggestion. We validate the functionality of the overexpression construct in our revised version (Figure S3).

      (3) Tracking of KD cells in embryos injected at the 2-cell stage with GFP is unclear.

      Thanks for your kind suggestion. We randomly co-injected green fluorescent protein (Gfp) mRNA as a linage tracer with either Hspa2-siRNA or NC-FAM into one of the 2 -cell, and then monitored embryo development to the blastocyst stage (line 342-344).

      (4) A key rationale of the study relies on measuring small differences in the levels of mRNA and proteins using semi-quantitative methods to compare blastomeres. As such, it is not possible to know whether those subtle differences are biologically meaningful. For example, the lowest HSPA2 level of the embryo with the highest level is much higher than the top cell from the embryo with the lowest level. What does this level mean then? Does this mean that some blastomeres grafted from strong embryos would systematically outcompete all other blastomeres from weaker embryos? That would be very surprising. I think the authors should be more careful and consider the lack of quantitative power of their approach before reaching firm conclusions. Although to be fair, the authors only follow a long trend of studies with the same intrinsic flaw of this approach.

      Thanks for your advisable comment. Indeed, despite the approach drew on previous research (Zhou Cell 2018), we were clearly aware that this approach can only reflect relative comparisons. This means that the relative difference among the blastomeres from the same embryo were detected and compared. We did not compare the absolute levels of mRNA between different embryos. We also offered simple and straightforward experiments with large sample sizes to confirm this conclusion.

      (5) Some of the analyses on immunostaining do not take into account that this technique only allows for semi-quantitative measurements and comparisons.

      a) Some of the microscopy images are shown with an incorrect look-up table.

      b) Some of the schematics are incorrect and misleading.

      Thanks for your advisable comment. We revised microscopy images and schematics in our revised version.

      Reviewer #2 (Public review):

      Summary:

      In this study, Gao et al. use RNA-seq to identify Hspa2 as one of the earliest transcripts heterogeneously distributed between blastomeres. Functional studies are performed using siRNA knockdown showing Hspa2 may bias cells toward the ICM lineage via interaction with the known methyltransferase CARM1.

      Strengths:

      This study tackles an important question regarding the origins of the first cell fate decision in the preimplantation embryo. It provides novelty in its identification of Hspa2 as a heterogeneous transcript in the early embryo and proposes a plausible mechanism showing interactions with Carm1. Multiple approaches are used to validate their functional studies (FISH, WB, development rates, proteomics). Given only 4 other transcripts/RNA have been identified at or before the 4-cell stage (LincGET, CARM1, PRDM14, HMGA1), this would be an important addition to our understanding of how TE vs ICM fate is established.

      Thanks for your kind recognition.

      The RNA-seq results leading the authors to focus on Hspa2 are not included in the manuscript. This dataset would serve as an important resource but is neither included nor discussed. Nor is it mentioned whether Hspa2 was identified in prior RNA-seq embryos studies (for example Deng Science 2014).

      Thanks for your advisable comment. To identify genes that show a significantly high variability across blastomeres in the same embryo, we regressed out the embryo effect by established a new method, which will be published and uploaded to the database in the future. Thus, the RNA-seq results leading the we focus on Hspa2 are not included in the manuscript.   

      In addition, the functional studies are centered on Hspa2 knockdown at the zygote (1-cell) stage, which would largely target maternal transcript. Given the proposed mechanism relies on Hspa2 heterogeneity post-ZGA (late 2-cell stage), the knockdown studies don't necessarily test this and thus don't provide direct support to the authors' conclusions. The relevance of the study would be improved if the authors could show that zygotic knockdown leads to symmetric Hspa2 levels at the late 2-cell and/or 4-cell stage. It may be possible that zygotic knockdown leads to lower global Hspa2 levels, but that asymmetry is still generated at the 4-cell stage.

      Thanks for your advisable comment. We showed that the Hspa2 levels at the late 2-cell and 4cell stage after zygotic knockdown in our revised version (Figure S1 G-H, line 450-452).

      Furthermore, the authors show that Hspa2 knockdown at the 1-cell stage lowers total Carm1 levels at the 4-cell stage. However, it is unclear how total abundance within the embryo alters lineage specification within blastomeres. The authors go on to propose a plausible mechanism involving Hspa2 and Carm1 interaction, but do not discuss how expression levels may be involved.

      Thanks for your advisable comment. Previous research suggests that heterogeneous activity of the methyltransferase CARM1 results in differential methylation of histone H3R26 to modulate establishment of lineage specification (Zernicka-Goetz Cell 2018). Thus, we didn't discuss the total abundance within the embryo alters lineage specification.

      Recommendations for the authors:  

      Reviewer #1 (Recommendations for the authors):

      (1) Major issue with analyses:

      Image analysis needs to be much better explained than simply saying that ImageJ was used. Where are cells measured (at their equatorial plane? What is the size of the ROI?)? Ideally, the ROI and/or raw measurements should be provided.

      Thanks for your advisable comment. We redescribe the Image analysis in our revised version (line 187-194). 

      What are the objective criteria determining whether a cell is counted as GFP positive, CDX2 positive, or OCT4 positive? This is very unclear and key to the interpretation of many experiments.

      Thanks for your advisable comment. We think that the cell containing fluorescence signals above background noise were counted positive.

      Statistical analyses mention ANOVA in the methods but the student's t-test in the figure legend. Which is which? Most data are heavily normalized, which would unlikely fit the description for Student's t-test analyses.

      Thanks for your advisable comment. We redescribe the statistical analyses in our materials and methods (line 253-260).

      Figure 5H describes a relative fluorescence intensity with control at 1. The legend describes a normalization to "DNA" (I guess the authors meant DAPI), which is unlikely to give 1. This suggests that additional normalization was done and is not described. Is that the case? Also, since the authors propose that HSPA2 would control Histone modification and chromatin packing, I do not think that using DAPI is an appropriate way of normalizing the fluorescence signal.

      Thanks for your advisable comment. We replaced DNA with DAPI in our revised version. Based on previous studies, we adopted DAPI as a normalized fluorescence signal (Zhou Cell 2018, Zernicka-Goetz Cell 2018).

      Figure 1E shows data normalized to the lowest level while Figure 1H is normalized to the highest level. A consistent representation would be welcome.

      Thanks for your advisable comment. We revised the Figure 1H in our revised version.

      Is Figure 1C showing a t-test between correlations?

      Yes, Figure 1C shows the t-test between correlation.

      (2) Major issue with the interpretation of semi-quantitative methods and measurements:

      qPCR, WB, immunostaining are all semi-quantitative methods that require some kind of normalization due to non-linear bias in the way the molecules are picked up. Such normalization makes it difficult to know whether a detectable difference is meaningful biologically speaking i.e. if a difference of 1 CT between blastomeres can be detected after qPCR, is it meaningful? If that were the case, then embryos with lower CT than others (Figure 1D) would not be able to develop into blastocyst, like siRNA injected embryos, or grafting a blastomere with a high CT onto an embryo with low CT would lead to the systematic differentiation of these strong blastomeres into ICM.

      Thanks for your advisable comment. The CT values represent the relative mRNA levels of Hspa2 between blastomeres, and the higher CT value represents the lower expression of Hspa2 at mRNA level. Figure 1D shows the Hspa2 mRNA levels between blastomeres. The blastomere with lowlevel expression of the Hspa2 mRNA is not bias an ICM fates.  

      The same goes for fluorescence analyses (Figure 1F). Can the authors also provide the measurements for DAPI as they did for HSPA2? I am sure that with enough measurements, DAPI is variable enough to give a statistical difference among blastomeres with questionable biological meaning.

      I think the reasoning used here (unfortunately following the reasoning that has been used in a series of studies by other groups) of ranking blastomeres after semi-quantitative measurement is fundamentally flawed.

      Thanks for your advisable comment. The DAPI was determined by the maximal area using a custom Python script. Based on previous studies, we adopted DAPI as a normalized fluorescence signal (Zhou Cell 2018). This approach is to normalize embryo-to-embryo variance from the technical reason.

      (3) Major issue with overexpression experiment:

      While the siRNA experiment is partially validated by qPCR and WB measurements of HSPA2 after KD, the overexpression experiment is not. Do the authors have any evidence that the construct they use is produced into protein and functional? Can the authors check by WB? Can the authors rescue the siRNA with their overexpression?

      Thanks for your advisable comment. We verified the overexpression experiment by WB in in our revised version (Figure S3, line 360-361). Considering that siRNA degrades mRNA and prevents the mRNA translation process, we did not co-inject the siRNA with their overexpression.

      The lack of effect of HSPA2 overexpression on blastocyst formation is difficult to reconcile with the interpretation from the authors that levels of HSPA2 bias lineages.

      Have the authors tried lower concentrations? Have the authors tried FISH on their half-injected 2cell embryos? Of course, if the antibody against HSPA2 would work with immunostaining, that would be ideal.

      Thanks for your advisable comment. We chose the concentrations for our study based on previous research (Zernicka-Goetz Cell 2016). To verified Hspa2 was successfully inject into one blastomere at the 2-cell stage, we observed green fluorescence after co-injected GFP mRNA with either siRNA or NC-FAM into one blastomere of the two-cell embryos. Thus, we didn't try FISH on half-injected 2-cell embryos. We tried to perform immunostaining experiments with various HSPA2 antibodies (Proteintech: 12797-1-AP, Abcam: ab108416) and no good results were achieved.

      Author response image 1.

      (4) Major issue with tracking of injected cells:

      It is unclear what counts as a GFP-positive cell. In Figure 3D, most cells appear to have the same level of GFP.

      Thanks for your advisable comment. The cell containing green fluorescence signals above background noise were counted GFP-positive in Figure 3D. Most cells seem to have the same level of GFP because they are daughter cells of the blastomeres injected with GFP.

      In the images of GFP-expressing cells used to track the control of KD cells shown in Figure 3A, it seems that the control embryos have mostly GFP cells in the ICM. Is that the case, or just a bad example?

      Thanks for your advisable comment. The green fluorescent signals in Figure 3A represented OCT4 protein, an ICM marker.

      Can the authors do FISH against HSPA2 and visualize their GFP cells to validate the heterogeneous expression in situ?

      Thanks for your advisable comment. We have verified the heterogeneous expression of HSPA2 in Figure1.

      (5) Issue with fluorescent images:

      Many images are shown with inappropriate look-up tables with saturated DAPI, OCT4, CDX2, and FISH. This raises the doubt that analyses were made on saturated images, which would be incorrect.

      The LUT of Figure 5H should be adjusted similarly between the control and siRNA.

      Thanks for your advisable comment. We revised some images which showed inappropriate lookup tables in our revised version. The LUT of Figure 5H had been adjusted between the control and siRNA. 

      (6) Issue with schematics:

      Schematics of blastomere isolation grown into blastocyst-like structures are misleading since the final blastocyst-like structure should not have a zona pellucida and should have fewer cells than regular blastocysts.

      Thanks for your advisable comment. We revised schematics of blastomere grown into morula in our revised version (Figure 1A and Figure S1A).

      The summary schematics in the final figure should not state HSPA2 -/- since experiments in the study did not use KO but KD.

      Thanks for your advisable comment. We revised the summary schematics in our revised version.

      The blastocysts are the same sizes as the cleavage stage or morula embryos which implies that cells lose volume to the lumen, which is not the case.

      Thanks for your advisable comment. We revised the schematics in our revised version.

      (7) Issue with data presentation:

      In the tables within the figures, the number of decimals given should be the same for the mean and SE (one decimal should be more than enough).

      Thanks for your advisable comment. We revised the figure 2H in our revised version.

      The comparison of cell number and distribution within embryos (e.g. Figure 2B) would be best represented by a correlation analysis of TE vs ICM cells.

      Thanks for your advisable comment. We add the figure of a correlation analysis of TE vs ICM cells in our revised version (Figure 3B).

      The docking simulations are described in the main text as "experiments".

      Thanks for your advisable comment. We redescribed the docking simulations in our revised version.

      (8) Issue with data interpretation:

      The reduced number of ICM cells is interpreted as a slowed-down cell cycle. This could also be explained by failed cytokinesis and the generation of binucleated or polyploid cells. Have the authors checked for that? For example, by looking at their DAPI staining. 

      Thanks for your advisable comment. Our RNA-seq results revealed that the differentially expressed genes (DEGs) at blastocyst stage with HSPA2 knocking down are closely related to negative regulation of cell cycle, G1/S transition of mitotic cell cycle, mitotic cell cycle phase transition and regulation of mitotic cell cycle phase transition. Additionally, the previous study demonstrated that knockdown of HSPA2 reduced cell proliferation and led to G1/S phase cell cycle arrest (Hu Ann Transl Med 2019). Additionally, the lower cell number in ICM may also associated with failed cytokinesis and the generation of binucleated or polyploid cells. Thus, we guessed that HSPA2 has a role in ICM lineage establishment, although half of the ICM cells were able to survive with HSPA2 deficiency (line 463-472).

      It is unclear to me why reduced ICM should lead to fewer blastocysts. Blastocysts should be able to form as long as their TE is fine. In Figure 2G, embryos seem to be cultured in close proximity, which is fine if they are healthy but not if some of the embryos start dying and releasing toxic compounds (e.g. ROS). Have the authors tried removing the dying KD embryos to see if the development of the remaining embryos would improve?

      Thanks for your advisable comment. We think HSPA2 may affect blastocyst development by affecting other signaling pathways. And, the GO enriched terms was closely related to blastocyst development (Figure 2E). There was no significant difference in morula formation rate between Hspa2-KD group and NC group, thus the assumption that the toxic compounds released by some of the embryos that lead to downregulation of blastocyst rate may not be correct. Indeed, the rate of blastocyst formation in Hspa2-KD embryos was reduced significantly lower when few embryos was cultured separately. In addition, we discussed the possibility that the lower cell number in ICM may also associated with failed cytokinesis and the generation of binucleated or polyploid cells.

      Author response image 2.

      Reviewer #2 (Recommendations for the authors):

      One of the significant findings in the paper is the discovery portion where Hspa2 is identified as a heterogeneous transcript. To improve the logic and impact of the manuscript, it may benefit from reorganizing some of the figures and text. For example:

      (1) The paragraph in the introduction (Lines 56-68) should be moved to the discussion as the Hspa2 reveal should be in section 3.1, not prior to the RNA-seq results presented in Figure 1.

      Thanks for your advisable comment. We think it is more logical that HSPA2 needs to be introduced in the introduction.

      (2) Add text at the beginning of Section 3.1 to describe the rationale and results for the RNAseq. It would help the readers if the authors clearly stated why they chose the 4-cell stage.

      Thanks for your advisable comment. We explain why we chose the 4-cell stage in our revised version (line 272-273).

      (3) As this is the first time Hspa2 is identified, consider moving Figure S1C to the main figure to show expression throughout development.

      Thanks for your advisable comment. We moved Figure S1C to the main figure in our revised version (line 286-291).

      (4) Figure 1C: the correlation between Hspa2 and ICM markers would be strengthened if additional transcripts were used (Oct4, Sox2, Sox21). The graph in 1C would also be more informative if represented as a scatter plot with correlation coefficients (Nanog log2TPM vs Hspa2 log2TPM), rather than bar graphs.

      Thanks for your advisable comment. We chose Nanog as the correlation between Hspa2 and Nanog, a ICM markers, was showing the strongest correlation in result. And, the figure 1C shows the stronger positive correlation between Nanog and Hspa2 in gene expression than random gene pairs (n=100, n means the number of random gene pairs). Thus, the figure 1C with bar graphs is easier to understand.

      (5) Figure 1D: how were individual blastomeres grouped into B1-4? Individually run and then pooled based on relative expression?

      Thanks for your advisable comment. Blastomeres are named B1 to B4 according to increasing Hspa2 concentration in figure 1E.

      (6) Figures 1F, 1I, 5H: the DAPI channel appears to be saturated, but is used to normalize fluorescence intensity and may incorrectly account for light scattering within the embryo. Please clarify by adding more details regarding image analysis. Were partial stacks through the nucleus used for analysis, or max projections? Graph axes should be "relative fluorescence intensity."

      Thanks for your advisable comment. We added the details of fluorescence images analysis. The graph axes had revised in our revised version.

      (7) Line 278: the results in Figure S1C would benefit from more text regarding expression patterns throughout development. The maternal transcript appears to have a sharp downregulation by the early 2-cell stage, and is then upregulated coinciding with ZGA.

      Thanks for your advisable comment. We added more describe of the Figure in main text (LINE 285-290).

      (8) For the analyses in Figure 2 I-J and 2K-L, were arrested embryos excluded from analysis? This is an important detail as including arrested embryos would significantly bias the RNA-seq results. 

      Thanks for your advisable comment. The arrested embryos were excluded in Figure 2 I-J and 2K-L.

      (9) Figures 2G-H would be aided by converting the table in 2H to a bar graph and adding development rates for all stages (2-, 4-, 8-, morula, and blast). This would also show when an arrest occurs.

      Thanks for your advisable comment. We converted the table in 2H to a bar graph.

      (10) Blast rates are represented with too many significant digits (Figures 2H, 4B). They should only be reported to the closest ones given the unit of measure (number of blasts divided by number of zygotes). For instance, a blast rate of 81.63 {plus minus} 2.000 reflects excessive precision that is not measured in the data, it should rather read 82 {plus minus} 2%. This is also true for % cells (Figures 3E, 4H).

      Thanks for your advisable comment. Values were rounded down to the one decimal place (rounded down).

      (11) The clarity and impact of Figure 3A and 3D would benefit from 2D slices through the ICM. 

      Thanks for your advisable comment. In order to get more comprehensive understanding of the 3D structure of blastocyst of Figure 3A and 3D, we did not choose 2D slices.

      (12) To improve clarity and logic, separate the 1-cell and 2-cell knockdown experiments in the text and figures:

      a) 1-cell knockdown with RNA-seq results (Fig 2A-F).

      b) 1-cell knockdown showing less ICM/pluripotency markers in (combine Figures 2G-M and Figures 3A-B; "new Fig 3").

      c) 2-cell knockdown tracing lineage (Figures 2D-E; "new Fig 4").

      The new Figures 3 and 4 should mirror one another (i.e. for each knockdown experiment, development rates and cell counts should be included). For the 2-cell knockdown (Figures 2 D-E), what were the developmental rates (8-cell, morula, blast)?

      Thanks for your advisable comment. However, in order to the overall logical of the article, we do not separate the 1-cell and 2-cell knockdown experiments in the text and figures. And, we added the developmental rates (8-cell, morula, blast) of 2-cell knockdown group in our revised version (Figure S2).

      For the overexpression experiment (Figure 4), why were injections performed at the zygote stage versus the 2-cell stage? Given the significant downregulation of maternal transcript demonstrated in Figure S1C, it seems plausible that the injected RNA was also downregulated.

      Thanks for your advisable comment. For the overexpression experiment, we first chose to inject Hspa2 mRNA at the zygote stage and found that the overexpression of Hspa2 does not induce blastomere cells to bias an ICM fate. The qRT-PCR results indicated that the expression level of Hspa2 in overexpression group was significantly increased compared with normal group at 4cell and blastocyst stage (Figure 4C, 4D).  In addition, there is no guarantee that an equal amount of Hspa2 mRNA be injected into each blastomere in 2-cell stage. Thus, we did not microinject Hspa2 mRNA into the 2-cell stage.

      The 3.5 subheading overstates the results as the Hspa2-Carm1 interaction is not linked to lineage segregation. For example, a more specific subtitle might be, "Hspa2 interacts with Carm1 and alters H3R26me2 levels."

      Thanks for your advisable comment. We revised the subtitle in our revised version (line 376).

      Figures 5B-C and 5D-E. The qRT-PCR and WB analysis of knockdown blasts shows a correlation between Hspa2 downregulation and Carm1 downregulation. However, if the proposed mechanism is Hspa2 binding to Carm1 to mediate downstream methylation, why would it be expected to alter transcript levels at the 4-cell or blast stage? Please add further details and discussion in the results and discussion sections.

      Thanks for your advisable comment. The reason we chose to work at the 4-cell stage is because previous studies on CARM1 have focused on the 4-cell stage (Zernicka-Goetz Cell 2018,2016). 

      In the discussion, the statement in Lines 430-431 is an overinterpretation: "the heterogeneity of HSPA2... acts as an upstream factor to drive [the] first cell-fate decision." The knockdown experiments don't alter heterogeneity per se, but total abundance. Furthermore, the results do not show that heterogeneity drives heterogeneity in H3R26me2 patterns, for example.

      Thanks for your advisable comment. We redescribe the relevant statement in the discussion.

      More needs to be said regarding the ICM cells that persisted in the 1-cell KD experiment (Fig 3B). Lines 449-450 point out this result, but do not propose any plausible explanations. For instance, ICM cells may still form due to the incomplete knockdown achieved or the possibility that redundant pathways exist.

      Thanks for your advisable comment. We redescribe the relevant statement in our revised version (line 468-473).

      The 5th paragraph of the discussion seems incomplete. The authors point out a possible link between Hspa2 and Hippo and Wnt signaling pathways, but need to expand their discussion on how this may act as an additional mechanism incorporating Hspa2 with lineage segregation.

      Thanks for your advisable comment. We redescribe the 5th paragraph of the discussion (line 483-494).

      Statistics: all comparisons with greater than 2 groups should be performed with a one-way ANOVA and multiple comparisons, rather than Student's t-test (Figures 1B, 1D, 1E, 1F).

      All figure legends lack statistical test details.

      Thanks for your advisable comment. All figure legends added statistical test details in statistical analysis.

      Minor comments:

      In all graphs, individual blastomere expression levels should be represented as boxwhisker/bar/scatter/violin plots since the comparison is groups rather than time points (i.e. symbols should not be connected with a line in Figures 1B, 1D, 1F-G, 1I, S1D, S1F).

      Thanks for your advisable comment. Each colored line represents a single cell, and the dots of the same color represent the blastomere of the same cell. Thus, we use a line representation individual blastomere.

      For all fluorescent images, having two representative images may be confusing for the reader. Figures may be improved by just including one representative image for each stage/treatment (Figures 1F, 1I, S1F, 3A, 3D, 4E, 4G).

      Thanks for your advisable comment. The figures just including one representative image for each stage in our revised version. In addition, two representative images from each group were shown for each treatment (Figures 3A, 3D, 4E, 4G).

      The manuscript would be improved with thorough grammar and typo editing.

      For example:

      (1) Lines 18, 73, the wording is confusing, consider: "knockdown of Hspa2 in one of the two-cell blastomeres biased its progeny towards the trophectoderm lineage.".

      (2) Line 23, overstatement. Consider: "we demonstrated that HSPA2 levels correlate with ICMassociated genes and that it interacts with the CARM1.".

      (3) Line 25 confusing wording, "via the execution of commitment and differentiation phases.".

      (4) Line 37, replace "that" with "of;" replace "cell-fate decisions" with "cell-fate decision".

      (5) Line 40: needs space before (CARM1).

      (6) Line 43: the wording is confusing, consider "can result in higher expression levels of".

      (7) Line 45: wording, consider "Recent [studies have] further suggested".

      (8) Line 70: plurality, consider "analyzed gene expression pattern".

      (9) Line 73 typo: "prevents its".

      (10) Line 76-77 wording, consider "Hspa2 expression patterns can bias cell fate in the mouse embryo".

      (11) Line 276: remove "in whole embryos," since MII eggs are not embryos.

      (12) Line 617 "There" should be "Three".

      (13) Axis label in Fig 3b "Totle" should be "Total".

      (14) Lines 417, 419 missing spaces.

      (15) Line 448 missing word, "interfering [with] the cell cycle".

      (16) Line 462 incorrect word, "[a]polar cells being specified as ICM".

      (17) Line 469 incorrect plural, "cell differentiation".

      Thanks for your advisable comment. We revised the whole manuscript carefully according to the reviewers' suggestions.